Re: Using kafka with RESTful API

2019-03-18 Thread 1095193...@qq.com
Hi,
 A request-response structure is more suitable for your scenario, you should 
still persist RESTful API rather than Kafka. 


1095193...@qq.com
 
From: Desmond Lim
Date: 2019-03-19 09:52
To: users
Subject: Using kafka with RESTful API
Hi all,
 
Just started using kafka yesterday and I have this question.
 
I have a RESTful API that gets JSON data from a user (via POST) passes it
to an app, the app computes the data and return it to the RESTful API and
the API would display the results to the user.
 
I'm trying to do this:
 
The RESTful API would send the data to kafka. The app will get the data,
compute and return the results to the user.
 
The sending to kafka and the app, that I know how to do, I'm stuck at the
second part. How can I get the return results back to the RESTful API?
 
There are 2 scenarios:
 
1. It returns it via kafka again, the RESTful API will get the data and
return. But I can't understand how this would work in a queue? If I have 5
users using it, how can I ensure that the right result gets returned to the
right user?
 
Also, I assume that I need a while loop to check the kafka topic while it
waits for the results.
 
2. The second, is not to use kafka at all for the returning of results but
I don't know how it can be done and I think that it would make this really
complex (I might be wrong).
 
Any help would be appreciated. Thanks.
 
Desmond


Using kafka with RESTful API

2019-03-18 Thread Desmond Lim
Hi all,

Just started using kafka yesterday and I have this question.

I have a RESTful API that gets JSON data from a user (via POST) passes it
to an app, the app computes the data and return it to the RESTful API and
the API would display the results to the user.

I'm trying to do this:

The RESTful API would send the data to kafka. The app will get the data,
compute and return the results to the user.

The sending to kafka and the app, that I know how to do, I'm stuck at the
second part. How can I get the return results back to the RESTful API?

There are 2 scenarios:

1. It returns it via kafka again, the RESTful API will get the data and
return. But I can't understand how this would work in a queue? If I have 5
users using it, how can I ensure that the right result gets returned to the
right user?

Also, I assume that I need a while loop to check the kafka topic while it
waits for the results.

2. The second, is not to use kafka at all for the returning of results but
I don't know how it can be done and I think that it would make this really
complex (I might be wrong).

Any help would be appreciated. Thanks.

Desmond


Re: kafka latency for large message

2019-03-18 Thread Bruce Markey
Hi Nan,

Would you consider other approaches that may actually be a more efficient
solution for you? There is a slide deck Handle Large Messages In Apache
Kafka
.
For messages this large, one of the approaches suggested is Reference Based
Messaging where you write your large files to an external data store then
produce a small Apache Kafka message with a reference for where to find the
file. This would allow your consumer applications to find the file as
needed rather than storing all that data in the event log.

--  bjm

On Thu, Mar 14, 2019 at 1:53 PM Xu, Nan  wrote:

> Hi,
>
> We are using kafka to send messages and there is less than 1% of
> message is very big, close to 30M. understanding kafka is not ideal for
> sending big messages, because the large message rate is very low, we just
> want let kafka do it anyway. But still want to get a reasonable latency.
>
> To test, I just setup up a topic test on a single broker local kafka,
> with only 1 partition and 1 replica, using the following command
>
> ./kafka-producer-perf-test.sh  --topic test --num-records 200
> --throughput 1 --record-size 3000 --producer.config
> ../config/producer.properties
>
> Producer.config
>
> #Max 40M message
> max.request.size=4000
> buffer.memory=4000
>
> #2M buffer
> send.buffer.bytes=200
>
> 6 records sent, 1.1 records/sec (31.00 MB/sec), 973.0 ms avg latency,
> 1386.0 max latency.
> 6 records sent, 1.0 records/sec (28.91 MB/sec), 787.2 ms avg latency,
> 1313.0 max latency.
> 5 records sent, 1.0 records/sec (27.92 MB/sec), 582.8 ms avg latency,
> 643.0 max latency.
> 6 records sent, 1.1 records/sec (30.16 MB/sec), 685.3 ms avg latency,
> 1171.0 max latency.
> 5 records sent, 1.0 records/sec (27.92 MB/sec), 629.4 ms avg latency,
> 729.0 max latency.
> 5 records sent, 1.0 records/sec (27.61 MB/sec), 635.6 ms avg latency,
> 673.0 max latency.
> 6 records sent, 1.1 records/sec (30.09 MB/sec), 736.2 ms avg latency,
> 1255.0 max latency.
> 5 records sent, 1.0 records/sec (27.62 MB/sec), 626.8 ms avg latency,
> 685.0 max latency.
> 5 records sent, 1.0 records/sec (28.38 MB/sec), 608.8 ms avg latency,
> 685.0 max latency.
>
>
> On the broker, I change the
>
> socket.send.buffer.bytes=2024000
> # The receive buffer (SO_RCVBUF) used by the socket server
> socket.receive.buffer.bytes=2224000
>
> and all others are default.
>
> I am a little surprised to see about 1 s max latency and average about 0.5
> s. my understanding is kafka is doing the memory mapping for log file and
> let system flush it. all the write are sequential. So flush should be not
> affected by message size that much. Batching and network will take longer,
> but those are memory based and local machine. my ssd should be far better
> than 0.5 second. where the time got consumed? any suggestion?
>
> Thanks,
> Nan
>
>
>
>
>
>
>
> --
> This message, and any attachments, is for the intended recipient(s) only,
> may contain information that is privileged, confidential and/or proprietary
> and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer.   If you are not the
> intended recipient, please delete this message.
>


Re: kafka latency for large message

2019-03-18 Thread Mike Trienis
It takes time to send that much data over the network. Why would you expect
a smaller latency?

On Mon, Mar 18, 2019 at 8:05 AM Nan Xu  wrote:

> anyone can give some suggestion? or an explanation why kafka give a big
> latency for large payload.
>
> Thanks,
> Nan
>
> On Thu, Mar 14, 2019 at 3:53 PM Xu, Nan  wrote:
>
> > Hi,
> >
> > We are using kafka to send messages and there is less than 1% of
> > message is very big, close to 30M. understanding kafka is not ideal for
> > sending big messages, because the large message rate is very low, we just
> > want let kafka do it anyway. But still want to get a reasonable latency.
> >
> > To test, I just setup up a topic test on a single broker local kafka,
> > with only 1 partition and 1 replica, using the following command
> >
> > ./kafka-producer-perf-test.sh  --topic test --num-records 200
> > --throughput 1 --record-size 3000 --producer.config
> > ../config/producer.properties
> >
> > Producer.config
> >
> > #Max 40M message
> > max.request.size=4000
> > buffer.memory=4000
> >
> > #2M buffer
> > send.buffer.bytes=200
> >
> > 6 records sent, 1.1 records/sec (31.00 MB/sec), 973.0 ms avg latency,
> > 1386.0 max latency.
> > 6 records sent, 1.0 records/sec (28.91 MB/sec), 787.2 ms avg latency,
> > 1313.0 max latency.
> > 5 records sent, 1.0 records/sec (27.92 MB/sec), 582.8 ms avg latency,
> > 643.0 max latency.
> > 6 records sent, 1.1 records/sec (30.16 MB/sec), 685.3 ms avg latency,
> > 1171.0 max latency.
> > 5 records sent, 1.0 records/sec (27.92 MB/sec), 629.4 ms avg latency,
> > 729.0 max latency.
> > 5 records sent, 1.0 records/sec (27.61 MB/sec), 635.6 ms avg latency,
> > 673.0 max latency.
> > 6 records sent, 1.1 records/sec (30.09 MB/sec), 736.2 ms avg latency,
> > 1255.0 max latency.
> > 5 records sent, 1.0 records/sec (27.62 MB/sec), 626.8 ms avg latency,
> > 685.0 max latency.
> > 5 records sent, 1.0 records/sec (28.38 MB/sec), 608.8 ms avg latency,
> > 685.0 max latency.
> >
> >
> > On the broker, I change the
> >
> > socket.send.buffer.bytes=2024000
> > # The receive buffer (SO_RCVBUF) used by the socket server
> > socket.receive.buffer.bytes=2224000
> >
> > and all others are default.
> >
> > I am a little surprised to see about 1 s max latency and average about
> 0.5
> > s. my understanding is kafka is doing the memory mapping for log file and
> > let system flush it. all the write are sequential. So flush should be not
> > affected by message size that much. Batching and network will take
> longer,
> > but those are memory based and local machine. my ssd should be far better
> > than 0.5 second. where the time got consumed? any suggestion?
> >
> > Thanks,
> > Nan
> >
> >
> >
> >
> >
> >
> >
> > --
> > This message, and any attachments, is for the intended recipient(s) only,
> > may contain information that is privileged, confidential and/or
> proprietary
> > and subject to important terms and conditions available at
> > http://www.bankofamerica.com/emaildisclaimer.   If you are not the
> > intended recipient, please delete this message.
> >
>


-- 
Thanks, Mike


Re: Permanent topic

2019-03-18 Thread Sönke Liebau
Hi Maxim,

I had a quick look, it seems like that was added to the Apache Kafka
documentation only in version 1.1.1, which didn't get a doc update being
only a patch version, so the relevant sentence is only in the docs for 2.0
[1].
Additionally, the passage was only added to the docs of the topic level
setting, not for the broker setting, in case you checked there, that is why
you didn't see it :)

I've created a pull request to add this to the broker level setting as well.

Hope that helps?

Best regards,
Sönke

[1] https://kafka.apache.org/20/documentation.html#topicconfigs


On Mon, Mar 18, 2019 at 7:24 PM Maxim Kheyfets 
wrote:

> Good time of the day,
>
> We are using kafka 1.0.1, and want to create a permanent topic. One online
> post suggests setting retention.ms and retention.bytes to -1. The sample
> below shows system accepts  -1 correctly, but I don't see this documented
> anywhere explicitly in the official documentation.
>
> Could you confirm, and/or point me to the right official page?
>
> Thank you,
> Maxim
>
>
> kafka-topics.sh --create --zookeeper zk.local/kafka --replication-factor 3
> --partitions 30 --topic maxim-test
> kafka-configs.sh --zookeeper zk.local/kafka --entity-type topics
> --entity-name maxim-test --alter --add-config retention.ms=-1
> kafka-configs.sh --zookeeper zk.local/kafka --entity-type topics
> --entity-name maxim-test --alter --add-config retention.bytes=-1
>
> --describe shows it as successful:
> zk.local
>
> Topic:maxim-test PartitionCount:30 ReplicationFactor:3
> Configs:retention.ms=-1,retention.bytes=-1
> Topic: msg-opt-in Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
> msg-opt-in Partition: 1 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
> msg-opt-in Partition: 2 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
> msg-opt-in Partition: 3 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
> msg-opt-in Partition: 4 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
> msg-opt-in Partition: 5 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
> msg-opt-in Partition: 6 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
> msg-opt-in Partition: 7 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
> msg-opt-in Partition: 8 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
> msg-opt-in Partition: 9 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
> msg-opt-in Partition: 10 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
> msg-opt-in Partition: 11 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
> msg-opt-in Partition: 12 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
> msg-opt-in Partition: 13 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
> msg-opt-in Partition: 14 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
> msg-opt-in Partition: 15 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
> msg-opt-in Partition: 16 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
> msg-opt-in Partition: 17 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
> msg-opt-in Partition: 18 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
> msg-opt-in Partition: 19 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
> msg-opt-in Partition: 20 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
> msg-opt-in Partition: 21 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
> msg-opt-in Partition: 22 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
> msg-opt-in Partition: 23 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
> msg-opt-in Partition: 24 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
> msg-opt-in Partition: 25 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
> msg-opt-in Partition: 26 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
> msg-opt-in Partition: 27 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
> msg-opt-in Partition: 28 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
> msg-opt-in Partition: 29 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
>
>
> --
>
>
>
>
> *Maxim Kheyfets*
> Senior DevOps Engineer
>
> maxim.kheyf...@clearme.com | www.clearme.com
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: Permanent topic

2019-03-18 Thread Harpreet Singh
https://www.cloudera.com/documentation/enterprise/properties/5-13-x/topics/cm_props_cdh5110_kafka.html

Mentions
log.retention.ms:
The maximum time before a new log segment is rolled out. If both
log.retention.ms and log.retention.bytes are set, a segment is deleted when
either limit is exceeded. The special value of -1 is interpreted as
unlimited. This property is used in Kafka 1.4.0 and later in place of
log.retention.hours.
Regards,
Harpreet

On Mon, Mar 18, 2019 at 2:24 PM Maxim Kheyfets 
wrote:

> Good time of the day,
>
> We are using kafka 1.0.1, and want to create a permanent topic. One online
> post suggests setting retention.ms and retention.bytes to -1. The sample
> below shows system accepts  -1 correctly, but I don't see this documented
> anywhere explicitly in the official documentation.
>
> Could you confirm, and/or point me to the right official page?
>
> Thank you,
> Maxim
>
>
> kafka-topics.sh --create --zookeeper zk.local/kafka --replication-factor 3
> --partitions 30 --topic maxim-test
> kafka-configs.sh --zookeeper zk.local/kafka --entity-type topics
> --entity-name maxim-test --alter --add-config retention.ms=-1
> kafka-configs.sh --zookeeper zk.local/kafka --entity-type topics
> --entity-name maxim-test --alter --add-config retention.bytes=-1
>
> --describe shows it as successful:
> zk.local
>
> Topic:maxim-test PartitionCount:30 ReplicationFactor:3
> Configs:retention.ms=-1,retention.bytes=-1
> Topic: msg-opt-in Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
> msg-opt-in Partition: 1 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
> msg-opt-in Partition: 2 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
> msg-opt-in Partition: 3 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
> msg-opt-in Partition: 4 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
> msg-opt-in Partition: 5 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
> msg-opt-in Partition: 6 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
> msg-opt-in Partition: 7 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
> msg-opt-in Partition: 8 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
> msg-opt-in Partition: 9 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
> msg-opt-in Partition: 10 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
> msg-opt-in Partition: 11 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
> msg-opt-in Partition: 12 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
> msg-opt-in Partition: 13 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
> msg-opt-in Partition: 14 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
> msg-opt-in Partition: 15 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
> msg-opt-in Partition: 16 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
> msg-opt-in Partition: 17 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
> msg-opt-in Partition: 18 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
> msg-opt-in Partition: 19 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
> msg-opt-in Partition: 20 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
> msg-opt-in Partition: 21 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
> msg-opt-in Partition: 22 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
> msg-opt-in Partition: 23 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
> msg-opt-in Partition: 24 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
> msg-opt-in Partition: 25 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
> msg-opt-in Partition: 26 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
> msg-opt-in Partition: 27 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
> msg-opt-in Partition: 28 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
> msg-opt-in Partition: 29 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1
>
>
> --
>
>
>
>
> *Maxim Kheyfets*
> Senior DevOps Engineer
>
> maxim.kheyf...@clearme.com | www.clearme.com
>


-- 
Regards,
Harpreet Singh
647-309-6132


Permanent topic

2019-03-18 Thread Maxim Kheyfets
Good time of the day,

We are using kafka 1.0.1, and want to create a permanent topic. One online
post suggests setting retention.ms and retention.bytes to -1. The sample
below shows system accepts  -1 correctly, but I don't see this documented
anywhere explicitly in the official documentation.

Could you confirm, and/or point me to the right official page?

Thank you,
Maxim


kafka-topics.sh --create --zookeeper zk.local/kafka --replication-factor 3
--partitions 30 --topic maxim-test
kafka-configs.sh --zookeeper zk.local/kafka --entity-type topics
--entity-name maxim-test --alter --add-config retention.ms=-1
kafka-configs.sh --zookeeper zk.local/kafka --entity-type topics
--entity-name maxim-test --alter --add-config retention.bytes=-1

--describe shows it as successful:
zk.local

Topic:maxim-test PartitionCount:30 ReplicationFactor:3
Configs:retention.ms=-1,retention.bytes=-1
Topic: msg-opt-in Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
msg-opt-in Partition: 1 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
msg-opt-in Partition: 2 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
msg-opt-in Partition: 3 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
msg-opt-in Partition: 4 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
msg-opt-in Partition: 5 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
msg-opt-in Partition: 6 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
msg-opt-in Partition: 7 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
msg-opt-in Partition: 8 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
msg-opt-in Partition: 9 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
msg-opt-in Partition: 10 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
msg-opt-in Partition: 11 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
msg-opt-in Partition: 12 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
msg-opt-in Partition: 13 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
msg-opt-in Partition: 14 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
msg-opt-in Partition: 15 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
msg-opt-in Partition: 16 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
msg-opt-in Partition: 17 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
msg-opt-in Partition: 18 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
msg-opt-in Partition: 19 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
msg-opt-in Partition: 20 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
msg-opt-in Partition: 21 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
msg-opt-in Partition: 22 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
msg-opt-in Partition: 23 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic:
msg-opt-in Partition: 24 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3 Topic:
msg-opt-in Partition: 25 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic:
msg-opt-in Partition: 26 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:
msg-opt-in Partition: 27 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic:
msg-opt-in Partition: 28 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic:
msg-opt-in Partition: 29 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1


-- 




*Maxim Kheyfets*
Senior DevOps Engineer

maxim.kheyf...@clearme.com | www.clearme.com


Re: Operationalizing Zookeeper and common gotchas

2019-03-18 Thread Patrik Kleindl
Hi Eno
Thanks too, this is indeed helpful
Best regards 
Patrik 

> Am 18.03.2019 um 18:16 schrieb Eno Thereska :
> 
> Hi folks,
> 
> The team here has come up with a couple of clarifying tips for
> operationalizing Zookeeper for Kafka that we found missing from the
> official documentation, and passed them along to share. If you find them
> useful, I'm thinking of putting on
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ. Meanwhile any
> feedback is appreciated.
> 
> ---
> Operationalizing Zookeeper FAQ
> 
> The discussion below uses a 3-instance Zookeeper cluster as an example. The
> findings apply to a larger cluster as well, but you’ll need to adjust the
> numbers.
> 
> - Does it make sense to have a config with only 2 Zookeeper instances?
> I.e., in zookeeper.properties file have two entries for server 1 and server
> 2 only. A: No. A setup with 2 Zookeeper instances is not fault tolerant to
> even 1 failure. If one of the Zookeeper instances fails, the remaining one
> will not be functional since there is no quorum majority (1 out of 2 is not
> majority). If you do a “stat” command on that remaining instance you’ll see
> the output being “This ZooKeeper instance is not currently serving
> requests”.
> 
> - What if you end up with only 2 running Zookeeper instances, e.g., you
> started with 3 but one failed? Isn’t that the same as the case above? A: No
> it’s not the same scenario. First of all, the 3- instance setup did
> tolerate 1 instance down. The 2 remaining Zookeeper instances will continue
> to function because the quorum majority (2 out of 3) is there.
> 
> - I had a 3 Zookeeper instance setup and one instance just failed. How
> should I recover? A: Restart the failed instance with the same
> configuration it had before (i.e., same “myid” ID file, and same IP
> address). It is not important to recover the data volume of the failed
> instance, but it is a bonus if you do so. Once the instance comes up, it
> will sync with the other 2 Zookeeper instances and get all the data.
> 
> - I had a 3 Zookeeper instance setup and two instances failed. How should I
> recover? Is my Zookeeper cluster even running at that point? A: First of
> all, ZooKeeper is now unavailable and the remaining instance will show
> “This ZooKeeper instance is not currently serving requests” if probed.
> Second, you should make sure this situation is extremely rare. It should be
> possible to recover the first failed instance quickly before the second
> instance fails. Third, bring up the two failed instances one by one without
> changing anything in their config. Similarly to the case above, it is not
> important to recover the data volume of the failed instance, but it is a
> bonus if you do so. Once the instance comes up, it will sync with the other
> 1 ZooKeeper instance and get all the data.
> 
> - I had a 3 Zookeeper instance setup and two instances failed. I can’t
> recover the failed instances for whatever reason. What should I do? A: You
> will have to restart the remaining healthy ZooKeeper in “standalone” mode
> and restart all the brokers and point them to this standalone zookeeper
> (instead of all 3 ZooKeepers).
> 
> - The Zookeeper cluster is unavailable (for any of the reasons mentioned
> above, e.g., no quorum, all instances have failed). What is the impact on
> Kafka clients? What is the impact on brokers? A: The Zookeeper cluster is
> unavailable (for any of the reasons mentioned above, e.g., no quorum, all
> instances have failed). What is the impact on Kafka applications
> producing/consuming? What is the impact on admin tools to manage topics and
> cluster? What is the impact on brokers? A: Applications will be able to
> continue producing and consuming, at least for a while. This is true if the
> ZooKeeper cluster is temporarily unavailable but eventually becomes
> available (after a few mins). On the other hand, if the ZooKeeper cluster
> is permanently unavailable, then applications will slowly start to see
> problems with producing/consuming especially if some brokers fail, because
> the partition leaders will not be distributed to other brokers. So taking
> one extreme, if the ZooKeeper cluster is down for a month, it is very
> likely that applications will get produce/consume errors. Admin tools
> (e.g., that create topics, set ACLs or change configs) will not work.
> Brokers will not be impacted from Zookeeper being unavailable. They will
> periodically try to reconnect to the ZooKeeper cluster. If you take care to
> use the same IP address for a recovered Zookeeper instance as it had before
> it failed, brokers will not need to be restarted.
> --
> 
> Cheers,
> Eno


Re: Operationalizing Zookeeper and common gotchas

2019-03-18 Thread Ryanne Dolan
Eno, I found this useful, thanks.

Ryanne

On Mon, Mar 18, 2019, 12:16 PM Eno Thereska  wrote:

> Hi folks,
>
> The team here has come up with a couple of clarifying tips for
> operationalizing Zookeeper for Kafka that we found missing from the
> official documentation, and passed them along to share. If you find them
> useful, I'm thinking of putting on
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ. Meanwhile any
> feedback is appreciated.
>
> ---
> Operationalizing Zookeeper FAQ
>
> The discussion below uses a 3-instance Zookeeper cluster as an example. The
> findings apply to a larger cluster as well, but you’ll need to adjust the
> numbers.
>
> - Does it make sense to have a config with only 2 Zookeeper instances?
> I.e., in zookeeper.properties file have two entries for server 1 and server
> 2 only. A: No. A setup with 2 Zookeeper instances is not fault tolerant to
> even 1 failure. If one of the Zookeeper instances fails, the remaining one
> will not be functional since there is no quorum majority (1 out of 2 is not
> majority). If you do a “stat” command on that remaining instance you’ll see
> the output being “This ZooKeeper instance is not currently serving
> requests”.
>
> - What if you end up with only 2 running Zookeeper instances, e.g., you
> started with 3 but one failed? Isn’t that the same as the case above? A: No
> it’s not the same scenario. First of all, the 3- instance setup did
> tolerate 1 instance down. The 2 remaining Zookeeper instances will continue
> to function because the quorum majority (2 out of 3) is there.
>
> - I had a 3 Zookeeper instance setup and one instance just failed. How
> should I recover? A: Restart the failed instance with the same
> configuration it had before (i.e., same “myid” ID file, and same IP
> address). It is not important to recover the data volume of the failed
> instance, but it is a bonus if you do so. Once the instance comes up, it
> will sync with the other 2 Zookeeper instances and get all the data.
>
> - I had a 3 Zookeeper instance setup and two instances failed. How should I
> recover? Is my Zookeeper cluster even running at that point? A: First of
> all, ZooKeeper is now unavailable and the remaining instance will show
> “This ZooKeeper instance is not currently serving requests” if probed.
> Second, you should make sure this situation is extremely rare. It should be
> possible to recover the first failed instance quickly before the second
> instance fails. Third, bring up the two failed instances one by one without
> changing anything in their config. Similarly to the case above, it is not
> important to recover the data volume of the failed instance, but it is a
> bonus if you do so. Once the instance comes up, it will sync with the other
> 1 ZooKeeper instance and get all the data.
>
> - I had a 3 Zookeeper instance setup and two instances failed. I can’t
> recover the failed instances for whatever reason. What should I do? A: You
> will have to restart the remaining healthy ZooKeeper in “standalone” mode
> and restart all the brokers and point them to this standalone zookeeper
> (instead of all 3 ZooKeepers).
>
> - The Zookeeper cluster is unavailable (for any of the reasons mentioned
> above, e.g., no quorum, all instances have failed). What is the impact on
> Kafka clients? What is the impact on brokers? A: The Zookeeper cluster is
> unavailable (for any of the reasons mentioned above, e.g., no quorum, all
> instances have failed). What is the impact on Kafka applications
> producing/consuming? What is the impact on admin tools to manage topics and
> cluster? What is the impact on brokers? A: Applications will be able to
> continue producing and consuming, at least for a while. This is true if the
> ZooKeeper cluster is temporarily unavailable but eventually becomes
> available (after a few mins). On the other hand, if the ZooKeeper cluster
> is permanently unavailable, then applications will slowly start to see
> problems with producing/consuming especially if some brokers fail, because
> the partition leaders will not be distributed to other brokers. So taking
> one extreme, if the ZooKeeper cluster is down for a month, it is very
> likely that applications will get produce/consume errors. Admin tools
> (e.g., that create topics, set ACLs or change configs) will not work.
> Brokers will not be impacted from Zookeeper being unavailable. They will
> periodically try to reconnect to the ZooKeeper cluster. If you take care to
> use the same IP address for a recovered Zookeeper instance as it had before
> it failed, brokers will not need to be restarted.
> --
>
> Cheers,
> Eno
>


Operationalizing Zookeeper and common gotchas

2019-03-18 Thread Eno Thereska
Hi folks,

The team here has come up with a couple of clarifying tips for
operationalizing Zookeeper for Kafka that we found missing from the
official documentation, and passed them along to share. If you find them
useful, I'm thinking of putting on
https://cwiki.apache.org/confluence/display/KAFKA/FAQ. Meanwhile any
feedback is appreciated.

---
Operationalizing Zookeeper FAQ

The discussion below uses a 3-instance Zookeeper cluster as an example. The
findings apply to a larger cluster as well, but you’ll need to adjust the
numbers.

- Does it make sense to have a config with only 2 Zookeeper instances?
I.e., in zookeeper.properties file have two entries for server 1 and server
2 only. A: No. A setup with 2 Zookeeper instances is not fault tolerant to
even 1 failure. If one of the Zookeeper instances fails, the remaining one
will not be functional since there is no quorum majority (1 out of 2 is not
majority). If you do a “stat” command on that remaining instance you’ll see
the output being “This ZooKeeper instance is not currently serving
requests”.

- What if you end up with only 2 running Zookeeper instances, e.g., you
started with 3 but one failed? Isn’t that the same as the case above? A: No
it’s not the same scenario. First of all, the 3- instance setup did
tolerate 1 instance down. The 2 remaining Zookeeper instances will continue
to function because the quorum majority (2 out of 3) is there.

- I had a 3 Zookeeper instance setup and one instance just failed. How
should I recover? A: Restart the failed instance with the same
configuration it had before (i.e., same “myid” ID file, and same IP
address). It is not important to recover the data volume of the failed
instance, but it is a bonus if you do so. Once the instance comes up, it
will sync with the other 2 Zookeeper instances and get all the data.

- I had a 3 Zookeeper instance setup and two instances failed. How should I
recover? Is my Zookeeper cluster even running at that point? A: First of
all, ZooKeeper is now unavailable and the remaining instance will show
“This ZooKeeper instance is not currently serving requests” if probed.
Second, you should make sure this situation is extremely rare. It should be
possible to recover the first failed instance quickly before the second
instance fails. Third, bring up the two failed instances one by one without
changing anything in their config. Similarly to the case above, it is not
important to recover the data volume of the failed instance, but it is a
bonus if you do so. Once the instance comes up, it will sync with the other
1 ZooKeeper instance and get all the data.

- I had a 3 Zookeeper instance setup and two instances failed. I can’t
recover the failed instances for whatever reason. What should I do? A: You
will have to restart the remaining healthy ZooKeeper in “standalone” mode
and restart all the brokers and point them to this standalone zookeeper
(instead of all 3 ZooKeepers).

- The Zookeeper cluster is unavailable (for any of the reasons mentioned
above, e.g., no quorum, all instances have failed). What is the impact on
Kafka clients? What is the impact on brokers? A: The Zookeeper cluster is
unavailable (for any of the reasons mentioned above, e.g., no quorum, all
instances have failed). What is the impact on Kafka applications
producing/consuming? What is the impact on admin tools to manage topics and
cluster? What is the impact on brokers? A: Applications will be able to
continue producing and consuming, at least for a while. This is true if the
ZooKeeper cluster is temporarily unavailable but eventually becomes
available (after a few mins). On the other hand, if the ZooKeeper cluster
is permanently unavailable, then applications will slowly start to see
problems with producing/consuming especially if some brokers fail, because
the partition leaders will not be distributed to other brokers. So taking
one extreme, if the ZooKeeper cluster is down for a month, it is very
likely that applications will get produce/consume errors. Admin tools
(e.g., that create topics, set ACLs or change configs) will not work.
Brokers will not be impacted from Zookeeper being unavailable. They will
periodically try to reconnect to the ZooKeeper cluster. If you take care to
use the same IP address for a recovered Zookeeper instance as it had before
it failed, brokers will not need to be restarted.
--

Cheers,
Eno


Re: kafka latency for large message

2019-03-18 Thread Nan Xu
anyone can give some suggestion? or an explanation why kafka give a big
latency for large payload.

Thanks,
Nan

On Thu, Mar 14, 2019 at 3:53 PM Xu, Nan  wrote:

> Hi,
>
> We are using kafka to send messages and there is less than 1% of
> message is very big, close to 30M. understanding kafka is not ideal for
> sending big messages, because the large message rate is very low, we just
> want let kafka do it anyway. But still want to get a reasonable latency.
>
> To test, I just setup up a topic test on a single broker local kafka,
> with only 1 partition and 1 replica, using the following command
>
> ./kafka-producer-perf-test.sh  --topic test --num-records 200
> --throughput 1 --record-size 3000 --producer.config
> ../config/producer.properties
>
> Producer.config
>
> #Max 40M message
> max.request.size=4000
> buffer.memory=4000
>
> #2M buffer
> send.buffer.bytes=200
>
> 6 records sent, 1.1 records/sec (31.00 MB/sec), 973.0 ms avg latency,
> 1386.0 max latency.
> 6 records sent, 1.0 records/sec (28.91 MB/sec), 787.2 ms avg latency,
> 1313.0 max latency.
> 5 records sent, 1.0 records/sec (27.92 MB/sec), 582.8 ms avg latency,
> 643.0 max latency.
> 6 records sent, 1.1 records/sec (30.16 MB/sec), 685.3 ms avg latency,
> 1171.0 max latency.
> 5 records sent, 1.0 records/sec (27.92 MB/sec), 629.4 ms avg latency,
> 729.0 max latency.
> 5 records sent, 1.0 records/sec (27.61 MB/sec), 635.6 ms avg latency,
> 673.0 max latency.
> 6 records sent, 1.1 records/sec (30.09 MB/sec), 736.2 ms avg latency,
> 1255.0 max latency.
> 5 records sent, 1.0 records/sec (27.62 MB/sec), 626.8 ms avg latency,
> 685.0 max latency.
> 5 records sent, 1.0 records/sec (28.38 MB/sec), 608.8 ms avg latency,
> 685.0 max latency.
>
>
> On the broker, I change the
>
> socket.send.buffer.bytes=2024000
> # The receive buffer (SO_RCVBUF) used by the socket server
> socket.receive.buffer.bytes=2224000
>
> and all others are default.
>
> I am a little surprised to see about 1 s max latency and average about 0.5
> s. my understanding is kafka is doing the memory mapping for log file and
> let system flush it. all the write are sequential. So flush should be not
> affected by message size that much. Batching and network will take longer,
> but those are memory based and local machine. my ssd should be far better
> than 0.5 second. where the time got consumed? any suggestion?
>
> Thanks,
> Nan
>
>
>
>
>
>
>
> --
> This message, and any attachments, is for the intended recipient(s) only,
> may contain information that is privileged, confidential and/or proprietary
> and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer.   If you are not the
> intended recipient, please delete this message.
>


Broker deregisters from ZK, but stays alive and does not rejoin the cluster

2019-03-18 Thread Joe Ammann
Hi all

We're running several clusters (mostly with 3 brokers) with 2.1.1

We quite regularly see the pattern that one of the 3 brokers "detaches" from ZK 
(the broker id is not registered anymore under /brokers/ids). We assume that 
the root cause for this is that the brokers are running on VMs (due to company 
policy, no alternative) and that the VM gets "stalled" for several minutes due 
to missing resources on the VMware ESX host.

This is not new behaviour with 2.1.1, we already saw it with 0.10.2.1 before.

The sequence of events is normally something like the following
- cluster is running ok
- one broker "gets stalled", not pingable anymore
- partitions go to underreplicated
- failed broker comes back and reports that ZK session was expired [1]
- some of the brokers that were ok report leader election problems [2]
- the failed/revived broker logs errors continuosly about expired session [3]

This goes on, until we restart the broker on the VM that had failed. Until we 
do this restart, the failed broker seems to think it is working perfectly ok. 
We're monitoring all brokers via JMX, and that one does not report any 
problems. It claims in the JMX values to be the leader of a number of 
partitions, and have 0 underreplicated partitions. Whilst the other brokers 
rightfully report via JMX that they in turn do have underreplicate paritions. 
This then causes alerts to go off about the brokers that still work in degraded 
mode, while the one that is really broken appears green/ok.

Is this in any way expected behaviour? That a Kafka broker gets its ZK session 
expired but continues to run (just issues the errors in [3]). I would have 
expected that the broker would shutdown itsself in a similar manner it does 
when it's unable to register with ZK on startup.

Any idea how I could best detect this situation in monitoring? I'm thinking 
about after polling the broker via JMX, I also poll ZK to check if 
/brokers/ids/ node exists. If not, restart that broker.

BTW: I do know that probably the best answer is: "just run your ZK/Kafka on 
hardware, not VMs". We're working on that, but company policies seem to prefer 
outages over spending a little money).

-- 
CU, Joe

[1]

[2019-03-18 02:27:13,043] INFO [ZooKeeperClient] Session expired. 
(kafka.zookeeper.ZooKeeperClient)

[2]

[2019-03-18 02:27:20,283] ERROR [Controller id=3 epoch=94562] Controller 3 
epoch 94562 failed to change state for partition __consumer_offsets-4 from 
OnlinePartition to OnlinePartition (state.change.logger) 
kafka.common.StateChangeFailedException: Failed to elect leader for partition 
__consumer_offsets-4 under strategy 
PreferredReplicaPartitionLeaderElectionStrategy
at 
kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
at 
kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
at 
kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:292)
at 
kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:210)
at 
kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:133)
at 
kafka.controller.KafkaController.kafka$controller$KafkaController$$onPreferredReplicaElection(KafkaController.scala:624)
at 
kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:974)
at 
kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:955)
at scala.collection.immutable.Map$Map4.foreach(Map.scala:188)
at 
kafka.controller.KafkaController.kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance(KafkaController.scala:955)
at 
kafka.controller.KafkaController$AutoPreferredReplicaLeaderElection$.process(KafkaController.scala:986)
at 
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:89)
at 
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
at 
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at 
kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:88)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
 
[3]

[2019-03-18 02:28:3

Re: Broker suddenly becomes unstable after upgrade to 2.1.0

2019-03-18 Thread Joe Ammann
Hi all

On 3/13/19 9:36 AM, Joe Ammann wrote:
> Hi Ankur
>
> On 3/13/19 3:34 AM, Ankur Rana wrote:
>> Hey,
>> I think this is a known issue in Kafka 2.1.0. Check this out
>> https://issues.apache.org/jira/browse/KAFKA-7697
>> It has been fixed in 2.1.1.
> This surely does look like our issue! I should have found that myself..
>
> Thanks, we'll roll out 2.1.1 today and see if that fixes the issue (BTW:
> the same issue happened again last night, this time I took a thread dump
> and surely I found several threads in exactly the state described in the
> ticket)

Thanks again for the quick response I got on the list. We have now
upgraded everything to 2.1.1 and didn't see this anymore in the past
days. So I'm very positive it was exactly this issue.

-- 
CU, Joe



Re: [VOTE] 2.2.0 RC2

2019-03-18 Thread Jakub Scholz
+1 (non-binding). I used the staged binaries and run some of my tests
against them. All seems to look good to me.

On Sat, Mar 9, 2019 at 11:56 PM Matthias J. Sax 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 2.2.0.
>
>  - Added SSL support for custom principal name
>  - Allow SASL connections to periodically re-authenticate
>  - Command line tool bin/kafka-topics.sh adds AdminClient support
>  - Improved consumer group management
>- default group.id is `null` instead of empty string
>  - API improvement
>- Producer: introduce close(Duration)
>- AdminClient: introduce close(Duration)
>- Kafka Streams: new flatTransform() operator in Streams DSL
>- KafkaStreams (and other classed) now implement AutoClosable to
> support try-with-resource
>- New Serdes and default method implementations
>  - Kafka Streams exposed internal client.id via ThreadMetadata
>  - Metric improvements:  All `-min`, `-avg` and `-max` metrics will now
> output `NaN` as default value
> Release notes for the 2.2.0 release:
> https://home.apache.org/~mjsax/kafka-2.2.0-rc2/RELEASE_NOTES.html
>
> *** Please download, test, and vote by Thursday, March 14, 9am PST.
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~mjsax/kafka-2.2.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~mjsax/kafka-2.2.0-rc2/javadoc/
>
> * Tag to be voted upon (off 2.2 branch) is the 2.2.0 tag:
> https://github.com/apache/kafka/releases/tag/2.2.0-rc2
>
> * Documentation:
> https://kafka.apache.org/22/documentation.html
>
> * Protocol:
> https://kafka.apache.org/22/protocol.html
>
> * Jenkins builds for the 2.2 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-2.2-jdk8/
> System tests: https://jenkins.confluent.io/job/system-test-kafka/job/2.2/
>
> /**
>
> Thanks,
>
> -Matthias
>
>