Re: Reduce latency
you can configure that, to block or to fail: http://kafka.apache.org/documentation.html#producerconfigs By default it should block On Tue, Aug 18, 2015 at 4:57 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: I see. So the internal queue overwrites the producer buffer size configuration? When buffer is full the producer will block sending, right? On Tue, Aug 18, 2015 at 3:52 PM, Tao Feng fengta...@gmail.com wrote: From what I understand, if you set the throughput to -1, the producerperformance will push records as much as possible to an internal per topic per partition queue. In the background there is a sender IO thread handling the actual record sending process. If you push record to the queue faster than the send rate, your queue will become longer and longer, eventually record latency will become meaningless for a latency-purpose test. On Tue, Aug 18, 2015 at 11:48 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: I see. Thank you Tao. But now I don't get it what Jay said that my latency test only makes sense if I set a fixed throughput. Why do I need to set a fixed throughput for my test instead of just set the expected throughput to be -1 (as much as possible)? Thanks. On Tue, Aug 18, 2015 at 2:43 PM, Tao Feng fengta...@gmail.com wrote: Hi Yuheng, The 1 record/s is just a param for producerperformance for your producer target tput. It only takes effect to do the throttling if you tries to send more than 1 record/s. The actual tput of the test depends on your producer config and your setup. -Tao On Tue, Aug 18, 2015 at 11:34 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, When I set the target throughput to be 1 records/s, The actual test results show I got an average of 579.86 records per second among all my producers. How did that happen? Why this number is not 1 then? Thanks. On Tue, Aug 18, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Jay, that really helps! Kishore, Where you can monitor whether the network is busy on IO in visual vm? Thanks. I am running 90 producer process on 90 physical machines in the experiment. On Tue, Aug 18, 2015 at 1:19 AM, Jay Kreps j...@confluent.io wrote: Yuheng, From the command you gave it looks like you are configuring the perf test to send data as fast as possible (the -1 for target throughput). This means it will always queue up a bunch of unsent data until the buffer is exhausted and then block. The larger the buffer, the bigger the queue. This is where the latency comes from. This is exactly what you would expect and what the buffering is supposed to do. If you want to measure latency this test doesn't really make sense, you need to measure with some fixed throughput. Instead of -1 enter the target throughput you want to measure latency at (e.g. 10 records/sec). -Jay On Thu, Aug 13, 2015 at 12:18 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Alvaro, How to use sync producers? I am running the standard ProducerPerformance test from kafka to measure the latency of each message to send from producer to broker only. The command is like bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 For running producers, where should I put the producer.type=sync configuration into? The config/server.properties? Also Does this mean we are using batch size of 1? Which version of Kafka are you using? thanks. On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com wrote: Are you measuring latency as time between producer and consumer ? In that case, the ack shouldn't affect the latency, cause even tough your producer is not going to wait for the ack, the consumer will only get the message after its commited in the server. About latency my best result occur with sync producers, but the throughput is much lower in that case. About not flushing to disk I'm pretty sure that it's not an option in kafka (correct me if I'm wrong) Regards, Alvaro Gareppe On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the latency results show no major difference when using ack=0
Re: Reduce latency
I'm using last one, but not using the ProducerPerformance, I created my own. but I think there is a producer.properties file in config folder in kafka.. is that configuration not for this tester ? On Thu, Aug 13, 2015 at 4:18 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Thank you Alvaro, How to use sync producers? I am running the standard ProducerPerformance test from kafka to measure the latency of each message to send from producer to broker only. The command is like bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 For running producers, where should I put the producer.type=sync configuration into? The config/server.properties? Also Does this mean we are using batch size of 1? Which version of Kafka are you using? thanks. On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com wrote: Are you measuring latency as time between producer and consumer ? In that case, the ack shouldn't affect the latency, cause even tough your producer is not going to wait for the ack, the consumer will only get the message after its commited in the server. About latency my best result occur with sync producers, but the throughput is much lower in that case. About not flushing to disk I'm pretty sure that it's not an option in kafka (correct me if I'm wrong) Regards, Alvaro Gareppe On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the latency results show no major difference when using ack=0 or ack=1. Why is that? On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: I am running an experiment where 92 producers is publishing data into 6 brokers and 10 consumer are reading online data simultaneously. How should I do to reduce the latency? Currently when I run the producer performance test the average latency is around 10s. Should I disable log.flush? How to do that? Thanks. -- Ing. Alvaro Gareppe agare...@gmail.com -- Ing. Alvaro Gareppe agare...@gmail.com
Re: Reduce latency
Are you measuring latency as time between producer and consumer ? In that case, the ack shouldn't affect the latency, cause even tough your producer is not going to wait for the ack, the consumer will only get the message after its commited in the server. About latency my best result occur with sync producers, but the throughput is much lower in that case. About not flushing to disk I'm pretty sure that it's not an option in kafka (correct me if I'm wrong) Regards, Alvaro Gareppe On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com wrote: Also, the latency results show no major difference when using ack=0 or ack=1. Why is that? On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote: I am running an experiment where 92 producers is publishing data into 6 brokers and 10 consumer are reading online data simultaneously. How should I do to reduce the latency? Currently when I run the producer performance test the average latency is around 10s. Should I disable log.flush? How to do that? Thanks. -- Ing. Alvaro Gareppe agare...@gmail.com
lowlatency on kafka
I'm starting to use kafka for a low latency application. I need a topic that has over all process latency around 2 or 3 ms, (latency from producer to consumer) I want to use a async producer, but I'm not getting it to work that fast. What are the key properties to configure in: producer, consumer, and topic to accomplish the best latency possible ? I can send you what I have configured so far. Thank you --
Re: message filterin or selector
Thanks On Thu, Aug 6, 2015 at 2:20 PM, Grant Henke ghe...@cloudera.com wrote: I completely agree with Ben's response. Especially the invitation to propose and get involved in adding functionality to Kafka. A first step to a change this large would be to thoroughly describe your motivations, needed features and proposed changes or architecture in a KIP proposal. This way the community can discuss if features like this belong in Kafka, where they belong, and options for implementation. More information about that process can be found here: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals On Thu, Aug 6, 2015 at 11:55 AM, Ben Stopford b...@confluent.io wrote: I think short answer here is that, if you need freeform selectors semantics as per JMS message selectors then you’d need to wrap the API yourself (or get involved in adding the functionality to Kafka). As Gwen and Grant say, you could synthesise something simpler using topics/partitions to provide separate routing, but it would have to be a relatively simple use case. Kafka will support a large number of topics/partitions pairs but each one incurs a cost. Thus this route may not be wise for the use case you are describing. B On 6 Aug 2015, at 16:38, Alvaro Gareppe agare...@gmail.com wrote: Is not because of throughput is more about Security. I cant allow all clients to have access to all the topic content (in some cases) I know that access control is something that is not implemented yet, but planed. My idea is to plug a customisation there to add security at selection level too. But If the selector applies only at client side I wont get any information of how the user is planing to select on the server side therefore I wont be able to restrict or grant access. I planing to substitute an activeMQ with Kafka but I need to keep some functionality like security and selection that are not yet implemented in kafka so I need to get creative in the workarounds to be able to use it. You comment that I can do some custom partitioning in my particular case. But I'm not sure if I can do something like that because even though I can know what are the fields that can be used for filtering I dont know the values. but dont know... Lets say the message has a property X that I can use as selection criteria. I can create a partitioning based on X, so that would split the topic based on X values, and connect the clients to the specific partition, that could work. But what if I have X and Y as possible selection criteria, I can split based on 2 properties ? if yes, can I connect based only on X ? If I do it like this the qty of partitions that I'm going to create is going to be amazingly large. How kafka is going to perform ? Maybe I'm trying to fit a problem into a system that is not for that. I would love to have the amazing performance of kafka, but sadly I'm not sure if its the best fit for me because of this... Thank you very much guys for the responses On Thu, Aug 6, 2015 at 12:10 PM, Grant Henke ghe...@cloudera.com wrote: The filtering logic there is topic filtering and not message filtering. The idea is to subscribe to multiple topics via a regex whitelist or black list. This does exist today as it does not depend on understanding the content of the message, but I don't think it is what you are looking for. As far as message filtering goes; As Gwen said, The way Kafka is currently implemented is that Kafka is not aware of the content of messages, so there is no Selector logic available. However, If you know upfront how you would like to filter the messages you could write your producer to use multiple topics, or even some custom partitioning. And implement a consumer that can understand and filter based on that logic. However, that would be an involved and creative implementation based on your use case. I would recommend starting simple and just dropping the messages you don't care about on the consumer side. If throughput becomes a problem, then consider alternatives. On Thu, Aug 6, 2015 at 9:47 AM, Alvaro Gareppe agare...@gmail.com wrote: Is this implemented ? https://cwiki.apache.org/confluence/display/KAFKA/Consumer+API+changes ? This is message filtering on the client or server side ? On Tue, Aug 4, 2015 at 9:54 PM, Gwen Shapira g...@confluent.io wrote: The way Kafka is currently implemented is that Kafka is not aware of the content of messages, so there is no Selector logic available. The way to go is to implement the Selector in your client - i.e. your consume() loop will get all messages but will throw away those that don't fit your pattern. It may be worthwhile to add a ticket for pluggable selector logic in the new
Broker side consume-request filtering
Is this discussion open ? Cause this is exactly what I’m looking for.
Re: message filterin or selector
Is this implemented ? https://cwiki.apache.org/confluence/display/KAFKA/Consumer+API+changes ? This is message filtering on the client or server side ? On Tue, Aug 4, 2015 at 9:54 PM, Gwen Shapira g...@confluent.io wrote: The way Kafka is currently implemented is that Kafka is not aware of the content of messages, so there is no Selector logic available. The way to go is to implement the Selector in your client - i.e. your consume() loop will get all messages but will throw away those that don't fit your pattern. It may be worthwhile to add a ticket for pluggable selector logic in the new consumer. I can't guarantee it will happen, there are infinite things that can be plugged into consumers and we need to draw the line somewhere, but worth a discussion. On Tue, Aug 4, 2015 at 2:05 PM, Alvaro Gareppe agare...@gmail.com wrote: The is way to implement a selector logic in kafka (similar to JMS selectors) So, allow to consume a message if only the message contains certain header or content ? I'm evaluating to migrate from ActiveMQ to kafka and I'm using the selector logic widely in the application -- Ing. Alvaro Gareppe agare...@gmail.com -- Ing. Alvaro Gareppe agare...@gmail.com
message filterin or selector
The is way to implement a selector logic in kafka (similar to JMS selectors) So, allow to consume a message if only the message contains certain header or content ? I'm evaluating to migrate from ActiveMQ to kafka and I'm using the selector logic widely in the application -- Ing. Alvaro Gareppe agare...@gmail.com
Access control in kafka
Can someone point me to documentation about access control in kafka. There is something implemented in the current or plan for future versions ? I need something that allows me to define what users are allowed to connect to certain topic, and of course user management. Thank you guys in advance! -- Eng. Alvaro Gareppe