Re: Kafka consumer configuration / performance issues
Sorry to bump this up, can anyone provide some input on this ? I need to make a call soon whether kafka is a good fit to our requirement. On Tuesday, October 4, 2016 8:57 PM, Shamik Banerjeewrote: Hi, I'm a newbie trying out kafka as an alternative to AWS SQS. The motivation primarily is to improve performance where kafka would eliminate the constraint of pulling 10 messages at a time with a cap of 256kb. Here's a high-level scenario of my use case. I've a bunch of crawlers which are sending documents for indexing. The size of the payload is around 1 mb on average. The crawlers call a SOAP end-point which in turn runs a producer code to submit the messages to a kafka queue. The consumer app picks up the messages and processes them. For my test box, I've configured the topic with 30 partitions with 2 replication. The two kafka instances are running with 1 zookeeper instance. The kafka version is 0.10.0. For my testing, I published 7 million messages in the queue. I created a consumer group with 30 consumer thread , one per partition. I was initially under the impression that this would substantially speed up the processing power compared to what I was getting via SQS. Unfortunately, that was not to be the case. In my case, the processing of data is complex and takes up 1-2 minutes on average to complete.That lead to a flurry of partition rebalancing as the threads were not able to heartbeat on time. I could see a bunch of messages in the log citing "Auto offset commit failed for group full_group: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in the poll() with max.poll.records." This lead to the same message being processed multiple times. I tried playing around with session timeout, max.poll.records and poll time to avoid this, but that slowed down the overall processing bigtime. Here's some of the configuration parameter: metadata.max.age.ms = 30 max.partition.fetch.bytes = 1048576 bootstrap.servers = [kafkahost1:9092, kafkahost2:9092] enable.auto.commit = true max.poll.records = 1 request.timeout.ms = 31 heartbeat.interval.ms = 10 auto.commit.interval.ms = 1000 receive.buffer.bytes = 65536 fetch.min.bytes = 1 send.buffer.bytes = 131072 value.deserializer = class com.autodesk.preprocessor.consumer.serializer.KryoObjectSerializer group.id = full_group retry.backoff.ms = 100 fetch.max.wait.ms = 500 connections.max.idle.ms = 54 session.timeout.ms = 30 key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer metrics.sample.window.ms = 3 auto.offset.reset = latest I reduced the consumer poll time to 100 ms. It reduced the rebalancing issues, eliminated duplicate processing but slowed down the overall process significantly. It ended up taking 35 hours to complete processing all 6 million messages compared to 25 hours using the SQS based solution. Each consumer thread on average retrieved 50-60 messages per poll, though some of them polled 0 records at times. I'm not sure about this behavior when there are a huge amount messages available in the partition. The same thread was able to pick up messages during the subsequent iteration. Could this be due to rebalancing ? Here's my consumer code: while (true) { try{ ConsumerRecords records = consumer.poll(100); for (ConsumerRecord record : records) { if(record.value()!=null){ TextAnalysisRequest textAnalysisObj = record.value(); if(textAnalysisObj!=null){ // Process record PreProcessorUtil.submitPostProcessRequest(textAnalysisObj); } } } }catch(Exception ex){ LOGGER.error("Error in Full Consumer group worker", ex); } I understanding that record processing part is one bottleneck in my case. But I'm sure a few folks here have a similar use case of dealing with large processing time. I thought of doing an async processing by spinning each processor in it's dedicated thread or use a thread pool with large capacity, but not sure if it would create a big load in the system. At the same time, I've seen a couple of instances where people have used pause and resume API to perform the processing in order to avoid rebalancing issue. I'm really looking for some advice / best practice in this circumstance. Particularly, the recommended configuration setting around hearbeat, request timeout, max poll records, auto commit
Kafka consumer configuration / performance issues
Hi, I'm a newbie trying out kafka as an alternative to AWS SQS. The motivation primarily is to improve performance where kafka would eliminate the constraint of pulling 10 messages at a time with a cap of 256kb. Here's a high-level scenario of my use case. I've a bunch of crawlers which are sending documents for indexing. The size of the payload is around 1 mb on average. The crawlers call a SOAP end-point which in turn runs a producer code to submit the messages to a kafka queue. The consumer app picks up the messages and processes them. For my test box, I've configured the topic with 30 partitions with 2 replication. The two kafka instances are running with 1 zookeeper instance. The kafka version is 0.10.0. For my testing, I published 7 million messages in the queue. I created a consumer group with 30 consumer thread , one per partition. I was initially under the impression that this would substantially speed up the processing power compared to what I was getting via SQS. Unfortunately, that was not to be the case. In my case, the processing of data is complex and takes up 1-2 minutes on average to complete.That lead to a flurry of partition rebalancing as the threads were not able to heartbeat on time. I could see a bunch of messages in the log citing "Auto offset commit failed for group full_group: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in the poll() with max.poll.records." This lead to the same message being processed multiple times. I tried playing around with session timeout, max.poll.records and poll time to avoid this, but that slowed down the overall processing bigtime. Here's some of the configuration parameter: metadata.max.age.ms = 30 max.partition.fetch.bytes = 1048576 bootstrap.servers = [kafkahost1:9092, kafkahost2:9092] enable.auto.commit = true max.poll.records = 1 request.timeout.ms = 31 heartbeat.interval.ms = 10 auto.commit.interval.ms = 1000 receive.buffer.bytes = 65536 fetch.min.bytes = 1 send.buffer.bytes = 131072 value.deserializer = class com.autodesk.preprocessor.consumer.serializer.KryoObjectSerializer group.id = full_group retry.backoff.ms = 100 fetch.max.wait.ms = 500 connections.max.idle.ms = 54 session.timeout.ms = 30 key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer metrics.sample.window.ms = 3 auto.offset.reset = latest I reduced the consumer poll time to 100 ms. It reduced the rebalancing issues, eliminated duplicate processing but slowed down the overall process significantly. It ended up taking 35 hours to complete processing all 6 million messages compared to 25 hours using the SQS based solution. Each consumer thread on average retrieved 50-60 messages per poll, though some of them polled 0 records at times. I'm not sure about this behavior when there are a huge amount messages available in the partition. The same thread was able to pick up messages during the subsequent iteration. Could this be due to rebalancing ? Here's my consumer code: while (true) { try{ ConsumerRecordsrecords = consumer.poll(100); for (ConsumerRecord record : records) { if(record.value()!=null){ TextAnalysisRequest textAnalysisObj = record.value(); if(textAnalysisObj!=null){ // Process record PreProcessorUtil.submitPostProcessRequest(textAnalysisObj); } } } }catch(Exception ex){ LOGGER.error("Error in Full Consumer group worker", ex); } I understanding that record processing part is one bottleneck in my case. But I'm sure a few folks here have a similar use case of dealing with large processing time. I thought of doing an async processing by spinning each processor in it's dedicated thread or use a thread pool with large capacity, but not sure if it would create a big load in the system. At the same time, I've seen a couple of instances where people have used pause and resume API to perform the processing in order to avoid rebalancing issue. I'm really looking for some advice / best practice in this circumstance. Particularly, the recommended configuration setting around hearbeat, request timeout, max poll records, auto commit interval, poll interval, etc. if kafka is not the right tool for my use case, please let me know as well. Any pointers will be appreciated. -Thanks, Shamik
Tools/recommendations to debug performance issues?
We're using 0.8.2.1 processing maybe 1 million messages per hour. Each message includes tracking information with a timestamp for when it was produced, and a timestamp for when it was consumed, to give us roughly the amount of time it spent in Kafka. On average this number is in the seconds and our upper percentiles are in the minutes. What metrics and settings can we look at to figure out why we might be spending so much time in Kafka?
Re: Tools/recommendations to debug performance issues?
Have you checked the consumer lag? You can use the offset checker tool to see if there is a lag. On 14 Sep 2015 18:36, "noah"wrote: > We're using 0.8.2.1 processing maybe 1 million messages per hour. Each > message includes tracking information with a timestamp for when it was > produced, and a timestamp for when it was consumed, to give us roughly the > amount of time it spent in Kafka. On average this number is in the seconds > and our upper percentiles are in the minutes. > > What metrics and settings can we look at to figure out why we might be > spending so much time in Kafka? >
Re: Tools/recommendations to debug performance issues?
Kafka also collects very useful metrics on request times and their breakdown. They are under kafka.network. On Mon, Sep 14, 2015 at 6:59 AM, Rahul Jainwrote: > Have you checked the consumer lag? You can use the offset checker tool to > see if there is a lag. > On 14 Sep 2015 18:36, "noah" wrote: > > > We're using 0.8.2.1 processing maybe 1 million messages per hour. Each > > message includes tracking information with a timestamp for when it was > > produced, and a timestamp for when it was consumed, to give us roughly > the > > amount of time it spent in Kafka. On average this number is in the > seconds > > and our upper percentiles are in the minutes. > > > > What metrics and settings can we look at to figure out why we might be > > spending so much time in Kafka? > > >
Re: Performance issues
By increasing partitions and using kafka from master branch I was able to cut down the response times into half. But it still seems high and it looks like there still is a delay between a successful post and the first time message is seen by the consumers. There are plenty of resources available. Is there a way I can easily check breakdown of latency on every tier. For eg: producer - broker - consumer On Wed, Oct 22, 2014 at 2:37 PM, Neha Narkhede neha.narkh...@gmail.com wrote: the server.properties file doesn't have all the properties. You can add it there and try your test. On Wed, Oct 22, 2014 at 11:41 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I can't find this property in server.properties file. Is that the right place to set this parameter? On Tue, Oct 21, 2014 at 6:27 PM, Jun Rao jun...@gmail.com wrote: Could you also set replica.fetch.wait.max.ms in the broker to sth much smaller? Thanks, Jun On Tue, Oct 21, 2014 at 2:15 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I set the property to 1 in the consumer code that is passed to createJavaConsumerConnector code, but it didn't seem to help props.put(fetch.wait.max.ms, fetchMaxWait); On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com wrote: This is a consumer config: fetch.wait.max.ms On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Is this a parameter I need to set it in kafka server or on the client side? Also, can you help point out which one exactly is consumer max wait time from this list? https://kafka.apache.org/08/configuration.html On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote: There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group. -- -- Guozhang
Re: Performance issues
I can't find this property in server.properties file. Is that the right place to set this parameter? On Tue, Oct 21, 2014 at 6:27 PM, Jun Rao jun...@gmail.com wrote: Could you also set replica.fetch.wait.max.ms in the broker to sth much smaller? Thanks, Jun On Tue, Oct 21, 2014 at 2:15 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I set the property to 1 in the consumer code that is passed to createJavaConsumerConnector code, but it didn't seem to help props.put(fetch.wait.max.ms, fetchMaxWait); On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com wrote: This is a consumer config: fetch.wait.max.ms On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Is this a parameter I need to set it in kafka server or on the client side? Also, can you help point out which one exactly is consumer max wait time from this list? https://kafka.apache.org/08/configuration.html On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote: There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group. -- -- Guozhang
Re: Performance issues
I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group.
Re: Performance issues
This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group.
Re: Performance issues
There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group.
Re: Performance issues
Is this a parameter I need to set it in kafka server or on the client side? Also, can you help point out which one exactly is consumer max wait time from this list? https://kafka.apache.org/08/configuration.html On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote: There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group.
Re: Performance issues
This is a consumer config: fetch.wait.max.ms On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Is this a parameter I need to set it in kafka server or on the client side? Also, can you help point out which one exactly is consumer max wait time from this list? https://kafka.apache.org/08/configuration.html On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote: There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group. -- -- Guozhang
Re: Performance issues
I set the property to 1 in the consumer code that is passed to createJavaConsumerConnector code, but it didn't seem to help props.put(fetch.wait.max.ms, fetchMaxWait); On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com wrote: This is a consumer config: fetch.wait.max.ms On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Is this a parameter I need to set it in kafka server or on the client side? Also, can you help point out which one exactly is consumer max wait time from this list? https://kafka.apache.org/08/configuration.html On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote: There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group. -- -- Guozhang
Re: Performance issues
Most of the consumer threads seems to be waiting: ConsumerFetcherThread-groupA_ip-10-38-19-230-1413925671158-3cc3e22f-0-0 prio=10 tid=0x7f0aa84db800 nid=0x5be9 runnable [0x7f0a5a618000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked 0x9515bec0 (a sun.nio.ch.Util$2) - locked 0x9515bea8 (a java.util.Collections$UnmodifiableSet) - locked 0x95511d00 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:221) - locked 0x9515bd28 (a java.lang.Object) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) - locked 0x95293828 (a sun.nio.ch.SocketAdaptor$SocketInputStream) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) - locked 0x9515bcb0 (a java.lang.Object) at kafka.utils.Utils$.read(Utils.scala:375) On Tue, Oct 21, 2014 at 2:15 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I set the property to 1 in the consumer code that is passed to createJavaConsumerConnector code, but it didn't seem to help props.put(fetch.wait.max.ms, fetchMaxWait); On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com wrote: This is a consumer config: fetch.wait.max.ms On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Is this a parameter I need to set it in kafka server or on the client side? Also, can you help point out which one exactly is consumer max wait time from this list? https://kafka.apache.org/08/configuration.html On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote: There was a bug that could lead to the fetch request from the consumer hitting it's timeout instead of being immediately triggered by the produce request. To see if you are effected by that set you consumer max wait time to 1 ms and see if the latency drops to 1 ms (or, alternately, try with trunk and see if that fixes the problem). The reason I suspect this problem is because the default timeout in the java consumer is 100ms. -Jay On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com wrote: This is the version I am using: kafka_2.10-0.8.1.1 I think this is fairly recent version On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote: What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It's consistently close to 100ms which makes me believe that there are some settings that I might have to tweak, however, I am not sure how to confirm that assumption :) On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a java test that produces messages and then consumer consumers it. Consumers are active all the time. There is 1 consumer for 1 producer. I am measuring the time between the message is successfully written to the queue and the time consumer picks it up. On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com wrote: Can you give more information about the performance test? Which test? Which queue? How did you measure the dequeue latency. On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group. -- -- Guozhang
Performance issues
I am running a performance test and from what I am seeing is that messages are taking about 100ms to pop from the queue itself and hence making the test slow. I am looking for pointers of how I can troubleshoot this issue. There seems to be plenty of CPU and IO available. I am running 22 producers and 22 consumers in the same group.