Re: Kafka consumer configuration / performance issues

2016-10-05 Thread Shamik Banerjee
Sorry to bump this up, can anyone provide some input on this ? I need to make a 
call soon whether kafka is a good fit to our requirement.



On Tuesday, October 4, 2016 8:57 PM, Shamik Banerjee 
 wrote:
Hi,

  I'm a newbie trying out kafka as an alternative to AWS SQS. The motivation 
primarily is to improve performance where kafka would eliminate the constraint 
of pulling 10 messages at a time with a cap of 256kb. Here's a high-level 
scenario of my use case. I've a bunch of crawlers which are sending documents 
for indexing. The size of the payload is around 1 mb on average. The crawlers 
call a SOAP end-point which in turn runs a producer code to submit the messages 
to a kafka queue. The consumer app picks up the messages and processes them. 
For my test box, I've configured the topic with 30 partitions with 2 
replication. The two kafka instances are running with 1 zookeeper instance. The 
kafka version is 0.10.0.

For my testing, I published 7 million messages in the queue. I created a 
consumer group with 30 consumer thread , one per partition. I was initially 
under the impression that this would substantially speed up the processing 
power compared to what I was getting via SQS. Unfortunately, that was not to be 
the case. In my case, the processing of data is complex and takes up 1-2 
minutes on average to complete.That lead to a flurry of partition rebalancing 
as the threads were not able to heartbeat on time. I could see a bunch of 
messages in the log citing "Auto offset commit failed for group full_group: 
Commit cannot be completed since the group has already rebalanced and assigned 
the partitions to another member. This means that the time between subsequent 
calls to poll() was longer than the configured session.timeout.ms, which 
typically implies that the poll loop is spending too much time message 
processing. You can address this either by increasing the session timeout or by 
reducing the maximum size of batches returned in the poll() with 
max.poll.records." This lead to the same message being processed multiple 
times. I tried playing around with session timeout, max.poll.records and poll 
time to avoid this, but that slowed down the overall processing bigtime. Here's 
some of the configuration parameter:

metadata.max.age.ms = 30 
max.partition.fetch.bytes = 1048576 
bootstrap.servers = [kafkahost1:9092, kafkahost2:9092] 
enable.auto.commit = true 
max.poll.records = 1 
request.timeout.ms = 31 
heartbeat.interval.ms = 10 
auto.commit.interval.ms = 1000 
receive.buffer.bytes = 65536 
fetch.min.bytes = 1 
send.buffer.bytes = 131072 
value.deserializer = class 
com.autodesk.preprocessor.consumer.serializer.KryoObjectSerializer 
group.id = full_group 
retry.backoff.ms = 100 
fetch.max.wait.ms = 500 
connections.max.idle.ms = 54 
session.timeout.ms = 30 
key.deserializer = class 
org.apache.kafka.common.serialization.StringDeserializer 
metrics.sample.window.ms = 3 
auto.offset.reset = latest

I reduced the consumer poll time to 100 ms. It reduced the rebalancing issues, 
eliminated duplicate processing but slowed down the overall process 
significantly. It ended up taking 35 hours to complete processing all 6 million 
messages compared to 25 hours using the SQS based solution. Each consumer 
thread on average retrieved 50-60 messages per poll, though some of them polled 
0 records at times. I'm not sure about this behavior when there are a huge 
amount messages available in the partition. The same thread was able to pick up 
messages during the subsequent iteration. Could this be due to rebalancing ? 

Here's my consumer code:

while (true) { 
try{ 
ConsumerRecords records = 
consumer.poll(100); 
for (ConsumerRecord record : records) { 
if(record.value()!=null){ 
TextAnalysisRequest textAnalysisObj = record.value(); 
if(textAnalysisObj!=null){ 
// Process record
PreProcessorUtil.submitPostProcessRequest(textAnalysisObj); 
}
} 
} 
}catch(Exception ex){ 
LOGGER.error("Error in Full Consumer group worker", ex); 
}

I understanding that record processing part is one bottleneck in my case. But 
I'm sure a few folks here have a similar use case of dealing with large 
processing time. I thought of doing an async processing by spinning each 
processor in it's dedicated thread or use a thread pool with large capacity, 
but not sure if it would create a big load in the system. At the same time, 
I've seen a couple of instances where people have used pause and resume API to 
perform the processing in order to avoid rebalancing issue.

I'm really looking for some advice / best practice in this circumstance. 
Particularly, the recommended configuration setting around hearbeat, request 
timeout, max poll records, auto commit 

Kafka consumer configuration / performance issues

2016-10-04 Thread Shamik Banerjee
Hi,

  I'm a newbie trying out kafka as an alternative to AWS SQS. The motivation 
primarily is to improve performance where kafka would eliminate the constraint 
of pulling 10 messages at a time with a cap of 256kb. Here's a high-level 
scenario of my use case. I've a bunch of crawlers which are sending documents 
for indexing. The size of the payload is around 1 mb on average. The crawlers 
call a SOAP end-point which in turn runs a producer code to submit the messages 
to a kafka queue. The consumer app picks up the messages and processes them. 
For my test box, I've configured the topic with 30 partitions with 2 
replication. The two kafka instances are running with 1 zookeeper instance. The 
kafka version is 0.10.0.

For my testing, I published 7 million messages in the queue. I created a 
consumer group with 30 consumer thread , one per partition. I was initially 
under the impression that this would substantially speed up the processing 
power compared to what I was getting via SQS. Unfortunately, that was not to be 
the case. In my case, the processing of data is complex and takes up 1-2 
minutes on average to complete.That lead to a flurry of partition rebalancing 
as the threads were not able to heartbeat on time. I could see a bunch of 
messages in the log citing "Auto offset commit failed for group full_group: 
Commit cannot be completed since the group has already rebalanced and assigned 
the partitions to another member. This means that the time between subsequent 
calls to poll() was longer than the configured session.timeout.ms, which 
typically implies that the poll loop is spending too much time message 
processing. You can address this either by increasing the session timeout or by 
reducing the maximum size of batches returned in the poll() with 
max.poll.records." This lead to the same message being processed multiple 
times. I tried playing around with session timeout, max.poll.records and poll 
time to avoid this, but that slowed down the overall processing bigtime. Here's 
some of the configuration parameter:

metadata.max.age.ms = 30 
max.partition.fetch.bytes = 1048576 
bootstrap.servers = [kafkahost1:9092, kafkahost2:9092] 
enable.auto.commit = true 
max.poll.records = 1 
request.timeout.ms = 31 
heartbeat.interval.ms = 10 
auto.commit.interval.ms = 1000 
receive.buffer.bytes = 65536 
fetch.min.bytes = 1 
send.buffer.bytes = 131072 
value.deserializer = class 
com.autodesk.preprocessor.consumer.serializer.KryoObjectSerializer 
group.id = full_group 
retry.backoff.ms = 100 
fetch.max.wait.ms = 500 
connections.max.idle.ms = 54 
session.timeout.ms = 30 
key.deserializer = class 
org.apache.kafka.common.serialization.StringDeserializer 
metrics.sample.window.ms = 3 
auto.offset.reset = latest

I reduced the consumer poll time to 100 ms. It reduced the rebalancing issues, 
eliminated duplicate processing but slowed down the overall process 
significantly. It ended up taking 35 hours to complete processing all 6 million 
messages compared to 25 hours using the SQS based solution. Each consumer 
thread on average retrieved 50-60 messages per poll, though some of them polled 
0 records at times. I'm not sure about this behavior when there are a huge 
amount messages available in the partition. The same thread was able to pick up 
messages during the subsequent iteration. Could this be due to rebalancing ? 

Here's my consumer code:

while (true) { 
try{ 
ConsumerRecords records = 
consumer.poll(100); 
for (ConsumerRecord record : records) { 
if(record.value()!=null){ 
TextAnalysisRequest textAnalysisObj = record.value(); 
if(textAnalysisObj!=null){ 
// Process record
PreProcessorUtil.submitPostProcessRequest(textAnalysisObj); 
}
} 
} 
}catch(Exception ex){ 
LOGGER.error("Error in Full Consumer group worker", ex); 
}

I understanding that record processing part is one bottleneck in my case. But 
I'm sure a few folks here have a similar use case of dealing with large 
processing time. I thought of doing an async processing by spinning each 
processor in it's dedicated thread or use a thread pool with large capacity, 
but not sure if it would create a big load in the system. At the same time, 
I've seen a couple of instances where people have used pause and resume API to 
perform the processing in order to avoid rebalancing issue.

I'm really looking for some advice / best practice in this circumstance. 
Particularly, the recommended configuration setting around hearbeat, request 
timeout, max poll records, auto commit interval, poll interval, etc. if kafka 
is not the right tool for my use case, please let me know as well.

Any pointers will be appreciated. 

-Thanks,
Shamik


Tools/recommendations to debug performance issues?

2015-09-14 Thread noah
We're using 0.8.2.1 processing maybe 1 million messages per hour. Each
message includes tracking information with a timestamp for when it was
produced, and a timestamp for when it was consumed, to give us roughly the
amount of time it spent in Kafka.  On average this number is in the seconds
and our upper percentiles are in the minutes.

What metrics and settings can we look at to figure out why we might be
spending so much time in Kafka?


Re: Tools/recommendations to debug performance issues?

2015-09-14 Thread Rahul Jain
Have you checked the consumer lag? You can use the offset checker tool to
see if there is a lag.
On 14 Sep 2015 18:36, "noah"  wrote:

> We're using 0.8.2.1 processing maybe 1 million messages per hour. Each
> message includes tracking information with a timestamp for when it was
> produced, and a timestamp for when it was consumed, to give us roughly the
> amount of time it spent in Kafka.  On average this number is in the seconds
> and our upper percentiles are in the minutes.
>
> What metrics and settings can we look at to figure out why we might be
> spending so much time in Kafka?
>


Re: Tools/recommendations to debug performance issues?

2015-09-14 Thread Gwen Shapira
Kafka also collects very useful metrics on request times and their
breakdown.
They are under kafka.network.



On Mon, Sep 14, 2015 at 6:59 AM, Rahul Jain  wrote:

> Have you checked the consumer lag? You can use the offset checker tool to
> see if there is a lag.
> On 14 Sep 2015 18:36, "noah"  wrote:
>
> > We're using 0.8.2.1 processing maybe 1 million messages per hour. Each
> > message includes tracking information with a timestamp for when it was
> > produced, and a timestamp for when it was consumed, to give us roughly
> the
> > amount of time it spent in Kafka.  On average this number is in the
> seconds
> > and our upper percentiles are in the minutes.
> >
> > What metrics and settings can we look at to figure out why we might be
> > spending so much time in Kafka?
> >
>


Re: Performance issues

2014-10-23 Thread Mohit Anchlia
By increasing partitions and using kafka from master branch I was able to
cut down the response times into half. But it still seems high and it looks
like there still is a delay between a successful post and the first time
message is seen by the consumers. There are plenty of resources available.

Is there a way I can easily check breakdown of latency on every tier. For
eg: producer - broker - consumer
On Wed, Oct 22, 2014 at 2:37 PM, Neha Narkhede neha.narkh...@gmail.com
wrote:

 the server.properties file doesn't have all the properties. You can add it
 there and try your test.

 On Wed, Oct 22, 2014 at 11:41 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  I can't find this property in server.properties file. Is that the right
  place to set this parameter?
  On Tue, Oct 21, 2014 at 6:27 PM, Jun Rao jun...@gmail.com wrote:
 
   Could you also set replica.fetch.wait.max.ms in the broker to sth much
   smaller?
  
   Thanks,
  
   Jun
  
   On Tue, Oct 21, 2014 at 2:15 PM, Mohit Anchlia mohitanch...@gmail.com
 
   wrote:
  
I set the property to 1 in the consumer code that is passed to
createJavaConsumerConnector
code, but it didn't seem to help
   
props.put(fetch.wait.max.ms, fetchMaxWait);
   
On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com
   wrote:
   
 This is a consumer config:

 fetch.wait.max.ms

 On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia 
   mohitanch...@gmail.com
 wrote:

  Is this a parameter I need to set it in kafka server or on the
  client
 side?
  Also, can you help point out which one exactly is consumer max
 wait
time
  from this list?
 
  https://kafka.apache.org/08/configuration.html
 
  On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com
 
wrote:
 
   There was a bug that could lead to the fetch request from the
consumer
   hitting it's timeout instead of being immediately triggered by
  the
  produce
   request. To see if you are effected by that set you consumer
 max
   wait
  time
   to 1 ms and see if the latency drops to 1 ms (or, alternately,
  try
with
   trunk and see if that fixes the problem).
  
   The reason I suspect this problem is because the default
 timeout
  in
the
   java consumer is 100ms.
  
   -Jay
  
   On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
This is the version I am using: kafka_2.10-0.8.1.1
   
I think this is fairly recent version
On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps 
  jay.kr...@gmail.com
   
  wrote:
   
 What version of Kafka is this? Can you try the same test
   against
  trunk?
We
 fixed a couple of latency related bugs which may be the
  cause.

 -Jay

 On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia 
   mohitanch...@gmail.com
 wrote:

  It's consistently close to 100ms which makes me believe
  that
 there
   are
 some
  settings that I might have to tweak, however, I am not
 sure
   how
 to
 confirm
  that assumption :)
  On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia 
   mohitanch...@gmail.com

  wrote:
 
   I have a java test that produces messages and then
  consumer
   consumers
 it.
   Consumers are active all the time. There is 1 consumer
  for
   1
producer.
 I
  am
   measuring the time between the message is successfully
written
 to
   the
  queue
   and the time consumer picks it up.
  
   On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
 neha.narkh...@gmail.com
   wrote:
  
   Can you give more information about the performance
  test?
 Which
test?
   Which
   queue? How did you measure the dequeue latency.
  
   On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
I am running a performance test and from what I am
   seeing
is
   that
   messages
are taking about 100ms to pop from the queue itself
  and
 hence
making
  the
test slow. I am looking for pointers of how I can
 troubleshoot
this
   issue.
   
There seems to be plenty of CPU and IO available. I
 am
 running
   22
   producers
and 22 consumers in the same group.
   
  
  
  
 

   
  
 



 --
 -- Guozhang

   
  
 



Re: Performance issues

2014-10-22 Thread Mohit Anchlia
I can't find this property in server.properties file. Is that the right
place to set this parameter?
On Tue, Oct 21, 2014 at 6:27 PM, Jun Rao jun...@gmail.com wrote:

 Could you also set replica.fetch.wait.max.ms in the broker to sth much
 smaller?

 Thanks,

 Jun

 On Tue, Oct 21, 2014 at 2:15 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  I set the property to 1 in the consumer code that is passed to
  createJavaConsumerConnector
  code, but it didn't seem to help
 
  props.put(fetch.wait.max.ms, fetchMaxWait);
 
  On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com
 wrote:
 
   This is a consumer config:
  
   fetch.wait.max.ms
  
   On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
Is this a parameter I need to set it in kafka server or on the client
   side?
Also, can you help point out which one exactly is consumer max wait
  time
from this list?
   
https://kafka.apache.org/08/configuration.html
   
On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com
  wrote:
   
 There was a bug that could lead to the fetch request from the
  consumer
 hitting it's timeout instead of being immediately triggered by the
produce
 request. To see if you are effected by that set you consumer max
 wait
time
 to 1 ms and see if the latency drops to 1 ms (or, alternately, try
  with
 trunk and see if that fixes the problem).

 The reason I suspect this problem is because the default timeout in
  the
 java consumer is 100ms.

 -Jay

 On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia 
   mohitanch...@gmail.com
 wrote:

  This is the version I am using: kafka_2.10-0.8.1.1
 
  I think this is fairly recent version
  On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com
 
wrote:
 
   What version of Kafka is this? Can you try the same test
 against
trunk?
  We
   fixed a couple of latency related bugs which may be the cause.
  
   -Jay
  
   On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
It's consistently close to 100ms which makes me believe that
   there
 are
   some
settings that I might have to tweak, however, I am not sure
 how
   to
   confirm
that assumption :)
On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia 
 mohitanch...@gmail.com
  
wrote:
   
 I have a java test that produces messages and then consumer
 consumers
   it.
 Consumers are active all the time. There is 1 consumer for
 1
  producer.
   I
am
 measuring the time between the message is successfully
  written
   to
 the
queue
 and the time consumer picks it up.

 On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
   neha.narkh...@gmail.com
 wrote:

 Can you give more information about the performance test?
   Which
  test?
 Which
 queue? How did you measure the dequeue latency.

 On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
   mohitanch...@gmail.com
 wrote:

  I am running a performance test and from what I am
 seeing
  is
 that
 messages
  are taking about 100ms to pop from the queue itself and
   hence
  making
the
  test slow. I am looking for pointers of how I can
   troubleshoot
  this
 issue.
 
  There seems to be plenty of CPU and IO available. I am
   running
 22
 producers
  and 22 consumers in the same group.
 



   
  
 

   
  
  
  
   --
   -- Guozhang
  
 



Re: Performance issues

2014-10-21 Thread Mohit Anchlia
I have a java test that produces messages and then consumer consumers it.
Consumers are active all the time. There is 1 consumer for 1 producer. I am
measuring the time between the message is successfully written to the queue
and the time consumer picks it up.
On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede neha.narkh...@gmail.com
wrote:

 Can you give more information about the performance test? Which test? Which
 queue? How did you measure the dequeue latency.

 On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  I am running a performance test and from what I am seeing is that
 messages
  are taking about 100ms to pop from the queue itself and hence making the
  test slow. I am looking for pointers of how I can troubleshoot this
 issue.
 
  There seems to be plenty of CPU and IO available. I am running 22
 producers
  and 22 consumers in the same group.
 



Re: Performance issues

2014-10-21 Thread Mohit Anchlia
This is the version I am using: kafka_2.10-0.8.1.1

I think this is fairly recent version
On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote:

 What version of Kafka is this? Can you try the same test against trunk? We
 fixed a couple of latency related bugs which may be the cause.

 -Jay

 On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  It's consistently close to 100ms which makes me believe that there are
 some
  settings that I might have to tweak, however, I am not sure how to
 confirm
  that assumption :)
  On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   I have a java test that produces messages and then consumer consumers
 it.
   Consumers are active all the time. There is 1 consumer for 1 producer.
 I
  am
   measuring the time between the message is successfully written to the
  queue
   and the time consumer picks it up.
  
   On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
 neha.narkh...@gmail.com
   wrote:
  
   Can you give more information about the performance test? Which test?
   Which
   queue? How did you measure the dequeue latency.
  
   On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
I am running a performance test and from what I am seeing is that
   messages
are taking about 100ms to pop from the queue itself and hence making
  the
test slow. I am looking for pointers of how I can troubleshoot this
   issue.
   
There seems to be plenty of CPU and IO available. I am running 22
   producers
and 22 consumers in the same group.
   
  
  
  
 



Re: Performance issues

2014-10-21 Thread Jay Kreps
There was a bug that could lead to the fetch request from the consumer
hitting it's timeout instead of being immediately triggered by the produce
request. To see if you are effected by that set you consumer max wait time
to 1 ms and see if the latency drops to 1 ms (or, alternately, try with
trunk and see if that fixes the problem).

The reason I suspect this problem is because the default timeout in the
java consumer is 100ms.

-Jay

On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 This is the version I am using: kafka_2.10-0.8.1.1

 I think this is fairly recent version
 On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote:

  What version of Kafka is this? Can you try the same test against trunk?
 We
  fixed a couple of latency related bugs which may be the cause.
 
  -Jay
 
  On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   It's consistently close to 100ms which makes me believe that there are
  some
   settings that I might have to tweak, however, I am not sure how to
  confirm
   that assumption :)
   On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia mohitanch...@gmail.com
 
   wrote:
  
I have a java test that produces messages and then consumer consumers
  it.
Consumers are active all the time. There is 1 consumer for 1
 producer.
  I
   am
measuring the time between the message is successfully written to the
   queue
and the time consumer picks it up.
   
On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
  neha.narkh...@gmail.com
wrote:
   
Can you give more information about the performance test? Which
 test?
Which
queue? How did you measure the dequeue latency.
   
On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
  mohitanch...@gmail.com
wrote:
   
 I am running a performance test and from what I am seeing is that
messages
 are taking about 100ms to pop from the queue itself and hence
 making
   the
 test slow. I am looking for pointers of how I can troubleshoot
 this
issue.

 There seems to be plenty of CPU and IO available. I am running 22
producers
 and 22 consumers in the same group.

   
   
   
  
 



Re: Performance issues

2014-10-21 Thread Mohit Anchlia
Is this a parameter I need to set it in kafka server or on the client side?
Also, can you help point out which one exactly is consumer max wait time
from this list?

https://kafka.apache.org/08/configuration.html

On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote:

 There was a bug that could lead to the fetch request from the consumer
 hitting it's timeout instead of being immediately triggered by the produce
 request. To see if you are effected by that set you consumer max wait time
 to 1 ms and see if the latency drops to 1 ms (or, alternately, try with
 trunk and see if that fixes the problem).

 The reason I suspect this problem is because the default timeout in the
 java consumer is 100ms.

 -Jay

 On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  This is the version I am using: kafka_2.10-0.8.1.1
 
  I think this is fairly recent version
  On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
   What version of Kafka is this? Can you try the same test against trunk?
  We
   fixed a couple of latency related bugs which may be the cause.
  
   -Jay
  
   On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
It's consistently close to 100ms which makes me believe that there
 are
   some
settings that I might have to tweak, however, I am not sure how to
   confirm
that assumption :)
On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia 
 mohitanch...@gmail.com
  
wrote:
   
 I have a java test that produces messages and then consumer
 consumers
   it.
 Consumers are active all the time. There is 1 consumer for 1
  producer.
   I
am
 measuring the time between the message is successfully written to
 the
queue
 and the time consumer picks it up.

 On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
   neha.narkh...@gmail.com
 wrote:

 Can you give more information about the performance test? Which
  test?
 Which
 queue? How did you measure the dequeue latency.

 On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
   mohitanch...@gmail.com
 wrote:

  I am running a performance test and from what I am seeing is
 that
 messages
  are taking about 100ms to pop from the queue itself and hence
  making
the
  test slow. I am looking for pointers of how I can troubleshoot
  this
 issue.
 
  There seems to be plenty of CPU and IO available. I am running
 22
 producers
  and 22 consumers in the same group.
 



   
  
 



Re: Performance issues

2014-10-21 Thread Guozhang Wang
This is a consumer config:

fetch.wait.max.ms

On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 Is this a parameter I need to set it in kafka server or on the client side?
 Also, can you help point out which one exactly is consumer max wait time
 from this list?

 https://kafka.apache.org/08/configuration.html

 On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote:

  There was a bug that could lead to the fetch request from the consumer
  hitting it's timeout instead of being immediately triggered by the
 produce
  request. To see if you are effected by that set you consumer max wait
 time
  to 1 ms and see if the latency drops to 1 ms (or, alternately, try with
  trunk and see if that fixes the problem).
 
  The reason I suspect this problem is because the default timeout in the
  java consumer is 100ms.
 
  -Jay
 
  On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   This is the version I am using: kafka_2.10-0.8.1.1
  
   I think this is fairly recent version
   On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com
 wrote:
  
What version of Kafka is this? Can you try the same test against
 trunk?
   We
fixed a couple of latency related bugs which may be the cause.
   
-Jay
   
On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia 
  mohitanch...@gmail.com
wrote:
   
 It's consistently close to 100ms which makes me believe that there
  are
some
 settings that I might have to tweak, however, I am not sure how to
confirm
 that assumption :)
 On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia 
  mohitanch...@gmail.com
   
 wrote:

  I have a java test that produces messages and then consumer
  consumers
it.
  Consumers are active all the time. There is 1 consumer for 1
   producer.
I
 am
  measuring the time between the message is successfully written to
  the
 queue
  and the time consumer picks it up.
 
  On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
neha.narkh...@gmail.com
  wrote:
 
  Can you give more information about the performance test? Which
   test?
  Which
  queue? How did you measure the dequeue latency.
 
  On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
mohitanch...@gmail.com
  wrote:
 
   I am running a performance test and from what I am seeing is
  that
  messages
   are taking about 100ms to pop from the queue itself and hence
   making
 the
   test slow. I am looking for pointers of how I can troubleshoot
   this
  issue.
  
   There seems to be plenty of CPU and IO available. I am running
  22
  producers
   and 22 consumers in the same group.
  
 
 
 

   
  
 




-- 
-- Guozhang


Re: Performance issues

2014-10-21 Thread Mohit Anchlia
I set the property to 1 in the consumer code that is passed to
createJavaConsumerConnector
code, but it didn't seem to help

props.put(fetch.wait.max.ms, fetchMaxWait);

On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com wrote:

 This is a consumer config:

 fetch.wait.max.ms

 On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  Is this a parameter I need to set it in kafka server or on the client
 side?
  Also, can you help point out which one exactly is consumer max wait time
  from this list?
 
  https://kafka.apache.org/08/configuration.html
 
  On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
   There was a bug that could lead to the fetch request from the consumer
   hitting it's timeout instead of being immediately triggered by the
  produce
   request. To see if you are effected by that set you consumer max wait
  time
   to 1 ms and see if the latency drops to 1 ms (or, alternately, try with
   trunk and see if that fixes the problem).
  
   The reason I suspect this problem is because the default timeout in the
   java consumer is 100ms.
  
   -Jay
  
   On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
This is the version I am using: kafka_2.10-0.8.1.1
   
I think this is fairly recent version
On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com
  wrote:
   
 What version of Kafka is this? Can you try the same test against
  trunk?
We
 fixed a couple of latency related bugs which may be the cause.

 -Jay

 On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia 
   mohitanch...@gmail.com
 wrote:

  It's consistently close to 100ms which makes me believe that
 there
   are
 some
  settings that I might have to tweak, however, I am not sure how
 to
 confirm
  that assumption :)
  On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia 
   mohitanch...@gmail.com

  wrote:
 
   I have a java test that produces messages and then consumer
   consumers
 it.
   Consumers are active all the time. There is 1 consumer for 1
producer.
 I
  am
   measuring the time between the message is successfully written
 to
   the
  queue
   and the time consumer picks it up.
  
   On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
 neha.narkh...@gmail.com
   wrote:
  
   Can you give more information about the performance test?
 Which
test?
   Which
   queue? How did you measure the dequeue latency.
  
   On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
I am running a performance test and from what I am seeing is
   that
   messages
are taking about 100ms to pop from the queue itself and
 hence
making
  the
test slow. I am looking for pointers of how I can
 troubleshoot
this
   issue.
   
There seems to be plenty of CPU and IO available. I am
 running
   22
   producers
and 22 consumers in the same group.
   
  
  
  
 

   
  
 



 --
 -- Guozhang



Re: Performance issues

2014-10-21 Thread Mohit Anchlia
Most of the consumer threads seems to be waiting:

ConsumerFetcherThread-groupA_ip-10-38-19-230-1413925671158-3cc3e22f-0-0
prio=10 tid=0x7f0aa84db800 nid=0x5be9 runnable [0x7f0a5a618000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked 0x9515bec0 (a sun.nio.ch.Util$2)
- locked 0x9515bea8 (a
java.util.Collections$UnmodifiableSet)
- locked 0x95511d00 (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:221)
- locked 0x9515bd28 (a java.lang.Object)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
- locked 0x95293828 (a
sun.nio.ch.SocketAdaptor$SocketInputStream)
at
java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
- locked 0x9515bcb0 (a java.lang.Object)
at kafka.utils.Utils$.read(Utils.scala:375)

On Tue, Oct 21, 2014 at 2:15 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:

 I set the property to 1 in the consumer code that is passed to 
 createJavaConsumerConnector
 code, but it didn't seem to help

 props.put(fetch.wait.max.ms, fetchMaxWait);

 On Tue, Oct 21, 2014 at 1:21 PM, Guozhang Wang wangg...@gmail.com wrote:

 This is a consumer config:

 fetch.wait.max.ms

 On Tue, Oct 21, 2014 at 11:39 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  Is this a parameter I need to set it in kafka server or on the client
 side?
  Also, can you help point out which one exactly is consumer max wait time
  from this list?
 
  https://kafka.apache.org/08/configuration.html
 
  On Tue, Oct 21, 2014 at 11:35 AM, Jay Kreps jay.kr...@gmail.com
 wrote:
 
   There was a bug that could lead to the fetch request from the consumer
   hitting it's timeout instead of being immediately triggered by the
  produce
   request. To see if you are effected by that set you consumer max wait
  time
   to 1 ms and see if the latency drops to 1 ms (or, alternately, try
 with
   trunk and see if that fixes the problem).
  
   The reason I suspect this problem is because the default timeout in
 the
   java consumer is 100ms.
  
   -Jay
  
   On Tue, Oct 21, 2014 at 11:06 AM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
This is the version I am using: kafka_2.10-0.8.1.1
   
I think this is fairly recent version
On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps jay.kr...@gmail.com
  wrote:
   
 What version of Kafka is this? Can you try the same test against
  trunk?
We
 fixed a couple of latency related bugs which may be the cause.

 -Jay

 On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia 
   mohitanch...@gmail.com
 wrote:

  It's consistently close to 100ms which makes me believe that
 there
   are
 some
  settings that I might have to tweak, however, I am not sure how
 to
 confirm
  that assumption :)
  On Tue, Oct 21, 2014 at 8:53 AM, Mohit Anchlia 
   mohitanch...@gmail.com

  wrote:
 
   I have a java test that produces messages and then consumer
   consumers
 it.
   Consumers are active all the time. There is 1 consumer for 1
producer.
 I
  am
   measuring the time between the message is successfully
 written to
   the
  queue
   and the time consumer picks it up.
  
   On Tue, Oct 21, 2014 at 8:32 AM, Neha Narkhede 
 neha.narkh...@gmail.com
   wrote:
  
   Can you give more information about the performance test?
 Which
test?
   Which
   queue? How did you measure the dequeue latency.
  
   On Mon, Oct 20, 2014 at 5:09 PM, Mohit Anchlia 
 mohitanch...@gmail.com
   wrote:
  
I am running a performance test and from what I am seeing
 is
   that
   messages
are taking about 100ms to pop from the queue itself and
 hence
making
  the
test slow. I am looking for pointers of how I can
 troubleshoot
this
   issue.
   
There seems to be plenty of CPU and IO available. I am
 running
   22
   producers
and 22 consumers in the same group.
   
  
  
  
 

   
  
 



 --
 -- Guozhang





Performance issues

2014-10-20 Thread Mohit Anchlia
I am running a performance test and from what I am seeing is that messages
are taking about 100ms to pop from the queue itself and hence making the
test slow. I am looking for pointers of how I can troubleshoot this issue.

There seems to be plenty of CPU and IO available. I am running 22 producers
and 22 consumers in the same group.