Re: Reduce latency

2015-08-18 Thread Alvaro Gareppe
you can configure that, to block or to fail:
http://kafka.apache.org/documentation.html#producerconfigs

By default it should block

On Tue, Aug 18, 2015 at 4:57 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:

 I see. So the internal queue overwrites the producer buffer size
 configuration? When buffer is full the producer will block sending, right?

 On Tue, Aug 18, 2015 at 3:52 PM, Tao Feng fengta...@gmail.com wrote:

  From what I understand, if you set the throughput to -1, the
  producerperformance will push records as much as possible to an internal
  per topic per partition queue. In the background there is a sender IO
  thread handling the actual record sending process. If you push record to
  the queue faster than the send rate, your queue  will become longer and
  longer, eventually record latency will become meaningless for a
  latency-purpose test.
 
 
  On Tue, Aug 18, 2015 at 11:48 AM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
   I see. Thank you Tao. But now I don't get it what Jay said that my
  latency
   test only makes sense if I set a fixed throughput. Why do I need to
 set a
   fixed throughput for my test instead of just set the expected
 throughput
  to
   be -1 (as much as possible)?
  
   Thanks.
  
   On Tue, Aug 18, 2015 at 2:43 PM, Tao Feng fengta...@gmail.com wrote:
  
Hi Yuheng,
   
The 1 record/s is just a param for producerperformance for your
producer target tput. It only takes effect to do the throttling if
 you
tries to send more than 1 record/s.  The actual tput of the test
depends on your producer config and your setup.
   
-Tao
   
On Tue, Aug 18, 2015 at 11:34 AM, Yuheng Du 
 yuheng.du.h...@gmail.com
wrote:
   
 Also, When I set the target throughput to be 1 records/s, The
   actual
 test results show I got an average of 579.86 records per second
 among
   all
 my producers. How did that happen? Why this number is not 1
 then?
 Thanks.

 On Tue, Aug 18, 2015 at 10:03 AM, Yuheng Du 
  yuheng.du.h...@gmail.com
 wrote:

  Thank you Jay, that really helps!
 
  Kishore, Where you can monitor whether the network is busy on IO
 in
 visual
  vm? Thanks. I am running 90 producer process on 90 physical
  machines
   in
 the
  experiment.
 
  On Tue, Aug 18, 2015 at 1:19 AM, Jay Kreps j...@confluent.io
  wrote:
 
  Yuheng,
 
  From the command you gave it looks like you are configuring the
  perf
 test
  to send data as fast as possible (the -1 for target throughput).
   This
  means
  it will always queue up a bunch of unsent data until the buffer
 is
  exhausted and then block. The larger the buffer, the bigger the
   queue.
  This
  is where the latency comes from. This is exactly what you would
   expect
 and
  what the buffering is supposed to do.
 
  If you want to measure latency this test doesn't really make
  sense,
you
  need to measure with some fixed throughput. Instead of -1 enter
  the
 target
  throughput you want to measure latency at (e.g. 10
  records/sec).
 
  -Jay
 
  On Thu, Aug 13, 2015 at 12:18 PM, Yuheng Du 
   yuheng.du.h...@gmail.com

  wrote:
 
   Thank you Alvaro,
  
   How to use sync producers? I am running the standard
 ProducerPerformance
   test from kafka to measure the latency of each message to send
   from
   producer to broker only.
   The command is like bin/kafka-run-class.sh
   org.apache.kafka.clients.tools.ProducerPerformance test7
  5000
100
 -1
   acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
   buffer.memory=67108864 batch.size=8196
  
   For running producers, where should I put the
 producer.type=sync
   configuration into? The config/server.properties? Also Does
 this
mean
 we
   are using batch size of 1? Which version of Kafka are you
 using?
   thanks.
  
   On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe 
   agare...@gmail.com

   wrote:
  
Are you measuring latency as time between producer and
  consumer
   ?
   
In that case, the ack shouldn't affect the latency, cause
 even
tough
  your
producer is not going to wait for the ack, the consumer will
   only
 get
  the
message after its commited in the server.
   
About latency my best result occur with sync producers, but
  the
   throughput
is much lower in that case.
   
About not flushing to disk I'm pretty sure that it's not an
   option
 in
   kafka
(correct me if I'm wrong)
   
Regards,
Alvaro Gareppe
   
On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du 
 yuheng.du.h...@gmail.com
  
wrote:
   
 Also, the latency results show no major difference when
  using
 ack=0

Re: Reduce latency

2015-08-13 Thread Alvaro Gareppe
I'm using last one, but not using the ProducerPerformance, I created my
own. but I think there is a producer.properties file in config folder in
kafka.. is that configuration not for this tester ?

On Thu, Aug 13, 2015 at 4:18 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:

 Thank you Alvaro,

 How to use sync producers? I am running the standard ProducerPerformance
 test from kafka to measure the latency of each message to send from
 producer to broker only.
 The command is like bin/kafka-run-class.sh
 org.apache.kafka.clients.tools.ProducerPerformance test7 5000 100 -1
 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092
 buffer.memory=67108864 batch.size=8196

 For running producers, where should I put the producer.type=sync
 configuration into? The config/server.properties? Also Does this mean we
 are using batch size of 1? Which version of Kafka are you using?
 thanks.

 On Thu, Aug 13, 2015 at 3:01 PM, Alvaro Gareppe agare...@gmail.com
 wrote:

  Are you measuring latency as time between producer and consumer ?
 
  In that case, the ack shouldn't affect the latency, cause even tough your
  producer is not going to wait for the ack, the consumer will only get the
  message after its commited in the server.
 
  About latency my best result occur with sync producers, but the
 throughput
  is much lower in that case.
 
  About not flushing to disk I'm pretty sure that it's not an option in
 kafka
  (correct me if I'm wrong)
 
  Regards,
  Alvaro Gareppe
 
  On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
   Also, the latency results show no major difference when using ack=0 or
   ack=1. Why is that?
  
   On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
I am running an experiment where 92 producers is publishing data
 into 6
brokers and 10 consumer are reading online data simultaneously.
   
How should I do to reduce the latency? Currently when I run the
  producer
performance test the average latency is around 10s.
   
Should I disable log.flush? How to do that? Thanks.
   
  
 
 
 
  --
  Ing. Alvaro Gareppe
  agare...@gmail.com
 




-- 
Ing. Alvaro Gareppe
agare...@gmail.com


Re: Reduce latency

2015-08-13 Thread Alvaro Gareppe
Are you measuring latency as time between producer and consumer ?

In that case, the ack shouldn't affect the latency, cause even tough your
producer is not going to wait for the ack, the consumer will only get the
message after its commited in the server.

About latency my best result occur with sync producers, but the throughput
is much lower in that case.

About not flushing to disk I'm pretty sure that it's not an option in kafka
(correct me if I'm wrong)

Regards,
Alvaro Gareppe

On Thu, Aug 13, 2015 at 12:59 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:

 Also, the latency results show no major difference when using ack=0 or
 ack=1. Why is that?

 On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  I am running an experiment where 92 producers is publishing data into 6
  brokers and 10 consumer are reading online data simultaneously.
 
  How should I do to reduce the latency? Currently when I run the producer
  performance test the average latency is around 10s.
 
  Should I disable log.flush? How to do that? Thanks.
 




-- 
Ing. Alvaro Gareppe
agare...@gmail.com


lowlatency on kafka

2015-08-11 Thread Alvaro Gareppe
I'm starting to use kafka for a low latency application.

I need a topic that has over all process latency around 2 or 3 ms, (latency
from producer to consumer)

I want to use a async producer, but I'm not getting it to work that fast.
What are the key properties to configure in: producer, consumer, and topic
to accomplish the best latency possible ?

I can send you what I have configured so far.

Thank you --


Re: message filterin or selector

2015-08-06 Thread Alvaro Gareppe
Thanks

On Thu, Aug 6, 2015 at 2:20 PM, Grant Henke ghe...@cloudera.com wrote:

 I completely agree with Ben's response. Especially the invitation to
 propose and get involved in adding functionality to Kafka. A first step to
 a change this large would be to thoroughly describe your motivations,
 needed features and proposed changes or architecture in a KIP proposal.
 This way the community can discuss if features like this belong in Kafka,
 where they belong, and options for implementation. More information about
 that process can be found here:

 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

 On Thu, Aug 6, 2015 at 11:55 AM, Ben Stopford b...@confluent.io wrote:

  I think short answer here is that, if you need freeform selectors
  semantics as per JMS message selectors then you’d need to wrap the API
  yourself (or get involved in adding the functionality to Kafka).
 
  As Gwen and Grant say, you could synthesise something simpler using
  topics/partitions to provide separate routing, but it would have to be a
  relatively simple use case. Kafka will support a large number of
  topics/partitions pairs but each one incurs a cost. Thus this route may
 not
  be wise for the use case you are describing.
 
  B
   On 6 Aug 2015, at 16:38, Alvaro Gareppe agare...@gmail.com wrote:
  
   Is not because of throughput is more about Security. I cant allow all
   clients to have access to all the topic content (in some cases)
   I know that access control is something that is not implemented yet,
 but
   planed. My idea is to plug a customisation there to add security at
   selection level too. But If the selector applies only at client side
 I
   wont get any information of how the user is planing to select on the
  server
   side therefore I wont be able to restrict or grant access.
  
   I planing to substitute an activeMQ with Kafka but I need to keep some
   functionality like security and selection that are not yet implemented
 in
   kafka so I need to get creative in the workarounds to be able to use
 it.
  
   You comment that I can do some custom partitioning in my particular
 case.
   But I'm not sure if I can do something like that because even though I
  can
   know what are the fields that can be used for filtering I dont know
 the
   values. but dont know...
  
   Lets say the message has a property X that I can use as selection
  criteria.
   I can create a partitioning based on X, so that would split the topic
  based
   on X values, and connect the clients to the specific partition, that
  could
   work. But what if I have X and Y as possible selection criteria, I can
   split based on 2 properties ? if yes, can I connect based only on X ?
  
   If I do it like this the qty of partitions that I'm going to create is
   going to be amazingly large. How kafka is going to perform ?
  
   Maybe I'm trying to fit a problem into a system that is not for that. I
   would love to have the amazing performance of kafka, but sadly I'm not
  sure
   if its the best fit for me because of this...
  
  
   Thank you very much guys for the responses
  
   On Thu, Aug 6, 2015 at 12:10 PM, Grant Henke ghe...@cloudera.com
  wrote:
  
   The filtering logic there is topic filtering and not message
 filtering.
  The
   idea is to subscribe to multiple topics via a regex whitelist or black
   list. This does exist today as it does not depend on understanding the
   content of the message, but I don't think it is what you are looking
  for.
  
   As far as message filtering goes; As Gwen said, The way Kafka is
  currently
   implemented is that Kafka is not aware of the content of messages, so
  there
   is no Selector logic available. However, If you know upfront how you
  would
   like to filter the messages you could write your producer to use
  multiple
   topics, or even some custom partitioning. And implement a consumer
 that
  can
   understand and filter based on that logic. However, that would be an
   involved and creative implementation based on your use case.
  
   I would recommend starting simple and just dropping the messages you
  don't
   care about on the consumer side. If throughput becomes a problem, then
   consider alternatives.
  
  
   On Thu, Aug 6, 2015 at 9:47 AM, Alvaro Gareppe agare...@gmail.com
  wrote:
  
   Is this implemented ?
  
 https://cwiki.apache.org/confluence/display/KAFKA/Consumer+API+changes
  ?
  
   This is message filtering on the client or server side ?
  
   On Tue, Aug 4, 2015 at 9:54 PM, Gwen Shapira g...@confluent.io
  wrote:
  
   The way Kafka is currently implemented is that Kafka is not aware of
   the
   content of messages, so there is no Selector logic available.
  
   The way to go is to implement the Selector in your client - i.e.
 your
   consume() loop will get all messages but will throw away those that
   don't
   fit your pattern.
  
  
   It may be worthwhile to add a ticket for pluggable selector logic in
   the
   new

Broker side consume-request filtering

2015-08-06 Thread Alvaro Gareppe
Is this discussion open ? Cause this is exactly what I’m looking for. 

Re: message filterin or selector

2015-08-06 Thread Alvaro Gareppe
Is this implemented ?
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+API+changes ?

This is message filtering on the client or server side ?

On Tue, Aug 4, 2015 at 9:54 PM, Gwen Shapira g...@confluent.io wrote:

 The way Kafka is currently implemented is that Kafka is not aware of the
 content of messages, so there is no Selector logic available.

 The way to go is to implement the Selector in your client - i.e. your
 consume() loop will get all messages but will throw away those that don't
 fit your pattern.


 It may be worthwhile to add a ticket for pluggable selector logic in the
 new consumer. I can't guarantee it will happen, there are infinite things
 that can be plugged into consumers and we need to draw the line somewhere,
 but worth a discussion.

 On Tue, Aug 4, 2015 at 2:05 PM, Alvaro Gareppe agare...@gmail.com wrote:

  The is way to implement a selector logic in kafka (similar to JMS
  selectors)
 
  So, allow to consume a message if only the message contains certain
 header
  or content ?
 
  I'm evaluating to migrate from ActiveMQ to kafka and I'm using the
 selector
  logic widely in the application
 
  --
  Ing. Alvaro Gareppe
  agare...@gmail.com
 




-- 
Ing. Alvaro Gareppe
agare...@gmail.com


message filterin or selector

2015-08-04 Thread Alvaro Gareppe
The is way to implement a selector logic in kafka (similar to JMS
selectors)

So, allow to consume a message if only the message contains certain header
or content ?

I'm evaluating to migrate from ActiveMQ to kafka and I'm using the selector
logic widely in the application

-- 
Ing. Alvaro Gareppe
agare...@gmail.com


Access control in kafka

2015-08-04 Thread Alvaro Gareppe
Can someone point me to documentation about access control in kafka. There
is something implemented in the current or plan for future versions ?

I need something that allows me to define what users are allowed to connect
to certain topic, and of course user management.

Thank you guys in advance!

-- 
Eng. Alvaro Gareppe