Hey Krishna,

Let me clarify the current state of things a little:
1. Kafka offers a single producer interface as well as two consumer
interfaces: the low-level "simple consumer" which just directly makes
network requests, and the higher level interface which handles
fault-tolerance, partition assignment, etc. These have been in all releases
and not too much has changed with them.
2. Partitioning in the producer is controlled by the key specified with the
message. This key is used to assign the message to a partition. This is the
normal mechanism for balancing load. If no key is specified the producer
will connect to a single broker at random and send its traffic there (to
minimize the number of tcp connections). If you have many producers this
will also balance traffic, but if you have just one it will not and you
will want to specify some partitioning key (you can even just use a random
number if you like). This behavior has really really confused people and
seems to have been a mistake on our part.

In an effort to simplify these interfaces as well as improve a lot of other
things, we are working on a future replacement producer and consumer. The
intention is that these will replace the existing clients (both the current
producer, as well as the simple and high-level consumer). This is the
KafkaProducer and KafkaConsumer discussion you are referring to. These are
not yet available, code is being written right now. The producer is
available in beta form on trunk if you want to try it out but the consumer
does not yet exist so you definitely can't use that. :-)

Hope that helps!

Cheers,

-Jay

On Wed, Mar 19, 2014 at 2:33 AM, Krishna Raj <reach.krishna...@gmail.com>wrote:

> Hello Experts & Kafka Team,
>
> Its existing to learn and work on using Kafka. I have been going through
> lot of pages and Q&A.
>
> We are building an infra & topology using Kafka for events processing in
> our application.
>
> We need some advice about designing the Producer and Consumer.
>
> *Please find attached file/below picture* of our current setup that we
> are thinking of.
>
>
> [image: Inline image 1]
>
> *1) Producer:*
>
> I understand that from 0.8.1, the message balancing is done in a fashion
> that the broker will choose a partition after every meta refresh(the
> default of which is 10 mins)
>
> Questions are:
>
> a.* Is there any other mechanism other than changing meta refresh ?* (I
> understand that the logic implementation using custom class is no longer
> supported in 0.8.1)
>
> b. We ultimately want the message to be evenly distributed across
> partitions so that consumer's load is also evenly distributed thus paving
> scalability & reduce lag and will help us scale easily as we can just add
> partition with a corresponding consumer node attached to it. Is this
> advised ? And to achieve this, *what is the optimized meta refresh time
> without affecting performance ?*
>
> *2) Consumer*
>
> a.  I was in a though that SimpleConsumer has more flexibillity and
> features. But after reading Neha's below JavaDoc, I am liking KafkaConsumer
> features and the less need to handle at granular level. *What is the
> adviced Consumer, SimpleCosumer or KafkaConsumer ?*
>
> Neha's KafkaCosumer JavaDoc:
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html
>
> b. For keeping track of the Offset at each Consumer node, I am thinking to
> manually control the Offset commit(to ensure that processing a message is
> neither missed nor duplicated). On failure or exception, I would also log
> the current Offset in a file or something before exiting so that when I
> start my consumer again I can start from the Offset where I left. *Is
> this a good design ?*
>
>
> Thanks for the time and really appreciate the effort for making Kafka
> amazing :)
>
> Thanks,
> KR
>
>
>

Reply via email to