When a key is available, you generally include it because you want all
messages with the same key to always end up in the same partition. This
allows all messages with the same key to be processed by the same consumer
(e.g. allowing you to aggregate all data for a single user if you key on
user ID). To accomplish this you always consider all partitions (not just
available partitions) and keep the # of partitions in a topic fixed.

The docs on Kafka's design, specifically some notes in the producer &
consumer sections, cover a bit of this: http://kafka.apache.org/
documentation.html#intro_producers

-Ewen

On Mon, Aug 29, 2016 at 10:52 AM, BigData dev <bigdatadev...@gmail.com>
wrote:

> Hi All,
> In DefaultPartitioner implementation, when key is null, we get the
> partition number by modulo of available partitions. Below is the code
> snippet.
>
> if (availablePartitions.size() > 0)
> { int part = Utils.toPositive(nextValue) % availablePartitions.size();
> return availablePartitions.get(part).partition();
> }
> Where as when key is not null, we get the partition number by modulo of
> total no og partitions.
>
> return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
>
> As if some partitions are not available,then the producer will not be able
> to publish message to that partition.
>
> Should n't we do the same as by considering only available partitions?
>
> https://github.com/apache/kafka/blob/trunk/clients/src/
> main/java/org/apache/kafka/clients/producer/internals/
> DefaultPartitioner.java#L67
>
> Could any help to clarify on this issue.
>
>
> Thanks,
> Bharat
>



-- 
Thanks,
Ewen

Reply via email to