Thanks for the clarification, my experience with Kafka was with 0.6 and 0.7
and I'm pretty sure each broker had N partitions, we were load balancing
brokers using a HW LB with a partition count of 1 and each broker had a
partition 0.

I guess this has changed in 0.8 then, or my memory is all messed up!

Anyway if there are N partitions spread over P brokers the Samza job
partitions fit the topic ones and everything is good.

Mathias.




On Fri, Aug 23, 2013 at 11:17 PM, Chris Riccomini
<[email protected]>wrote:

> Hey Mathias,
>
> Kafka's topic:partition mapping is not quite what you describe. If a topic
> were to have 4 partitions, and a cluster had 4 machines, you would not end
> up with 4*4=16 partitions. You would end up with 4 partitions, one on each
> box. Using your annotation, a topic T with N partitions would have N
> partitions (not P*N). These N partitions will be distributed as fairly as
> possible across all Kafka brokers.
>
> If a Samza job were to read from this 4 partition topic, it would have 4
> tasks, each consuming one partition of the topic.
>
> If a Samza job were reading from Topic A (4 partitions) and Topic B (6
> partitions), then the Samza job would have 6 tasks. Tasks 1-4 would read
> messages from both topics, and tasks 5-6 would receive messages only from
> Topic B.
>
> This is described in some detail in the docs:
>
>
> http://samza.incubator.apache.org/learn/documentation/0.7.0/container/task-
> runner.html
>
> The motivation for this partitioning model is that it allows us to
> guarantee that two topics with the same partition count, and same
> partitioning key, will be delivered to a single Samza task. For example,
> if you have an AdView topic and an AdClick topic with the same partition
> count, and both partitioned by member_id, then the Samza task that
> receives the AdView events for member 0 will also receive the AdClick
> events for member 0. This behavior enables aggregation and joining of data.
>
> Cheers,
> Chris
>
> On 8/23/13 12:42 PM, "Mathias Herberts" <[email protected]>
> wrote:
>
> >Hi,
> >
> >first of all kudos for putting Samza into the Apache Incubator, it's good
> >to have yet another approach to stream processing.
> >
> >IIRC in a multi-node Kafka cluster (let's assume P nodes), a topic T with
> >N
> >partitions will have N partitions on each node, so the total number of
> >partitions will be P*N.
> >
> >My question relates to the notion of partition in the Samza stream linked
> >to T, will the Samza partition number be N or P*N ?
> >
> >Mathias.
>
>

Reply via email to