Ah, one thing that is worth mentioning is that the way partitions are handled changed between 0.7 and 0.8. What you describe sounds like the 0.7 partitioning behavior. Samza only supports 0.8. This change was actual made to better support partitioned stream processing which requires hard partition guarantees and highly available partitions (otherwise your partitioning will change when the partition count changes as you add machines or they die). As of 0.8 the number of partitions is set at topic creation time and is independent of the number of servers (as Chris describes).
Hopefully that helps explain. -Jay On Fri, Aug 23, 2013 at 12:42 PM, Mathias Herberts < [email protected]> wrote: > Hi, > > first of all kudos for putting Samza into the Apache Incubator, it's good > to have yet another approach to stream processing. > > IIRC in a multi-node Kafka cluster (let's assume P nodes), a topic T with N > partitions will have N partitions on each node, so the total number of > partitions will be P*N. > > My question relates to the notion of partition in the Samza stream linked > to T, will the Samza partition number be N or P*N ? > > Mathias. >
