Hi Dotan, A Samza job will create one instance of your StreamTask class for each input partition. There is no particular limit to the number of such partitions you can have; the main limitation is that each partition requires a file handle on the Kafka brokers, so if you want to go over a few hundred, you'll need to be careful.
The number of containers is independent from the number of input partitions. You can set it to use one container, in which case all StreamTasks will be in the same JVM, and multiplexed onto a single thread. If you set it to use two containers, approximately half the StreamTasks will be in one JVM, approximately half in the other. Etc. If what you are talking about is several tasks in sequence within the same container (i.e. one task consumes another one's output): that isn't supported by Samza right now. Every task output has to be written to a stream. You can build your own mechanism for composing bits of logic within the same container, but Samza provides a deliberately low-level interface which doesn't include such a mechanism. Hope that helps, Martin On 25 Nov 2014, at 06:58, Dotan Patrich <[email protected]> wrote: > Hi, > > We run a topology that contains multiple tasks and plan to add more to it > in the near future. However, one of the key design issues that I > considering is how granular should each samza task should be: on the one > hand have granular tasks helps integrating them at different parts of the > topology, however on the other hand each task has it's own basic JVM memory > requirement that restrict how many tasks a machine can host. > > One thing I noticed in the documentation is that each samza container can > host several tasks? > "The SamzaContainer is responsible for managing the startup, execution, and > shutdown of one or more StreamTask > <http://samza.incubator.apache.org/learn/documentation/0.7.0/api/overview.html> > instances" > > I thought this could be some sort of workaround to the memory concerns I > have (assuming cpu consumption of the streaming task will work out ok). > Can anyone share how to host several tasks in a single container? Are those > only tasks instances for different partitions or can it be different tasks > all together? > > Thanks, > Dotan
