Hi, Bruno, The number of partitions consumed by a single task is also configurable via the partition assignment policies (job.systemstreampartition. grouper.factory). By default, there are two partition assignment policies implemented: org.apache.samza.container.grouper.stream.GroupByPartitionFactory and org.apache.samza.container.grouper.stream.GroupBySystemStreamPartitionFactory. The detailed explanation is available here: http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html
Thanks! -Yi On Mon, Sep 14, 2015 at 3:37 PM, <bruno.bona...@gmail.com> wrote: > Hi Yi, > > Does a single task consume from a single partition or it consumes from > more/all partitions? > > Thanks > Bruno > > > On 14 Sep 2015, at 23:22, Yi Pan <nickpa...@gmail.com> wrote: > > > > Hi, Bruno, > > > > The number of containers are configurable in YarnJobFactory via > > yarn.container.count. > > Each container is a single threaded model and you can run multiple tasks > in > > a single container. > > At maximum, you can have as many containers as the number of tasks in > this > > config to achieve 1 task / thread. > > > > Hope that clarifies the config a bit more for you. > > > > Thanks! > > > > -Yi > > > > On Mon, Sep 14, 2015 at 3:16 PM, Bruno Bonacci <bruno.bona...@gmail.com> > > wrote: > > > >> Thanks Yan for writing me back, > >> > >> That's ok for ThreadJobFactory and ProcessJobFactory but what about the > >> YarnJobFactory? > >> How many task/executors will be spawning? > >> > >> > >> Bruno > >> > >>> On Mon, Sep 14, 2015 at 7:08 PM, Yan Fang <yanfang...@gmail.com> > wrote: > >>> > >>> Hi Bruno, > >>> > >>> AFAIK, there is no existing JobFactory that brings as many threads as > the > >>> partition number. But I think nothing stops you to implement this: you > >> can > >>> get the partition information from the JobCoordinator, and then bring > as > >>> many threads as the partition/task number. > >>> > >>> Since the two local factories (ThreadJobFactory and ProcessJobFactory) > >> are > >>> mainly for development, there is no additional document. But most of > the > >>> code here > >>> < > >> > https://github.com/apache/samza/tree/master/samza-core/src/main/scala/org/apache/samza/job/local > >>> is > >>> self-explained. > >>> > >>> Thanks, > >>> > >>> Fang, Yan > >>> yanfang...@gmail.com > >>> > >>> On Sat, Sep 12, 2015 at 1:47 PM, Bruno Bonacci < > bruno.bona...@gmail.com> > >>> wrote: > >>> > >>>> Hi, > >>>> I'm looking for additional documentation on the different RUNTIME > >>>> EXECUTION MODELS of the different `job.factory.class`. > >>>> > >>>> I'm particularly interested on how each factory (ThreadJobFactory, > >>>> ProcessJobFactory and YarnJobFactory) will create tasks consume and > >>> process > >>>> messages out of Kafka and the thread model used. > >>>> > >>>> I did a few tests with the ThreadJob factory consuming out of a kafka > >>>> topic with 5 partitions and I was expecting that it would use multiple > >>>> threads to consume/process the different partitions, however it is > >>>> using only one thread at runtime. > >>>> > >>>> Is there any way to tell Samza to use multiple processing threads (1 > >> per > >>>> partition)?? > >>>> > >>>> > >>>> Thanks > >>>> Bruno > >> >