Slots are like "resource groups" which execute entire pipelines. They frequently have more than one operator.
What you can try as a workaround is decrease the number of slots per machine to cause the operators to be spread across more machines. If this is a crucial issue for your use case, it should be simple to add a "preference to spread out" to the scheduler... On Tue, Dec 1, 2015 at 3:26 PM, Kashmar, Ali <ali.kash...@emc.com> wrote: > Is there a way to make a task cluster-parallelizable? I.e. Make sure the > parallel instances of the task are distributed across the cluster. When I > run my flink job with a parallelism of 16, all the parallel tasks are > assigned to the first task manager. > > - Ali > > On 2015-11-30, 2:18 PM, "Ufuk Celebi" <u...@apache.org> wrote: > > > > >> On 30 Nov 2015, at 17:47, Kashmar, Ali <ali.kash...@emc.com> wrote: > >> Do the parallel instances of each task get distributed across the > >>cluster or is it possible that they all run on the same node? > > > >Yes, slots are requested from all nodes of the cluster. But keep in mind > >that multiple tasks (forming a local pipeline) can be scheduled to the > >same slot (1 slot can hold many tasks). > > > >Have you seen this? > > > https://ci.apache.org/projects/flink/flink-docs-release-0.10/internals/job > >_scheduling.html > > > >> If they can all run on the same node, what happens when that node > >>crashes? Does the job manager recreate them using the remaining open > >>slots? > > > >What happens: The job manager tries to restart the program with the same > >parallelism. Thus if you have enough free slots available in your > >cluster, this works smoothly (so yes, the remaining/available slots are > >used) > > > >With a YARN cluster the task manager containers are restarted > >automatically. In standalone mode, you have to take care of this yourself. > > > > > >Does this help? > > > > Ufuk > > > >