Hi,

I look at some scheduler documentations but could not find answer to my
question. My question is: suppose that i have a big file on 40 node hadoop
cluster and since it is a big file every node has at least one chunk of the
file. If i write a flink job and want to filter file and if job has
parelelism of 4(less that 40 actually) how datalocality is working? Does
some tasks read some chunks from remote nodes? Or scheduler schedule tasks
in way that keeping max paralelism at 4 but schedule tasks on every node?

Regards

Reply via email to