Hi, I look at some scheduler documentations but could not find answer to my question. My question is: suppose that i have a big file on 40 node hadoop cluster and since it is a big file every node has at least one chunk of the file. If i write a flink job and want to filter file and if job has parelelism of 4(less that 40 actually) how datalocality is working? Does some tasks read some chunks from remote nodes? Or scheduler schedule tasks in way that keeping max paralelism at 4 but schedule tasks on every node?
Regards