Hi list, How does the NN place blocks on the disks within a single node? Does it spread out adjecent blocks of a single file horizontally over the disks? For example, lets say I have four DN's and each has 4 disks. (And forget about replication.) If I copy a file existing of 16 blocks of 128MB each to the cluster, will each disk have exactly one block of the file?
If I run some job over this file with its sixteen blocks this is important, since the cluster would use its maximum I/O capabilities. This leads me to another question (which might be better of on mapred-user). Does the JT schedule its tasks to maximally use I/O capabilities? Would it try to process blocks that reside on a disk that is not currently being read from or written to? Or does it just use a randomized strategy? Cheers, Evert