block placement

Evert Lammerts Thu, 30 Jun 2011 06:04:38 -0700

Hi list,

How does the NN place blocks on the disks within a single node? Does it spread 
out adjecent blocks of a single file horizontally over the disks? For example, 
lets say I have four DN's and each has 4 disks. (And forget about replication.) 
If I copy a file existing of 16 blocks of 128MB each to the cluster, will each 
disk have exactly one block of the file?


If I run some job over this file with its sixteen blocks this is important, 
since the cluster would use its maximum I/O capabilities.

This leads me to another question (which might be better of on mapred-user). 
Does the JT schedule its tasks to maximally use I/O capabilities? Would it try 
to process blocks that reside on a disk that is not currently being read from 
or written to? Or does it just use a randomized strategy?

Cheers,
Evert

block placement

Reply via email to