> But you could do all this with larger blocks as well. Having a large block > size only says that a block CAN be that long, not that it MUST be that long.
No you cannot. Imagine a streaming server where users send real time generated data to your server and each file is not more than 100MB. Let's assume user do not have more than 10 MB of local cache space. So user cannot keep more than 10 MB of data while he is generating the data. So user caches the data, and streams it to your server. As one chunk of data accumulates, your server writes that chunk to Hadoop, gets confirmation from Hadoop and sends an ack to the user so that user can delete data from his cache (because data is persisted). This way you are making the system tolerant to the failure of your servers. How would you do the same thing with a block size of 100MB? What am I missing? Cagdas On Fri, May 2, 2008 at 1:20 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > > Also, you said that the average size was ~ 40MB (20 x 2MB blocks). If > that > is so, then you should be able to radically decrease the number of blocks > with a larger block size. >