But you could do all this with larger blocks as well. Having a large block size only says that a block CAN be that long, not that it MUST be that long.
Also, you said that the average size was ~ 40MB (20 x 2MB blocks). If that is so, then you should be able to radically decrease the number of blocks with a larger block size. On 5/2/08 1:17 PM, "Cagdas Gerede" <[EMAIL PROTECTED]> wrote: > fault tolerance. As files are uploaded into our server, we can continuously > write the data in small chunks and if our server fails, we can tolerate this > failure by switching our user to another server and the user can continue to > write. Otherwise we have to wait on the server until we get the whole file > to write it to Hadoop (if server fails then we lose all the data), or we > need the user to cash all the data he is generating which is not feasible > for our requirements. > > I appreciate your comment on this. > > Cagdas > > On Fri, May 2, 2008 at 1:09 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > >> >> Why did you pick such a small block size? >> >> Why not go with the default of 64MB? >> >> That would give you only 10 million blocks for your 600TB. >> >> I don't see any advantage to the tiny block size. >> >> On 5/2/08 1:06 PM, "Cagdas Gerede" <[EMAIL PROTECTED]> wrote: >> >>> Thanks Doug for your answers. Our interest is on more distributed file >>> system part rather than map reduce. >>> I must confess that our block size is not as large as how a lot of >> people >>> configure. I appreciate if I can get your and others' input. >>> >>> Do you think these numbers are suitable? >>> >>> We will have 5 million files each having 20 blocks of 2MB. With the >> minimum >>> replication of 3, we would have 300 million blocks. >>> 300 million blocks would store 600TB. At ~10TB/node, this means a 60 >> node >>> system. >>> >>> Do you think these numbers are suitable for Hadoop DFS. >>> >>> Cagdas >>> >>> >>> >>> At ~100M per block, 100M blocks would store 10PB. At ~1TB/node, this >> means >>>> a ~10,000 node system, larger than Hadoop currently supports well (for >> this >>>> and other reasons). >>>> >>>> Doug >>>> >>>> >>> >> >> >