But you could do all this with larger blocks as well.  Having a large block
size only says that a block CAN be that long, not that it MUST be that long.

Also, you said that the average size was ~ 40MB (20 x 2MB blocks).  If that
is so, then you should be able to radically decrease the number of blocks
with a larger block size.


On 5/2/08 1:17 PM, "Cagdas Gerede" <[EMAIL PROTECTED]> wrote:

> fault tolerance. As files are uploaded into our server, we can continuously
> write the data in small chunks and if our server fails, we can tolerate this
> failure by switching our user to another server and the user can continue to
> write. Otherwise we have to wait on the server until we get the whole file
> to write it to Hadoop (if server fails then we lose all the data), or we
> need the user to cash all the data he is generating which is not feasible
> for our requirements.
> 
> I appreciate your comment on this.
> 
> Cagdas
> 
> On Fri, May 2, 2008 at 1:09 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> 
>> 
>> Why did you pick such a small block size?
>> 
>> Why not go with the default of 64MB?
>> 
>> That would give you only 10 million blocks for your 600TB.
>> 
>> I don't see any advantage to the tiny block size.
>> 
>> On 5/2/08 1:06 PM, "Cagdas Gerede" <[EMAIL PROTECTED]> wrote:
>> 
>>> Thanks Doug for your answers. Our interest is on more distributed file
>>> system part rather than map reduce.
>>> I must confess that our block size is not as large as how a lot of
>> people
>>> configure. I appreciate if I can get your and others' input.
>>> 
>>> Do you think these numbers are suitable?
>>> 
>>> We will have 5 million files each having 20 blocks of 2MB. With the
>> minimum
>>> replication of 3, we would have 300 million blocks.
>>> 300 million blocks would store 600TB. At ~10TB/node, this means a 60
>> node
>>> system.
>>> 
>>> Do you think these numbers are suitable for Hadoop DFS.
>>> 
>>> Cagdas
>>> 
>>> 
>>> 
>>> At ~100M per block, 100M blocks would store 10PB.  At ~1TB/node, this
>> means
>>>> a ~10,000 node system, larger than Hadoop currently supports well (for
>> this
>>>> and other reasons).
>>>> 
>>>> Doug
>>>> 
>>>> 
>>> 
>> 
>> 
> 

Reply via email to