Re: [Beowulf] Re: failure trends in a large disk drive population (google fileing system)

momentics Mon, 19 Feb 2007 19:24:51 -0800

On 2/19/07, matt jones <[EMAIL PROTECTED]> wrote:

if one fails there
are still 3, if another there are still 2. i've also read somewhere else
that if one fails, it can automatically recreate the image from the
remaining ones on a spare node.


[...]

this approach is rather ott, but it works and works well.



not sure of Google gents; but we're using reliability model to
calculate number of nodes and their physical locations (continuous
scheduling) - to meet the expected reliability coefficient specified
by the system operator/deployer/configurator (for EE, SW and HW
parts).

HDD is unreliable system part, with the nearly known reliability
(expected -actually), moreover, as we know, most of HDDs have SMART
metrics - the good way to correct live coefficients within used math
model. The outcome here is to use adaptive techs.
So Googles are using the same way probably - a good company anyhow... ta-da! :)

[EMAIL PROTECTED] – http://sgrid.sourceforge.net/

//
(the perfect doc - the amazing work)

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Re: failure trends in a large disk drive population (google fileing system)

Reply via email to