Although this thread is wandering a bit, I disagree strongly that it is inappropriate to discuss other vendor specific features (or competing compute platform features) on general@. The topic has become the factors that influence hardware purchase choices, and one of those is how the system deals with disk failure. Compare/contrast with other platforms is healthy for the Hadoop project!
On 6/30/11 9:47 PM, "Ian Holsman" <had...@holsman.net> wrote: > >On Jul 1, 2011, at 2:08 PM, M. C. Srivas wrote: > >> On Thu, Jun 30, 2011 at 5:24 PM, Todd Lipcon <t...@cloudera.com> wrote: >> >>> >>> I'd advise you to look at "stock hadoop" again. This used to be true, >>>but >>> was fixed a long while back by HDFS-457 and several followup JIRAs. >>> >>> If MapR does something fancier, I'm sure we'd be interested to hear >>>about >>> it >>> so we can compare the approaches. >>> >>> -Todd >>> >>> >> MapR tracks disk responsiveness. In other words, a moving histogram of >> IO-completion times is maintained internally, and if a disk starts >>getting >> really slow, it is pre-emptively taken offline so it does not create >>long >> tails for running jobs (and the data on the disk is re-replicated using >> whatever re-replication policy is in place). One of the benefits of >> managing the disks directly instead of through ext3 / xfs / or other ... >> >> All these stats can be fed into Ganglia (or pushed out centrally via a >>text >> file that can be pulled out using NFS) if historical info about disk >> behavior (and failures) needs to be preserved. >> >> - Srivas. > >While I am intrigued about how MapR performs internally, I don't think >this is the forum for it. >please keep MapR (and other vendor specific discussions) on their >respective support forums. > >Thanks! > >Ian. >