If you are interested in Hadoop's distributed filesystem HDFS [1] you might also be interested in Tahoe [2].
The downside to things like Hadoop and Tahoe as compared with S3 are that you have to manage the machines and services yourself, rather than paying someone else to do it in the cloud. But I guess for some this is an upside. Or has Yahoo built a service model around Hadoop? //Ed [1] http://hadoop.apache.org/core/docs/current/hdfs_design.html [2] http://allmydata.org/trac/tahoe