Thank you, Brian. I found your paper "Using Hadoop as grid storage," and it was very useful.
One thing I did not understand in it is your file usage pattern - do you deal with small or large files, and do you delete them often enough? My question was, in part, can you use HDFS as a regular file system with frequent file deletes? Does it not become fragmented and unreliable? Thank you, Mark On Thu, Dec 2, 2010 at 7:10 AM, Brian Bockelman <bbock...@cse.unl.edu>wrote: > > On Dec 2, 2010, at 5:16 AM, Steve Loughran wrote: > > > On 02/12/10 03:01, Mark Kerzner wrote: > >> Hi, guys, > >> > >> I see that there is MountableHDFS< > http://wiki.apache.org/hadoop/MountableHDFS>, > >> and I know that it works, but my questions are as follows: > >> > >> - How reliable is it for large storage?; > > > > Shouldn't be any worse than normal HDFS operations. > > > >> - Is it not hiding the regular design questions - we are dealing with > >> NameServers after all, but are trying to use it as a regular file > system? > >> - For example, HDFS is not optimized for many small files that get > >> written and deleted, but a mounted system will lure one in this > direction. > > > > Like you say, it's not a conventional posix fs, it hates small files, > where other things may be better. > > I would comment that it's extremely reliable. There's at least one slow > memory leak in fuse-dfs that I haven't been able to squash, and I typically > remount things after a month or two of *heavy* usage. > > Across all the nodes in our cluster, we probably do a few billion HDFS > operations per day over FUSE. > > Brian