If you are looking for large distributed file system with posix locking look at:
glusterfs lusterfs ocfs2 redhat GFS Edward On Fri, Nov 13, 2009 at 5:07 PM, Michael Thomas <tho...@hep.caltech.edu> wrote: > Hi Dmitry, > > I still stand by my original statement. We do use fuse_dfs for reading data > on all of the worker nodes. We don't use it much for writing data, but only > because our project's data model was never designed to use a posix > filesystem for writing data, only reading. > > --Mike > > On 11/13/2009 02:04 PM, Dmitry Pushkarev wrote: >> >> Mike, >> >> I guess what I said referred to use of fuse_hdfs as general solution. If >> we >> were to use native APIs that'd be perfect. But we basically need to mount >> is >> as a place where programs can simultaneously dump large amounts of data. >> >> -----Original Message----- >> From: Michael Thomas [mailto:tho...@hep.caltech.edu] >> Sent: Friday, November 13, 2009 2:00 PM >> To: common-user@hadoop.apache.org >> Subject: Re: Alternative distributed filesystem. >> >> On 11/13/2009 01:56 PM, Dmitry Pushkarev wrote: >>> >>> Dear Hadoop users, >>> >>> >>> >>> One of our hadoop clusters is being converted to SGE to run very specific >>> application and we're thinking about how to utilize these huge >>> hard-drives >>> that are there. Since there will be no hadoop installed on these nodes >> >> we're >>> >>> looking for alternative distributed filesystem that will have decent >>> concurrent read/write performance (compared to HDFS) for large amounts of >>> data. Using single filestorage - like NAS RAID arrays proved to be very >>> ineffective when someone is pushing gigabytes of data on them. >>> >>> >>> >>> What other systems can we look at? We would like that FS to be mounted on >>> every node, open source, hopefully we'd like to have POSIX compliance and >>> decent random access performance (yet it isn't critical). >>> >>> HDFS doesn't fit the bill because mounting it via fuse_dfs and using >> >> without >>> >>> any mapred jobs (i.e. data will typically be pushed from 1-2 nodes at >>> most >>> at different times) seems slightly "ass-backward" to say the least. >> >> I would hardly call is ass-backwards. I know of at least 3 HPC clusters >> that use only the HDFS component of Hadoop to serve 500TB+ of data to >> 100+ worker nodes. >> >> As a cluster filesystem, HDFS works pretty darn well. >> >> --Mike >> >> >> > > >