Mike, 

I guess what I said referred to use of fuse_hdfs as general solution. If we
were to use native APIs that'd be perfect. But we basically need to mount is
as a place where programs can simultaneously dump large amounts of data. 

-----Original Message-----
From: Michael Thomas [mailto:tho...@hep.caltech.edu] 
Sent: Friday, November 13, 2009 2:00 PM
To: common-user@hadoop.apache.org
Subject: Re: Alternative distributed filesystem.

On 11/13/2009 01:56 PM, Dmitry Pushkarev wrote:
> Dear Hadoop users,
>
>
>
> One of our hadoop clusters is being converted to SGE to run very specific
> application and we're thinking about how to utilize these huge hard-drives
> that are there. Since there will be no hadoop installed on these nodes
we're
> looking for alternative distributed filesystem that will have decent
> concurrent read/write performance (compared to HDFS) for large amounts of
> data. Using single filestorage - like NAS RAID arrays proved to be very
> ineffective when someone is pushing gigabytes of data on them.
>
>
>
> What other systems can we look at? We would like that FS to be mounted on
> every node, open source, hopefully we'd like to have POSIX compliance and
> decent random access performance (yet it isn't critical).
>
> HDFS doesn't fit the bill because mounting it via fuse_dfs and using
without
> any mapred jobs (i.e. data will typically be pushed from 1-2 nodes at most
> at different times) seems slightly "ass-backward" to say the least.

I would hardly call is ass-backwards.  I know of at least 3 HPC clusters 
that use only the HDFS component of Hadoop to serve 500TB+ of data to 
100+ worker nodes.

As a cluster filesystem, HDFS works pretty darn well.

--Mike



Reply via email to