Re: Using HDFS to serve www requests

2009-03-28 Thread phil cryer
"Yes. IMHO GlusterFS advertises benchmarks vs Luster."
You're right, I've found those now, thanks for the reply - it helped

P

On Fri, Mar 27, 2009 at 5:04 PM, Edward Capriolo  wrote:
>>>but does Sun's Lustre follow in the steps of Gluster then
>
> Yes. IMHO GlusterFS advertises benchmarks vs Luster.
>
> The main difference is that GlusterFS is a fuse (userspace filesystem)
> while Luster has to be patched into the kernel, or a module.
>


Re: Using HDFS to serve www requests

2009-03-26 Thread phil cryer
> When you say that you have huge images, how big is "huge?"

Yes, we're looking at some images that are 100Megs in size, but
nothing like what you're speaking of.  This helps me understand
Hadoop's usage better and unfortunately it won't be the fit I was
hoping for.

> You can use the API or the FUSE module to mount hadoop but that is not
> a direct goal of hadoop. Hope that helps.

Very interesting, and yes, that indeed does help, not to veer off
thread too much, but does Sun's Lustre follow in the steps of Gluster
then?  I know Lustre requires kernel patches to install, so it's at a
different level than the others, but I have seen some articles about
large scale clusters built with Lustre and want to look at that as
another option.

Again, thanks for the info, if anyone has general information on
cluster software, or know of a more appropriate list, I'd appreciate
the advice.

Thanks

P

On Thu, Mar 26, 2009 at 12:32 PM, Edward Capriolo  wrote:
> It is a little more natural to connect to HDFS from apache tomcat.
> This will allow you to skip the FUSE mounts and just use the HDFS-API.
>
> I have modified this code to run inside tomcat.
> http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample
>
> I will not testify to how well this setup will perform under internet
> traffic, but it does work.
>
> GlusterFS is more like a traditional POSIX filesystem. It supports
> locking and appends and you can do things like put the mysql data
> directory on it.
>
> GLUSTERFS is geared for storing data to be accessed with low latency.
> Nodes (Bricks) are normally connected via GIG-E or infiniban. The
> GlusterFS volume is mounted directly on a unix system.
>
> Hadoop is a user space file system. The latency is higher. Nodes are
> connected by GIG-E. It is closely coupled with MAP/REDUCE.
>
> You can use the API or the FUSE module to mount hadoop but that is not
> a direct goal of hadoop. Hope that helps.
>


Using HDFS to serve www requests

2009-03-26 Thread phil cryer
This is somewhat of a noob question I know, but after learning about
Hadoop, testing it in a small cluster and running Map Reduce jobs on
it, I'm still not sure if Hadoop is the right distributed file system
to serve web requests.  In other words, can, or is it right to, serve
Images and data from HDFS using something like FUSE to mount a
filesystem where Apache could serve images from it?  We have huge
images, thus the need for a distributed file system, and they go in,
get stored with lots of metadata, and are redundant with Hadoop/HDFS -
but is it the right way to serve web content?

I looked at glusterfs before, they had an Apache and Lighttpd module
which made it simple, does HDFS have something like this, do people
just use a FUSE option as I described, or is this not a good use of
Hadoop?

Thanks

P