Re: Using HDFS to serve www requests

Jeff Hammerbacher Thu, 26 Mar 2009 16:09:02 -0700

HDFS itself has some facilities for serving data over HTTP:
https://issues.apache.org/jira/browse/HADOOP-5010. YMMV.


On Thu, Mar 26, 2009 at 3:47 PM, Brian Bockelman <bbock...@cse.unl.edu>wrote:

>
> On Mar 26, 2009, at 8:55 PM, phil cryer wrote:
>
>  When you say that you have huge images, how big is "huge?"
>>>
>>
>> Yes, we're looking at some images that are 100Megs in size, but
>> nothing like what you're speaking of.  This helps me understand
>> Hadoop's usage better and unfortunately it won't be the fit I was
>> hoping for.
>>
>>
> I wouldn't split hairs between 100MB and 1GB.  However, it may be less
> reliable due to the extra layer via FUSE if you want to serve it via apache.
>  It wouldn't be too bad to whip up a tomcat webapp that goes through
> Hadoop...
>
> It really depends on your hardware level and redundancy.  If you have the
> money to get the hardware necessary to go with a Lustre-based solution, do
> that.  If you have enough money to load up your pre-existing cluster with
> lots of disk, HDFS might be better.  Certainly it will be outperformed by
> lustre if you have lots of reliable hardware, especially in terms of
> latency.
>
> Brian
>
>
>  You can use the API or the FUSE module to mount hadoop but that is not
>>> a direct goal of hadoop. Hope that helps.
>>>
>>
>> Very interesting, and yes, that indeed does help, not to veer off
>> thread too much, but does Sun's Lustre follow in the steps of Gluster
>> then?  I know Lustre requires kernel patches to install, so it's at a
>> different level than the others, but I have seen some articles about
>> large scale clusters built with Lustre and want to look at that as
>> another option.
>>
>> Again, thanks for the info, if anyone has general information on
>> cluster software, or know of a more appropriate list, I'd appreciate
>> the advice.
>>
>> Thanks
>>
>> P
>>
>> On Thu, Mar 26, 2009 at 12:32 PM, Edward Capriolo <edlinuxg...@gmail.com>
>> wrote:
>>
>>> It is a little more natural to connect to HDFS from apache tomcat.
>>> This will allow you to skip the FUSE mounts and just use the HDFS-API.
>>>
>>> I have modified this code to run inside tomcat.
>>> http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample
>>>
>>> I will not testify to how well this setup will perform under internet
>>> traffic, but it does work.
>>>
>>> GlusterFS is more like a traditional POSIX filesystem. It supports
>>> locking and appends and you can do things like put the mysql data
>>> directory on it.
>>>
>>> GLUSTERFS is geared for storing data to be accessed with low latency.
>>> Nodes (Bricks) are normally connected via GIG-E or infiniban. The
>>> GlusterFS volume is mounted directly on a unix system.
>>>
>>> Hadoop is a user space file system. The latency is higher. Nodes are
>>> connected by GIG-E. It is closely coupled with MAP/REDUCE.
>>>
>>> You can use the API or the FUSE module to mount hadoop but that is not
>>> a direct goal of hadoop. Hope that helps.
>>>
>>>
>

Re: Using HDFS to serve www requests

Reply via email to