Re: long write operations and data recovery

Ted Dunning Fri, 29 Feb 2008 08:40:07 -0800

In our case, we looked at the problem and decided that Hadoop wasn't
feasible for our real-time needs in any case.  There were several issues,


- first, of all, map-reduce itself didn't seem very plausible for real-time
applications.  That left hbase and hdfs as the capabilities offered by
hadoop (for real-time stuff)

- hbase was far to immature to consider using.  Also, the read rate from
hbase is not that impressive compared, say to a bank of a dozen or more
memcaches.

- hdfs won't handle nearly the volume of files that we need to work with.
In our main delivery system (one of many needs), we have nearly a billion
(=10^9) files that we have to be able to export at high data rates.  That
just isn't feasible in hadoop without lots of extra work.

The upshot is that we use hadoop extensively for batch operations where it
really shines.  The other nice effect is that we don't have to worry all
that much about HA (at least not real-time HA) since we don't do real-time
with hadoop.  


On 2/28/08 9:53 PM, "dhruba Borthakur" <[EMAIL PROTECTED]> wrote:

> I agree with Joydeep. For batch processing, it is sufficient to make the
> application not assume that HDFS is always up and active. However, for
> real-time applications that are not batch-centric, it might not be
> sufficient. There are a few things that HDFS could do to better handle
> Namenode outages:
> 
> 1. Make Clients handle transient Namenode downtime. This requires that
> Namenode restarts are fast, clients can handle long Namenode outages,
> etc.etc.
> 2. Design HDFS Namenode to be a set of two, an active one and a passive
> one. The active Namenode could continuously forward transactions to the
> passive one. In case of failure of the active Namenode, the passive
> could take over. This type of High-Availability would probably be very
> necessary for non-batch-type-applications.
> 
> Thanks,
> dhruba
> 
> -----Orivery necessaginal Message-----
> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
> Sent: Thursday, February 28, 2008 6:06 PM
> To: core-user@hadoop.apache.org
> Subject: RE: long write operations and data recovery
> 
> We have had a lot of peace of mind by building a data pipeline that does
> not assume that hdfs is always up and running. If the application is
> primarily non real-time log processing - I would suggest
> batch/incremental copies of data to hdfs that can catch up automatically
> in case of failures/downtimes.
> 
> we have a rsync like map-reduce job that monitors a log directories and
> keeps pulling new data in (and suspect lot of other users do similar
> stuff as well). Might be a useful notion to generalize and put in
> contrib.
> 
> 
> -----Original Message-----
> From: Steve Sapovits [mailto:[EMAIL PROTECTED]
> Sent: Thursday, February 28, 2008 4:54 PM
> To: core-user@hadoop.apache.org
> Subject: Re: long write operations and data recovery
> 
> 
>> How does replication affect this?  If there's at least one replicated
>>  client still running, I assume that takes care of it?
> 
> Never mind -- I get this now after reading the docs again.
> 
> My remaining point of failure question concerns name nodes.  The docs
> say manual 
> intervention is still required if a name node goes down.  How is this
> typically managed
> in production environments?   It would seem even a short name node
> outage in a 
> data intestive environment would lead to data loss (no name node to give
> the data
> to).

Re: long write operations and data recovery

Reply via email to