Re: long write operations and data recovery

Jason Venner Thu, 28 Feb 2008 22:53:42 -0800

us also.

The pulling in of data from external machines then a pipeline of simplemap/reduces is our standard pattern.


Joydeep Sen Sarma wrote:

We have had a lot of peace of mind by building a data pipeline that does
not assume that hdfs is always up and running. If the application is
primarily non real-time log processing - I would suggest
batch/incremental copies of data to hdfs that can catch up automatically
in case of failures/downtimes.

we have a rsync like map-reduce job that monitors a log directories and
keeps pulling new data in (and suspect lot of other users do similar
stuff as well). Might be a useful notion to generalize and put in
contrib.


-----Original Message-----

From: Steve Sapovits [mailto:[EMAIL PROTECTED]Sent: Thursday, February 28, 2008 4:54 PM

To: core-user@hadoop.apache.org
Subject: Re: long write operations and data recovery

How does replication affect this?  If there's at least one replicated
 client still running, I assume that takes care of it?


Never mind -- I get this now after reading the docs again.

My remaining point of failure question concerns name nodes.  The docs

say manualintervention is still required if a name node goes down. How is this

typically managed
in production environments?   It would seem even a short name node

outage in adata intestive environment would lead to data loss (no name node to give

the data
to).

Re: long write operations and data recovery

Reply via email to