RE: long write operations and data recovery

Joydeep Sen Sarma Thu, 28 Feb 2008 18:07:01 -0800

We have had a lot of peace of mind by building a data pipeline that does
not assume that hdfs is always up and running. If the application is
primarily non real-time log processing - I would suggest
batch/incremental copies of data to hdfs that can catch up automatically
in case of failures/downtimes.


we have a rsync like map-reduce job that monitors a log directories and
keeps pulling new data in (and suspect lot of other users do similar
stuff as well). Might be a useful notion to generalize and put in
contrib.


-----Original Message-----
From: Steve Sapovits [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 28, 2008 4:54 PM
To: core-user@hadoop.apache.org
Subject: Re: long write operations and data recovery


> How does replication affect this?  If there's at least one replicated
>  client still running, I assume that takes care of it?

Never mind -- I get this now after reading the docs again.

My remaining point of failure question concerns name nodes.  The docs
say manual 
intervention is still required if a name node goes down.  How is this
typically managed
in production environments?   It would seem even a short name node
outage in a 
data intestive environment would lead to data loss (no name node to give
the data
to).

-- 
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]

RE: long write operations and data recovery

Reply via email to