We have had a lot of peace of mind by building a data pipeline that does not assume that hdfs is always up and running. If the application is primarily non real-time log processing - I would suggest batch/incremental copies of data to hdfs that can catch up automatically in case of failures/downtimes.
we have a rsync like map-reduce job that monitors a log directories and keeps pulling new data in (and suspect lot of other users do similar stuff as well). Might be a useful notion to generalize and put in contrib. -----Original Message----- From: Steve Sapovits [mailto:[EMAIL PROTECTED] Sent: Thursday, February 28, 2008 4:54 PM To: core-user@hadoop.apache.org Subject: Re: long write operations and data recovery > How does replication affect this? If there's at least one replicated > client still running, I assume that takes care of it? Never mind -- I get this now after reading the docs again. My remaining point of failure question concerns name nodes. The docs say manual intervention is still required if a name node goes down. How is this typically managed in production environments? It would seem even a short name node outage in a data intestive environment would lead to data loss (no name node to give the data to). -- Steve Sapovits Invite Media - http://www.invitemedia.com [EMAIL PROTECTED]