Re: long write operations and data recovery

Steve Sapovits Fri, 29 Feb 2008 11:19:31 -0800

Ted Dunning wrote:

In our case, we looked at the problem and decided that Hadoop wasn'tfeasible for our real-time needs in any case. There were several
issues,
- first, of all, map-reduce itself didn't seem very plausible for
real-time applications.  That left hbase and hdfs as the capabilities
offered by hadoop (for real-time stuff)


We'll be using map-reduce batch mode, so we're okay there.

The upshot is that we use hadoop extensively for batch operations
where it really shines.  The other nice effect is that we don't have
to worry all that much about HA (at least not real-time HA) since we
don't do real-time with hadoop.


What I'm struggling with is the write side of things.  We'll have a huge
amount of data to write that's essentially a log format.  It would seem
that writing that outside of HDFS then trying to batch import it would
be a losing battle -- that you would need the distributed nature of HDFS
to do very large volume writes directly and wouldn't easily be able to take
some other flat storage model and feed it in as a secondary step without
having the HDFS side start to lag behind.

The realization is that Name Node could go down so we'll have to have a

backup store that might be used during temporary outages, but thatmost of the writes would be direct HDFS updates.


The alternative would seem to be to end up with a set of distributed files

without some unifying distributed file system (e.g., like lots of Apacheweb logs on many many individual boxes) and then have to come up with

some way to funnel those back into HDFS.

--
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]

Re: long write operations and data recovery

Reply via email to