Ted Dunning wrote:
In our case, we looked at the problem and decided that Hadoop wasn't
feasible for our real-time needs in any case. There were several
issues,
- first, of all, map-reduce itself didn't seem very plausible for
real-time applications. That left hbase and hdfs as the capabilities
offered by hadoop (for real-time stuff)
We'll be using map-reduce batch mode, so we're okay there.
The upshot is that we use hadoop extensively for batch operations
where it really shines. The other nice effect is that we don't have
to worry all that much about HA (at least not real-time HA) since we
don't do real-time with hadoop.
What I'm struggling with is the write side of things. We'll have a huge
amount of data to write that's essentially a log format. It would seem
that writing that outside of HDFS then trying to batch import it would
be a losing battle -- that you would need the distributed nature of HDFS
to do very large volume writes directly and wouldn't easily be able to take
some other flat storage model and feed it in as a secondary step without
having the HDFS side start to lag behind.
The realization is that Name Node could go down so we'll have to have a
backup store that might be used during temporary outages, but that
most of the writes would be direct HDFS updates.
The alternative would seem to be to end up with a set of distributed files
without some unifying distributed file system (e.g., like lots of Apache
web logs on many many individual boxes) and then have to come up with
some way to funnel those back into HDFS.
--
Steve Sapovits
Invite Media - http://www.invitemedia.com
[EMAIL PROTECTED]