On Wed, Feb 25, 2009 at 10:09 PM, Edward Capriolo >>> Is anyone using Hadoop as more of a near/almost real-time processing >>> of log data for their systems to aggregate stats, etc? >> >> We do, although "near realtime" is pretty relative subject and your >> mileage may vary. For example, startups / shutdowns of Hadoop jobs are >> pretty expensive and it could take anything from 5-10 seconds up to >> several minutes to get the job started and almost same thing goes for >> job finalization. Generally, if your "near realtime" would tolerate >> 3-4-5 minutes lag, it's possible to use Hadoop. > > I was thinking about this. Assuming your datasets are small would > running a local jobtracker or even running the MinimMR cluster from > the test case be an interesting way to run small jobs confided to one > CPU?
Yeah, but what's the point of using Hadoop then? i.e. we lost all the parallelism? -- WBR, Mikhail Yakshin