Re: Using Hadoop for near real-time processing of log data

Mikhail Yakshin Wed, 25 Feb 2009 11:15:25 -0800

On Wed, Feb 25, 2009 at 10:09 PM, Edward Capriolo
>>> Is anyone using Hadoop as more of a near/almost real-time processing
>>> of log data for their systems to aggregate stats, etc?
>>
>> We do, although "near realtime" is pretty relative subject and your
>> mileage may vary. For example, startups / shutdowns of Hadoop jobs are
>> pretty expensive and it could take anything from 5-10 seconds up to
>> several minutes to get the job started and almost same thing goes for
>> job finalization. Generally, if your "near realtime" would tolerate
>> 3-4-5 minutes lag, it's possible to use Hadoop.
>
> I was thinking about this. Assuming your datasets are small would
> running a local jobtracker or even running the MinimMR cluster from
> the test case be an interesting way to run small jobs confided to one
> CPU?


Yeah, but what's the point of using Hadoop then? i.e. we lost all the
parallelism?

-- 
WBR, Mikhail Yakshin

Re: Using Hadoop for near real-time processing of log data

Reply via email to