Hi,

Is queue-like structure supported from HDFS where stream of data is
processed when it's generated?
Specifically, I will have stream of data coming; and data independent
operation needs to be applied to it (so only Map function, reducer is
identity).
I wish to distribute data among nodes using HDFS and start processing it as
it arrives, preferably in single MR job.

I agree that it can be done by starting new MR job for each batch of data,
but is starting many MR jobs frequently for small data chunks a good idea?
(Consider new batch arrives after every few sec and processing of one batch
takes few mins)

Thanks,
-- 
Saumitra S. Shahapure

Reply via email to