Hi, Is queue-like structure supported from HDFS where stream of data is processed when it's generated? Specifically, I will have stream of data coming; and data independent operation needs to be applied to it (so only Map function, reducer is identity). I wish to distribute data among nodes using HDFS and start processing it as it arrives, preferably in single MR job.
I agree that it can be done by starting new MR job for each batch of data, but is starting many MR jobs frequently for small data chunks a good idea? (Consider new batch arrives after every few sec and processing of one batch takes few mins) Thanks, -- Saumitra S. Shahapure