Not directly, but you may wish to take a look at the Kafka project (http://sna-projects.com/kafka/), which we use as a queue and then bring the data periodically into HDFS via an MR job. See this presentation: http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation -Jakob
On Fri, Jun 24, 2011 at 10:12 AM, Saumitra Shahapure <saumitra.offic...@gmail.com> wrote: > Hi, > > Is queue-like structure supported from HDFS where stream of data is > processed when it's generated? > Specifically, I will have stream of data coming; and data independent > operation needs to be applied to it (so only Map function, reducer is > identity). > I wish to distribute data among nodes using HDFS and start processing it as > it arrives, preferably in single MR job. > > I agree that it can be done by starting new MR job for each batch of data, > but is starting many MR jobs frequently for small data chunks a good idea? > (Consider new batch arrives after every few sec and processing of one batch > takes few mins) > > Thanks, > -- > Saumitra S. Shahapure >