Not directly, but you may wish to take a look at the Kafka project
(http://sna-projects.com/kafka/), which we use as a queue and then
bring the data periodically into HDFS via an MR job.  See this
presentation: http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation
-Jakob



On Fri, Jun 24, 2011 at 10:12 AM, Saumitra Shahapure
<saumitra.offic...@gmail.com> wrote:
> Hi,
>
> Is queue-like structure supported from HDFS where stream of data is
> processed when it's generated?
> Specifically, I will have stream of data coming; and data independent
> operation needs to be applied to it (so only Map function, reducer is
> identity).
> I wish to distribute data among nodes using HDFS and start processing it as
> it arrives, preferably in single MR job.
>
> I agree that it can be done by starting new MR job for each batch of data,
> but is starting many MR jobs frequently for small data chunks a good idea?
> (Consider new batch arrives after every few sec and processing of one batch
> takes few mins)
>
> Thanks,
> --
> Saumitra S. Shahapure
>

Reply via email to