Re: Queue support from HDFS

Jakob Homan Fri, 24 Jun 2011 12:32:53 -0700

Not directly, but you may wish to take a look at the Kafka project
(http://sna-projects.com/kafka/), which we use as a queue and then
bring the data periodically into HDFS via an MR job.  See this
presentation: http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation
-Jakob




On Fri, Jun 24, 2011 at 10:12 AM, Saumitra Shahapure
<saumitra.offic...@gmail.com> wrote:
> Hi,
>
> Is queue-like structure supported from HDFS where stream of data is
> processed when it's generated?
> Specifically, I will have stream of data coming; and data independent
> operation needs to be applied to it (so only Map function, reducer is
> identity).
> I wish to distribute data among nodes using HDFS and start processing it as
> it arrives, preferably in single MR job.
>
> I agree that it can be done by starting new MR job for each batch of data,
> but is starting many MR jobs frequently for small data chunks a good idea?
> (Consider new batch arrives after every few sec and processing of one batch
> takes few mins)
>
> Thanks,
> --
> Saumitra S. Shahapure
>

Re: Queue support from HDFS

Reply via email to