What is the best way to use the Hadoop output data

Huy Phan Thu, 25 Jun 2009 03:03:04 -0700

Hi everybody, I'm working on a hadoop project that processing the log files.
In the reduce part, as usual, I store the output to HDFS, but I also want
send those output data to the message queue using HTTP Post Request.
I'm wondering if there's any performance killer in this approach, I posted
the question to IRC channel and someone told me that there may be a
bottleneck.
Then I think about running a cron task to get the output data and send it to
MQ, but not sure it's the best way cause it's not synchronize with the
MapReduce process.
I wonder if there is any way to spawn a process directly from Hadoop after
all the MapReduce tasks finish ?

What is the best way to use the Hadoop output data

Reply via email to