You could take a look at Chukwa, which essentially collects and drops your
logs to HDFS:
http://wiki.apache.org/hadoop/Chukwa
The last time I tried to play with Chukwa, it wasn't in a state to be played
with yet. If that's still the case, then you can use Scribe to collect all
of your logs in a single place, and then create a quick Python script to
persist these logs to HDFS. Learn more about Scribe here:
http://www.cloudera.com/blog/2008/11/02/configuring-and-using-scribe-for-hadoop-log-collection/
Alex
On Tue, Nov 18, 2008 at 2:37 PM, Nathan Marz [EMAIL PROTECTED] wrote:
We find that after about 400 to 500 jobs run in succession on our Hadoop
cluster, the disk space on each machine is quickly used up by logs for all
the tasks. What do people do to manage these logs? Does Hadoop have anything
built in for managing them? Or do we have to delete/move the logs with a
home-cooked method?
Thanks,
Nathan Marz
Rapleaf