How best to collect userlogs (in a streaming world)

Dan Milstein Mon, 28 Sep 2009 14:18:47 -0700

Hadoop-folk,

How have people gone about collecting debug/error log information fromstreaming jobs, in Hadoop?

I'm clear that, if I write to stderr (and it's not a counter/statusline), then it goes onto the node's local disk, in:


 /var/log/hadoop/userlogs/<task atttempt>/stderr

However, I'd really like to collect those in some central location,for processing. Possibly via splunk (which we use right now),possibly some other means.

- Do people write a custom log4j appender? (does log4j even controlwrites to that stderr file? I can't tell -- it somewhat looks like no)

- Or, maybe write cron jobs that run on the slaves and periodicallypush logs somewhere?


 - Are people outside of Facebook using scribe?

Any ideas / experiences appreciated.

Thanks,
-Dan Milstein

How best to collect userlogs (in a streaming world)

Reply via email to