You might also look at Chukwa -- this was precisely the original problem Chukwa was designed to solve, and we're pretty much there. Chukwa is a particularly natural fit if you want your logs stored in HDFS.
On Mon, Sep 28, 2009 at 2:18 PM, Dan Milstein <[email protected]> wrote: > Hadoop-folk, > > How have people gone about collecting debug/error log information from > streaming jobs, in Hadoop? > > I'm clear that, if I write to stderr (and it's not a counter/status line), > then it goes onto the node's local disk, in: > > /var/log/hadoop/userlogs/<task atttempt>/stderr > > However, I'd really like to collect those in some central location, for > processing. Possibly via splunk (which we use right now), possibly some > other means. > > - Do people write a custom log4j appender? (does log4j even control writes > to that stderr file? I can't tell -- it somewhat looks like no) > > - Or, maybe write cron jobs that run on the slaves and periodically push > logs somewhere? > > - Are people outside of Facebook using scribe? > > Any ideas / experiences appreciated. > > Thanks, > -Dan Milstein > > > > -- Ari Rabkin [email protected] UC Berkeley Computer Science Department
