We could consider the use of markers to throw in more metadata about the relevance of a particular log message.
On Fri, Apr 18, 2014 at 10:46 PM, Sean Busbey <bus...@cloudera.com> wrote: > I also try to limit what goes at higher warning levels. One of my goals > over hte next few months is to improve our current logging. It sounds like > this is a good time to make sure we're on the same page. > > We're going to have to train users on something (esp since our currently > logging is very noisy). The short version I like is "Info and more severe > are for operators; info and less severe are for developers." > > Here's what I usually use as a guideline (constrained to slf4j levels): > > > = ERROR > > Something is wrong and an operator needs to do something, preferably very > soon. In other words, if I was on call I'd expect to get paged. > > = WARN > > Something is amiss, but not of immediate concern. An operator who is on > call but not busy at the moment might want to investigate some kind of > underlying issue, but the system will continue to function within some > reasonable bound. > > = INFO > > Summary information about normal operations that is safe to ignore. GC > information, throughput stats, that kind of thing. > > = DEBUG > > Low level information that is not normally useful, but will help determine > the cause of a system malfunction. Usually something a developer or tier 3 > supporter would want when something was going wrong (e.g. stack traces). > > = TRACE > > Detailed low level information at a volume that probably can't be gathered > in production. > > > Eric, do those all sound reasonable? I want to make sure we have a common > basis before I get into the specifics of this case. > > -Sean > > On Fri, Apr 18, 2014 at 8:21 PM, Eric Newton <eric.new...@gmail.com> > wrote: > > > -1 > > > > I would hesitate to put *any* message at WARN. It is normal for balancing > > to take a little while, especially for some of my users who have their > own > > balancing algorithm. > > > > Users feel the need to fix the problem; after all, it's there in big > scary > > yellow on the monitor page. I don't like training users to ignore scary > > yellow. Is it a problem, or not? > > > > Alternatively, put the balance info into the master status, and display > it. > > Like GC collection time... hey, I've been migrating these tablets for a > > long time... turn yellow/red. > > > > -Eric > > > > > > > > > > On Fri, Apr 18, 2014 at 4:03 PM, Sean Busbey <bus...@cloudera.com> > wrote: > > > > > At the moment all of our logs about problems balancing are at DEBUG. > > > > > > Given the impact to a cluster when this happens (skewing load onto few > > > servers, in some case severely), I'd like to raise it to WARN so that > it > > > surfaces for operators in the Monitor and in the non-debug log. > > > > > > Thought I'd do a quick lazy consensus check before filing a jira and > > taking > > > care of it. > > > > > > -- > > > Sean > > > > > > > > > -- > Sean >