We have the ability to alter log levels at runtime. This would allow an operator to temporarily increase log level for afflicted components, even in production. Doing this on a server-by-server basis should have minimal impact on overall cluster performance. Maybe this needs to be better documented? Maybe we need a script that makes this easier, or could be managed via a new shell command?
On Saturday, October 25, 2014, Andrew Purtell <apurt...@apache.org> wrote: > β > On Sat, Oct 25, 2014 at 6:34 AM, Sean Busbey <bus...@cloudera.com > <javascript:;>> wrote: > > > Even if debug is disabled in production, it could be enabled on a > > non-production system for reproducing the problem, no? > > > > βIn my experience, often enough, no.β > > I do hear the complaint that Hadoop ecosystem projects are quite operator > unfriendly because error messages most often come in the form of a > stacktrace. It's a totally valid point. I think we could certainly improve > the exception message printed ahead of the stacktrace in a large number of > cases. > > > > On Sat, Oct 25, 2014 at 6:34 AM, Sean Busbey <bus...@cloudera.com > <javascript:;>> wrote: > > > Even if debug is disabled in production, it could be enabled on a > > non-production system for reproducing the problem, no? > > > > -- > > Sean > > On Oct 25, 2014 7:11 AM, "Qiang Tian" <tian...@gmail.com <javascript:;>> > wrote: > > > > > perhaps case by case is better. stacktrace is one of most important > > problem > > > determination methods. debug is mostly disabled in production, we may > > lose > > > important clues. > > > > > > > > > On Sat, Oct 25, 2014 at 1:14 PM, Sean Busbey <bus...@cloudera.com > <javascript:;>> > > wrote: > > > > > > > Hi! > > > > > > > > Right now we have many failure paths where we send stack traces to > log > > > > files at ERROR / WARN. In an effort to make things easier to operate, > > I'd > > > > like to propose we move towards: > > > > > > > > * INFO/WARN/ERROR : description of failure and if possible an action > an > > > > operator could take to fix/diagnose > > > > * DEBUG : information needed to handle failures that require > developer > > > > action, i.e. stack traces > > > > > > > > I figure this can go as one or more subtasks off of HBASE-12341, but > > > wanted > > > > to float things here before I get started. > > > > > > > > -- > > > > Sean > > > > > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >