We have the ability to alter log levels at runtime. This would allow an
operator to temporarily increase log level for afflicted components, even
in production. Doing this on a server-by-server basis should have minimal
impact on overall cluster performance. Maybe this needs to be better
documented? Maybe we need a script that makes this easier, or could be
managed via a new shell command?

On Saturday, October 25, 2014, Andrew Purtell <apurt...@apache.org> wrote:

> ​
> On Sat, Oct 25, 2014 at 6:34 AM, Sean Busbey <bus...@cloudera.com
> <javascript:;>> wrote:
>
> > Even if debug is disabled in production, it could be enabled on a
> > non-production system for reproducing the problem, no?
> >
>
> ​In my experience, often enough, no.​
>
> I do hear the complaint that Hadoop ecosystem projects are quite operator
> unfriendly because error messages most often come in the form of a
> stacktrace. It's a totally valid point. I think we could certainly improve
> the exception message printed ahead of the stacktrace in a large number of
> cases.
>
>
>
> On Sat, Oct 25, 2014 at 6:34 AM, Sean Busbey <bus...@cloudera.com
> <javascript:;>> wrote:
>
> > Even if debug is disabled in production, it could be enabled on a
> > non-production system for reproducing the problem, no?
> >
> > --
> > Sean
> > On Oct 25, 2014 7:11 AM, "Qiang Tian" <tian...@gmail.com <javascript:;>>
> wrote:
> >
> > > perhaps case by case is better. stacktrace is one of most important
> > problem
> > > determination methods.  debug is mostly disabled in production, we may
> > lose
> > > important clues.
> > >
> > >
> > > On Sat, Oct 25, 2014 at 1:14 PM, Sean Busbey <bus...@cloudera.com
> <javascript:;>>
> > wrote:
> > >
> > > > Hi!
> > > >
> > > > Right now we have many failure paths where we send stack traces to
> log
> > > > files at ERROR / WARN. In an effort to make things easier to operate,
> > I'd
> > > > like to propose we move towards:
> > > >
> > > > * INFO/WARN/ERROR : description of failure and if possible an action
> an
> > > > operator could take to fix/diagnose
> > > > * DEBUG : information needed to handle failures that require
> developer
> > > > action, i.e. stack traces
> > > >
> > > > I figure this can go as one or more subtasks off of HBASE-12341, but
> > > wanted
> > > > to float things here before I get started.
> > > >
> > > > --
> > > > Sean
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Reply via email to