On 16/12/10 22:15, Luke Kanies wrote:
> On Dec 16, 2010, at 5:00 AM, Brice Figureau wrote:
>>[snipped myself]
>>
>> In short: is this a good idea? is there a better solution?
> 
> I guess I want to separate the question a bit, between is it a good
> idea to have instrumentation, and is the proposed solution a good
> one?
> 
> To the first, I say easily yes - some form of instrumentation in
> Puppet beyond the basic benchmarking of very limited things we have
> would be fantastic.
> 
> To the second, I guess there are a few questions I'd like to think
> about before being able to answer it.  Feel free to ignore any of
> these if you think they're irrelevant; they're just what popped into
> my head as a means of breaking it down.
> 
> * Are there other examples of this problem being solved in the ruby
> community that we can crib off of? It'd be great not to have to
> reinvent the wheel, especially in terms of design.  I really wish we
> could just add dtrace probes for it, but that only works on solaris
> and os x.

To my (extremly limited) knowledge there's none based outside the
ruby-dtrace[1] project.

> * Is there a difference between instrumentation to discover the
> problem behind stuck processes, and instrumentation to generally know
> what's going on in Puppet?  I.e., people probably would like some
> more understanding of what's slow without having to open a debugger,
> but this probably doesn't require the extra step of threads reporting
> separately.

My proposal is a dumb and simple system, it's not designed to cover
every use case, but at least should prevent people firing gdb to inspect
stack traces when they look at an apparently stuck process.

> * Do you have an idea of how you might implement the methods that do
> the instrumentation?

OK, maybe I was a too enthusiastic, and what I propose can't really be
qualified as instrumentation. It's more a visual signal to understand
what takes long in a process.

> * How do you decide where to put the instrumentation?  And how
> worried are you about doing so at the subsystem level (e.g.,
> networking, parser, indirector) vs. individual classes?

My overall plan was simply to cover various high-level aspect of the
master (ie parsing of individual files, some part of the compiler, file
serving) and the resource evaluation (ie in the transaction).
This should not be more than 10 different place where I'll add my "probes".

> I know this is a lot of background, but it's a top-level
> cross-cutting design, and I think it's hard enough that I've thought
> about it for a long time and never delivered on anything.

Yes, I'm thinking about this since a long time. I failed to find a
global way to solve the problem, so I thought that a small but maybe
useful attempt is way better than nothing :)

Anyway, even if this gets never merged upstream I'll produce the patch
for those willing to use it (like myself) ;-)

Now, if we want to think about a more general system, I think I won't
use dtrace. Of course it's elegant because doesn't use any system
resources, but it's not portable enough to cripple our whole codebase
for it. And I don't think accumulating a couple hundreds (if not less)
instrumentation metrics would be such a performance issue.
So one solution would be to set the probes ourselves (like I did in my
example earlier). Each of these probe "blocks" would accumulate CPU
time/memory, call site count (and/or everything we can find from ruby
using something like proc-wait3 or other). Those could be accumulated in
a thread safe array. Then sending a signal to one puppet process would
dump this array to log or a given file, or get this through the status
indirection.

The complex question as you asked it is where to put the probes :)
I think this could be an ongoing process that we can refine from release
to release, starting with the high-level ones I described.
It's also possible act transversally in the indirector, the network and
the transaction layer with a fully generic system. We would cover almost
every aspect of a puppet process.

But as with defining metrics the important question to ask ourselves
(and the users) is what would be interesting to measure? What the
question the instrumentation system should answer?

I should be able to propose a proof-of-concept implementation of the
generic system later if needed. Maybe we could refine the system based
on some real code to discuss on?

Thanks!

[1]: https://github.com/chrisa/ruby-dtrace
-- 
Brice Figureau
My Blog: http://www.masterzen.fr/

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Reply via email to