First, I confess that I'm pretty uncomfortable answering Oracle
support questions here.  Oracle has very aggressively monetized the
operating system with the notion that its support was of tremendous
value; that Oracle has to essentially rely on the broader community
for that support speaks volumes to the actual value that Oracle is
provided its customers in this regard.  Put more bluntly: let no one
question the vitality of illumos when it is we in the illumos
community who are fielding a support call about your operating
system...

Now, on to the meat here.  The problem here is a bug in the script,
albeit one that is due to some subtle semantics:  threads that exit
will fire sched:::on-cpu but never fire sched:::off-cpu.  (That is, a
thread exits while on the CPU; it does not give up the CPU to die.)
This can, of course, be easily confirmed with DTrace itself; try this
experiment:

 # dtrace -n sched:::on-cpu,sched:::off-cpu,proc:::lwp-exit'/pid ==
$target/{printf("%d: %s\n", timestamp, probename)}' -q -c
"/usr/bin/sleep 1" | sort -n

When I run this on my illumos machine, I see:

  153116965615211: on-cpu
  153116965637283: off-cpu
  153116965876279: on-cpu
  153116966272791: off-cpu
  153116966908637: on-cpu
  153116966979203: off-cpu
  153116967125718: on-cpu
  153116967215732: off-cpu
  153116967377948: on-cpu
  153116967447287: off-cpu
  153116967573339: on-cpu
  153116968106208: off-cpu
  153116968166568: on-cpu
  153116968181498: off-cpu
  153116968347277: on-cpu
  153116968367427: off-cpu
  153116980796598: on-cpu
  153116980810590: off-cpu
  153116980969177: on-cpu
  153116981580498: off-cpu
  153116981746760: on-cpu
  153116981761285: off-cpu
  153116981909940: on-cpu
  153116981931360: off-cpu
  153116996876523: on-cpu
  153116996890628: off-cpu
  153116997047335: on-cpu
  153116999245574: off-cpu
  153117999387773: on-cpu
  153117999472949: lwp-exit

So the script is leaking a dynamic variable whenever a thread that has
been off CPU exits -- and it's not at all surprising that dynamic
variable drops are only seen after 24 hours and only under heavy load.
 To fix this, the customer should add a clause for proc:::lwp-exit
that zeroes out their thread-local variable....

        - Bryan
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to