Konstantin,

Concerning i/o counters we collect them in rucollect() in for loop and update in various places, for example in vfs_bio.c. Rusage of an exiting threads is handled in exit1() -> ruadd().

Your version of the patch certainly is more robust, however I'm still concerning about following things, which could be done better:

1) Zeroing of thread runtime statistic looks fine if we use it only for whole process statistic calculating, but now (when it's also used as is for the thread statistic) it should be handled independently, i.e. RUSAGE_{SELF,CHILDREN} calls should not affect the thread structures somehow. So I suppose we need to introduce some new counters to proc structure and counters update code (it was stopped me to go on this way).

2) With first in mind, getruasge(RUSAGE_THREAD) is called in current thread and doesn't affect or use information from other threads, so it definitely should use less number of locks, may be with atomic operations for read-update-write operations. In fact the same getting per-thread statistic in Linux is done _without_ locks at all (however Linux uses different process/thread model).

If we're in time and it really looks like a problem (is getrusage() ever a hotspot?) I can try to prepare the patch with less locking on this weekend.

Also I still don't understand the sanity check in calccru1() for negativeness of tu ( (int64_t)tu < 0 ) - is it reachable? It seems that it is possible only after about 300 thousand years of uptime... Could you please explain this?

Should I write further about the patch to svn-src-all@ (sorry, but I'm new in FreeBSD mailing) ?

Kostik Belousov wrote:
On Mon, May 03, 2010 at 08:13:00PM +0000, Alexander Krizhanovsky wrote:
Kostik,

thank you very much for the review!

Kostik Belousov wrote:
On Mon, Apr 19, 2010 at 12:46:48AM +0000, Alexander Krizhanovsky wrote:
Hi all!

This patch implements per-thread rusage statistic (like RUSAGE_THREAD in Linux and RUSAGE_LWP in OpenSolaris).

Unfortunately, we have to acquire a number of locks to read/update system and user times for current thread rusage information because it's also used for whole process statistic and needs to be zeroed.

Any comments are very appreciated.

It's tested against 8.0-RELEASE. Appropriate PR is submitted.

--
Alexander Krizhanovsky
NatSys Lab. (http://natsys-lab.com)
tel: +7 (916) 717-3899, +7 (499) 747-6304
e-mail: a...@natsys-lab.com

I think this should be somewhat modified before commit.

First, please separate patch into two pieces. One would introduce
ruxagg_tlock() helper and use it in existing locations instead of
three-liners. Possibly, add K&R->ANSI C kern_getrusage definition
change.

Second would actually add RUSAGE_THREAD. There, please follow
the style(9), in particular, do not initialize the local variables
at the declaration, instead, use explicit initialization in the code.
Done. I have also introduced third patch for casting microseconds to timeval and replaced a few appropriate places in whole kernel code. patch-rusage-thread-03052010.txt is incremental for patch-ruxagg_tlock-03052010.txt, which is by turn incremental for patch-usec2timeval-03052010.txt.

I have made following change for calcru1():
-       if ((int64_t)tu < 0) {
-               /* XXX: this should be an assert /phk */
- printf("calcru: negative runtime of %jd usec for pid %d (%s)\n",
-                   (intmax_t)tu, p->p_pid, p->p_comm);
-               tu = ruxp->rux_tu;
-       }
+       KASSERT((int64_t)tu < 0, ("calcru: negative runtime of %jd usec "
+                               "for pid %d (%s)\n",
+                               (intmax_t)tu, p->p_pid, p->p_comm));

tu can become negative at about 300 thousand years of uptime, so this condition seems quite unrealistic and indeed should be replaced by an assertion. However I didn't understand why it isn't done so far and the comment only was added. Did I miss something?

Should calctru() share the code with calcru1() ? I am esp. concerned
with sanity check like negative runtime etc. Possibly, this code
>from calcru1() should be separated into function used from both
calcru1() and calctru().

Man page for getrusage(2) should be updated.
It's added to patch-rusage-thread-03052010.txt.

In the end - shoud I raise a new PR (the original one is kern/145813)?

Thanks for the submission.

It was some time, I already committed ruxagg_tlock() part, that caused
some feedback, and I reworked the rest of the patch myself. See svn-src@
for some discussion, and the latest version that I intend to commit
shortly is below. Your comments are welcome.

Lets discuss third patch after this is landed.

diff --git a/lib/libc/sys/getrusage.2 b/lib/libc/sys/getrusage.2
index bdf5d45..423503f 100644
--- a/lib/libc/sys/getrusage.2
+++ b/lib/libc/sys/getrusage.2
@@ -28,7 +28,7 @@
 .\"     @(#)getrusage.2   8.1 (Berkeley) 6/4/93
 .\" $FreeBSD$
 .\"
-.Dd June 4, 1993
+.Dd May 1, 2010
 .Dt GETRUSAGE 2
 .Os
 .Sh NAME
@@ -42,6 +42,7 @@
 .In sys/resource.h
 .Fd "#define      RUSAGE_SELF      0"
 .Fd "#define      RUSAGE_CHILDREN -1"
+.Fd "#define      RUSAGE_THREAD   1"
 .Ft int
 .Fn getrusage "int who" "struct rusage *rusage"
 .Sh DESCRIPTION
@@ -49,11 +50,12 @@ The
 .Fn getrusage
 system call
 returns information describing the resources utilized by the current
-process, or all its terminated child processes.
+thread, the current process, or all its terminated child processes.
 The
 .Fa who
 argument is either
-.Dv RUSAGE_SELF
+.Dv RUSAGE_THREAD ,
+.Dv RUSAGE_SELF ,
 or
 .Dv RUSAGE_CHILDREN .
 The buffer to which
@@ -175,6 +177,10 @@ The
 .Fn getrusage
 system call appeared in
 .Bx 4.2 .
+The
+.Dv RUSAGE_THREAD
+facility first appeared in
+.Fx 8.1 .
 .Sh BUGS
 There is no way to obtain information about a child process
 that has not yet terminated.
diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c
index 49a3097..6c50f1f 100644
--- a/sys/kern/kern_proc.c
+++ b/sys/kern/kern_proc.c
@@ -901,7 +901,7 @@ fill_kinfo_thread(struct thread *td, struct kinfo_proc *kp, 
int preferthread)
        kp->ki_pri.pri_user = td->td_user_pri;
if (preferthread) {
-               kp->ki_runtime = cputick2usec(td->td_runtime);
+               kp->ki_runtime = td->td_rux.rux_runtime;
                kp->ki_pctcpu = sched_pctcpu(td);
                kp->ki_estcpu = td->td_estcpu;
        }
diff --git a/sys/kern/kern_resource.c b/sys/kern/kern_resource.c
index a3ed75d..0bc78d0 100644
--- a/sys/kern/kern_resource.c
+++ b/sys/kern/kern_resource.c
@@ -76,7 +76,7 @@ static void   calcru1(struct proc *p, struct rusage_ext *ruxp,
                    struct timeval *up, struct timeval *sp);
 static int     donice(struct thread *td, struct proc *chgp, int n);
 static struct uidinfo *uilookup(uid_t uid);
-static void    ruxagg_tlock(struct proc *p, struct thread *td);
+static void    ruxagg(struct proc *p, struct thread *td);
/*
  * Resource controls and accounting.
@@ -630,7 +630,7 @@ lim_cb(void *arg)
                return;
        PROC_SLOCK(p);
        FOREACH_THREAD_IN_PROC(p, td) {
-               ruxagg_tlock(p, td);
+               ruxagg(p, td);
        }
        PROC_SUNLOCK(p);
        if (p->p_rux.rux_runtime > p->p_cpulimit * cpu_tickrate()) {
@@ -841,7 +841,7 @@ calcru(struct proc *p, struct timeval *up, struct timeval 
*sp)
        FOREACH_THREAD_IN_PROC(p, td) {
                if (td->td_incruntime == 0)
                        continue;
-               ruxagg_tlock(p, td);
+               ruxagg(p, td);
        }
        calcru1(p, &p->p_rux, up, sp);
 }
@@ -961,6 +961,16 @@ kern_getrusage(struct thread *td, int who, struct rusage 
*rup)
                calccru(p, &rup->ru_utime, &rup->ru_stime);
                break;
+ case RUSAGE_THREAD:
+               PROC_SLOCK(p);
+               ruxagg(p, td);
+               PROC_SUNLOCK(p);
+               thread_lock(td);
+               *rup = td->td_ru;
+               calcru1(p, &td->td_rux, &rup->ru_utime, &rup->ru_stime);
+               thread_unlock(td);
+               break;
+
        default:
                error = EINVAL;
        }
@@ -1001,7 +1011,7 @@ ruadd(struct rusage *ru, struct rusage_ext *rux, struct 
rusage *ru2,
  * Aggregate tick counts into the proc's rusage_ext.
  */
 void
-ruxagg(struct rusage_ext *rux, struct thread *td)
+ruxagg_locked(struct rusage_ext *rux, struct thread *td)
 {
THREAD_LOCK_ASSERT(td, MA_OWNED);
@@ -1010,18 +1020,19 @@ ruxagg(struct rusage_ext *rux, struct thread *td)
        rux->rux_uticks += td->td_uticks;
        rux->rux_sticks += td->td_sticks;
        rux->rux_iticks += td->td_iticks;
-       td->td_incruntime = 0;
-       td->td_uticks = 0;
-       td->td_iticks = 0;
-       td->td_sticks = 0;
 }
static void
-ruxagg_tlock(struct proc *p, struct thread *td)
+ruxagg(struct proc *p, struct thread *td)
 {
thread_lock(td);
-       ruxagg(&p->p_rux, td);
+       ruxagg_locked(&p->p_rux, td);
+       ruxagg_locked(&td->td_rux, td);
+       td->td_incruntime = 0;
+       td->td_uticks = 0;
+       td->td_iticks = 0;
+       td->td_sticks = 0;
        thread_unlock(td);
 }
@@ -1039,7 +1050,7 @@ rufetch(struct proc *p, struct rusage *ru)
        *ru = p->p_ru;
        if (p->p_numthreads > 0)  {
                FOREACH_THREAD_IN_PROC(p, td) {
-                       ruxagg_tlock(p, td);
+                       ruxagg(p, td);
                        rucollect(ru, &td->td_ru);
                }
        }
diff --git a/sys/kern/kern_thread.c b/sys/kern/kern_thread.c
index 9be4c2f..b32c584 100644
--- a/sys/kern/kern_thread.c
+++ b/sys/kern/kern_thread.c
@@ -432,7 +432,7 @@ thread_exit(void)
        PROC_UNLOCK(p);
        thread_lock(td);
        /* Save our tick information with both the thread and proc locked */
-       ruxagg(&p->p_rux, td);
+       ruxagg_locked(&p->p_rux, td);
        PROC_SUNLOCK(p);
        td->td_state = TDS_INACTIVE;
 #ifdef WITNESS
diff --git a/sys/sys/proc.h b/sys/sys/proc.h
index fb31cfc..e32e494 100644
--- a/sys/sys/proc.h
+++ b/sys/sys/proc.h
@@ -173,6 +173,27 @@ struct kdtrace_thread;
 struct cpuset;
/*
+ * XXX: Does this belong in resource.h or resourcevar.h instead?
+ * Resource usage extension.  The times in rusage structs in the kernel are
+ * never up to date.  The actual times are kept as runtimes and tick counts
+ * (with control info in the "previous" times), and are converted when
+ * userland asks for rusage info.  Backwards compatibility prevents putting
+ * this directly in the user-visible rusage struct.
+ *
+ * Locking for p_rux: (cj) means (j) for p_rux and (c) for p_crux.
+ * Locking for td_rux: (t) for all fields.
+ */
+struct rusage_ext {
+       u_int64_t       rux_runtime;    /* (cj) Real time. */
+       u_int64_t       rux_uticks;     /* (cj) Statclock hits in user mode. */
+       u_int64_t       rux_sticks;     /* (cj) Statclock hits in sys mode. */
+       u_int64_t       rux_iticks;     /* (cj) Statclock hits in intr mode. */
+       u_int64_t       rux_uu;         /* (c) Previous user time in usec. */
+       u_int64_t       rux_su;         /* (c) Previous sys time in usec. */
+       u_int64_t       rux_tu;         /* (c) Previous total time in usec. */
+};
+
+/*
  * Kernel runnable context (thread).
  * This is what is put to sleep and reactivated.
  * Thread context.  Processes may have multiple threads.
@@ -219,7 +240,8 @@ struct thread {
        u_int           td_estcpu;      /* (t) estimated cpu utilization */
        int             td_slptick;     /* (t) Time at sleep. */
        int             td_blktick;     /* (t) Time spent blocked. */
-       struct rusage   td_ru;          /* (t) rusage information */
+       struct rusage   td_ru;          /* (t) rusage information. */
+       struct rusage_ext td_rux;       /* (t) Internal rusage information. */
        uint64_t        td_incruntime;  /* (t) Cpu ticks to transfer to proc. */
        uint64_t        td_runtime;     /* (t) How many cpu ticks we've run. */
        u_int           td_pticks;      /* (t) Statclock hits for profiling */
@@ -426,26 +448,6 @@ do {                                                       
                \
 #define        TD_SET_CAN_RUN(td)      (td)->td_state = TDS_CAN_RUN
/*
- * XXX: Does this belong in resource.h or resourcevar.h instead?
- * Resource usage extension.  The times in rusage structs in the kernel are
- * never up to date.  The actual times are kept as runtimes and tick counts
- * (with control info in the "previous" times), and are converted when
- * userland asks for rusage info.  Backwards compatibility prevents putting
- * this directly in the user-visible rusage struct.
- *
- * Locking: (cj) means (j) for p_rux and (c) for p_crux.
- */
-struct rusage_ext {
-       u_int64_t       rux_runtime;    /* (cj) Real time. */
-       u_int64_t       rux_uticks;     /* (cj) Statclock hits in user mode. */
-       u_int64_t       rux_sticks;     /* (cj) Statclock hits in sys mode. */
-       u_int64_t       rux_iticks;     /* (cj) Statclock hits in intr mode. */
-       u_int64_t       rux_uu;         /* (c) Previous user time in usec. */
-       u_int64_t       rux_su;         /* (c) Previous sys time in usec. */
-       u_int64_t       rux_tu;         /* (c) Previous total time in usec. */
-};
-
-/*
  * Process structure.
  */
 struct proc {
diff --git a/sys/sys/resource.h b/sys/sys/resource.h
index 9af96af..e703744 100644
--- a/sys/sys/resource.h
+++ b/sys/sys/resource.h
@@ -56,6 +56,7 @@
#define RUSAGE_SELF 0
 #define        RUSAGE_CHILDREN -1
+#define        RUSAGE_THREAD   1
struct rusage {
        struct timeval ru_utime;        /* user time used */
diff --git a/sys/sys/resourcevar.h b/sys/sys/resourcevar.h
index 21728aa..95a9b49 100644
--- a/sys/sys/resourcevar.h
+++ b/sys/sys/resourcevar.h
@@ -131,7 +131,7 @@ void         rucollect(struct rusage *ru, struct rusage 
*ru2);
 void    rufetch(struct proc *p, struct rusage *ru);
 void    rufetchcalc(struct proc *p, struct rusage *ru, struct timeval *up,
            struct timeval *sp);
-void    ruxagg(struct rusage_ext *rux, struct thread *td);
+void    ruxagg_locked(struct rusage_ext *rux, struct thread *td);
 int     suswintr(void *base, int word);
 struct uidinfo
        *uifind(uid_t uid);


--
Alexander Krizhanovsky
NatSys Lab. (http://natsys-lab.com)
tel: +7 (916) 717-3899, +7 (499) 747-6304
e-mail: a...@natsys-lab.com

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Reply via email to