Re: [PATCH 03/22] 2.6.22-rc3 perfmon2 : new system calls support

2007-06-07 Thread Stephane Eranian
David,

On Wed, Jun 06, 2007 at 03:53:06PM -0700, David Rientjes wrote:
> On Wed, 6 Jun 2007, Stephane Eranian wrote:
> 
> > > Is this ia64 patch the one you mentioned that you did not post to LKML 
> > > because it was too large in patch 0?  Is there any way you could break 
> > > that patch up itself and post it for comments?
> > > 
> > Yes, this is the patch. It would be hard to break up in pieces.
> > The reason it is big is because it has to remove the older IA-64-only
> > implementation which was all in a single file whose size
> > was bigger than 100kB. It is hard to break this, unless I explicitely
> > remove the 'remove old file' diff from the patch.
> > 
> 
> Could you provide it online somewhere so it can be downloaded, reviewed, 
> and tested?

All the perfmon patches are available at:
http://perfmon2.sf.net
-- 

-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/22] 2.6.22-rc3 perfmon2 : new system calls support

2007-06-06 Thread David Rientjes
On Wed, 6 Jun 2007, Stephane Eranian wrote:

> > Is this ia64 patch the one you mentioned that you did not post to LKML 
> > because it was too large in patch 0?  Is there any way you could break 
> > that patch up itself and post it for comments?
> > 
> Yes, this is the patch. It would be hard to break up in pieces.
> The reason it is big is because it has to remove the older IA-64-only
> implementation which was all in a single file whose size
> was bigger than 100kB. It is hard to break this, unless I explicitely
> remove the 'remove old file' diff from the patch.
> 

Could you provide it online somewhere so it can be downloaded, reviewed, 
and tested?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/22] 2.6.22-rc3 perfmon2 : new system calls support

2007-06-06 Thread Stephane Eranian
David,

On Tue, Jun 05, 2007 at 06:34:56PM -0700, David Rientjes wrote:
> 
> > > > +int pfm_get_task(struct pfm_context *ctx, pid_t pid, struct 
> > > > task_struct **task)
> > > > +{
> > > 
> > > This function could be marked static even though it's exported through 
> > > perfmon.h in patch 13.  It is unreferenced elsewhere.
> > > 
> > No because it is used in another module on IA-64 (for compatibility with 
> > older versions).
> > 
> 
> Is this ia64 patch the one you mentioned that you did not post to LKML 
> because it was too large in patch 0?  Is there any way you could break 
> that patch up itself and post it for comments?
> 
Yes, this is the patch. It would be hard to break up in pieces.
The reason it is big is because it has to remove the older IA-64-only
implementation which was all in a single file whose size
was bigger than 100kB. It is hard to break this, unless I explicitely
remove the 'remove old file' diff from the patch.


> > > Why can't this be done with just struct task_struct *task as the third 
> > > formal and change the assignment later to task = p?
> > > 
> > Because we need to carry the errno back: ESRCH or EPERM.
> > 
> 
> Your formal is "struct task_struct **task" yet the only actual to this 
> function is the memory address of a pointer to a single struct task_struct 
> (i.e. it's never passed an array of struct task_struct pointers, which 
> "struct task_struct **task" is).
> 
> And since you only ever use this has *task to get the pointer, you can 
> change the formal to just be "struct task_struct *task" and then pass in a 
> pointer to a single struct task_struct.
> 
I must be missing something here. I am modifying the address of task * in
the function. This is my second return value.

int pfm_get_task(void **p)
{
*p = 0x1000;
return 0;
}

int main(void)
{
void *p;

p = 0x2000;
printf("p=%p\n", p);
pfm_get_task(&p);
printf("p=%p\n", p);
return 0;
}
I am not passing a pointer to an array of struct ask *, but merely the address 
of a pointer
to struct task *. 

> > > > +
> > > > +asmlinkage long sys_pfm_write_pmcs(int fd, struct pfarg_pmc __user 
> > > > *ureq, int count)
> > > > +{
> > > > +   struct pfm_context *ctx;
> > > > +   struct file *filp;
> > > > +   struct pfarg_pmc pmcs[PFM_PMC_STK_ARG];
> > > > +   struct pfarg_pmc *req;
> > > > +   void *fptr;
> > > > +   unsigned long flags;
> > > > +   size_t sz;
> > > > +   int ret, fput_needed;
> > > > +
> > > 
> > > Could this have a stack overflow on powerpc?
> > > 
> > The PFM_PMC_STK_ARG is per-arch,  so you could chose a very low value. 
> > I think it is set to 4. pfarg_pmc s 48 bytes and pfarg_pmd is 176 bytes
> > regardless of LP64 vs. ILP32.
> > 
> 
> Stack overflows like that are annoying to track down and powerpc has the 
> highest PFM_PMC_STK_ARG of the entire patchset.
> 
The function using this is a system call, so it is not too deep in the call 
stack
and then the perfmon function never go very deep.


> 
> I'm looking forward to seeing the next patchset and I'll give it a 
> thorough test run on x86_64.  It'd probably be best to base that patchset 
> off 2.6.22 when it's released.
> 
Very good thanks. I still need to go through all your other comments.

-- 

-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/22] 2.6.22-rc3 perfmon2 : new system calls support

2007-06-05 Thread David Rientjes
On Tue, 5 Jun 2007, Stephane Eranian wrote:

> > > +static int pfm_task_incompatible(struct pfm_context *ctx, struct 
> > > task_struct *task)
> > > +{
> > > + /*
> > > +  * no kernel task or task not owned by caller
> > > +  */
> > > + if (!task->mm) {
> > > + PFM_DBG("cannot attach to kernel thread [%d]", task->pid);
> > > + return -EPERM;
> > > + }
> > 
> > This isn't a sufficient check for whether a task is owned by the caller.
> > 
> 
> The comment is missing a part. The ownership test is done by 
> ptrace_may_attahc().
> The above test is about checking for a kernel-only thread.
> 

Ok.

> > > +int pfm_get_task(struct pfm_context *ctx, pid_t pid, struct task_struct 
> > > **task)
> > > +{
> > 
> > This function could be marked static even though it's exported through 
> > perfmon.h in patch 13.  It is unreferenced elsewhere.
> > 
> No because it is used in another module on IA-64 (for compatibility with 
> older versions).
> 

Is this ia64 patch the one you mentioned that you did not post to LKML 
because it was too large in patch 0?  Is there any way you could break 
that patch up itself and post it for comments?

> > Why can't this be done with just struct task_struct *task as the third 
> > formal and change the assignment later to task = p?
> > 
> Because we need to carry the errno back: ESRCH or EPERM.
> 

Your formal is "struct task_struct **task" yet the only actual to this 
function is the memory address of a pointer to a single struct task_struct 
(i.e. it's never passed an array of struct task_struct pointers, which 
"struct task_struct **task" is).

And since you only ever use this has *task to get the pointer, you can 
change the formal to just be "struct task_struct *task" and then pass in a 
pointer to a single struct task_struct.

> > > + if (check_mask & PFM_CMD_STOPPED) {
> > > +
> > > + spin_unlock_irqrestore(&ctx->lock, local_flags);
> > > +
> > > + /*
> > > +  * check that the thread is ptraced AND STOPPED
> > > +  */
> > > + ret = ptrace_check_attach(task, 0);
> > > +
> > > + spin_lock_irqsave(&ctx->lock, new_flags);
> > > +
> > > + /*
> > > +  * flags may be different than when we released the lock
> > > +  */
> > > + *flags = new_flags;
> > 
> > You can't do this, you'll need to either separate these functions out by 
> > having pfm_check_task_state() indicate by a return value that 
> > ptrace_check_attach() should be checked or that we've already failed, or 
> > come up with a non-locking solution.
> > 
> Are you worried about the the local_flags vs. new flags?
> I think your solution would be cleaner, I will see what I can do.
> 

It should be simple if this is broken down into two smaller functions, the 
latter of which is called only upon a successful return of the former.

> > > +
> > > +asmlinkage long sys_pfm_write_pmcs(int fd, struct pfarg_pmc __user 
> > > *ureq, int count)
> > > +{
> > > + struct pfm_context *ctx;
> > > + struct file *filp;
> > > + struct pfarg_pmc pmcs[PFM_PMC_STK_ARG];
> > > + struct pfarg_pmc *req;
> > > + void *fptr;
> > > + unsigned long flags;
> > > + size_t sz;
> > > + int ret, fput_needed;
> > > +
> > 
> > Could this have a stack overflow on powerpc?
> > 
> The PFM_PMC_STK_ARG is per-arch,  so you could chose a very low value. 
> I think it is set to 4. pfarg_pmc s 48 bytes and pfarg_pmd is 176 bytes
> regardless of LP64 vs. ILP32.
> 

Stack overflows like that are annoying to track down and powerpc has the 
highest PFM_PMC_STK_ARG of the entire patchset.

> Thanks for your feedback.
> 

I'm looking forward to seeing the next patchset and I'll give it a 
thorough test run on x86_64.  It'd probably be best to base that patchset 
off 2.6.22 when it's released.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/22] 2.6.22-rc3 perfmon2 : new system calls support

2007-06-05 Thread Stephane Eranian
David,

On Mon, Jun 04, 2007 at 07:38:23AM -0700, David Rientjes wrote:
> On Tue, 29 May 2007, Stephane Eranian wrote:
> > +static int pfm_task_incompatible(struct pfm_context *ctx, struct 
> > task_struct *task)
> > +{
> > +   /*
> > +* no kernel task or task not owned by caller
> > +*/
> > +   if (!task->mm) {
> > +   PFM_DBG("cannot attach to kernel thread [%d]", task->pid);
> > +   return -EPERM;
> > +   }
> 
> This isn't a sufficient check for whether a task is owned by the caller.
> 

The comment is missing a part. The ownership test is done by 
ptrace_may_attahc().
The above test is about checking for a kernel-only thread.

> > +
> > +   /*
> > +* cannot block in self-monitoring mode
> > +*/
> > +   if (ctx->flags.block && task == current) {
> > +   PFM_DBG("cannot load a in blocking mode on self for [%d]",
> > +   task->pid);
> > +   return -EINVAL;
> > +   }
> 
> Poor description of trying block notification on self.
> 

Fixed.

> > +
> > +   if (task->exit_state == EXIT_ZOMBIE || task->exit_state == EXIT_DEAD) {
> > +   PFM_DBG("cannot attach to zombie/dead task [%d]", task->pid);
> > +   return -EBUSY;
> > +   }
> > +   return 0;
> > +}
> > +
> > +/*
> > + * This function  is used in per-thread mode only AND when not
> > + * self-monitoring. It finds the task to monitor and checks
> > + * that the caller has persmissions to attach. It also checks
> > + * that the task is stopped via ptrace so that we can safely
> > + * modify its state.
> > + *
> > + * task refcount is increment when succesful.
> > + */
> > +int pfm_get_task(struct pfm_context *ctx, pid_t pid, struct task_struct 
> > **task)
> > +{
> 
> This function could be marked static even though it's exported through 
> perfmon.h in patch 13.  It is unreferenced elsewhere.
> 
No because it is used in another module on IA-64 (for compatibility with older 
versions).

> Why can't this be done with just struct task_struct *task as the third 
> formal and change the assignment later to task = p?
> 
Because we need to carry the errno back: ESRCH or EPERM.

> > +
> > +int pfm_check_task_state(struct pfm_context *ctx, int check_mask,
> > +unsigned long *flags)
> > +{
> 
> Another function that can be static.
> 
It cannot for the same reason described above.

> You need to mention the fact that ctx->lock needs to be locked before 
> calling this function.
> 
Fixed.

> No need to send a pointer to flags, just send the unsigned long value 
> itself, if necessary.  All callers currently send the address of an 
> automatic anyway.  Unlocking and then relocking ctx->lock by returning the 
> new flags is inappropriate.
> 
How can you guarantee that when you grab the spin_lock_irqsave() you'll
get the same flags as when you entered the syscall? I did tha tbecause I ran 
into
an issue at some point (I think it was on Itanium).

> > +   if (check_mask & PFM_CMD_STOPPED) {
> > +
> > +   spin_unlock_irqrestore(&ctx->lock, local_flags);
> > +
> > +   /*
> > +* check that the thread is ptraced AND STOPPED
> > +*/
> > +   ret = ptrace_check_attach(task, 0);
> > +
> > +   spin_lock_irqsave(&ctx->lock, new_flags);
> > +
> > +   /*
> > +* flags may be different than when we released the lock
> > +*/
> > +   *flags = new_flags;
> 
> You can't do this, you'll need to either separate these functions out by 
> having pfm_check_task_state() indicate by a return value that 
> ptrace_check_attach() should be checked or that we've already failed, or 
> come up with a non-locking solution.
> 
Are you worried about the the local_flags vs. new flags?
I think your solution would be cleaner, I will see what I can do.


> > +
> > +   if (ret)
> > +   return ret;
> > +   /*
> > +* we must recheck to verify if state has changed
> > +*/
> > +   if (ctx->state != state) {
> > +   PFM_DBG("old_state=%d new_state=%d",
> > +   state,
> > +   ctx->state);
> > +   goto recheck;
> 
> This should be unnecessary when you no longer drop the ctx->lock in this 
> function.
> 
I cannot keep ctx->lock help and unmask interrupts because there is a race
condition when a PMU interrupt (which grabs ctx->lock) or a timer tick
which may cause an event set switch (which grabs ctx->lock). Yet, 
ptrace_check_attach()
needs to be called with interrupt unmasked, thus I have to drop everything.

> > +int pfm_get_args(void __user *ureq, size_t sz, size_t lsz, void *laddr,
> > +void **req, void **ptr_free)
> > +{
> 
> Another function that can be static.
> 
Cannot for the same reaons described above.

> Needs a comment to say that *req and *ptr_free get kmalloc'd and requires 
> a free() of at least one of them upon return.
> 
Done.

> > +int pfm_get_smpl_arg(cha

Re: [PATCH 03/22] 2.6.22-rc3 perfmon2 : new system calls support

2007-06-04 Thread David Rientjes
On Tue, 29 May 2007, Stephane Eranian wrote:

> --- linux-2.6.22.base/perfmon/perfmon_syscalls.c  1969-12-31 
> 16:00:00.0 -0800
> +++ linux-2.6.22/perfmon/perfmon_syscalls.c   2007-05-29 03:24:14.0 
> -0700
> @@ -0,0 +1,991 @@
> +/*
> + * perfmon_syscalls.c: perfmon2 system call interface
> + *
> + * This file implements the perfmon2 interface which
> + * provides access to the hardware performance counters
> + * of the host processor.
> + *
> + * The initial version of perfmon.c was written by
> + * Ganesh Venkitachalam, IBM Corp.
> + *
> + * Then it was modified for perfmon-1.x by Stephane Eranian and
> + * David Mosberger, Hewlett Packard Co.
> + *
> + * Version Perfmon-2.x is a complete rewrite of perfmon-1.x
> + * by Stephane Eranian, Hewlett Packard Co.
> + *
> + * Copyright (c) 1999-2006 Hewlett-Packard Development Company, L.P.
> + * Contributed by Stephane Eranian <[EMAIL PROTECTED]>
> + *David Mosberger-Tang <[EMAIL PROTECTED]>
> + *
> + * More information about perfmon available at:
> + *   http://perfmon2.sf.net
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
> + * 02111-1307 USA
> +  */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * Context locking rules:
> + * -
> + *   - any thread with access to the file descriptor of a context can
> + * potentially issue perfmon calls
> + *
> + *   - calls must be serialized to guarantee correctness
> + *
> + *   - as soon as a context is attached to a thread or CPU, it may be
> + * actively monitoring. On some architectures, such as IA-64, this
> + * is true even though the pfm_start() call has not been made. This
> + * comes from the fact that on some architectures, it is possible to
> + * start/stop monitoring from userland.
> + *
> + *   - If monitoring is active, then there can PMU interrupts. Because
> + * context accesses must be serialized, the perfmon system calls
> + * must mask interrupts as soon as the context is attached.
> + *
> + *   - perfmon system calls that operate with the context unloaded cannot
> + * assume it is actually unloaded when they are called. They first need
> + * to check and for that they need interrupts masked. Then if the context
> + * is actually unloaded, they can unmask interrupts.
> + *
> + *   - interrupt masking holds true for other internal perfmon functions as
> + * well. Except for PMU interrupt handler because those interrupts cannot
> + * be nested.
> + *
> + *   - we mask ALL interrupts instead of just the PMU interrupt because we
> + * also need to protect against timer interrupts which could trigger
> + * a set switch.
> + */
> +
> +/*
> + * cannot attach if :
> + *   - kernel task
> + *   - task not owned by caller
> + *   - task is dead or zombie
> + *   - cannot use blocking notification when self-monitoring
> + */
> +static int pfm_task_incompatible(struct pfm_context *ctx, struct task_struct 
> *task)
> +{
> + /*
> +  * no kernel task or task not owned by caller
> +  */
> + if (!task->mm) {
> + PFM_DBG("cannot attach to kernel thread [%d]", task->pid);
> + return -EPERM;
> + }

This isn't a sufficient check for whether a task is owned by the caller.

> +
> + /*
> +  * cannot block in self-monitoring mode
> +  */
> + if (ctx->flags.block && task == current) {
> + PFM_DBG("cannot load a in blocking mode on self for [%d]",
> + task->pid);
> + return -EINVAL;
> + }

Poor description of trying block notification on self.

> +
> + if (task->exit_state == EXIT_ZOMBIE || task->exit_state == EXIT_DEAD) {
> + PFM_DBG("cannot attach to zombie/dead task [%d]", task->pid);
> + return -EBUSY;
> + }
> + return 0;
> +}
> +
> +/*
> + * This function  is used in per-thread mode only AND when not
> + * self-monitoring. It finds the task to monitor and checks
> + * that the caller has persmissions to attach. It also checks
> + * that the task is stopped via ptrace so that we can safely
> + * modify its state.
> + *
> + * task refcount is increment when succesful.
> + */
> +int pfm_get_task(struct pfm_context *ctx, pid_t pid, struct task_struct 
> **task)
> +{

This function could be marked static even though it's exporte

Re: [PATCH 03/22] 2.6.22-rc3 perfmon2 : new system calls support

2007-05-31 Thread Stephane Eranian
Christoph,

On Thu, May 31, 2007 at 04:21:34PM +0100, Christoph Hellwig wrote:
> On Tue, May 29, 2007 at 06:48:17AM -0700, Stephane Eranian wrote:
> > sys_pfm_create_context():
> > - create a new perfmon2 context and returns a file descriptor in
> >   the pfarg_ctx_t parameters. This is the first call an application
> >   must make to do monitoring 
> > - rewritten to pass sampling format identification as a string
> > - file descriptor is now returned by call
> > 
> > sys_pfm_write_pmcs():
> > - program the PMU configuration registers. Accepts vector of arguments
> >   of type pfarg_pmc_t
> > 
> > sys_pfm_write_pmds():
> > - program the PMU data registers. Accepts a vector of arguments of type
> >   pfarg_pmd_t
> > 
> > sys_pfm_read_pmds():
> > - read the PMU data registers.  Accepts a vector of arguments of type
> >   pfarg_pmd_t
> 
> This kind of interface doesn't make any sense at all.  Information should
> be read and written from filedescriptors using the read and write family
> syscalls and through the VFS instead of adding tons of system calls.
> 

They are all using file descriptors already.
We use read() for receiving overflow notifications. Write is not used.

> I fear we need to write down the requirements first and then come up
> with something better.  E.g. for per-task sampling an interface centered
> around a few files in /proc// would fit very nicely:
> 
>   /proc//perfmon_pmcs
>   /proc//perfmon_pmds
>   Obvious
>   /proc//perfmon_ctl
>   Can get control commands as ascii sets written to
> 
You don't want to do parsing because usually, when sampling, you have
to reprogram the registers on the fly and this is on the critical path.
Information has to be exchanged in binary format.

-- 

-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/22] 2.6.22-rc3 perfmon2 : new system calls support

2007-05-31 Thread Christoph Hellwig
On Tue, May 29, 2007 at 06:48:17AM -0700, Stephane Eranian wrote:
> sys_pfm_create_context():
>   - create a new perfmon2 context and returns a file descriptor in
> the pfarg_ctx_t parameters. This is the first call an application
> must make to do monitoring 
>   - rewritten to pass sampling format identification as a string
>   - file descriptor is now returned by call
> 
> sys_pfm_write_pmcs():
>   - program the PMU configuration registers. Accepts vector of arguments
> of type pfarg_pmc_t
>   
> sys_pfm_write_pmds():
>   - program the PMU data registers. Accepts a vector of arguments of type
> pfarg_pmd_t
> 
> sys_pfm_read_pmds():
>   - read the PMU data registers.  Accepts a vector of arguments of type
> pfarg_pmd_t

This kind of interface doesn't make any sense at all.  Information should
be read and written from filedescriptors using the read and write family
syscalls and through the VFS instead of adding tons of system calls.

I fear we need to write down the requirements first and then come up
with something better.  E.g. for per-task sampling an interface centered
around a few files in /proc// would fit very nicely:

  /proc//perfmon_pmcs
  /proc//perfmon_pmds
Obvious
  /proc//perfmon_ctl
Can get control commands as ascii sets written to


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/