[PATCH 0/2] Cell OProfile: SPU mutex lock fix, version 4

2008-08-01 Thread Carl Love
Patch 1 is version 4 of the SPU mutex lock fix.

Patch 2 is a patch to fix an issue found in OProfile while implementing
version 4 of the SPU mutex lock fix.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 1/2] Cell OProfile: SPU mutex lock fix, version 4

2008-08-01 Thread Carl Love
Version 4 of the SPU mutex lock fix.

The issue is the SPU code is not holding the kernel mutex lock while
adding samples to the kernel buffer.

This patch creates per SPU buffers to hold the data.  Data
is added to the buffers from in interrupt context.  The data
is periodically pushed to the kernel buffer via a new Oprofile
function oprofile_put_buff(). The oprofile_put_buff() function
is called via a work queue enabling the funtion to acquire the
mutex lock.  

The existing user controls for adjusting the per CPU buffer 
size is used to control the size of the per SPU buffers.  
Similarly, overflows of the SPU buffers are reported by 
incrementing the per CPU buffer stats.  This eliminates the 
need to have architecture specific controls for the per SPU 
buffers which is not acceptable to the OProfile user tool
maintainer.

The export of the oprofile add_event_entry() is removed as it 
is no longer needed given this patch.
 
Note, this patch has not addressed the issue of indexing arrays
by the spu number.  This still needs to be fixed as the spu
numbering is not guarenteed to be 0 to max_num_spus-1.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>
Signed-off-by: Maynard Johnson   <[EMAIL PROTECTED]>


Index: Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
===
--- Cell_kernel_6_26_2008.orig/drivers/oprofile/cpu_buffer.c
+++ Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
@@ -37,11 +37,24 @@ static int work_enabled;
 void free_cpu_buffers(void)
 {
int i;
- 
+
for_each_online_cpu(i)
vfree(per_cpu(cpu_buffer, i).buffer);
 }
 
+unsigned long oprofile_get_cpu_buffer_size(void)
+{
+   return fs_cpu_buffer_size;
+}
+
+void oprofile_cpu_buffer_inc_smpl_lost(void)
+{
+   struct oprofile_cpu_buffer *cpu_buf
+   = &__get_cpu_var(cpu_buffer);
+
+   cpu_buf->sample_lost_overflow++;
+}
+
 int alloc_cpu_buffers(void)
 {
int i;
Index: Cell_kernel_6_26_2008/include/linux/oprofile.h
===
--- Cell_kernel_6_26_2008.orig/include/linux/oprofile.h
+++ Cell_kernel_6_26_2008/include/linux/oprofile.h
@@ -84,13 +84,6 @@ int oprofile_arch_init(struct oprofile_o
 void oprofile_arch_exit(void);
 
 /**
- * Add data to the event buffer.
- * The data passed is free-form, but typically consists of
- * file offsets, dcookies, context information, and ESCAPE codes.
- */
-void add_event_entry(unsigned long data);
-
-/**
  * Add a sample. This may be called from any context. Pass
  * smp_processor_id() as cpu.
  */
@@ -160,5 +153,14 @@ int oprofilefs_ulong_from_user(unsigned 
 
 /** lock for read/write safety */
 extern spinlock_t oprofilefs_lock;
+
+/**
+ * Add the contents of a circular buffer to the event buffer.
+ */
+void oprofile_put_buff(unsigned long *buf,unsigned int start,
+   unsigned int stop, unsigned int max);
+
+unsigned long oprofile_get_cpu_buffer_size(void);
+void oprofile_cpu_buffer_inc_smpl_lost(void);
  
 #endif /* OPROFILE_H */
Index: Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_profiler.c
===
--- Cell_kernel_6_26_2008.orig/arch/powerpc/oprofile/cell/spu_profiler.c
+++ Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_profiler.c
@@ -23,12 +23,11 @@
 
 static u32 *samples;
 
-static int spu_prof_running;
+int spu_prof_running;
 static unsigned int profiling_interval;
 
 #define NUM_SPU_BITS_TRBUF 16
 #define SPUS_PER_TB_ENTRY   4
-#define SPUS_PER_NODE   8
 
 #define SPU_PC_MASK 0x
 
@@ -208,6 +207,7 @@ int start_spu_profiling(unsigned int cyc
 
spu_prof_running = 1;
hrtimer_start(&timer, kt, HRTIMER_MODE_REL);
+   schedule_delayed_work(&spu_work, DEFAULT_TIMER_EXPIRE);
 
return 0;
 }
Index: Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
===
--- Cell_kernel_6_26_2008.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -35,7 +35,106 @@ static DEFINE_SPINLOCK(buffer_lock);
 static DEFINE_SPINLOCK(cache_lock);
 static int num_spu_nodes;
 int spu_prof_num_nodes;
-int last_guard_val[MAX_NUMNODES * 8];
+int last_guard_val[MAX_NUMNODES * SPUS_PER_NODE];
+static int spu_ctx_sw_seen[MAX_NUMNODES * SPUS_PER_NODE];
+static int sync_start_registered;
+static int delayed_work_init;
+
+struct spu_buffers spu_buff;
+struct delayed_work spu_work;
+static unsigned max_spu_buff;
+
+static void spu_buff_add(unsigned long int value, int spu)
+{
+   /* spu buff is a circular buffer.  Add entries to the
+* head.  Head is the index to store the next value.
+* The buffer is full when there is one available entry
+* in the queue, i.e. head and tail can't be equal.
+* That way we can tell the differen

[PATCH 2/2] Cell OProfile: SPU mutex lock fix, version 4

2008-08-01 Thread Carl Love

If an error occurs on opcontrol start, the event and per cpu buffers 
are released.  If later opcontrol shutdown is called then the free
function will be called again to free buffers that no longer 
exist.  This results in a kernel oops.  The following changes
prevent the call to delete buffers that don't exist.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>

Index: Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
===
--- Cell_kernel_6_26_2008.orig/drivers/oprofile/cpu_buffer.c
+++ Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
@@ -38,8 +38,12 @@ void free_cpu_buffers(void)
 {
int i;
 
-   for_each_online_cpu(i)
-   vfree(per_cpu(cpu_buffer, i).buffer);
+   for_each_online_cpu(i) {
+   if (per_cpu(cpu_buffer, i).buffer) {
+   vfree(per_cpu(cpu_buffer, i).buffer);
+   per_cpu(cpu_buffer, i).buffer = NULL;
+   }
+   }
 }
 
 unsigned long oprofile_get_cpu_buffer_size(void)
Index: Cell_kernel_6_26_2008/drivers/oprofile/event_buffer.c
===
--- Cell_kernel_6_26_2008.orig/drivers/oprofile/event_buffer.c
+++ Cell_kernel_6_26_2008/drivers/oprofile/event_buffer.c
@@ -92,7 +92,10 @@ out:
 
 void free_event_buffer(void)
 {
-   vfree(event_buffer);
+   if (event_buffer)
+   vfree(event_buffer);
+
+   event_buffer = NULL;
 }
 
  


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/2] Cell OProfile: SPU mutex lock fix, version 4

2008-08-08 Thread Carl Love

On Fri, 2008-08-08 at 18:08 +0200, Arnd Bergmann wrote:
> On Friday 01 August 2008, Carl Love wrote:
> > The issue is the SPU code is not holding the kernel mutex lock while
> > adding samples to the kernel buffer.
> 
> Thanks for your patch, and sorry for not replying earlier.
> It looks good from a functionality perspective, I just have
> some style comments left that I hope we can address quickly.
> 
> Since this is still a bug fix (though a rather large one), I
> guess we can should get it into 2.6.27-rc3.
> 
>   Arnd <><
> 
> > Index: Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
> > ===
> > --- Cell_kernel_6_26_2008.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
> > +++ Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
> > @@ -35,7 +35,106 @@ static DEFINE_SPINLOCK(buffer_lock);
> >  static DEFINE_SPINLOCK(cache_lock);
> >  static int num_spu_nodes;
> >  int spu_prof_num_nodes;
> > -int last_guard_val[MAX_NUMNODES * 8];
> > +int last_guard_val[MAX_NUMNODES * SPUS_PER_NODE];
> > +static int spu_ctx_sw_seen[MAX_NUMNODES * SPUS_PER_NODE];
> > +static int sync_start_registered;
> > +static int delayed_work_init;
> 
> You don't need the delayed_work_init variable. Just initialize
> the work queue in your init function to be sure it's always
> right.
> 
> I think you should also try to remove sync_start_registered,
> these global state variables can easily get annoying when you
> try to change something.
> AFAICT, spu_sync_stop does not get called unless spu_sync_start
> was successful, so the sync_start_registered variable is
> redundant.
> 


I was able to remove the delayed_work_init variable with a little code
restructuring.  

I worked on the sync_start_registered variable.  I had put it in because
I thought that the call to spu_switch_event_unregister() call for a non
registered event was giving me a kernel error.  I retested this and it
does look like it is OK.  So it looks safe to remove the
sync_start_registered variable.  

The rest of your comments were fairly easy to address.


> > +static int oprofile_spu_buff_create(void)
> > +{
> > +   int spu;
> > +
> > +   max_spu_buff = oprofile_get_cpu_buffer_size();
> > +
> > +   for (spu = 0; spu < num_spu_nodes; spu++) {
> > +   /* create circular buffers to store the data in.
> > +* use locks to manage accessing the buffers
> > +*/
> > +   spu_buff.index[spu].head = 0;
> > +   spu_buff.index[spu].tail = 0;
> > +
> > +   /*
> > +* Create a buffer for each SPU.  Can't reliably
> > +* create a single buffer for all spus due to not
> > +* enough contiguous kernel memory.
> > +*/
> > +
> > +   spu_buff.buff[spu] = kzalloc((max_spu_buff
> > + * sizeof(unsigned long)),
> > +GFP_KERNEL);
> > +
> > +   if (!spu_buff.buff[spu]) {
> > +   printk(KERN_ERR "SPU_PROF: "
> > +  "%s, line %d:  oprofile_spu_buff_create " \
> > +  "failed to allocate spu buffer %d.\n",
> > +  __FUNCTION__, __LINE__, spu);
> 
> The formatting of the printk line is a little unconventional. You certainly
> don't need the '\' at the end of the line.
> Also, please don't use __FUNCTION__ in new code, but instead of the
> standard c99 __func__ symbol. The __LINE__ macro is fine.
> 
> > +
> > +   /* release the spu buffers that have been allocated */
> > +   while (spu >= 0) {
> > +   if (spu_buff.buff[spu]) {
> > +   kfree(spu_buff.buff[spu]);
> > +   spu_buff.buff[spu] = 0;
> > +   }
> > +   spu--;
> > +   }
> > +   return 1;
> > +   }
> > +   }
> > +   return 0;
> > +}
> 
> The convention for a function would be to return -ENOMEM here instead
> of 1.
> 
> >  /* The main purpose of this function is to synchronize
> >   * OProfile with SPUFS by registering to be notified of
> >   * SPU task switches.
> > @@ -372,30 +521,50 @@ static int number_of_online_nodes(void)
> >   */
> >  int spu_sync_start(void)
> >  {
> > -   int k;
> > +   int

Re: [PATCH 1/2] Repost Cell OProfile: SPU mutex lock fix, version 4

2008-08-08 Thread Carl Love
Version 4 of the SPU mutex lock fix.

Updated to address Arnd's comments.

The issue is the SPU code is not holding the kernel mutex lock while
adding samples to the kernel buffer.

This patch creates per SPU buffers to hold the data.  Data
is added to the buffers from in interrupt context.  The data
is periodically pushed to the kernel buffer via a new Oprofile
function oprofile_put_buff(). The oprofile_put_buff() function
is called via a work queue enabling the funtion to acquire the
mutex lock.  

The existing user controls for adjusting the per CPU buffer 
size is used to control the size of the per SPU buffers.  
Similarly, overflows of the SPU buffers are reported by 
incrementing the per CPU buffer stats.  This eliminates the 
need to have architecture specific controls for the per SPU 
buffers which is not acceptable to the OProfile user tool
maintainer.

The export of the oprofile add_event_entry() is removed as it 
is no longer needed given this patch.
 
Note, this patch has not addressed the issue of indexing arrays
by the spu number.  This still needs to be fixed as the spu
numbering is not guarenteed to be 0 to max_num_spus-1.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>
Signed-off-by: Maynard Johnson   <[EMAIL PROTECTED]>


Index: Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
===
--- Cell_kernel_6_26_2008.orig/drivers/oprofile/cpu_buffer.c
+++ Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
@@ -37,11 +37,24 @@ static int work_enabled;
 void free_cpu_buffers(void)
 {
int i;
- 
+
for_each_online_cpu(i)
vfree(per_cpu(cpu_buffer, i).buffer);
 }
 
+unsigned long oprofile_get_cpu_buffer_size(void)
+{
+   return fs_cpu_buffer_size;
+}
+
+void oprofile_cpu_buffer_inc_smpl_lost(void)
+{
+   struct oprofile_cpu_buffer *cpu_buf
+   = &__get_cpu_var(cpu_buffer);
+
+   cpu_buf->sample_lost_overflow++;
+}
+
 int alloc_cpu_buffers(void)
 {
int i;
Index: Cell_kernel_6_26_2008/include/linux/oprofile.h
===
--- Cell_kernel_6_26_2008.orig/include/linux/oprofile.h
+++ Cell_kernel_6_26_2008/include/linux/oprofile.h
@@ -84,13 +84,6 @@ int oprofile_arch_init(struct oprofile_o
 void oprofile_arch_exit(void);
 
 /**
- * Add data to the event buffer.
- * The data passed is free-form, but typically consists of
- * file offsets, dcookies, context information, and ESCAPE codes.
- */
-void add_event_entry(unsigned long data);
-
-/**
  * Add a sample. This may be called from any context. Pass
  * smp_processor_id() as cpu.
  */
@@ -160,5 +153,14 @@ int oprofilefs_ulong_from_user(unsigned 
 
 /** lock for read/write safety */
 extern spinlock_t oprofilefs_lock;
+
+/**
+ * Add the contents of a circular buffer to the event buffer.
+ */
+void oprofile_put_buff(unsigned long *buf,unsigned int start,
+   unsigned int stop, unsigned int max);
+
+unsigned long oprofile_get_cpu_buffer_size(void);
+void oprofile_cpu_buffer_inc_smpl_lost(void);
  
 #endif /* OPROFILE_H */
Index: Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_profiler.c
===
--- Cell_kernel_6_26_2008.orig/arch/powerpc/oprofile/cell/spu_profiler.c
+++ Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_profiler.c
@@ -23,12 +23,11 @@
 
 static u32 *samples;
 
-static int spu_prof_running;
+int spu_prof_running;
 static unsigned int profiling_interval;
 
 #define NUM_SPU_BITS_TRBUF 16
 #define SPUS_PER_TB_ENTRY   4
-#define SPUS_PER_NODE   8
 
 #define SPU_PC_MASK 0x
 
@@ -208,6 +207,7 @@ int start_spu_profiling(unsigned int cyc
 
spu_prof_running = 1;
hrtimer_start(&timer, kt, HRTIMER_MODE_REL);
+   schedule_delayed_work(&spu_work, DEFAULT_TIMER_EXPIRE);
 
return 0;
 }
Index: Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
===
--- Cell_kernel_6_26_2008.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -35,7 +35,102 @@ static DEFINE_SPINLOCK(buffer_lock);
 static DEFINE_SPINLOCK(cache_lock);
 static int num_spu_nodes;
 int spu_prof_num_nodes;
-int last_guard_val[MAX_NUMNODES * 8];
+
+struct spu_buffer spu_buff[MAX_NUMNODES * SPUS_PER_NODE];
+struct delayed_work spu_work;
+static unsigned max_spu_buff;
+
+static void spu_buff_add(unsigned long int value, int spu)
+{
+   /* spu buff is a circular buffer.  Add entries to the
+* head.  Head is the index to store the next value.
+* The buffer is full when there is one available entry
+* in the queue, i.e. head and tail can't be equal.
+* That way we can tell the difference between the
+* buffer being full versus empty.
+*
+*  ASSUPTION: the b

Re: [PATCH 2/2] Repost Cell OProfile: SPU mutex lock fix, version 4

2008-08-08 Thread Carl Love
Updated to address Arnd's comments.

If an error occurs on opcontrol start, the event and per cpu buffers 
are released.  If later opcontrol shutdown is called then the free
function will be called again to free buffers that no longer 
exist.  This results in a kernel oops.  The following changes
prevent the call to delete buffers that don't exist.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>

Index: Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
===
--- Cell_kernel_6_26_2008.orig/drivers/oprofile/cpu_buffer.c
+++ Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
@@ -38,8 +38,10 @@ void free_cpu_buffers(void)
 {
int i;
 
-   for_each_online_cpu(i)
+   for_each_online_cpu(i) {
vfree(per_cpu(cpu_buffer, i).buffer);
+   per_cpu(cpu_buffer, i).buffer = NULL;
+   }
 }
 
 unsigned long oprofile_get_cpu_buffer_size(void)
Index: Cell_kernel_6_26_2008/drivers/oprofile/event_buffer.c
===
--- Cell_kernel_6_26_2008.orig/drivers/oprofile/event_buffer.c
+++ Cell_kernel_6_26_2008/drivers/oprofile/event_buffer.c
@@ -93,6 +93,8 @@ out:
 void free_event_buffer(void)
 {
vfree(event_buffer);
+
+   event_buffer = NULL;
 }
 
  


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Cbe-oss-dev] please pull cell merge branch

2008-08-11 Thread Carl Love

On Mon, 2008-08-11 at 09:18 +0200, Arnd Bergmann wrote:
> On Monday 11 August 2008, Paul Mackerras wrote:
> > Arnd Bergmann writes:
> > > I've fixed one last bug in Carl's update for cell-oprofile (int flags
> > > instead of unsigned long flags) and made sure the comments fit the
> > > usual style. This fixes a long-standing bug that prevented us from
> > > using oprofile on SPUs in many real-world scenarios. 
> >
> > Since you're touching generic oprofile code here, do you have an ack 
> > from the oprofile maintainer?  Or is the oprofile maintainer listed in
> > MAINTAINERS (Robert Richter) no longer active or something?
> > 
> 
> Sorry about that. I had assumed that at the very least Carl had had
> the oprofile list on Cc and that people there just didn't care.
> 
> Robert has just recently taken over maintainership for oprofile,
> so I assume that he is indeed active and interested in the patches.
> I'll send them out again for his review on oprofile-list.
> 
>   Arnd <><

Sorry, my mistake.  I did mean to post both patches to the OProfile list
as well.  I was planning on following up with Robert on the patches this
week since I had not heard from him.  

   Carl Love

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: powerpc/cell/oprofile: fix mutex locking for spu-oprofile

2008-08-20 Thread Carl Love

On Wed, 2008-08-20 at 14:39 +0200, Robert Richter wrote:
> On 20.08.08 14:05:31, Arnd Bergmann wrote:
> > On Wednesday 20 August 2008, Robert Richter wrote:
> > > I am fine with the changes with the exception of removing
> > > add_event_entry() from include/linux/oprofile.h. Though there is no
> > > usage of the function also in other architectures anymore, this change
> > > in the API should be discussed on the oprofile mailing list. Please
> > > separate the change in a different patch and submit it to the mailing
> > > list. If there are no objections then, this change can go upstream as
> > > well.
> > 
> > As an explanation, the removal of add_event_entry is the whole point
> > of this patch. add_event_entry must only be called with buffer_mutex
> > held, but buffer_mutex itself is not exported.
> 
> Thanks for pointing this out.
> 

We originally added add_event_entry() to include/linux/oprofile.h
specifically because it was needed for the CELL SPU support.  As it
turns out it the approach was not completely thought through.  We were
using the function call without holding the mutex lock.  As we
discovered later, this can result in corrupting the data put into the
event buffer.  So exposing the function without a way to hold the mutex
lock is actually a really bad idea as it would encourage others to fall
into the same mistake that we made.  So, as Arnd said, the whole point
of this patch is to come up with a correct approach to adding the data.

> > I'm pretty sure that no other user of add_event_entry exists, as it
> > was exported specifically for the SPU support and that never worked.
> > Any other (theoretical) code using it would be broken in the same way
> > and need a corresponding fix.
> > 
> > We can easily leave the declaration in place, but I'd recommend removing
> > it eventually. If you prefer to keep it, how about marking it as
> > __deprecated?
> 
> No, since this is broken by design we remove it. The patch can go
> upstream as it is.
> 
> Thanks,

> -Robert
> 

It really is best to remove it.  Thank you for taking the time to review
and comment on the patch.

   Carl Love

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Cbe-oss-dev] powerpc/cell/oprofile: fix mutex locking for spu-oprofile

2008-08-21 Thread Carl Love

On Thu, 2008-08-21 at 20:20 +1000, Michael Ellerman wrote:
> On Thu, 2008-08-21 at 10:14 +0200, Arnd Bergmann wrote:
> > On Thursday 21 August 2008, Paul Mackerras wrote:
> > > Arnd Bergmann writes:
> > > 
> > > > Paul, any chance we can still get this into 2.6.27?
> > > 
> > > Possibly.  We'll need a really good explanation for Linus as to why
> > > this is needed (what regression or serious bug this fixes) and why it
> > > is late.  Can you send me something explaining that?
> > 
> > The patch does not fix a regression, the spu-oprofile code basically never
> > worked. With the current code in Linux, samples in the profile buffer
> > can get corrupted because reader and writer to that buffer use different
> > locks for accessing it.
> 
> Actually for me it worked[1] a reasonable amount of the time, enough to
> be useful. So the spu-oprofile code has always been broken in this way,
> but it's not always fatal. So the patch doesn't fix a regression, but it
> fixes a serious user-visible bug, which makes it legit rc4 material
> IMHO.
> 
> [1] that was late last year, so possibly a kernel or two ago.

The bug came in the original OProfile SPU support that was put out about
2 years ago.  The way the code was there was a window in which you may
get corruption.  It was not until Jan 08 when we got the first report of
the bug from Michael and identified it.  Since then there have been
three or four more people who have hit and reported the bug.  I am
seeing the bug show up more frequently with the latest couple of weekly
SDK 3.1 kernels.  It would seem that the kernel may have changed such
that the timing is more likely to hit the bug.  For the Beta SDK 3.1
release the IVT team was not able to complete their OProfile testing due
to the bug. 
> 
> cheers
> 
> -
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> ___ oprofile-list mailing list 
> [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oprofile-list

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[Cbe-oss-dev] [PATCH] Cell OProfile: Incorrect local array size in activate spu profiling function

2008-10-24 Thread Carl Love

The size of the pm_signal_local array should be equal to the
number of SPUs being configured in the call.  Currently, the
array is of size 4 (NR_PHYS_CTRS) but being indexed by a for 
loop from 0 to 7 (NUM_SPUS_PER_NODE).  

Signed-off-by: Carl Love <[EMAIL PROTECTED]>


Index: Cell_kernel_10_24_2008/arch/powerpc/oprofile/op_model_cell.c
===
--- Cell_kernel_10_24_2008.orig/arch/powerpc/oprofile/op_model_cell.c
+++ Cell_kernel_10_24_2008/arch/powerpc/oprofile/op_model_cell.c
@@ -830,7 +830,7 @@ static int calculate_lfsr(int n)
 static int pm_rtas_activate_spu_profiling(u32 node)
 {
int ret, i;
-   struct pm_signal pm_signal_local[NR_PHYS_CTRS];
+   struct pm_signal pm_signal_local[NUM_SPUS_PER_NODE];
 
/*
 * Set up the rtas call to configure the debug bus to


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[Cbe-oss-dev] [PATCH] Cell OProfile: SPU mutex lock fix

2008-03-25 Thread Carl Love
This patch fixes a bug in the code that records the SPU data and
context switches.  The buffer_mutex lock must be held when the
kernel is adding data to the buffer between the kernel and the
OProfile daemon.  The lock is not being held in the current code
base.  This patch fixes the bug using work queues.  The data to 
be passed to the daemon is caputured by the interrupt handler.  
The workqueue function is invoked to grab the buffer_mutex lock
and add the data to the buffer.  

Signed-off-by: Carl Love <[EMAIL PROTECTED]>


Index: linux-2.6.25-rc4/arch/powerpc/oprofile/cell/spu_profiler.c
===
--- linux-2.6.25-rc4.orig/arch/powerpc/oprofile/cell/spu_profiler.c
+++ linux-2.6.25-rc4/arch/powerpc/oprofile/cell/spu_profiler.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "pr_util.h"
 
 #define TRACE_ARRAY_SIZE 1024
@@ -32,9 +33,19 @@ static unsigned int profiling_interval;
 
 #define SPU_PC_MASK 0x
 
+/* The generic OProfile code uses the buffer_mutex to protect the buffer
+ * between the kernel and the daemon.  The SPU code needs to use the buffer
+ * to ensure that the kernel SPU writes complete as a single block before
+ * being consumed by the daemon.
+ */
+extern struct mutex buffer_mutex;
+
 static DEFINE_SPINLOCK(sample_array_lock);
 unsigned long sample_array_lock_flags;
 
+struct work_struct spu_record_wq;
+extern struct workqueue_struct *oprofile_spu_wq;
+
 void set_spu_profiling_frequency(unsigned int freq_khz, unsigned int 
cycles_reset)
 {
unsigned long ns_per_cyc;
@@ -123,14 +134,14 @@ static int cell_spu_pc_collection(int cp
return entry;
 }
 
-
-static enum hrtimer_restart profile_spus(struct hrtimer *timer)
-{
-   ktime_t kt;
+static void profile_spus_record_samples (struct work_struct *ws) {
+   /* This routine is called via schedule_work() to record the
+* spu data.  It must be run in a normal kernel mode to
+* grab the OProfile mutex lock.
+*/
int cpu, node, k, num_samples, spu_num;
 
-   if (!spu_prof_running)
-   goto stop;
+   mutex_lock(&buffer_mutex);
 
for_each_online_cpu(cpu) {
if (cbe_get_hw_thread_id(cpu))
@@ -170,6 +181,20 @@ static enum hrtimer_restart profile_spus
smp_wmb();  /* insure spu event buffer updates are written */
/* don't want events intermingled... */
 
+   mutex_unlock(&buffer_mutex);
+}
+
+static enum hrtimer_restart profile_spus(struct hrtimer *timer)
+{
+   ktime_t kt;
+
+
+   if (!spu_prof_running)
+   goto stop;
+
+   /* schedule the funtion to record the data */
+   schedule_work(&spu_record_wq);
+
kt = ktime_set(0, profiling_interval);
if (!spu_prof_running)
goto stop;
@@ -209,6 +234,10 @@ int start_spu_profiling(unsigned int cyc
spu_prof_running = 1;
hrtimer_start(&timer, kt, HRTIMER_MODE_REL);
 
+   /* setup the workqueue for recording the SPU data */
+   INIT_WORK(&spu_record_wq,
+ profile_spus_record_samples);
+
return 0;
 }
 
Index: linux-2.6.25-rc4/arch/powerpc/oprofile/cell/spu_task_sync.c
===
--- linux-2.6.25-rc4.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ linux-2.6.25-rc4/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -27,15 +27,44 @@
 #include 
 #include 
 #include 
+#include 
 #include "pr_util.h"
 
 #define RELEASE_ALL 
 
-static DEFINE_SPINLOCK(buffer_lock);
+extern struct mutex buffer_mutex;
+extern struct workqueue_struct *oprofile_spu_wq;
+extern int calls_to_record_switch;
+
 static DEFINE_SPINLOCK(cache_lock);
 static int num_spu_nodes;
+
 int spu_prof_num_nodes;
 int last_guard_val[MAX_NUMNODES * 8];
+int cnt_swtch_processed_flag[MAX_NUMNODES * 8];
+
+struct spus_profiling_code_data_s {
+   int num_spu_nodes;
+   struct work_struct spu_prof_code_wq;
+} spus_profiling_code_data;
+
+struct spu_context_switch_data_s {
+   struct spu *spu;
+   unsigned long spu_cookie;
+   unsigned long app_dcookie;
+   unsigned int offset;
+   unsigned long objectId;
+   int valid_entry;
+} spu_context_switch_data;
+
+int calls_to_record_switch = 0;
+int record_spu_start_flag = 0;
+
+struct spus_cntxt_sw_data_s {
+   int num_spu_nodes;
+   struct spu_context_switch_data_s spu_data[MAX_NUMNODES * 8];
+   struct work_struct spu_cntxt_work;
+} spus_cntxt_sw_data;
 
 /* Container for caching information about an active SPU task. */
 struct cached_info {
@@ -44,6 +73,8 @@ struct cached_info {
struct kref cache_ref;
 };
 
+struct workqueue_struct *oprofile_spu_wq;
+
 static struct cached_info *spu_info[MAX_NUMNODES * 8];
 
 static void destroy_cached_info(struct kref *kref)
@@ -283,39 +314,90 @@ fail_no_image_cookie:
  * passed SPU and records 

Re: [Cbe-oss-dev] [PATCH] Cell OProfile: SPU mutex lock fix

2008-04-02 Thread Carl Love

On Wed, 2008-04-02 at 07:21 +0200, Arnd Bergmann wrote:
> On Tuesday 25 March 2008, Carl Love wrote:
> > This patch fixes a bug in the code that records the SPU data and
> > context switches.  The buffer_mutex lock must be held when the
> > kernel is adding data to the buffer between the kernel and the
> > OProfile daemon.  The lock is not being held in the current code
> > base.  This patch fixes the bug using work queues.  The data to 
> > be passed to the daemon is caputured by the interrupt handler.  
> > The workqueue function is invoked to grab the buffer_mutex lock
> > and add the data to the buffer.  
> 
> So what was the exact bug you're fixing with this? There was no
> buffer_mutex before, so why do you need it now? Can't this be a
> spinlock so you can get it from interrupt context instead of
> using a workqueue?

The generic OProfile code defines a mutex lock, called buffer_mutex, to
protect the kernel/daemon data buffer from being writen by the kernal
and simultaneously read by the Daemon.  When adding a PPU sample the
oprofile routine  oprofile_add_ext_sample(pc, regs, i, is_kernel) is
called from the interrupt context to request the sample be stored.  The
generic oprofile code takes care of passing the data to a non interrupt
context where the mutex lock is held and the necessary sequence of data
is written into the kernel/daemon data buffer.  However, OProfile does
not have any built in functions for handling the SPU.  Hence, we have to
implement the code to capture the data in the interrupt context, pass it
to a non interrupt context and put it into the buffer.  This was not
done correctly in the original implementation.  Specifically, the mutex
lock was not being held.  

Writing data to the OProfile buffer consists of a sequence of items.
For example when writing an SPU entry, first comes the escape code so
the daemon knows this is a new entry.  The next item is the SPU context
switch code which says the data which will follow is the information
about a new context.  There is a different code to identify the data as
an address sample.  Finally the data about the SPU context switch is
entered into the buffer.  The issue is the OProfile daemon is read all
of the entire sequence of items then process the data.  Without the
mutex lock, the daemon may read part of the sequence try to process it
before everything is written into the buffer.  When the daemon reads
again, it doesn't see the escape code as the first item and isn't smart
enough to realize it is part of a previous sequence.  The generic
OProfile code defines the mutex lock and calls it buffer_mutex.  The
OProfile kernel/daemon API uses the mutex lock. The mutex lock can only
be held in a non interrupt context.  The current implementation uses a
spin lock to make sure the kernel writes each sequence if items into the
buffer but since the API does not use a spin lock we have no way to
prevent the daemon from reading the buffer until the entire sequence of
items has been written to the buffer.  Hence the need to hold the
buffer_mutex lock which prevents the daemon from accessing the buffer.

> 
> > Index: linux-2.6.25-rc4/arch/powerpc/oprofile/cell/spu_profiler.c
> > ===
> > --- linux-2.6.25-rc4.orig/arch/powerpc/oprofile/cell/spu_profiler.c
> > +++ linux-2.6.25-rc4/arch/powerpc/oprofile/cell/spu_profiler.c
> > @@ -16,6 +16,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "pr_util.h"
> >  
> 
> Please keep #include statements in alphabetical order, with all linux/ files
> before the asm/ files.
> 
> >  #define TRACE_ARRAY_SIZE 1024
> > @@ -32,9 +33,19 @@ static unsigned int profiling_interval;
> >  
> >  #define SPU_PC_MASK 0x
> >  
> > +/* The generic OProfile code uses the buffer_mutex to protect the buffer
> > + * between the kernel and the daemon.  The SPU code needs to use the buffer
> > + * to ensure that the kernel SPU writes complete as a single block before
> > + * being consumed by the daemon.
> > + */
> > +extern struct mutex buffer_mutex;
> > +
> >  static DEFINE_SPINLOCK(sample_array_lock);
> >  unsigned long sample_array_lock_flags;
> >  
> > +struct work_struct spu_record_wq;
> > +extern struct workqueue_struct *oprofile_spu_wq;
> > +
> >  void set_spu_profiling_frequency(unsigned int freq_khz, unsigned int 
> > cycles_reset)
> >  {
> > unsigned long ns_per_cyc;
> 
> Never put extern statements in the implementation, they describe the
> interface between two parts of the code and should be inside of a
> common header.
> 
> Why do you want to have your own workqueue instead of using the

Re: [Cbe-oss-dev] [PATCH] Cell OProfile: SPU mutex lock fix

2008-04-02 Thread Carl Love

On Wed, 2008-04-02 at 09:42 -0700, Carl Love wrote:
> On Wed, 2008-04-02 at 07:21 +0200, Arnd Bergmann wrote:
> > On Tuesday 25 March 2008, Carl Love wrote:
> > > This patch fixes a bug in the code that records the SPU data and
> > > context switches.  The buffer_mutex lock must be held when the
> > > kernel is adding data to the buffer between the kernel and the
> > > OProfile daemon.  The lock is not being held in the current code
> > > base.  This patch fixes the bug using work queues.  The data to 
> > > be passed to the daemon is caputured by the interrupt handler.  
> > > The workqueue function is invoked to grab the buffer_mutex lock
> > > and add the data to the buffer.  
> > 
> > So what was the exact bug you're fixing with this? There was no
> > buffer_mutex before, so why do you need it now? Can't this be a
> > spinlock so you can get it from interrupt context instead of
> > using a workqueue?
> 
> The generic OProfile code defines a mutex lock, called buffer_mutex, to
> protect the kernel/daemon data buffer from being writen by the kernal
> and simultaneously read by the Daemon.  When adding a PPU sample the
> oprofile routine  oprofile_add_ext_sample(pc, regs, i, is_kernel) is
> called from the interrupt context to request the sample be stored.  The
> generic oprofile code takes care of passing the data to a non interrupt
> context where the mutex lock is held and the necessary sequence of data
> is written into the kernel/daemon data buffer.  However, OProfile does
> not have any built in functions for handling the SPU.  Hence, we have to
> implement the code to capture the data in the interrupt context, pass it
> to a non interrupt context and put it into the buffer.  This was not
> done correctly in the original implementation.  Specifically, the mutex
> lock was not being held.  
> 
> Writing data to the OProfile buffer consists of a sequence of items.
> For example when writing an SPU entry, first comes the escape code so
> the daemon knows this is a new entry.  The next item is the SPU context
> switch code which says the data which will follow is the information
> about a new context.  There is a different code to identify the data as
> an address sample.  Finally the data about the SPU context switch is
> entered into the buffer.  The issue is the OProfile daemon is read all
> of the entire sequence of items then process the data.  Without the
> mutex lock, the daemon may read part of the sequence try to process it
> before everything is written into the buffer.  When the daemon reads
> again, it doesn't see the escape code as the first item and isn't smart
> enough to realize it is part of a previous sequence.  The generic
> OProfile code defines the mutex lock and calls it buffer_mutex.  The
> OProfile kernel/daemon API uses the mutex lock. The mutex lock can only
> be held in a non interrupt context.  The current implementation uses a
> spin lock to make sure the kernel writes each sequence if items into the
> buffer but since the API does not use a spin lock we have no way to
> prevent the daemon from reading the buffer until the entire sequence of
> items has been written to the buffer.  Hence the need to hold the
> buffer_mutex lock which prevents the daemon from accessing the buffer.
> 
> > 
> > > Index: linux-2.6.25-rc4/arch/powerpc/oprofile/cell/spu_profiler.c
> > > ===
> > > --- linux-2.6.25-rc4.orig/arch/powerpc/oprofile/cell/spu_profiler.c
> > > +++ linux-2.6.25-rc4/arch/powerpc/oprofile/cell/spu_profiler.c
> > > @@ -16,6 +16,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include "pr_util.h"
> > >  
> > 
> > Please keep #include statements in alphabetical order, with all linux/ files
> > before the asm/ files.
> > 
> > >  #define TRACE_ARRAY_SIZE 1024
> > > @@ -32,9 +33,19 @@ static unsigned int profiling_interval;
> > >  
> > >  #define SPU_PC_MASK   0x
> > >  
> > > +/* The generic OProfile code uses the buffer_mutex to protect the buffer
> > > + * between the kernel and the daemon.  The SPU code needs to use the 
> > > buffer
> > > + * to ensure that the kernel SPU writes complete as a single block before
> > > + * being consumed by the daemon.
> > > + */
> > > +extern struct mutex buffer_mutex;
> > > +
> > >  static DEFINE_SPINLOCK(sample_array_lock);
> > >  unsigned long sample_array_lock_flags;
> > >  
> > > +struct work_struct sp

Re: [Cbe-oss-dev] [PATCH] Cell OProfile: SPU mutex lock fix

2008-04-08 Thread Carl Love

On Fri, 2008-04-04 at 08:38 +0200, Arnd Bergmann wrote: 
> On Wednesday 02 April 2008, Carl Love wrote:
> > On Wed, 2008-04-02 at 07:21 +0200, Arnd Bergmann wrote:
> > > On Tuesday 25 March 2008, Carl Love wrote:
> > > > This patch fixes a bug in the code that records the SPU data and
> > > > context switches.  The buffer_mutex lock must be held when the
> > > > kernel is adding data to the buffer between the kernel and the
> > > > OProfile daemon.  The lock is not being held in the current code
> > > > base.  This patch fixes the bug using work queues.  The data to 
> > > > be passed to the daemon is caputured by the interrupt handler.  
> > > > The workqueue function is invoked to grab the buffer_mutex lock
> > > > and add the data to the buffer.  
> > > 
> > > So what was the exact bug you're fixing with this? There was no
> > > buffer_mutex before, so why do you need it now? Can't this be a
> > > spinlock so you can get it from interrupt context instead of
> > > using a workqueue?
> > 
> > The generic OProfile code defines a mutex lock, called buffer_mutex, to
> > protect the kernel/daemon data buffer from being writen by the kernal
> > and simultaneously read by the Daemon.  When adding a PPU sample the
> > oprofile routine  oprofile_add_ext_sample(pc, regs, i, is_kernel) is
> > called from the interrupt context to request the sample be stored.  The
> > generic oprofile code takes care of passing the data to a non interrupt
> > context where the mutex lock is held and the necessary sequence of data
> > is written into the kernel/daemon data buffer.  However, OProfile does
> > not have any built in functions for handling the SPU.  Hence, we have to
> > implement the code to capture the data in the interrupt context, pass it
> > to a non interrupt context and put it into the buffer.  This was not
> > done correctly in the original implementation.  Specifically, the mutex
> > lock was not being held.  
> 
> Ok, I see.
> 
> However, I'm pretty sure that the switch notification does not get
> called from an atomic context, so you don't need a workqueue for
> bringing that into a process context. Doing the context switch
> notification directly from the scheduler sounds much better regarding
> the impact on the measurement.

Our first thought to fix the bug was to just grab the mutex lock when
adding the switch notification data to the queue.  The kernel gives us
an oops message saying something along the line of "could not call mutex
lock in interrupt context".  Hence we had to go to work queues so we
could access the lock outside of the SPU switch notification context.

Secondly, it is my understanding that if the schedule_work() call tries
to queue the same work function multiple times the subsequent requests
are dropped.  Thus we were not able to pass the context switch data as
part of the schedule work requests.  This forced us to have an array to
store the data for each SPU.   
> 
> > > Never put extern statements in the implementation, they describe the
> > > interface between two parts of the code and should be inside of a
> > > common header.
> > > 
> > > Why do you want to have your own workqueue instead of using the
> > > global one?
> > 
> > It is important that the data get context switch data get recorded as
> > quickly as possible to avoid dropping data unnecessarily.  The PC
> > counter data for each SPU is ignored until the context switch record is
> > put into the kernel/daemon buffer.  The API documentation says that
> > using a private workqueue has better performance then using the global
> > workqueue.  There is a comment in the code about this, perhaps it is not
> > clear enough.
> 
> This sounds like an unrelated bug in the implementation. The PC
> data should *not* be ignored in any case. As long as the records
> get stored in the right order, everything should be fine here.

Until the OProfile sees the context switch record, it does not know what
to do with the PC samples and just drops them.  The thought was using a
private work queue might help get the context switch records processed a
little earlier.  It probably doesn't make that much difference.  I can
just use the generic work queue.  
> 
> 
> > > This looks like you want to use a delayed_work rather than building your
> > > own out of hrtimer and work. Is there any point why you want to use
> > > an hrtimer?
> > 
> > The current implementation uses the hrtimer to schedule when to read the
> > trace buffer the next time.  This patch does not change how the

[Cbe-oss-dev] [PATCH] Reworked Cell OProfile: SPU mutex lock fix

2008-04-23 Thread Carl Love
This is a reworked patch to fix the SPU data storage.  Currently, the 
SPU escape sequences and program counter data is being added directly 
into the kernel buffer without holding the buffer_mutex lock.  This 
patch changes how the data is stored.  A new function,
oprofile_add_value, is added into the oprofile driver to allow adding
generic data to the per cpu buffers.  This enables a series of calls
to the oprofile_add_value to enter the needed SPU escape sequences 
and SPU program data into the kernel buffer via the per cpu buffers
without any additional processing. The oprofile_add_value function is
generic so it could be used by other architecures as well provided
the needed postprocessing was added to opreport.

Finally, this patch backs out the changes previously added to the 
oprofile generic code for handling the architecture specific 
ops.sync_start and ops.sync_stop that allowed the architecture
to skip the per CPU buffer creation.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>

Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
===
--- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/pr_util.h
+++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
@@ -20,11 +20,6 @@
 #include 
 #include 
 
-/* Defines used for sync_start */
-#define SKIP_GENERIC_SYNC 0
-#define SYNC_START_ERROR -1
-#define DO_GENERIC_SYNC 1
-
 struct spu_overlay_info {  /* map of sections within an SPU overlay */
unsigned int vma;   /* SPU virtual memory address from elf */
unsigned int size;  /* size of section from elf */
@@ -85,7 +80,7 @@ void stop_spu_profiling(void);
 int spu_sync_start(void);
 
 /* remove the hooks */
-int spu_sync_stop(void);
+void spu_sync_stop(void);
 
 /* Record SPU program counter samples to the oprofile event buffer. */
 void spu_sync_buffer(int spu_num, unsigned int *samples,
Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
===
--- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -31,11 +31,13 @@
 
 #define RELEASE_ALL 
 
-static DEFINE_SPINLOCK(buffer_lock);
+static DEFINE_SPINLOCK(add_value_lock);
 static DEFINE_SPINLOCK(cache_lock);
 static int num_spu_nodes;
+int per_cpu_buffers_exist=0;
 int spu_prof_num_nodes;
 int last_guard_val[MAX_NUMNODES * 8];
+static int spu_ctx_sw_seen[MAX_NUMNODES * 8];
 
 /* Container for caching information about an active SPU task. */
 struct cached_info {
@@ -289,6 +291,7 @@ static int process_context_switch(struct
int retval;
unsigned int offset = 0;
unsigned long spu_cookie = 0, app_dcookie;
+   int cpu_buf;
 
retval = prepare_cached_spu_info(spu, objectId);
if (retval)
@@ -303,17 +306,28 @@ static int process_context_switch(struct
goto out;
}
 
-   /* Record context info in event buffer */
-   spin_lock_irqsave(&buffer_lock, flags);
-   add_event_entry(ESCAPE_CODE);
-   add_event_entry(SPU_CTX_SWITCH_CODE);
-   add_event_entry(spu->number);
-   add_event_entry(spu->pid);
-   add_event_entry(spu->tgid);
-   add_event_entry(app_dcookie);
-   add_event_entry(spu_cookie);
-   add_event_entry(offset);
-   spin_unlock_irqrestore(&buffer_lock, flags);
+   /* Record context info in event buffer.  Note, there are 4x more
+* SPUs then CPUs.  Map the SPU events/data for a given SPU to
+* the same CPU buffer.  Need to ensure the cntxt switch data and
+* samples stay in order.
+*/
+   cpu_buf = spu->number >> 2;
+   spin_lock_irqsave(&add_value_lock, flags);
+   oprofile_add_value(ESCAPE_CODE, cpu_buf);
+   oprofile_add_value(SPU_CTX_SWITCH_CODE, cpu_buf);
+   oprofile_add_value(spu->number, cpu_buf);
+   oprofile_add_value(spu->pid, cpu_buf);
+   oprofile_add_value(spu->tgid, cpu_buf);
+   oprofile_add_value(app_dcookie, cpu_buf);
+   oprofile_add_value(spu_cookie, cpu_buf);
+   oprofile_add_value(offset, cpu_buf);
+
+   /* Set flag to indicate SPU PC data can now be written out.  If
+* the SPU program counter data is seen before an SPU context
+* record is seen, the postprocessing will fail.
+*/
+   spu_ctx_sw_seen[spu->number] = 1;
+   spin_unlock_irqrestore(&add_value_lock, flags);
smp_wmb();  /* insure spu event buffer updates are written */
/* don't want entries intermingled... */
 out:
@@ -363,41 +377,55 @@ static int number_of_online_nodes(void)
 /* The main purpose of this function is to synchronize
  * OProfile with SPUFS by registering to be notified of
  * SPU task switches.
- *
- * NOTE: When profiling SPUs, we must ensure that only
- * spu_sync_start

[Cbe-oss-dev] [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix

2008-04-30 Thread Carl Love
This is a reworked patch to fix the SPU data storage.  Currently, the 
SPU escape sequences and program counter data is being added directly 
into the kernel buffer without holding the buffer_mutex lock.  This 
patch changes how the data is stored.  A new function,
oprofile_add_value, is added into the oprofile driver to allow adding
generic data to the per cpu buffers.  This enables a series of calls
to the oprofile_add_value to enter the needed SPU escape sequences 
and SPU program data into the kernel buffer via the per cpu buffers
without any additional processing. The oprofile_add_value function is
generic so it could be used by other architecures as well provided
the needed postprocessing was added to opreport.

Finally, this patch backs out the changes previously added to the 
oprofile generic code for handling the architecture specific 
ops.sync_start and ops.sync_stop that allowed the architecture
to skip the per CPU buffer creation.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>

Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
===
--- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/pr_util.h
+++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
@@ -20,11 +20,6 @@
 #include 
 #include 
 
-/* Defines used for sync_start */
-#define SKIP_GENERIC_SYNC 0
-#define SYNC_START_ERROR -1
-#define DO_GENERIC_SYNC 1
-
 struct spu_overlay_info {  /* map of sections within an SPU overlay */
unsigned int vma;   /* SPU virtual memory address from elf */
unsigned int size;  /* size of section from elf */
@@ -85,7 +80,7 @@ void stop_spu_profiling(void);
 int spu_sync_start(void);
 
 /* remove the hooks */
-int spu_sync_stop(void);
+void spu_sync_stop(void);
 
 /* Record SPU program counter samples to the oprofile event buffer. */
 void spu_sync_buffer(int spu_num, unsigned int *samples,
Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
===
---
Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -31,11 +31,12 @@
 
 #define RELEASE_ALL 
 
-static DEFINE_SPINLOCK(buffer_lock);
+static DEFINE_SPINLOCK(add_value_lock);
 static DEFINE_SPINLOCK(cache_lock);
 static int num_spu_nodes;
 int spu_prof_num_nodes;
 int last_guard_val[MAX_NUMNODES * 8];
+static int spu_ctx_sw_seen[MAX_NUMNODES * 8];
 
 /* Container for caching information about an active SPU task. */
 struct cached_info {
@@ -289,6 +290,7 @@ static int process_context_switch(struct
int retval;
unsigned int offset = 0;
unsigned long spu_cookie = 0, app_dcookie;
+   int cpu_buf;
 
retval = prepare_cached_spu_info(spu, objectId);
if (retval)
@@ -303,17 +305,28 @@ static int process_context_switch(struct
goto out;
}
 
-   /* Record context info in event buffer */
-   spin_lock_irqsave(&buffer_lock, flags);
-   add_event_entry(ESCAPE_CODE);
-   add_event_entry(SPU_CTX_SWITCH_CODE);
-   add_event_entry(spu->number);
-   add_event_entry(spu->pid);
-   add_event_entry(spu->tgid);
-   add_event_entry(app_dcookie);
-   add_event_entry(spu_cookie);
-   add_event_entry(offset);
-   spin_unlock_irqrestore(&buffer_lock, flags);
+   /* Record context info in event buffer.  Note, there are 4x more
+* SPUs then CPUs.  Map the SPU events/data for a given SPU to
+* the same CPU buffer.  Need to ensure the cntxt switch data and
+* samples stay in order.
+*/
+   cpu_buf = spu->number >> 2;
+   spin_lock_irqsave(&add_value_lock, flags);
+   oprofile_add_value(ESCAPE_CODE, cpu_buf);
+   oprofile_add_value(SPU_CTX_SWITCH_CODE, cpu_buf);
+   oprofile_add_value(spu->number, cpu_buf);
+   oprofile_add_value(spu->pid, cpu_buf);
+   oprofile_add_value(spu->tgid, cpu_buf);
+   oprofile_add_value(app_dcookie, cpu_buf);
+   oprofile_add_value(spu_cookie, cpu_buf);
+   oprofile_add_value(offset, cpu_buf);
+
+   /* Set flag to indicate SPU PC data can now be written out.  If
+* the SPU program counter data is seen before an SPU context
+* record is seen, the postprocessing will fail.
+*/
+   spu_ctx_sw_seen[spu->number] = 1;
+   spin_unlock_irqrestore(&add_value_lock, flags);
smp_wmb();  /* insure spu event buffer updates are written */
/* don't want entries intermingled... */
 out:
@@ -333,7 +346,6 @@ static int spu_active_notify(struct noti
unsigned long flags;
struct spu *the_spu = data;
 
-   pr_debug("SPU event notification arrived\n");
if (!val) {
spin_lock_irqsave(&cache_lock, flags);
retval =

[PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix

2008-05-01 Thread Carl Love
Sorry, looks like my mailer mangled the file.

This is a reworked patch to fix the SPU data storage.  Currently, the 
SPU escape sequences and program counter data is being added directly 
into the kernel buffer without holding the buffer_mutex lock.  This 
patch changes how the data is stored.  A new function,
oprofile_add_value, is added into the oprofile driver to allow adding
generic data to the per cpu buffers.  This enables a series of calls
to the oprofile_add_value to enter the needed SPU escape sequences 
and SPU program data into the kernel buffer via the per cpu buffers
without any additional processing. The oprofile_add_value function is
generic so it could be used by other architecures as well provided
the needed postprocessing was added to opreport.

Finally, this patch backs out the changes previously added to the 
oprofile generic code for handling the architecture specific 
ops.sync_start and ops.sync_stop that allowed the architecture
to skip the per CPU buffer creation.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>

Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
===
--- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/pr_util.h
+++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
@@ -20,11 +20,6 @@
 #include 
 #include 
 
-/* Defines used for sync_start */
-#define SKIP_GENERIC_SYNC 0
-#define SYNC_START_ERROR -1
-#define DO_GENERIC_SYNC 1
-
 struct spu_overlay_info {  /* map of sections within an SPU overlay */
unsigned int vma;   /* SPU virtual memory address from elf */
unsigned int size;  /* size of section from elf */
@@ -85,7 +80,7 @@ void stop_spu_profiling(void);
 int spu_sync_start(void);
 
 /* remove the hooks */
-int spu_sync_stop(void);
+void spu_sync_stop(void);
 
 /* Record SPU program counter samples to the oprofile event buffer. */
 void spu_sync_buffer(int spu_num, unsigned int *samples,
Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
===
--- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -31,11 +31,12 @@
 
 #define RELEASE_ALL 
 
-static DEFINE_SPINLOCK(buffer_lock);
+static DEFINE_SPINLOCK(add_value_lock);
 static DEFINE_SPINLOCK(cache_lock);
 static int num_spu_nodes;
 int spu_prof_num_nodes;
 int last_guard_val[MAX_NUMNODES * 8];
+static int spu_ctx_sw_seen[MAX_NUMNODES * 8];
 
 /* Container for caching information about an active SPU task. */
 struct cached_info {
@@ -289,6 +290,7 @@ static int process_context_switch(struct
int retval;
unsigned int offset = 0;
unsigned long spu_cookie = 0, app_dcookie;
+   int cpu_buf;
 
retval = prepare_cached_spu_info(spu, objectId);
if (retval)
@@ -303,17 +305,28 @@ static int process_context_switch(struct
goto out;
}
 
-   /* Record context info in event buffer */
-   spin_lock_irqsave(&buffer_lock, flags);
-   add_event_entry(ESCAPE_CODE);
-   add_event_entry(SPU_CTX_SWITCH_CODE);
-   add_event_entry(spu->number);
-   add_event_entry(spu->pid);
-   add_event_entry(spu->tgid);
-   add_event_entry(app_dcookie);
-   add_event_entry(spu_cookie);
-   add_event_entry(offset);
-   spin_unlock_irqrestore(&buffer_lock, flags);
+   /* Record context info in event buffer.  Note, there are 4x more
+* SPUs then CPUs.  Map the SPU events/data for a given SPU to
+* the same CPU buffer.  Need to ensure the cntxt switch data and
+* samples stay in order.
+*/
+   cpu_buf = spu->number >> 2;
+   spin_lock_irqsave(&add_value_lock, flags);
+   oprofile_add_value(ESCAPE_CODE, cpu_buf);
+   oprofile_add_value(SPU_CTX_SWITCH_CODE, cpu_buf);
+   oprofile_add_value(spu->number, cpu_buf);
+   oprofile_add_value(spu->pid, cpu_buf);
+   oprofile_add_value(spu->tgid, cpu_buf);
+   oprofile_add_value(app_dcookie, cpu_buf);
+   oprofile_add_value(spu_cookie, cpu_buf);
+   oprofile_add_value(offset, cpu_buf);
+
+   /* Set flag to indicate SPU PC data can now be written out.  If
+* the SPU program counter data is seen before an SPU context
+* record is seen, the postprocessing will fail.
+*/
+   spu_ctx_sw_seen[spu->number] = 1;
+   spin_unlock_irqrestore(&add_value_lock, flags);
smp_wmb();  /* insure spu event buffer updates are written */
/* don't want entries intermingled... */
 out:
@@ -333,7 +346,6 @@ static int spu_active_notify(struct noti
unsigned long flags;
struct spu *the_spu = data;
 
-   pr_debug("SPU event notification arrived\n");
if (!val) {
spin_lock_irqsave(&cache_loc

Re: [Cbe-oss-dev] [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix

2008-05-07 Thread Carl Love

On Wed, 2008-05-07 at 18:54 +0200, Jochen Roth wrote:
> Carl,
> 
> I applied your patch on Jeremy's latest kernel.org spufs tree.
> 
>  > +void oprofile_add_value(unsigned long value, int cpu) {
>  > +  struct oprofile_cpu_buffer * cpu_buf = &cpu_buffer[cpu];
> 
> Shouldn't it be
>   struct oprofile_cpu_buffer *cpu_buf = &per_cpu(cpu_buffer, cpu);

No, I don't think so.  Take a look at the other functions in
drivers/oprofile/cpu_buffer.c.  For example oprofile_add_trace().  You
will see that the cpu_buffer is not accessed using the per_cpu
construct.  Not sure why the compiler would complain about the
oprofile_add_value() function but not one of the other functions like
oprofile_add_trace().

What was the compiler error that you saw?

I will try getting Jeremy's kernel and applying the patch there to see
if it works.


> 
> At least my compiler complained about that.
> 
> I booted a kernel with your patches applied and the system locked up 
> anyway:
> 
> Unable to handle kernel paging request for data at address 
> 0xd04fe9a8
> Faulting instruction address: 0xd0330ad8
> cpu 0x0: Vector: 300 (Data Access) at [c0003c337680]
>  pc: d0330ad8: .alloc_cpu_buffers+0x7c/0x12c [oprofile]
>  lr: d0330abc: .alloc_cpu_buffers+0x60/0x12c [oprofile]
>  sp: c0003c337900
> msr: 90009032
> dar: d04fe9a8
>   dsisr: 4200
>current = 0xc0003e128600
>paca= 0xc05b3480
>  pid   = 2356, comm = oprofiled
> enter ? for help
> [c0003c3379a0] d03302ac .oprofile_setup+0x2c/0x134 [oprofile]
> [c0003c337a30] d033176c .event_buffer_open+0x7c/0xc8 [oprofile]
> [c0003c337ac0] c00d6ca8 .__dentry_open+0x190/0x308
> [c0003c337b70] c00e7a5c .do_filp_open+0x3c4/0x8e8
> [c0003c337d00] c00d6a1c .do_sys_open+0x80/0x14c
> [c0003c337db0] c01192f4 .compat_sys_open+0x24/0x38
> [c0003c337e30] c00076b4 syscall_exit+0x0/0x40
> --- Exception: c01 (System Call) at 0ff006f8
> SP (ffc6f6a0) is in userspace
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] Fix for OProfile callgraph for Power 64 bit user apps

2008-05-07 Thread Carl Love

The following patch fixes the 64 bit user code backtrace 
which currently may hang the system.  

Signed-off-by: Carl Love <[EMAIL PROTECTED]>

Index: linux-2.6.25.1/arch/powerpc/oprofile/backtrace.c
===
--- linux-2.6.25.1.orig/arch/powerpc/oprofile/backtrace.c   2008-05-01 
16:45:25.0 -0500
+++ linux-2.6.25.1/arch/powerpc/oprofile/backtrace.c2008-05-07 
11:29:02.0 -0500
@@ -53,19 +53,40 @@
 #ifdef CONFIG_PPC64
 static unsigned long user_getsp64(unsigned long sp, int is_first)
 {
-   unsigned long stack_frame[3];
+   unsigned long stk_frm_lr;
+   unsigned long stk_frm_sp;
+   unsigned long size;
+
+   /* Issue the __copy_from_user_inatomic() third argument currently
+* only takes sizes 1, 2, 4 or 8 bytes.  Don't read more then the 
+* first 48 bytes of the stack frame.  That is all that is 
+* guaranteed to exist.  Reading more may cause the system to hang.
+*
+* 64 bit stack frame layout:
+* 0-7   bytes is the pointer to previous stack 
+* 8-15  bytes condition register save area
+* 16-23 bytes link register save area
+*/
+   size = sizeof(unsigned long);
+   if (!access_ok(VERIFY_READ, (void __user *)sp, size))
+   return 0;
 
-   if (!access_ok(VERIFY_READ, (void __user *)sp, sizeof(stack_frame)))
+   if (__copy_from_user_inatomic(&stk_frm_sp, (void __user *)sp,
+   size))
return 0;
 
-   if (__copy_from_user_inatomic(stack_frame, (void __user *)sp,
-   sizeof(stack_frame)))
+   /* get the LR from the user stack */
+   if (!access_ok(VERIFY_READ, (void __user *)(sp+16), size))
+   return 0;
+
+   if (__copy_from_user_inatomic(&stk_frm_lr, (void __user *)(sp+16),
+   size))
return 0;
 
if (!is_first)
-   oprofile_add_trace(STACK_LR64(stack_frame));
+   oprofile_add_trace(stk_frm_lr);
 
-   return STACK_SP(stack_frame);
+   return stk_frm_sp;
 }
 #endif
 


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Cbe-oss-dev] [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix

2008-05-08 Thread Carl Love

On Thu, 2008-05-08 at 09:48 +0200, Jochen Roth wrote:
> >> Unable to handle kernel paging request for data at address 
> >> 0xd04fe9a8
> >> Faulting instruction address: 0xd0330ad8
> >> cpu 0x0: Vector: 300 (Data Access) at [c0003c337680]
> >>  pc: d0330ad8: .alloc_cpu_buffers+0x7c/0x12c [oprofile]
> >>  lr: d0330abc: .alloc_cpu_buffers+0x60/0x12c [oprofile]
> >>  sp: c0003c337900
> >> msr: 90009032
> >> dar: d04fe9a8
> >>   dsisr: 4200
> >>current = 0xc0003e128600
> >>paca= 0xc05b3480
> >>  pid   = 2356, comm = oprofiled
> >> enter ? for help
> >> [c0003c3379a0] d03302ac .oprofile_setup+0x2c/0x134 [oprofile]
> >> [c0003c337a30] d033176c .event_buffer_open+0x7c/0xc8 [oprofile]
> >> [c0003c337ac0] c00d6ca8 .__dentry_open+0x190/0x308
> >> [c0003c337b70] c00e7a5c .do_filp_open+0x3c4/0x8e8
> >> [c0003c337d00] c00d6a1c .do_sys_open+0x80/0x14c
> >> [c0003c337db0] c01192f4 .compat_sys_open+0x24/0x38
> >> [c0003c337e30] c00076b4 syscall_exit+0x0/0x40
> >> --- Exception: c01 (System Call) at 0ff006f8
> >> SP (ffc6f6a0) is in userspace
> >>
> 
> Btw, I get this crash even without the patch applied. So it might be 
> something unrelated. Shall I open a separate bugzilla for that?
> 
> > A couple of things, first I am not finding a 2.6.25 Jeremy tree that you
> > refer to.  The best that I see is a 2.6.23 tree at
> > git://git.kernel.org/pub/scm/linux/kernel/git/jermey/xen.git.  Can you
> > either send me a location to get the jeremy tree you used or
> > tar/compress it an email it to me.  A link would be preferred.  

I pulled down the jeremy tree.  I tried compiling and booting it.  It
also crashes for me without my patch. The tree is clearly bad.

I looked at the jeremy tree.  The oprofile code is different then the
Arnd tree.  My patch is consistent with the code in Arnd's tree.  The
oprofile cpu_buffer has been changed in the Jeremy tree from what is in
Arnd's tree.

It is my understanding that Arnd is the official CELL kernel maintainer.
I have no idea where the jeremy tree comes from.  My patch was done
against the latest Arnd tree.  I would recommend you use Arnd's tree.
Let me know if you find any issues with my patch with Arnd's tree.

> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jk/spufs.git
> 
> > 
> > I use Arnd's git tree at
> > git://git.kernel.org/pub/scm/linux/kernel/git/arnd/cell-2.6.git
> 
> I'll clone Arnd's git and apply your patch.
> 
> > The second thing is the function that you changed is used when adding a
> > sample to the CPU buffer.  It is only called when OProfile is running.
> > The crash that you sent above looks like it occurred when the module was
> > loading since it failed in the alloc_cpu_buffer call.  This would appear
> > to me to be completely separate from the oprofile_add_value() function.
> > The patch does make some changes in the oprofile arch specific sync init
> > code.  Perhaps something is failing as part of the sync start
> > (drivers/oprofile/oprof.c) and resulting in the crash.  There are no
> > buffer creation specific changes in the patch so I am puzzled by the
> > crash.  I would like to see if I can reproduce the crash on my system.
> > I have a qs20.  What are you running on?
> > 
> 
> Thanks for the explanation. It also crashed before applying your patch.
> I'm using a QS21.
> 
> --
> Kind regards,
> Jochen

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch 1/4] powerpc: fix for OProfile callgraph for Power 64 bit user apps

2008-05-15 Thread Carl Love

On Thu, 2008-05-15 at 13:58 +1000, Paul Mackerras wrote:
> Michael Ellerman writes:
> 
> > __copy_from_user_inatomic() accepts any value for n, it just has a
> > special case for 1, 2, 4 and 8 - but it should still work for other
> > values.
> 
> There is a bug in __copy_from_user_inatomic that
> 
> http://patchwork.ozlabs.org/linuxppc/patch?id=18418
> 
> fixes, and I'm about to send that Linus-wards.
> 
> Carl, to what extent does that fix eliminate the need for the changes
> in your patch?
> 
> Paul.

Paul:

I will have to apply the above mentioned patch to see if it fixes the
issue.  What I found was when the original code tried to copy three
unsigned long into stack_frame[3], i.e. 24 bytes my system consistently
failed.  The comment about 48 bytes is a note that is all that is
guaranteed to be there according to the API.  I found by restricting the
copy to the values in the case statement, things worked.  

  Carl Love

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Fix for OProfile callgraph for Power 64 bit user apps

2008-05-15 Thread Carl Love

On Thu, 2008-05-15 at 20:47 +1000, Paul Mackerras wrote:
> Carl Love writes:
> 
> > The following patch fixes the 64 bit user code backtrace 
> > which currently may hang the system.  
> 
> What exactly is wrong with it?
> 
> Having now taken a much closer look, I now don't think Nate Case's
> patch addresses this, since it only affects constant size arguments
> <= 8 to copy_{to,from}_user_inatomic.
> 
> However, I don't see why your patch fixes anything.  It means we do
> two access_ok calls and two __copy_from_user_inatomic calls, for 8
> bytes, at sp and at sp + 16, rather than doing one access_ok and
> __copy_from_user_inatomic for 24 bytes at sp.  Why does that make any
> difference (apart from being slower)?
> 
> Paul

When I tried testing the oprofile call graph on a 64 bit app the system
consistently hung.  I was able to isolate it to the
__copy_from_user_inatomic() call.  When I made the change in my patch to
make sure I was only requesting one of the values (8bytes) listed in the
case statement this fixed the issue.  I do not know nor was I able to
figure out why the __copy_from_user_inatomic() call failed trying to
read 24 bytes.  The system would hang and any attempt to use printk to
see what was going on failed as the output of the print would not go to
the console before the system hangs.  

I backed out my patch, put in Nate's patch.  The call graph test ran
fine.  I then backed out Nate's patch to go back and try to re-validate
that the system still hangs with the original code and it is not
hanging.  Not sure why it now seems to work.  I have done some other
work on the system but I don't see how that would have changed this.
Argh, I hate chasing phantom bugs!  I was working on 2.6.21. I believe
the 2.6.21 kernel had not been changed. Let me load the latest 2.6.25
and start over with a pristine kernel and see if I can reproduce the
hang.  Sorry for all the hassle.

 Carl Love


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix

2008-05-15 Thread Carl Love

On Thu, 2008-05-15 at 17:39 +0200, Arnd Bergmann wrote:
> On Thursday 01 May 2008, Carl Love wrote:
> 
> > Finally, this patch backs out the changes previously added to the 
> > oprofile generic code for handling the architecture specific 
> > ops.sync_start and ops.sync_stop that allowed the architecture
> > to skip the per CPU buffer creation.
> 
> Thanks for your patch, it looks a lot better than your previous version.
> It would be good to have an oprofile person look into the changes
> to the common code though.
> 
> Just a few more comments/questions this time:
> 
> > -static DEFINE_SPINLOCK(buffer_lock);
> > +static DEFINE_SPINLOCK(add_value_lock);
> >  static DEFINE_SPINLOCK(cache_lock);
> >  static int num_spu_nodes;
> >  int spu_prof_num_nodes;
> >  int last_guard_val[MAX_NUMNODES * 8];
> > +static int spu_ctx_sw_seen[MAX_NUMNODES * 8];
> >  
> >  /* Container for caching information about an active SPU task. */
> >  struct cached_info {
> 
> I noticed now that you are indexing arrays by SPU number. This is not
> a good idea, because you make assumptions about the system that may
> not be true. Better pass around 'struct spu' pointers and put those
> values in there if you need them. 

Hmm, well, we have arrays for last_guard_val[], spu_info[] as well that
are indexed by spu number.  So this would seem to be a more general
issue then just spu_ctx_seen[].  Not sure exactly what you are
suggesting with passing around 'struct spu' as a solution.  Are you
suggesting that a field be added to the structure for the spu context
seen flag?  Not sure that is a good idea.  We will need to clarify how
you propose to fix this not only for spu_ctx_sw_seen but the other
arrays as well that are already being indexed by spu number.

> Then again, the last_guard_val looks
> like you should really store that information in the context instead
> of the physical SPU, but that is something else to work on.
> 
> Let's leave this one as it is for now and fix the current bug at hand,
> but please remember to fix the assumptions in the code later.
> 
> > @@ -289,6 +290,7 @@ static int process_context_switch(struct
> > int retval;
> > unsigned int offset = 0;
> > unsigned long spu_cookie = 0, app_dcookie;
> > +   int cpu_buf;
> >  
> > retval = prepare_cached_spu_info(spu, objectId);
> > if (retval)
> > @@ -303,17 +305,28 @@ static int process_context_switch(struct
> > goto out;
> > }
> >  
> > -   /* Record context info in event buffer */
> > -   spin_lock_irqsave(&buffer_lock, flags);
> > -   add_event_entry(ESCAPE_CODE);
> > -   add_event_entry(SPU_CTX_SWITCH_CODE);
> > -   add_event_entry(spu->number);
> > -   add_event_entry(spu->pid);
> > -   add_event_entry(spu->tgid);
> > -   add_event_entry(app_dcookie);
> > -   add_event_entry(spu_cookie);
> > -   add_event_entry(offset);
> > -   spin_unlock_irqrestore(&buffer_lock, flags);
> > +   /* Record context info in event buffer.  Note, there are 4x more
> > +* SPUs then CPUs.  Map the SPU events/data for a given SPU to
> > +* the same CPU buffer.  Need to ensure the cntxt switch data and
> > +* samples stay in order.
> > +*/
> > +   cpu_buf = spu->number >> 2;
> 
> The ratio of CPUs versus SPUs is anything but fixed, and the CPU numbers
> may not start at 0. If you have a hypervisor (think PS3), or you are
> running a non-SMP kernel, this is practically always wrong.
> Please use get_cpu()/put_cpu() to read the current CPU number instead.
> This one needs to be fixed now.
> 
> > +   spin_lock_irqsave(&add_value_lock, flags);
> > +   oprofile_add_value(ESCAPE_CODE, cpu_buf);
> > +   oprofile_add_value(SPU_CTX_SWITCH_CODE, cpu_buf);
> > +   oprofile_add_value(spu->number, cpu_buf);
> > +   oprofile_add_value(spu->pid, cpu_buf);
> > +   oprofile_add_value(spu->tgid, cpu_buf);
> > +   oprofile_add_value(app_dcookie, cpu_buf);
> > +   oprofile_add_value(spu_cookie, cpu_buf);
> > +   oprofile_add_value(offset, cpu_buf);
> > +
> > +   /* Set flag to indicate SPU PC data can now be written out.  If
> > +* the SPU program counter data is seen before an SPU context
> > +* record is seen, the postprocessing will fail.
> > +*/
> > +   spu_ctx_sw_seen[spu->number] = 1;
> > +   spin_unlock_irqrestore(&add_value_lock, flags);
> > smp_wmb();  /* insure spu event buffer updates are written */
> > /* don't want entries intermingled... */
> >  out:
> 
> How does the spi

Re: [PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix

2008-05-16 Thread Carl Love

On Fri, 2008-05-16 at 16:22 +0200, Arnd Bergmann wrote:
> On Thursday 15 May 2008, Carl Love wrote:
> > On Thu, 2008-05-15 at 17:39 +0200, Arnd Bergmann wrote:
> > > 
> > > I noticed now that you are indexing arrays by SPU number. This is not
> > > a good idea, because you make assumptions about the system that may
> > > not be true. Better pass around 'struct spu' pointers and put those
> > > values in there if you need them. 
> > 
> > Hmm, well, we have arrays for last_guard_val[], spu_info[] as well that
> > are indexed by spu number.  So this would seem to be a more general
> > issue then just spu_ctx_seen[].  Not sure exactly what you are
> > suggesting with passing around 'struct spu' as a solution.  Are you
> > suggesting that a field be added to the structure for the spu context
> > seen flag?  Not sure that is a good idea.  We will need to clarify how
> > you propose to fix this not only for spu_ctx_sw_seen but the other
> > arrays as well that are already being indexed by spu number.
> 
> I'd suggest you add fields to struct spu, either directly to it, if
> it's short, or a pointer to your own struct.
> 
> As I said, let's not do that now.
> 
> > So, you can either add
> > a context switch sequence to a given CPU buffer or you can add an SPU
> > program counter to the CPU buffer not both at the same time.  The lock
> > is held until the entire SPU context switch entry is added to the CPU
> > queue to make sure entries are not intermingled.
> 
> My point was that oprofile collecting samples for the CPU could possibly
> add entries to the same queue, which would race with this.

Ah, I see what you are getting at.  That we would be collecting
profiling data on a a CPU (i.e. a PPU thread) and SPU cycle profiling
data at the same time.  No, due to hardware constraints on SPU cycle
profiling, the hardware cannot be configured to do both PPU profiling
and SPU CYCLE profiling at a time. 

I am working on SPU event profiling. The hardware will only support
event profiling on one SPU per node at a time. I will have to think
about it more but my initial thought is supporting PPU profiling and SPU
event profiling will again not work due to hardware constraints.  

> 
> > When the trace buffer is read, each 128 bit entry contains the 16 bit
> > SPU PC for each of the eight SPUs on the node.  Secondly, we really want
> > to keep the timing of the context switches and storing the SPU PC values
> > as close as possible.  Clearly there is some noise as the switch will
> > occur while the trace buffer is filling.
> 
> can't you just empty the trace buffer every time you encounter a context
> switch, even in the absence of a timer interrupt? That would prevent
> all of the noise.

That is a good thought.  It should help to keep the timing more accurate
and prevent the issue mentioned below about processing multiple trace
buffers of data before processing the context switch info.
> 
> > Storing the all of the SPU PC 
> > values, into the current CPU buffer causes two issues.  One is that one
> > of the buffers will be much fuller then the rest increasing the
> > probability of overflowing the CPU buffer and dropping data.  As it is,
> > in user land, I have to increase the size of the CPU buffers to
> > accommodate four SPUs worth of data in a given CPU buffer.  Secondly,
> > the SPU context switch info for a given SPU would be going to a
> > different CPU buffer then the data. 
> 
> In practice, the calling CPUs should be well spread around by the way
> that spufs works, so I don't think it's a big problem.
> Did you observe this problem, or just expect it to happen?

I did see the problem initially.  The first version of
oprofile_add_value() did not take the CPU buffer argument.  Rather
internally the oprofile_add_value() stored the data to the current CPU
buffer by calling smp_processor_id().  When processing the trace buffer,
the function would extract the PC value for each of the 8 SPUs on the
node, then store them in the current CPU buffer.  It would do this for
all of the samples in the trace buffer, typically about 250 (Maximum is
1024).  So you would put 8 * 350 samples all into the same CPU buffer
and nothing in the other CPU buffers for that node.  In order to ensure
I don't drop any samples, I had to increase the size of each CPU buffer
more then when I distribute them to the two CPU buffers in each node.  I
changed the oprofile_add_value() to take a CPU buffer to better utilize
the CPU buffers and to avoid the ordering issue with the SPU context
switch data.  I didn't do a detailed study to see exactly how much more
the C

Re: [PATCH] Fix for OProfile callgraph for Power 64 bit user apps

2008-05-16 Thread Carl Love

On Thu, 2008-05-15 at 11:01 -0700, Carl Love wrote:
> On Thu, 2008-05-15 at 20:47 +1000, Paul Mackerras wrote:
> > Carl Love writes:
> > 
> > > The following patch fixes the 64 bit user code backtrace 
> > > which currently may hang the system.  
> > 
> > What exactly is wrong with it?
> > 
> > Having now taken a much closer look, I now don't think Nate Case's
> > patch addresses this, since it only affects constant size arguments
> > <= 8 to copy_{to,from}_user_inatomic.
> > 
> > However, I don't see why your patch fixes anything.  It means we do
> > two access_ok calls and two __copy_from_user_inatomic calls, for 8
> > bytes, at sp and at sp + 16, rather than doing one access_ok and
> > __copy_from_user_inatomic for 24 bytes at sp.  Why does that make any
> > difference (apart from being slower)?
> > 
> > Paul
> 
> When I tried testing the oprofile call graph on a 64 bit app the system
> consistently hung.  I was able to isolate it to the
> __copy_from_user_inatomic() call.  When I made the change in my patch to
> make sure I was only requesting one of the values (8bytes) listed in the
> case statement this fixed the issue.  I do not know nor was I able to
> figure out why the __copy_from_user_inatomic() call failed trying to
> read 24 bytes.  The system would hang and any attempt to use printk to
> see what was going on failed as the output of the print would not go to
> the console before the system hangs.  
> 
> I backed out my patch, put in Nate's patch.  The call graph test ran
> fine.  I then backed out Nate's patch to go back and try to re-validate
> that the system still hangs with the original code and it is not
> hanging.  Not sure why it now seems to work.  I have done some other
> work on the system but I don't see how that would have changed this.
> Argh, I hate chasing phantom bugs!  I was working on 2.6.21. I believe
> the 2.6.21 kernel had not been changed. Let me load the latest 2.6.25
> and start over with a pristine kernel and see if I can reproduce the
> hang.  Sorry for all the hassle.
> 
>  Carl Love
> 

I installed the latest 2.6.25 kernel and tested OProfile call graph on
the 64 bit user application.  I did not see any hangs for the tests that
I ran.  I tried things multiple times.  So, I guess we should drop the
OProfile callgraph patch.  Clearly if there still is a problem it is not
in how the OProfile call graph code is written but is probably in the
underlying calls, i.e. __copy_from_user_inatomic().  I will continue to
test the functionality and see if I can find a example where the system
will hang so we can investigate the underlying cause.

Thank you for your time on this. 

   Carl Love


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch 1/5] powerpc: fix for OProfile callgraph for Power 64 bit user apps

2008-06-10 Thread Carl Love

On Tue, 2008-06-10 at 10:30 +1000, Michael Ellerman wrote:
> On Tue, 2008-06-10 at 09:41 +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2008-06-09 at 16:26 -0700, [EMAIL PROTECTED] wrote:
> > > From: Carl Love <[EMAIL PROTECTED]>
> > > 
> > > Fix the 64 bit user code backtrace which currently may hang the system.
> > > 
> > > Signed-off-by: Carl Love <[EMAIL PROTECTED]>
> > > Cc: Maynard Johnson <[EMAIL PROTECTED]>
> > 
> > That looks weird. I doubt it's the right fix for the problem. Paul,
> > I remember this was discussed earlier, did we come up with a proper fix
> > already ?
> 
> We decided there probably wasn't a bug there at all:
> 
> http://ozlabs.org/pipermail/linuxppc-dev/2008-May/056449.html
> 
> Haven't heard from Carl if he reproduced the hang and traced it to
> something else.
> 
> cheers
> 

After a discussion of this patch, the general consensus was that the
issue is with the underlying copy from user call.  After developing and
posting the patch, I had tried to go back and reproduce it again and was
not able to. My system had changed slightly and I could get it to fail.
I still have this on my to do list to get back to working on the Powerpc
callgraph support as there are some other fundamental issues with the
code.  Specifically the call back traces are not always correct for leaf
nodes.  This was actually the issue I was working on when I found the
issue with the 64 bit applications hanging the system.  

The patch should be dropped.

Carl Love


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] powerpc/oprofile: fix potential buffer overrun in op_model_cell.c

2010-06-02 Thread Carl Love

Denis:

I have reviewed the change and agree to it.  Thanks for catching that.

  Carl Love



   
 Denis Kirjanov
  To
   a...@arndb.de,  
 06/01/2010 12:43  b...@kernel.crashing.org,   
 PMpau...@samba.org, jkos...@suse.cz
cc
   linuxppc-...@ozlabs.org,
   oprofile-l...@lists.sf.net  
   Subject
   [PATCH] powerpc/oprofile: fix   
   potential buffer overrun in 
   op_model_cell.c 
   
   
   
   
   
   




Fix potential initial_lfsr buffer overrun.
Writing past the end of the buffer could happen when index == ENTRIES

Signed-off-by: Denis Kirjanov 
---
 arch/powerpc/oprofile/op_model_cell.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/oprofile/op_model_cell.c
b/arch/powerpc/oprofile/op_model_cell.c
index 2c9e522..7fd90d0 100644
--- a/arch/powerpc/oprofile/op_model_cell.c
+++ b/arch/powerpc/oprofile/op_model_cell.c
@@ -1077,7 +1077,7 @@ static int calculate_lfsr(int n)
 index = ENTRIES-1;

 /* make sure index is valid */
-if ((index > ENTRIES) || (index < 0))
+if ((index >= ENTRIES) || (index < 0))
 index = ENTRIES-1;

 return initial_lfsr[index];

--


___
oprofile-list mailing list
oprofile-l...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list
<><><>___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[UDATED PATCH] Cell OProfile: Incorrect local array size in activate spu profiling function

2008-10-28 Thread Carl Love
Updated the patch to address comments by Michael Ellerman.  
Specifically, changed upper limit in for loop to 
ARRAY_SIZE() macro and added a check to make sure the 
number of events specified by the user, which is used as
the max for indexing various arrays, is no bigger then the
declared size of the arrays.

The size of the pm_signal_local array should be equal to the
number of SPUs being configured in the array.  Currently, the
array is of size 4 (NR_PHYS_CTRS) but being indexed by a for 
loop from 0 to 7 (NUM_SPUS_PER_NODE).  

Signed-off-by: Carl Love <[EMAIL PROTECTED]>


Index: Cell_kernel_10_24_2008/arch/powerpc/oprofile/op_model_cell.c
===
--- Cell_kernel_10_24_2008.orig/arch/powerpc/oprofile/op_model_cell.c
+++ Cell_kernel_10_24_2008/arch/powerpc/oprofile/op_model_cell.c
@@ -582,6 +582,13 @@ static int cell_reg_setup(struct op_coun
 
num_counters = num_ctrs;
 
+   if (unlikely(num_ctrs > NR_PHYS_CTRS)) {
+   printk(KERN_ERR
+  "%s: Oprofile, number of specified events " \
+  "exceeds number of physical counters\n",
+  __func__);
+   return -EIO;
+   }
pm_regs.group_control = 0;
pm_regs.debug_bus_control = 0;
 
@@ -830,13 +837,13 @@ static int calculate_lfsr(int n)
 static int pm_rtas_activate_spu_profiling(u32 node)
 {
int ret, i;
-   struct pm_signal pm_signal_local[NR_PHYS_CTRS];
+   struct pm_signal pm_signal_local[NUM_SPUS_PER_NODE];
 
/*
 * Set up the rtas call to configure the debug bus to
 * route the SPU PCs.  Setup the pm_signal for each SPU
 */
-   for (i = 0; i < NUM_SPUS_PER_NODE; i++) {
+   for (i = 0; i < ARRAY_SIZE(pm_signal_local); i++) {
pm_signal_local[i].cpu = node;
pm_signal_local[i].signal_group = 41;
/* spu i on word (i/2) */


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [UPDATED PATCH VER2] Cell OProfile: Incorrect local array size in activate spu profiling function

2008-10-29 Thread Carl Love
Updated the patch to address comments by Michael Ellerman.  
Specifically, changed upper limit in for loop to 
ARRAY_SIZE() macro and added a check to make sure the 
number of events specified by the user, which is used as
the max for indexing various arrays, is no bigger then the
declared size of the arrays.

The size of the pm_signal_local array should be equal to the
number of SPUs being configured in the array.  Currently, the
array is of size 4 (NR_PHYS_CTRS) but being indexed by a for 
loop from 0 to 7 (NUM_SPUS_PER_NODE).  

Signed-off-by: Carl Love <[EMAIL PROTECTED]>


Index: Cell_kernel_10_24_2008/arch/powerpc/oprofile/op_model_cell.c
===
--- Cell_kernel_10_24_2008.orig/arch/powerpc/oprofile/op_model_cell.c
+++ Cell_kernel_10_24_2008/arch/powerpc/oprofile/op_model_cell.c
@@ -582,6 +582,13 @@ static int cell_reg_setup(struct op_coun
 
num_counters = num_ctrs;
 
+   if (unlikely(num_ctrs > NR_PHYS_CTRS)) {
+   printk(KERN_ERR
+  "%s: Oprofile, number of specified events " \
+  "exceeds number of physical counters\n",
+  __func__);
+   return -EIO;
+   }
pm_regs.group_control = 0;
pm_regs.debug_bus_control = 0;
 
@@ -830,13 +837,13 @@ static int calculate_lfsr(int n)
 static int pm_rtas_activate_spu_profiling(u32 node)
 {
int ret, i;
-   struct pm_signal pm_signal_local[NR_PHYS_CTRS];
+   struct pm_signal pm_signal_local[NUM_SPUS_PER_NODE];
 
/*
 * Set up the rtas call to configure the debug bus to
 * route the SPU PCs.  Setup the pm_signal for each SPU
 */
-   for (i = 0; i < NUM_SPUS_PER_NODE; i++) {
+   for (i = 0; i < ARRAY_SIZE(pm_signal_local); i++) {
pm_signal_local[i].cpu = node;
pm_signal_local[i].signal_group = 41;
/* spu i on word (i/2) */
@@ -848,7 +855,7 @@ static int pm_rtas_activate_spu_profilin
 
ret = rtas_ibm_cbe_perftools(SUBFUNC_ACTIVATE,
 PASSTHRU_ENABLE, pm_signal_local,
-(NUM_SPUS_PER_NODE
+(ARRAY_SIZE(pm_signal_local)
  * sizeof(struct pm_signal)));
 
if (unlikely(ret)) {


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[Patch 0/3] OProfile SPU event profiling support for IBM Cell processor

2008-11-24 Thread Carl Love

This patch set consists of two kernel patches and one user level patch
to add SPU event based profiling support to OProfile for the IBM Cell
processor.  The first patch in the series is the user level patch that
adds the needed events and event checking to the user tool.  The second
patch is the first of two kernel patches.  It makes some structural
changes to the kernel code to make it easier to add the specific
functions for doing SPU event profiling.  The first kernel patch does
not make any functional changes.  The third patch in the series is the
second kernel patch where the actual SPU event profiling code support is
added to the kernel.

  Carl Love

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[Patch 1/3] OProfile SPU event profiling support for IBM Cell processor

2008-11-24 Thread Carl Love
This patch adds the SPU events for the IBM Cell processor to the 
list of available events to the user level tool.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>

Index: oprofile-0.9.4/events/ppc64/cell-be/events
===
--- oprofile-0.9.4.orig/events/ppc64/cell-be/events
+++ oprofile-0.9.4/events/ppc64/cell-be/events
@@ -108,12 +108,42 @@ event:0xdbe counters:0,1,2,3 um:PPU_0_cy
 event:0xdbf counters:0,1,2,3 um:PPU_0_cycles  minimum:1
name:stb_not_empty  : At least one store gather buffer not empty.
 
 # Cell BE Island 4 - Synergistic Processor Unit (SPU)
-
+#
+# OPROFILE FOR CELL ONLY SUPPORTS PROFILING ON ONE SPU EVENT AT A TIME
+#
 # CBE Signal Group 41 - SPU (NClk)
+event:0x1004 counters:0 um:SPU_02_cycles  minimum:1
name:dual_instrctn_commit   : Dual instruction committed.
+event:0x1005 counters:0 um:SPU_02_cycles  minimum:1
name:sngl_instrctn_commit   : Single instruction committed.
+event:0x1006 counters:0 um:SPU_02_cycles  minimum:1
name:ppln0_instrctn_commit  : Pipeline 0 instruction committed.
+event:0x1007 counters:0 um:SPU_02_cycles  minimum:1
name:ppln1_instrctn_commit  : Pipeline 1 instruction committed.
+event:0x1008 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:instrctn_ftch_stll : Instruction fetch stall.
+event:0x1009 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:lcl_strg_bsy   : Local storage busy.
+event:0x100A counters:0 um:SPU_02_cycles  minimum:1
name:dma_cnflct_ld_st   : DMA may conflict with load or store.
+event:0x100B counters:0 um:SPU_02_cycles  minimum:1
name:str_to_lcl_strg: Store instruction to local storage issued.
+event:0x100C counters:0 um:SPU_02_cycles  minimum:1
name:ld_frm_lcl_strg: Load intruction from local storage issued.
+event:0x100D counters:0 um:SPU_02_cycles  minimum:1
name:fpu_exctn  : Floating-Point Unit (FPU) exception.
+event:0x100E counters:0 um:SPU_02_cycles  minimum:1
name:brnch_instrctn_commit  : Branch instruction committed.
+event:0x100F counters:0 um:SPU_02_cycles  minimum:1
name:change_of_flow : Non-sequential change of the SPU program counter, 
which can be caused by branch, asynchronous interrupt, stalled wait on channel, 
error correction code (ECC) error, and so forth.
+event:0x1010 counters:0 um:SPU_02_cycles  minimum:1
name:brnch_not_tkn  : Branch not taken.
+event:0x1011 counters:0 um:SPU_02_cycles  minimum:1
name:brnch_mss_prdctn   : Branch miss prediction; not exact. Certain other code 
sequences can cause additional pulses on this signal (see Note 2).
+event:0x1012 counters:0 um:SPU_02_cycles  minimum:1
name:brnch_hnt_mss_prdctn   : Branch hint miss prediction; not exact. 
Certain other code sequences can cause additional pulses on this signal (see 
Note 2).
+event:0x1013 counters:0 um:SPU_02_cycles  minimum:1
name:instrctn_seqnc_err : Instruction sequence error.
+event:0x1015 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl_wrt : Stalled waiting on any blocking channel write 
(see Note 3).
+event:0x1016 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl0: Stalled waiting on External Event Status 
(Channel 0) (see Note 3).
+event:0x1017 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl3: Stalled waiting on Signal Notification 1 
(Channel 3) (see Note 3).
+event:0x1018 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl4: Stalled waiting on Signal Notification 2 
(Channel 4) (see Note 3).
+event:0x1019 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl21   : Stalled waiting on DMA Command Opcode or 
ClassID Register (Channel 21) (see Note 3).
+event:0x101A counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl24   : Stalled waiting on Tag Group Status (Channel 
24) (see Note 3).
+event:0x101B counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl25   : Stalled waiting on List Stall-and-Notify Tag 
Status (Channel 25) (see Note 3).
+event:0x101C counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl28   : Stalled waiting on PPU Mailbox (Channel 28) 
(see Note 3).
+event:0x1022 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl29   : Stalled waiting on SPU Mailbox (Channel 29) 
(see Note 3).
+
 
 # CBE Signal Group 42 - SPU Trigger (NClk)
+event:0x10A1 counters:0 um:SPU_Trigger_cycles_or_edges minimum:1   
name:stld_wait_chnl_op  : Stalled wait

[Patch 2/3] OProfile SPU event profiling support for IBM Cell processor

2008-11-24 Thread Carl Love

This patch basically rearranges the code a bit to make it easier to
just add the needed SPU event based profiling routines.   The second 
kernel patch contains the new spu event based profiling code.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>


Index: Cell_kernel_11_10_2008/arch/powerpc/oprofile/op_model_cell.c
===
--- Cell_kernel_11_10_2008.orig/arch/powerpc/oprofile/op_model_cell.c
+++ Cell_kernel_11_10_2008/arch/powerpc/oprofile/op_model_cell.c
@@ -40,14 +40,11 @@
 #include "../platforms/cell/interrupt.h"
 #include "cell/pr_util.h"
 
-static void cell_global_stop_spu(void);
+static void cell_global_stop_spu_cycles(void);
 
-/*
- * spu_cycle_reset is the number of cycles between samples.
- * This variable is used for SPU profiling and should ONLY be set
- * at the beginning of cell_reg_setup; otherwise, it's read-only.
- */
-static unsigned int spu_cycle_reset;
+#define PPU_PROFILING0
+#define SPU_PROFILING_CYCLES 1
+#define SPU_PROFILING_EVENTS 2
 
 #define NUM_SPUS_PER_NODE8
 #define SPU_CYCLES_EVENT_NUM 2 /*  event number for SPU_CYCLES */
@@ -66,6 +63,15 @@ static unsigned int spu_cycle_reset;
 
 #define MAX_SPU_COUNT 0xFF /* maximum 24 bit LFSR value */
 
+/*
+ * spu_cycle_reset is the number of cycles between samples.
+ * This variable is used for SPU profiling and should ONLY be set
+ * at the beginning of cell_reg_setup; otherwise, it's read-only.
+ */
+static unsigned int spu_cycle_reset;
+static unsigned int profiling_mode;
+
+
 struct pmc_cntrl_data {
unsigned long vcntr;
unsigned long evnts;
@@ -541,44 +547,32 @@ static void start_virt_cntrs(void)
add_timer(&timer_virt_cntr);
 }
 
-/* This function is called once for all cpus combined */
-static int cell_reg_setup(struct op_counter_config *ctr,
+static int cell_reg_setup_spu_cycles(struct op_counter_config *ctr,
struct op_system_config *sys, int num_ctrs)
 {
int i, j, cpu;
-   spu_cycle_reset = 0;
-
-   if (ctr[0].event == SPU_CYCLES_EVENT_NUM) {
-   spu_cycle_reset = ctr[0].count;
-
-   /*
-* Each node will need to make the rtas call to start
-* and stop SPU profiling.  Get the token once and store it.
-*/
-   spu_rtas_token = rtas_token("ibm,cbe-spu-perftools");
-
-   if (unlikely(spu_rtas_token == RTAS_UNKNOWN_SERVICE)) {
-   printk(KERN_ERR
-  "%s: rtas token ibm,cbe-spu-perftools unknown\n",
-  __func__);
-   return -EIO;
-   }
-   }
 
-   pm_rtas_token = rtas_token("ibm,cbe-perftools");
+   spu_cycle_reset = ctr[0].count;
 
/*
-* For all events excetp PPU CYCLEs, each node will need to make
-* the rtas cbe-perftools call to setup and reset the debug bus.
-* Make the token lookup call once and store it in the global
-* variable pm_rtas_token.
+* Each node will need to make the rtas call to start
+* and stop SPU profiling.  Get the token once and store it.
 */
-   if (unlikely(pm_rtas_token == RTAS_UNKNOWN_SERVICE)) {
+   spu_rtas_token = rtas_token("ibm,cbe-spu-perftools");
+
+   if (unlikely(spu_rtas_token == RTAS_UNKNOWN_SERVICE)) {
printk(KERN_ERR
-  "%s: rtas token ibm,cbe-perftools unknown\n",
+  "%s: rtas token ibm,cbe-spu-perftools unknown\n",
   __func__);
return -EIO;
}
+   return 0;
+}
+
+static int cell_reg_setup_ppu(struct op_counter_config *ctr,
+   struct op_system_config *sys, int num_ctrs)
+{
+   int i, j, cpu;
 
num_counters = num_ctrs;
 
@@ -665,6 +659,42 @@ static int cell_reg_setup(struct op_coun
 }
 
 
+/* This function is called once for all cpus combined */
+static int cell_reg_setup(struct op_counter_config *ctr,
+   struct op_system_config *sys, int num_ctrs)
+{
+   int ret;
+
+   spu_cycle_reset = 0;
+
+
+   /*
+* For all events except PPU CYCLEs, each node will need to make
+* the rtas cbe-perftools call to setup and reset the debug bus.
+* Make the token lookup call once and store it in the global
+* variable pm_rtas_token.
+*/
+   pm_rtas_token = rtas_token("ibm,cbe-perftools");
+
+   if (unlikely(pm_rtas_token == RTAS_UNKNOWN_SERVICE)) {
+   printk(KERN_ERR
+  "%s: rtas token ibm,cbe-perftools unknown\n",
+  __func__);
+   return -EIO;
+   }
+
+   if (ctr[0].event == SPU_CYCLES_EVENT_NUM) {
+   profiling_mode = SPU_PROFILING_CYCLES;
+   

[Patch 3/3] OProfile SPU event profiling support for IBM Cell processor

2008-11-24 Thread Carl Love

This is the second of the two kernel  patches for adding SPU profiling 
for the IBM Cell processor.  This patch contains the spu event profiling
setup, start and stop routines.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>

Index: Cell_kernel_11_10_2008/arch/powerpc/oprofile/op_model_cell.c
===
--- Cell_kernel_11_10_2008.orig/arch/powerpc/oprofile/op_model_cell.c
+++ Cell_kernel_11_10_2008/arch/powerpc/oprofile/op_model_cell.c
@@ -41,10 +41,19 @@
 #include "cell/pr_util.h"
 
 static void cell_global_stop_spu_cycles(void);
+static void cell_global_stop_spu_events(void);
 
 #define PPU_PROFILING0
 #define SPU_PROFILING_CYCLES 1
 #define SPU_PROFILING_EVENTS 2
+#define PHYS_SPU_INIT0 /* Which spu in the node to start
+   * profiling on.
+   */
+#define SPU_EVENT_NUM_START  4100
+#define SPU_EVENT_NUM_STOP   4399
+#define SPU_PROFILE_EVENT_ADDR  4363  /* spu, address trace, decimal */
+#define SPU_PROFILE_EVENT_ADDR_MASK_A   0x146 /* sub unit set to zero */
+#define SPU_PROFILE_EVENT_ADDR_MASK_B   0x186 /* sub unit set to zero */
 
 #define NUM_SPUS_PER_NODE8
 #define SPU_CYCLES_EVENT_NUM 2 /*  event number for SPU_CYCLES */
@@ -58,11 +67,17 @@ static void cell_global_stop_spu_cycles(
 #define NUM_THREADS 2 /* number of physical threads in
   * physical processor
   */
-#define NUM_DEBUG_BUS_WORDS 4
+#define NUM_TRACE_BUS_WORDS 4
 #define NUM_INPUT_BUS_WORDS 2
 
 #define MAX_SPU_COUNT 0xFF /* maximum 24 bit LFSR value */
 
+/* Minumum HW interval timer setting to send value to trace buffer is 10 cycle.
+ * To configure counter to send value every N cycles set counter to
+ * 2^32 - 1 - N.
+ */
+#define NUM_INTERVAL_CYC  0x - 10
+
 /*
  * spu_cycle_reset is the number of cycles between samples.
  * This variable is used for SPU profiling and should ONLY be set
@@ -70,7 +85,7 @@ static void cell_global_stop_spu_cycles(
  */
 static unsigned int spu_cycle_reset;
 static unsigned int profiling_mode;
-
+static int spu_evnt_phys_spu_indx;
 
 struct pmc_cntrl_data {
unsigned long vcntr;
@@ -111,6 +126,8 @@ struct pm_cntrl {
u16 trace_mode;
u16 freeze;
u16 count_mode;
+   u16 spu_addr_trace;
+   u8  trace_buf_ovflw;
 };
 
 static struct {
@@ -128,7 +145,7 @@ static struct {
 #define GET_INPUT_CONTROL(x) ((x & 0x0004) >> 2)
 
 static DEFINE_PER_CPU(unsigned long[NR_PHYS_CTRS], pmc_values);
-
+static unsigned long spu_pm_cnt[MAX_NUMNODES * NUM_SPUS_PER_NODE];
 static struct pmc_cntrl_data pmc_cntrl[NUM_THREADS][NR_PHYS_CTRS];
 
 /*
@@ -158,6 +175,7 @@ static u32 hdw_thread;
 
 static u32 virt_cntr_inter_mask;
 static struct timer_list timer_virt_cntr;
+static struct timer_list timer_spu_event_swap;
 
 /*
  * pm_signal needs to be global since it is initialized in
@@ -171,7 +189,7 @@ static int spu_rtas_token;   /* token fo
 static u32 reset_value[NR_PHYS_CTRS];
 static int num_counters;
 static int oprofile_running;
-static DEFINE_SPINLOCK(virt_cntr_lock);
+static DEFINE_SPINLOCK(cntr_lock);
 
 static u32 ctr_enabled;
 
@@ -242,7 +260,6 @@ static int pm_rtas_activate_signals(u32 
i = 0;
for (j = 0; j < count; j++) {
if (pm_signal[j].signal_group != PPU_CYCLES_GRP_NUM) {
-
/* fw expects physical cpu # */
pm_signal_local[i].cpu = node;
pm_signal_local[i].signal_group
@@ -265,7 +282,6 @@ static int pm_rtas_activate_signals(u32 
return -EIO;
}
}
-
return 0;
 }
 
@@ -339,7 +355,7 @@ static void set_pm_event(u32 ctr, int ev
p->bit = signal_bit;
}
 
-   for (i = 0; i < NUM_DEBUG_BUS_WORDS; i++) {
+   for (i = 0; i < NUM_TRACE_BUS_WORDS; i++) {
if (bus_word & (1 << i)) {
pm_regs.debug_bus_control |=
(bus_type << (30 - (2 * i)));
@@ -373,12 +389,16 @@ static void write_pm_cntrl(int cpu)
if (pm_regs.pm_cntrl.stop_at_max == 1)
val |= CBE_PM_STOP_AT_MAX;
 
-   if (pm_regs.pm_cntrl.trace_mode == 1)
+   if (pm_regs.pm_cntrl.trace_mode != 0)
val |= CBE_PM_TRACE_MODE_SET(pm_regs.pm_cntrl.trace_mode);
 
+   if (pm_regs.pm_cntrl.trace_buf_ovflw == 1)
+   val |= CBE_PM_TRACE_BUF_OVFLW(pm_regs.pm_cntrl.trace_buf_ovflw);
if (pm_regs.pm_cntrl.freeze == 1)
val |= CBE_PM_FREEZE_ALL_CTRS;
 
+   val |= CBE_PM_SPU_ADDR_TRACE_SET(pm_regs.pm_cntrl.spu_addr_trace);
+
/*
 * Routine set_count_mode must be called previously to set
 * the count mode based on the user selection of user and kernel.
@@ -447,7 +467,7 @@ static

Re: [Patch 0/3] OProfile SPU event profiling support for IBM Cell processor

2008-11-25 Thread Carl Love

On Tue, 2008-11-25 at 17:00 +0100, Arnd Bergmann wrote:
> On Tuesday 25 November 2008, Carl Love wrote:
> > This patch set consists of two kernel patches and one user level patch
> > to add SPU event based profiling support to OProfile for the IBM Cell
> > processor.  The first patch in the series is the user level patch that
> > adds the needed events and event checking to the user tool.  The second
> > patch is the first of two kernel patches.  It makes some structural
> > changes to the kernel code to make it easier to add the specific
> > functions for doing SPU event profiling.  The first kernel patch does
> > not make any functional changes.  The third patch in the series is the
> > second kernel patch where the actual SPU event profiling code support is
> > added to the kernel.
> 
> Thanks for your submission!
> 
> I can't comment on the oprofile user code, but I have some comments
> on the implementation in the third patch.
> 
> Are the patches interdependent, or will old versions of the oprofile
> tool work with new kernels and vice versa?
> 
>   Arnd <><

There are two cases:1) new kernel code and old user tool. This works
fine, user is not able to do SPU events as the user code doesn't support
them.  Case 2) old kernel code and new user tool.  I realized that I
hadn't tested this case.  So, I just did.  What happens is the event
counters get setup for the SPU event but the kernel code treats it as if
it is a PPU event.  OProfile runs, you get a report for the PPU
processors listing the SPU event as the PPU event used.  Unfortunately,
the report is all garbage and not obvious to the naive user that it is
garbage.  Looks like we will need to put a check into the user code to
make sure it does not try to do SPU event profiling if the kernel
doesn't support it.  Argh!

  Carl Love


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Patch 3/3] OProfile SPU event profiling support for IBM Cell processor

2008-11-25 Thread Carl Love

On Tue, 2008-11-25 at 16:58 +0100, Arnd Bergmann wrote:



> >  struct pmc_cntrl_data {
> > unsigned long vcntr;
> > @@ -111,6 +126,8 @@ struct pm_cntrl {
> > u16 trace_mode;
> > u16 freeze;
> > u16 count_mode;
> > +   u16 spu_addr_trace;
> > +   u8  trace_buf_ovflw;
> >  };
> >  
> >  static struct {
> > @@ -128,7 +145,7 @@ static struct {
> >  #define GET_INPUT_CONTROL(x) ((x & 0x0004) >> 2)
> >  
> >  static DEFINE_PER_CPU(unsigned long[NR_PHYS_CTRS], pmc_values);
> > -
> > +static unsigned long spu_pm_cnt[MAX_NUMNODES * NUM_SPUS_PER_NODE];
> >  static struct pmc_cntrl_data pmc_cntrl[NUM_THREADS][NR_PHYS_CTRS];
> 
> Can't you add this data to an existing data structure, e.g. struct spu,
> rather than adding a new global?

The data structure above holds flags for how the performance counters
are to be setup on each node.  This is not per SPU data.  It is per
system data.  The flags are setup once and then used when configuring
the various performance counter control registers on each node via one
of the PPU threads on each node.



> > +
> > +void stop_spu_profiling_events(void)
> > +{
> > +   spu_prof_running = 0;
> >  }
> 
> Is this atomic? What if two CPUs access the spu_prof_running variable at
> the same time?
> 
>   Arnd <><

When the user issues the command opcontrol --start then a series of
functions gets called by one CPU in the system that eventually gets down
to making the assignment spu_prof_running = 1.  Similarly, the variable
is set to 0 when the user executes the command opcontrol --stop.  In
each case, only one CPU in the system is executing the code to change
the value of the flag.  Once OProfile is started, you can't change the
event being profiled until after oprofile is stopped.  Hence, you can't
get the situation where spu_prof_running is set by the start and not
reset by the next stop command.  Now, as for multiple users issuing
opcontrol --start and/or opcontrol --stop at the same time, there is no
protection at the user level to prevent this.  If this occurs, then what
happens will depend on the order the OProfile file system serializes the
writes file for start/stop.  It is up to the users to not step on each
other.  If they do, the chances are they both will get bad data.

The other things you noted for this patch were easily fixed up.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[Patch 0/3] Overview, OProfile SPU event profiling support for IBM Cell processor

2008-12-01 Thread Carl Love
This is a rework of the previously posted set of patches.

Patch 1 is the user level patch to add the SPU events to the user
OProfile tool.  

Patch 2 is a kernel patch to do code clean up and restructuring to make
it easier to add the new SPU event profiling support.  This patch makes
no functional changes.

Patch 3 is a kernel patch to add the SPU event profiling support.

   Carl Love

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[Patch 1/3] User OProfile support for the IBM CELL processor SPU event profiling

2008-12-01 Thread Carl Love

This patch adds the SPU event profiling support for the IBM Cell
processor to the list of available events. The opcontrol script
patches include a test to see if there is a new cell specific file
in the kernel oprofile file system.  If the file exists, then the
kernel supports SPU event profiling.  

Signed-off-by: Carl Love <[EMAIL PROTECTED]>

Index: oprofile-cvs/events/ppc64/cell-be/events
===
--- oprofile-cvs.orig/events/ppc64/cell-be/events
+++ oprofile-cvs/events/ppc64/cell-be/events
@@ -108,12 +108,42 @@ event:0xdbe counters:0,1,2,3 um:PPU_0_cy
 event:0xdbf counters:0,1,2,3 um:PPU_0_cycles  minimum:1
name:stb_not_empty  : At least one store gather buffer not empty.
 
 # Cell BE Island 4 - Synergistic Processor Unit (SPU)
-
+#
+# OPROFILE FOR CELL ONLY SUPPORTS PROFILING ON ONE SPU EVENT AT A TIME
+#
 # CBE Signal Group 41 - SPU (NClk)
+event:0x1004 counters:0 um:SPU_02_cycles  minimum:1
name:dual_instrctn_commit   : Dual instruction committed.
+event:0x1005 counters:0 um:SPU_02_cycles  minimum:1
name:sngl_instrctn_commit   : Single instruction committed.
+event:0x1006 counters:0 um:SPU_02_cycles  minimum:1
name:ppln0_instrctn_commit  : Pipeline 0 instruction committed.
+event:0x1007 counters:0 um:SPU_02_cycles  minimum:1
name:ppln1_instrctn_commit  : Pipeline 1 instruction committed.
+event:0x1008 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:instrctn_ftch_stll : Instruction fetch stall.
+event:0x1009 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:lcl_strg_bsy   : Local storage busy.
+event:0x100A counters:0 um:SPU_02_cycles  minimum:1
name:dma_cnflct_ld_st   : DMA may conflict with load or store.
+event:0x100B counters:0 um:SPU_02_cycles  minimum:1
name:str_to_lcl_strg: Store instruction to local storage issued.
+event:0x100C counters:0 um:SPU_02_cycles  minimum:1
name:ld_frm_lcl_strg: Load intruction from local storage issued.
+event:0x100D counters:0 um:SPU_02_cycles  minimum:1
name:fpu_exctn  : Floating-Point Unit (FPU) exception.
+event:0x100E counters:0 um:SPU_02_cycles  minimum:1
name:brnch_instrctn_commit  : Branch instruction committed.
+event:0x100F counters:0 um:SPU_02_cycles  minimum:1
name:change_of_flow : Non-sequential change of the SPU program counter, 
which can be caused by branch, asynchronous interrupt, stalled wait on channel, 
error correction code (ECC) error, and so forth.
+event:0x1010 counters:0 um:SPU_02_cycles  minimum:1
name:brnch_not_tkn  : Branch not taken.
+event:0x1011 counters:0 um:SPU_02_cycles  minimum:1
name:brnch_mss_prdctn   : Branch miss prediction; not exact. Certain other code 
sequences can cause additional pulses on this signal (see Note 2).
+event:0x1012 counters:0 um:SPU_02_cycles  minimum:1
name:brnch_hnt_mss_prdctn   : Branch hint miss prediction; not exact. 
Certain other code sequences can cause additional pulses on this signal (see 
Note 2).
+event:0x1013 counters:0 um:SPU_02_cycles  minimum:1
name:instrctn_seqnc_err : Instruction sequence error.
+event:0x1015 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl_wrt : Stalled waiting on any blocking channel write 
(see Note 3).
+event:0x1016 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl0: Stalled waiting on External Event Status 
(Channel 0) (see Note 3).
+event:0x1017 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl3: Stalled waiting on Signal Notification 1 
(Channel 3) (see Note 3).
+event:0x1018 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl4: Stalled waiting on Signal Notification 2 
(Channel 4) (see Note 3).
+event:0x1019 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl21   : Stalled waiting on DMA Command Opcode or 
ClassID Register (Channel 21) (see Note 3).
+event:0x101A counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl24   : Stalled waiting on Tag Group Status (Channel 
24) (see Note 3).
+event:0x101B counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl25   : Stalled waiting on List Stall-and-Notify Tag 
Status (Channel 25) (see Note 3).
+event:0x101C counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl28   : Stalled waiting on PPU Mailbox (Channel 28) 
(see Note 3).
+event:0x1022 counters:0 um:SPU_02_cycles_or_edges minimum:1
name:stlld_wait_on_chnl29   : Stalled waiting on SPU Mailbox (Chan

[Patch 2/3] Kernel patch, IBM CELL processor OProfile cleanup and restructuring

2008-12-01 Thread Carl Love

This patch restructures and cleans up the code a bit to make it 
easier to add new functionality later.  The patch makes no 
functional changes to the existing code.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>


Index: Cell_kernel_11_10_2008/arch/powerpc/oprofile/op_model_cell.c
===
--- Cell_kernel_11_10_2008.orig/arch/powerpc/oprofile/op_model_cell.c
+++ Cell_kernel_11_10_2008/arch/powerpc/oprofile/op_model_cell.c
@@ -40,14 +40,9 @@
 #include "../platforms/cell/interrupt.h"
 #include "cell/pr_util.h"
 
-static void cell_global_stop_spu(void);
-
-/*
- * spu_cycle_reset is the number of cycles between samples.
- * This variable is used for SPU profiling and should ONLY be set
- * at the beginning of cell_reg_setup; otherwise, it's read-only.
- */
-static unsigned int spu_cycle_reset;
+#define PPU_PROFILING0
+#define SPU_PROFILING_CYCLES 1
+#define SPU_PROFILING_EVENTS 2
 
 #define NUM_SPUS_PER_NODE8
 #define SPU_CYCLES_EVENT_NUM 2 /*  event number for SPU_CYCLES */
@@ -66,6 +61,14 @@ static unsigned int spu_cycle_reset;
 
 #define MAX_SPU_COUNT 0xFF /* maximum 24 bit LFSR value */
 
+/*
+ * spu_cycle_reset is the number of cycles between samples.
+ * This variable is used for SPU profiling and should ONLY be set
+ * at the beginning of cell_reg_setup; otherwise, it's read-only.
+ */
+static unsigned int spu_cycle_reset;
+static unsigned int profiling_mode;
+
 struct pmc_cntrl_data {
unsigned long vcntr;
unsigned long evnts;
@@ -122,7 +125,6 @@ static struct {
 #define GET_INPUT_CONTROL(x) ((x & 0x0004) >> 2)
 
 static DEFINE_PER_CPU(unsigned long[NR_PHYS_CTRS], pmc_values);
-
 static struct pmc_cntrl_data pmc_cntrl[NUM_THREADS][NR_PHYS_CTRS];
 
 /*
@@ -165,7 +167,7 @@ static int spu_rtas_token;   /* token fo
 static u32 reset_value[NR_PHYS_CTRS];
 static int num_counters;
 static int oprofile_running;
-static DEFINE_SPINLOCK(virt_cntr_lock);
+static DEFINE_SPINLOCK(cntr_lock);
 
 static u32 ctr_enabled;
 
@@ -367,7 +369,7 @@ static void write_pm_cntrl(int cpu)
if (pm_regs.pm_cntrl.stop_at_max == 1)
val |= CBE_PM_STOP_AT_MAX;
 
-   if (pm_regs.pm_cntrl.trace_mode == 1)
+   if (pm_regs.pm_cntrl.trace_mode != 0)
val |= CBE_PM_TRACE_MODE_SET(pm_regs.pm_cntrl.trace_mode);
 
if (pm_regs.pm_cntrl.freeze == 1)
@@ -441,7 +443,7 @@ static void cell_virtual_cntr(unsigned l
 * not both playing with the counters on the same node.
 */
 
-   spin_lock_irqsave(&virt_cntr_lock, flags);
+   spin_lock_irqsave(&cntr_lock, flags);
 
prev_hdw_thread = hdw_thread;
 
@@ -527,7 +529,7 @@ static void cell_virtual_cntr(unsigned l
cbe_enable_pm(cpu);
}
 
-   spin_unlock_irqrestore(&virt_cntr_lock, flags);
+   spin_unlock_irqrestore(&cntr_lock, flags);
 
mod_timer(&timer_virt_cntr, jiffies + HZ / 10);
 }
@@ -541,44 +543,30 @@ static void start_virt_cntrs(void)
add_timer(&timer_virt_cntr);
 }
 
-/* This function is called once for all cpus combined */
-static int cell_reg_setup(struct op_counter_config *ctr,
+static int cell_reg_setup_spu_cycles(struct op_counter_config *ctr,
struct op_system_config *sys, int num_ctrs)
 {
-   int i, j, cpu;
-   spu_cycle_reset = 0;
-
-   if (ctr[0].event == SPU_CYCLES_EVENT_NUM) {
-   spu_cycle_reset = ctr[0].count;
-
-   /*
-* Each node will need to make the rtas call to start
-* and stop SPU profiling.  Get the token once and store it.
-*/
-   spu_rtas_token = rtas_token("ibm,cbe-spu-perftools");
-
-   if (unlikely(spu_rtas_token == RTAS_UNKNOWN_SERVICE)) {
-   printk(KERN_ERR
-  "%s: rtas token ibm,cbe-spu-perftools unknown\n",
-  __func__);
-   return -EIO;
-   }
-   }
-
-   pm_rtas_token = rtas_token("ibm,cbe-perftools");
+   spu_cycle_reset = ctr[0].count;
 
/*
-* For all events excetp PPU CYCLEs, each node will need to make
-* the rtas cbe-perftools call to setup and reset the debug bus.
-* Make the token lookup call once and store it in the global
-* variable pm_rtas_token.
+* Each node will need to make the rtas call to start
+* and stop SPU profiling.  Get the token once and store it.
 */
-   if (unlikely(pm_rtas_token == RTAS_UNKNOWN_SERVICE)) {
+   spu_rtas_token = rtas_token("ibm,cbe-spu-perftools");
+
+   if (unlikely(spu_rtas_token == RTAS_UNKNOWN_SERVICE)) {
printk(KERN_ERR
-  "%s: rtas token ibm,cbe-perftools unknown\n",
+  "%s: rtas toke

[Patch 3/3] Kernel patch, IBM CELL processor add OProfile SPU event profiling support

2008-12-01 Thread Carl Love

This patch adds the SPU event based profiling funcitonality for the
IBM Cell processor.  Previously, the CELL OProfile kernel code supported
PPU event, PPU cycle profiling and SPU cycle profiling.   The addition of
SPU event profiling allows the users to identify where in their SPU code
various SPU evnets are occuring.  This should help users further identify
issues with their code.  Note, SPU profiling has some limitations due to HW
constraints.  Only one event at a time can be used for profiling and SPU event 
profiling must be time sliced across all of the SPUs in a node.  

The patch adds a new arch specific file to the OProfile file system. The
file has bit 0 set to indicate that the kernel supports SPU event profiling.
The user tool must check this file/bit to make sure the kernel supports
SPU event profiling before trying to do SPU event profiling.  The user tool
check is part of the user tool patch for SPU event profiling.

Signed-off-by: Carl Love <[EMAIL PROTECTED]>

Index: Cell_kernel_11_10_2008-new-patches/arch/powerpc/oprofile/op_model_cell.c
===
--- 
Cell_kernel_11_10_2008-new-patches.orig/arch/powerpc/oprofile/op_model_cell.c
+++ Cell_kernel_11_10_2008-new-patches/arch/powerpc/oprofile/op_model_cell.c
@@ -44,6 +44,12 @@
 #define SPU_PROFILING_CYCLES 1
 #define SPU_PROFILING_EVENTS 2
 
+#define SPU_EVENT_NUM_START  4100
+#define SPU_EVENT_NUM_STOP   4399
+#define SPU_PROFILE_EVENT_ADDR  4363  /* spu, address trace, decimal */
+#define SPU_PROFILE_EVENT_ADDR_MASK_A   0x146 /* sub unit set to zero */
+#define SPU_PROFILE_EVENT_ADDR_MASK_B   0x186 /* sub unit set to zero */
+
 #define NUM_SPUS_PER_NODE8
 #define SPU_CYCLES_EVENT_NUM 2 /*  event number for SPU_CYCLES */
 
@@ -61,6 +67,12 @@
 
 #define MAX_SPU_COUNT 0xFF /* maximum 24 bit LFSR value */
 
+/* Minumum HW interval timer setting to send value to trace buffer is 10 cycle.
+ * To configure counter to send value every N cycles set counter to
+ * 2^32 - 1 - N.
+ */
+#define NUM_INTERVAL_CYC  0x - 10
+
 /*
  * spu_cycle_reset is the number of cycles between samples.
  * This variable is used for SPU profiling and should ONLY be set
@@ -68,6 +80,7 @@
  */
 static unsigned int spu_cycle_reset;
 static unsigned int profiling_mode;
+static int spu_evnt_phys_spu_indx;
 
 struct pmc_cntrl_data {
unsigned long vcntr;
@@ -108,6 +121,8 @@ struct pm_cntrl {
u16 trace_mode;
u16 freeze;
u16 count_mode;
+   u16 spu_addr_trace;
+   u8  trace_buf_ovflw;
 };
 
 static struct {
@@ -125,6 +140,7 @@ static struct {
 #define GET_INPUT_CONTROL(x) ((x & 0x0004) >> 2)
 
 static DEFINE_PER_CPU(unsigned long[NR_PHYS_CTRS], pmc_values);
+static unsigned long spu_pm_cnt[MAX_NUMNODES * NUM_SPUS_PER_NODE];
 static struct pmc_cntrl_data pmc_cntrl[NUM_THREADS][NR_PHYS_CTRS];
 
 /*
@@ -154,6 +170,7 @@ static u32 hdw_thread;
 
 static u32 virt_cntr_inter_mask;
 static struct timer_list timer_virt_cntr;
+static struct timer_list timer_spu_event_swap;
 
 /*
  * pm_signal needs to be global since it is initialized in
@@ -372,9 +389,13 @@ static void write_pm_cntrl(int cpu)
if (pm_regs.pm_cntrl.trace_mode != 0)
val |= CBE_PM_TRACE_MODE_SET(pm_regs.pm_cntrl.trace_mode);
 
+   if (pm_regs.pm_cntrl.trace_buf_ovflw == 1)
+   val |= CBE_PM_TRACE_BUF_OVFLW(pm_regs.pm_cntrl.trace_buf_ovflw);
if (pm_regs.pm_cntrl.freeze == 1)
val |= CBE_PM_FREEZE_ALL_CTRS;
 
+   val |= CBE_PM_SPU_ADDR_TRACE_SET(pm_regs.pm_cntrl.spu_addr_trace);
+
/*
 * Routine set_count_mode must be called previously to set
 * the count mode based on the user selection of user and kernel.
@@ -563,9 +584,184 @@ static int cell_reg_setup_spu_cycles(str
return 0;
 }
 
+/* Unfortunately, the hardware will only support event profiling
+ * on one SPU per node at a time.  Therefore, we must time slice
+ * the profiling across all SPUs in the node.  Note, we do this
+ * in parallel for each node.  The following routine is called
+ * periodically based on kernel timer to switch which SPU is
+ * being monitored in a round robbin fashion.
+ */
+static void spu_evnt_swap(unsigned long data)
+{
+   int node;
+   int cur_phys_spu, nxt_phys_spu, cur_spu_evnt_phys_spu_indx;
+   unsigned long flags;
+   int cpu;
+   int ret;
+   u32 interrupt_mask;
+
+
+   /* enable interrupts on cntr 0 */
+   interrupt_mask = CBE_PM_CTR_OVERFLOW_INTR(0);
+
+   hdw_thread = 0;
+
+   /* Make sure spu event interrupt handler and spu event swap
+* don't access the counters simultaneously.
+*/
+   spin_lock_irqsave(&cntr_lock, flags);
+
+   cur_spu_evnt_phys_spu_indx = spu_evnt_phys_spu_indx;
+
+   if (++(spu_evnt_phys_spu_indx) == NUM_SPUS_PER_NODE)
+   spu_evnt_phys_spu_indx = 0;
+
+   

Re: [Cbe-oss-dev] [Patch 1/3] User OProfile support for the IBM CELL processor SPU event profiling

2008-12-02 Thread Carl Love

On Tue, 2008-12-02 at 12:02 +1100, Michael Ellerman wrote:
> On Mon, 2008-12-01 at 16:18 -0800, Carl Love wrote:
> > This patch adds the SPU event profiling support for the IBM Cell
> > processor to the list of available events. The opcontrol script
> > patches include a test to see if there is a new cell specific file
> > in the kernel oprofile file system.  If the file exists, then the
> > kernel supports SPU event profiling.  
> > 
> > Signed-off-by: Carl Love <[EMAIL PROTECTED]>
> > 
> 
> > Index: oprofile-cvs/doc/oprofile.xml
> > ===
> > --- oprofile-cvs.orig/doc/oprofile.xml
> > +++ oprofile-cvs/doc/oprofile.xml
> > @@ -123,6 +123,10 @@ For information on how to use OProfile's
> > of 2.6.22 or more recent.  Additionally, full 
> > support of SPE profiling requires a BFD library
> > from binutils code dated January 2007 or later.  To 
> > ensure the proper BFD support exists, run
> > the configure utility with 
> > --with-target=cell-be.
> > +
> > +  Profiling the Cell Broadband Engine using SPU events 
> > requires a kernel version of 2.6.TBD or
>   
>  ^^^
> 
> Not sure if you missed that, or it's still TBD, but careful it doesn't
> get merged like that.
> 
> cheers
> 
Michael:

Yes, I am aware of it.  I felt it was important to post user and kernel
so everyone knows that user and kernel development is done.  I have
talked with the OProfile maintainer about these patches already.  The
plan is to update the user patch with the correct kernel version, once
we know which version the patches will be going into.  

I should have mentioned this specifically in patch 0/3.  I will need a
final spin of the user tool patch once the kernel patches are accepted.

Thanks.

Carl Love


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Patch 0/3] Overview, OProfile SPU event profiling support for IBM Cell processor

2009-01-08 Thread Carl Love

On Thu, 2009-01-08 at 16:48 +0100, Robert Richter wrote:
> On 01.12.08 16:18:26, Carl Love wrote:
> > This is a rework of the previously posted set of patches.
> > 
> > Patch 1 is the user level patch to add the SPU events to the user
> > OProfile tool.  
> > 
> > Patch 2 is a kernel patch to do code clean up and restructuring to make
> > it easier to add the new SPU event profiling support.  This patch makes
> > no functional changes.
> > 
> > Patch 3 is a kernel patch to add the SPU event profiling support.
> 
> I applied patch 2 & 3 to:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git cell
> 
> The patches did not apply cleanly. I had to change the path to
> arch/powerpc/include/asm/, fix cell/pr_util.h and did some whitespace
> cleanups. Please run your tests on the cell branch.
> 
> Thanks,
> 
> -Robert
> 

I pulled down the git tree, compiled and installed it.  I tested it
against the OProfile testsuite, which includes SPU event profiling
tests.  Everything passed.  The patch I submitted was against a 2.6.26
tree.  You are now on a 2.6.28 tree so perhaps that is why the patch did
not apply cleanly.  The patch has been out there for some time.  

Carl Love


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Patch 0/3] Overview, OProfile SPU event profiling support for IBM Cell processor

2009-01-12 Thread Carl Love

On Sun, 2009-01-11 at 10:31 +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2009-01-08 at 16:26 -0800, Carl Love wrote:
> > I pulled down the git tree, compiled and installed it.  I tested it
> > against the OProfile testsuite, which includes SPU event profiling
> > tests.  Everything passed.  The patch I submitted was against a 2.6.26
> > tree.  You are now on a 2.6.28 tree so perhaps that is why the patch did
> > not apply cleanly.  The patch has been out there for some time.  
> 
> When you submited it on Dec 2, it should have been against whatever was
> the latest upstream at the time, not 2.6.26.
> 
> Cheers,
> Ben.

Ben, Arnd and Robert:

The patch was against the latest Arnd Cell Kernel tree at the time it
was posted. The OProfile for cell patches have all been submitted,
accepted by Arnd (CELL Kernel maintainer) and pushed up stream by Arnd.
Arnd has waited for the OProfile maintainer to review and approve the
change before Arnd would push it upstream.  

So, at this point, Robert's (OProfile kernel maintainer) and Maynard's
(OProfile user space maintainer) approval for the patches.  

The question now goes to Arnd and Robert, who is going to push the
patches upstream?  Looks like Robert is ready to do it, so Arnd do you
approve the patches?  Would you like to have Robert push the patches
upstream or would you prefer to do it?  I think we need an
answer/agreement on this.  

 Carl Love


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Patch 0/3] Overview, OProfile SPU event profiling support for IBM Cell processor

2009-01-12 Thread Carl Love

On Mon, 2009-01-12 at 18:59 +0100, Arnd Bergmann wrote:
> On Monday 12 January 2009, Robert Richter wrote:
> > Arnd, I thought the patches were ok for you. If there are still some
> > concerns, we have to make delta patches.
> 
> Yes, that's fine with me, thanks for taking care of the patches!
> 
>   Arnd <><

Thanks guys for all your help on the patches.

   Carl Love


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev