Re: oprofile and ARM A9 hardware counter

2012-05-10 Thread stephane eranian
On Thu, May 10, 2012 at 10:44 AM, Will Deacon  wrote:
> On Wed, May 09, 2012 at 10:45:08PM +0100, Jon Hunter wrote:
>> Hi All,
>
> Hi Jon,
>
>> I have posted my latest series here [1] based upon that from Will [2]
>> which attempts to fix the EMU CD based upon the inputs from this thread.
>> It is working on my omap4460 panda. Hopefully Ming and/or Will can also
>> test. I know that Ming is out this week but said he can test next week.
>
> Many thanks to you (+Kevin, Benoit, Paul and co) for persevering with this.
> If I can get my hands on a Panda, I'll see if I can test something this
> week. Any particular tests you want me to run to exercise the interaction
> with PM?
>
> Cheers,
>
I would like to get the final patch (on top of Ming's) so I can test this on
my Panda board too. Thanks.

> Will
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-30 Thread stephane eranian
On Mon, Jan 30, 2012 at 8:14 PM, Will Deacon  wrote:
> On Mon, Jan 30, 2012 at 05:45:19PM +0000, stephane eranian wrote:
>> There you go, no attachment, not sure the omap list
>> supports this.
>
> Cheers Stephane.
>
>> There is something quite interesting to observe.
>>
>> While I run perf record -e cycles -F 100 noploop 10, I watch
>> /proc/interrupts. The number of interrupts is way lower than
>> expected. Therefore the number of samples is way too low:
>>
>> $ perf record -e cycles -F 100 noploop 10
>> $ perf report -D | tail -20
>> cycles stats:
>>            TOTAL events:        535
>>             MMAP events:         11
>>             COMM events:          2
>>             EXIT events:          2
>>           SAMPLE events:        520
>>
>> The delta in /proc/interrupts on CPU1 is 520 interrupts.
>
> Yes, that is about half of what you'd expect. Running on my A9 platform
> (vexpress) I get:
>
> $ perf record -e cycles -F 100 noploop 10
> $ perf report -D | tail -20
> cycles stats:
>           TOTAL events:       1007
>            MMAP events:         18
>            COMM events:          2
>            EXIT events:          2
>          SAMPLE events:        985
>
>> So looks like the frequency adjustment which is hooked off of the
>> timer tick is either not called at each timer tick, the timer ticks are
>> not at regular interval, or the math is wrong.
>
> My hunch is that that the interval is probably varying, but I don't know much
> about OMAP4 and its clocks.
>
Glad you tested this. At least, it seems the generic perf_event code
is allright.
I agree with you, something is fishy with the clocks. Just out of
curiosity, what is
the HZ value for your board? On my Panda it's 128Hz.

>> If I go with the fixed period mode:
>> $ perf stat -e cycles noploop 10
>> noploop for 10 seconds
>>  Performance counter stats for 'noploop 10':
>>        10079156960 cycles                    #    0.000 GHz
>>       10.004547117 seconds time elapsed
>>
>> That means, if I want 100 samples/sec: = 10079156960/(10*100)=10079157
>> $ perf record -e cycles -c 10079157 noploop 10
>> $ perf report -D | tail -20
>> cycles stats:
>>            TOTAL events:       1003
>>             MMAP events:         11
>>             COMM events:          2
>>             EXIT events:          2
>>         THROTTLE events:          1
>>       UNTHROTTLE events:          1
>>           SAMPLE events:        986
>>
>> Now, we're getting the right answer!
>
> Just to confirm, for me:
>
> $ perf stat -e cycles ./noploop 10
> noploop for 10 seconds
>
>  Performance counter stats for './noploop 10':
>
>        4001163930 cycles                    #    0.000 GHz
>
>      10.006534024 seconds time elapsed
>
> $ perf record -e cycles -c 4001163 ./noploop 10
> $ perf report -D | tail -20
>  Aggregated stats:
>           TOTAL events:       1020
>            MMAP events:         18
>            COMM events:          2
>            EXIT events:          2
>          SAMPLE events:        998
> cycles stats:
>           TOTAL events:       1020
>            MMAP events:         18
>            COMM events:          2
>            EXIT events:          2
>          SAMPLE events:        998
>
> which is close enough :)
>
>> We need to elucidate what's going on in perf_event_task_tick().
>> I have tried with my throttling fix and it did not help. We are
>> not subject to throttling with such a low rate.
>
> Ok. I would start by looking at the clock ticks if I were you, since this
> seems to be alright on my board.
>
> Will
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-30 Thread stephane eranian
Will,

There you go, no attachment, not sure the omap list
supports this.

There is something quite interesting to observe.

While I run perf record -e cycles -F 100 noploop 10, I watch
/proc/interrupts. The number of interrupts is way lower than
expected. Therefore the number of samples is way too low:

$ perf record -e cycles -F 100 noploop 10
$ perf report -D | tail -20
cycles stats:
   TOTAL events:535
MMAP events: 11
COMM events:  2
EXIT events:  2
  SAMPLE events:520

The delta in /proc/interrupts on CPU1 is 520 interrupts.

So looks like the frequency adjustment which is hooked off of the
timer tick is either not called at each timer tick, the timer ticks are
not at regular interval, or the math is wrong.

If I go with the fixed period mode:
$ perf stat -e cycles noploop 10
noploop for 10 seconds
 Performance counter stats for 'noploop 10':
   10079156960 cycles#0.000 GHz
  10.004547117 seconds time elapsed

That means, if I want 100 samples/sec: = 10079156960/(10*100)=10079157
$ perf record -e cycles -c 10079157 noploop 10
$ perf report -D | tail -20
cycles stats:
   TOTAL events:   1003
MMAP events: 11
COMM events:  2
EXIT events:  2
THROTTLE events:  1
  UNTHROTTLE events:  1
  SAMPLE events:986

Now, we're getting the right answer!

So with the right sampling period, everything works fine.
We need to elucidate what's going on in perf_event_task_tick().
I have tried with my throttling fix and it did not help. We are
not subject to throttling with such a low rate.

noploop.c:

#include 
#include 
#include 
#include 
#include 
#include 

void handler(int sig)
{
exit(0);
}

void
noploop(void)
{
for(;;);
}

int
main(int argc, char **argv)
{
unsigned int delay;
delay = argc > 1 ? atoi(argv[1]) : 1;
signal(SIGALRM, handler);
printf("noploop for %d seconds\n", delay);
alarm(delay);
noploop();
return 0;
}

On Mon, Jan 30, 2012 at 6:24 PM, Will Deacon  wrote:
> On Mon, Jan 30, 2012 at 05:15:53PM +, stephane eranian wrote:
>> Still need to investigate why the frequency mode does
>> not yield the correct number of samples even with low frequency.
>>
>>
>> $ taskset -c 1 perf record -e cycles -F 100 noploop 10
>> $ perf report -D | tail -20
>> Aggregated stats:
>>            TOTAL events:        475
>>             MMAP events:         11
>>             COMM events:          2
>>             EXIT events:          2
>>           SAMPLE events:        460
>> cycles stats:
>>            TOTAL events:        475
>>             MMAP events:         11
>>             COMM events:          2
>>             EXIT events:          2
>>           SAMPLE events:        460
>>
>> 460 samples is way too low. Should be 100x10 = 1000 samples or close to it.
>
> Can you stick noploop.c somewhere (I'm lazy :) and I'll try it on one of my
> A9 boards?
>
> Thanks,
>
> Will
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-30 Thread stephane eranian
On Mon, Jan 30, 2012 at 5:08 PM, Måns Rullgård  wrote:
> stephane eranian  writes:
>
>> Same result for me on CPU1:
>>
>> top - 16:20:24 up  1:45,  1 user,  load average: 0.29, 0.08, 0.07
>> Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 30.7%us,  2.7%sy,  0.0%ni, 66.7%id,  0.0%wa,  0.0%hi,  0.0%si,  
>> 0.0%st
>> Mem:    940232k total,   228984k used,   711248k free,    82244k buffers
>> Swap:   524240k total,        0k used,   524240k free,    91400k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
>>  3968 eranian   20   0   644  160  128 R  100  0.0   0:21.98 1 noploop
>>  3969 eranian   20   0  2184 1056  804 R    3  0.1   0:00.53 0 top
>>    82 root      20   0     0    0    0 S    1  0.0   0:01.35 0
>> kworker/0:1
>>
>> With 3.3.0-rc1, if I revert the clockdomain patch, I get the same result.
>> So it must be coming from somewhere else, as you suggested.
>>
>> If the processor was spending time processing interrupts, then this would be
>> accounted for in as sys time. But that's not what I observe here. It's either
>> idle or user. That line, leads me to believe that the processor can only run
>> my program for 30% of the time. The rest is spent idling even though my
>> program is non-blocking. How could that be possible? Power-saving?
>
> In top, press 1 to see the statistics for the CPUs separately.
>
Ok, when I pin my program to CPU1, and press 1 in top I get:
asks:  69 total,   2 running,  67 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.9%us,  3.8%sy,  0.0%ni, 94.3%id,  0.0%wa,  0.0%hi,  0.9%si,  0.0%st
Cpu1  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:940232k total,75480k used,   864752k free, 8148k buffers
Swap:   524240k total,0k used,   524240k free,37568k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 3788 eranian   20   0   644  160  128 R  100  0.0   0:47.93 noploop
 3758 eranian   20   0  9900 1512  712 S2  0.2   0:01.17 sshd
 3789 eranian   20   0  2184 1056  804 R2  0.1   0:01.22 top

Which gives me the right answer. But in 'collapsed mode', press 1 again,
the aggregate value is bogus. Could be wrong math in top. Ok, that was
a false alarm then. Thanks for the help.

Still need to investigate why the frequency mode does
not yield the correct number of samples even with low frequency.


$ taskset -c 1 perf record -e cycles -F 100 noploop 10
$ perf report -D | tail -20
Aggregated stats:
   TOTAL events:475
MMAP events: 11
COMM events:  2
EXIT events:  2
  SAMPLE events:460
cycles stats:
   TOTAL events:475
MMAP events: 11
COMM events:  2
EXIT events:  2
  SAMPLE events:460

460 samples is way too low. Should be 100x10 = 1000 samples or close to it.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-30 Thread stephane eranian
Same result for me on CPU1:

top - 16:20:24 up  1:45,  1 user,  load average: 0.29, 0.08, 0.07
Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
Cpu(s): 30.7%us,  2.7%sy,  0.0%ni, 66.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:940232k total,   228984k used,   711248k free,82244k buffers
Swap:   524240k total,0k used,   524240k free,91400k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 3968 eranian   20   0   644  160  128 R  100  0.0   0:21.98 1 noploop
 3969 eranian   20   0  2184 1056  804 R3  0.1   0:00.53 0 top
   82 root  20   0 000 S1  0.0   0:01.35 0
kworker/0:1

With 3.3.0-rc1, if I revert the clockdomain patch, I get the same result.
So it must be coming from somewhere else, as you suggested.

If the processor was spending time processing interrupts, then this would be
accounted for in as sys time. But that's not what I observe here. It's either
idle or user. That line, leads me to believe that the processor can only run
my program for 30% of the time. The rest is spent idling even though my
program is non-blocking. How could that be possible? Power-saving?


On Mon, Jan 30, 2012 at 3:49 PM, Ming Lei  wrote:
> On Mon, Jan 30, 2012 at 9:43 PM, stephane eranian
>  wrote:
>> Same results for me with 3.3.0-rc1 + 5 patches.
>
> In fact, I think the only effect of the patch is to enable pmu
> interrupt handling,
> which may cause so much difference?
>
> Also maybe you should put 'noploop' to run on CPU1 and you may observe
> a more accurate result of 'top'.
>
> On ARM, almost handling of all IRQs from gic is run on CPU0 at default,
> which may cause your issue.
>
>>
>>
>> top - 14:42:34 up 8 min,  1 user,  load average: 0.70, 0.29, 0.15
>> Tasks:  75 total,   2 running,  73 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 32.9%us,  1.3%sy,  0.0%ni, 65.8%id,  0.0%wa,  0.0%hi,  0.0%si,  
>> 0.0%st
>> Mem:    940232k total,   118520k used,   821712k free,     8080k buffers
>> Swap:   524240k total,        0k used,   524240k free,    79432k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  3868 eranian   20   0   644  160  128 R   99  0.0   0:53.34 noploop
>>  3870 eranian   20   0  2284 1060  804 R    3  0.1   0:00.63 top
>>     1 root      20   0  2564 1532  952 S    0  0.2   0:01.26 init
>>
>> I am connecting to the board via ssh.
>> But the results don't look correct to me.
>
> thanks,
> --
> Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-30 Thread stephane eranian
Same results for me with 3.3.0-rc1 + 5 patches.


top - 14:42:34 up 8 min,  1 user,  load average: 0.70, 0.29, 0.15
Tasks:  75 total,   2 running,  73 sleeping,   0 stopped,   0 zombie
Cpu(s): 32.9%us,  1.3%sy,  0.0%ni, 65.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    940232k total,   118520k used,   821712k free,     8080k buffers
Swap:   524240k total,        0k used,   524240k free,    79432k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3868 eranian   20   0   644  160  128 R   99  0.0   0:53.34 noploop
 3870 eranian   20   0  2284 1060  804 R    3  0.1   0:00.63 top
    1 root      20   0  2564 1532  952 S    0  0.2   0:01.26 init

I am connecting to the board via ssh.
But the results don't look correct to me.

On Mon, Jan 30, 2012 at 11:24 AM, stephane eranian
 wrote:
> Ok, let me try again with 3.3.0-rc1, that was with 3.2.0.
> The only thing that changed was that one line and it made
> a big difference.
>
>
> On Mon, Jan 30, 2012 at 10:40 AM, Ming Lei  wrote:
>> Hi,
>>
>> On Mon, Jan 30, 2012 at 1:36 AM, stephane eranian
>>  wrote:
>>> Hi,
>>>
>>> Ok, so I did a few more tests and there is a serious issue when sampling
>>> in frequency mode (the default). I noticed wrong number of samples, so
>>> I investigated this some more and instrumented the perf_event kernel code.
>>> I found some erratic timer ticks causing broken period adjustments.
>>>
>>> In fact, the problem is visible using top.
>>> I am running a noploop program on CPU0 and nothing else besides top.
>>> The noploop program  does: for(;;);. That is 100% user. On a 2-way
>>
>> Sometimes it is not 100% user, for example irq/exception handling...
>>
>>> system otherwise idle, I expect top to return 50% user 50% idle.
>>>
>>> Top with the commit:
>>>
>>> top - 16:19:21 up 5 min,  1 user,  load average: 0.23, 0.15, 0.07
>>> Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
>>> Cpu(s): 31.1%us,  2.0%sy,  0.0%ni, 66.2%id,  0.0%wa,  0.0%hi,  0.7%si,  
>>> 0.0%st
>>>             That's WRONG
>>
>> Did you reproduce the issue each time or just occasionally?
>>
>> Looks no such issue on my board with 3.3-rc1 plus the 5 extra pmu/emu 
>> patches.
>>
>> top - 00:59:15 up 7 min,  1 user,  load average: 1.00, 0.73, 0.35
>> Tasks:  56 total,   2 running,  54 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 42.6%us,  0.2%sy,  0.0%ni, 56.8%id,  0.0%wa,  0.0%hi,  0.4%si,  
>> 0.0%st
>> Mem:   1013560k total,    50960k used,   962600k free,     6272k buffers
>> Swap:        0k total,        0k used,        0k free,    29036k cached
>>
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  1355 root      20   0  1460  260  216 R   99  0.0   5:07.38 busy
>>  532 root      20   0     0    0    0 S    0  0.0   0:00.23 kworker/1:1
>>  1356 root      20   0  2552 1120  916 R    0  0.1   0:01.93 top
>>
>>>
>>> Mem:    940292k total,    74984k used,   865308k free,     8020k buffers
>>> Swap:   524240k total,        0k used,   524240k free,    37420k cached
>>>
>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>  3770 eranian   20   0 644 160 128 R   99  0.0   0:14.21 noploop
>>>  3771 eranian   20   0  2184 1052  804 R    2  0.1   0:00.32 top
>>>    1 root      20   0  2564 1528  952 S    0  0.2   0:01.26 init
>>>
>>>
>>> I removed that one liner patch from Ming. The one fiddling with the
>>> clockdomains:
>>>
>>> --- a/arch/arm/mach-omap2/clockdomains44xx_data.c
>>> +++ b/arch/arm/mach-omap2/clockdomains44xx_data.c
>>> @@ -390,7 +390,7 @@ static struct clockdomain emu_sys_44xx_clkdm = {
>>>        .prcm_partition   = OMAP4430_PRM_PARTITION,
>>>        .cm_inst          = OMAP4430_PRM_EMU_CM_INST,
>>>        .clkdm_offs       = OMAP4430_PRM_EMU_CM_EMU_CDOFFS,
>>> -       .flags            = CLKDM_CAN_HWSUP,
>>> +       .flags            = CLKDM_CAN_SWSUP,
>>
>> The patch should not affect timer tick logic, and what the patch does is
>> just to revert the commit [1]  wrt. emu clock domain.
>>
>>>
>>> When I rerun, the test, it now work:
>>>
>>> top - 16:02:51 up 15 min,  1 user,  load average: 1.02, 0.46, 0.21
>>> Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
>>> Cpu(s): 47.2%us,  1.0%sy,  0.0%ni, 50.8%id,  0.0%wa,  0.0%hi,  1.0%si,  
>>> 0.0%st
>>>            close enough (in it stabilize somehow ar

Re: oprofile and ARM A9 hardware counter

2012-01-30 Thread stephane eranian
Ok, let me try again with 3.3.0-rc1, that was with 3.2.0.
The only thing that changed was that one line and it made
a big difference.


On Mon, Jan 30, 2012 at 10:40 AM, Ming Lei  wrote:
> Hi,
>
> On Mon, Jan 30, 2012 at 1:36 AM, stephane eranian
>  wrote:
>> Hi,
>>
>> Ok, so I did a few more tests and there is a serious issue when sampling
>> in frequency mode (the default). I noticed wrong number of samples, so
>> I investigated this some more and instrumented the perf_event kernel code.
>> I found some erratic timer ticks causing broken period adjustments.
>>
>> In fact, the problem is visible using top.
>> I am running a noploop program on CPU0 and nothing else besides top.
>> The noploop program  does: for(;;);. That is 100% user. On a 2-way
>
> Sometimes it is not 100% user, for example irq/exception handling...
>
>> system otherwise idle, I expect top to return 50% user 50% idle.
>>
>> Top with the commit:
>>
>> top - 16:19:21 up 5 min,  1 user,  load average: 0.23, 0.15, 0.07
>> Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 31.1%us,  2.0%sy,  0.0%ni, 66.2%id,  0.0%wa,  0.0%hi,  0.7%si,  
>> 0.0%st
>>             That's WRONG
>
> Did you reproduce the issue each time or just occasionally?
>
> Looks no such issue on my board with 3.3-rc1 plus the 5 extra pmu/emu patches.
>
> top - 00:59:15 up 7 min,  1 user,  load average: 1.00, 0.73, 0.35
> Tasks:  56 total,   2 running,  54 sleeping,   0 stopped,   0 zombie
> Cpu(s): 42.6%us,  0.2%sy,  0.0%ni, 56.8%id,  0.0%wa,  0.0%hi,  0.4%si,  0.0%st
> Mem:   1013560k total,    50960k used,   962600k free,     6272k buffers
> Swap:        0k total,        0k used,        0k free,    29036k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  1355 root      20   0  1460  260  216 R   99  0.0   5:07.38 busy
>  532 root      20   0     0    0    0 S    0  0.0   0:00.23 kworker/1:1
>  1356 root      20   0  2552 1120  916 R    0  0.1   0:01.93 top
>
>>
>> Mem:    940292k total,    74984k used,   865308k free,     8020k buffers
>> Swap:   524240k total,        0k used,   524240k free,    37420k cached
>>
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  3770 eranian   20   0 644 160 128 R   99  0.0   0:14.21 noploop
>>  3771 eranian   20   0  2184 1052  804 R    2  0.1   0:00.32 top
>>    1 root      20   0  2564 1528  952 S    0  0.2   0:01.26 init
>>
>>
>> I removed that one liner patch from Ming. The one fiddling with the
>> clockdomains:
>>
>> --- a/arch/arm/mach-omap2/clockdomains44xx_data.c
>> +++ b/arch/arm/mach-omap2/clockdomains44xx_data.c
>> @@ -390,7 +390,7 @@ static struct clockdomain emu_sys_44xx_clkdm = {
>>        .prcm_partition   = OMAP4430_PRM_PARTITION,
>>        .cm_inst          = OMAP4430_PRM_EMU_CM_INST,
>>        .clkdm_offs       = OMAP4430_PRM_EMU_CM_EMU_CDOFFS,
>> -       .flags            = CLKDM_CAN_HWSUP,
>> +       .flags            = CLKDM_CAN_SWSUP,
>
> The patch should not affect timer tick logic, and what the patch does is
> just to revert the commit [1]  wrt. emu clock domain.
>
>>
>> When I rerun, the test, it now work:
>>
>> top - 16:02:51 up 15 min,  1 user,  load average: 1.02, 0.46, 0.21
>> Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 47.2%us,  1.0%sy,  0.0%ni, 50.8%id,  0.0%wa,  0.0%hi,  1.0%si,  
>> 0.0%st
>>            close enough (in it stabilize somehow around 49%
>> which is good)
>>
>> Mem:    940292k total,    75288k used,   865004k free,     8004k buffers
>> Swap:   524240k total,        0k used,   524240k free,    37408k cached
>>
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  3771 eranian   20   0 644 160 128 R  100  0.0   0:34.44 noploop
>>
>> Although the patch fixes PMU interrupts, it breaks the timer tick logic 
>> somehow.
>> The perf problem is related to timer tick.
>>
>> I am hoping that the tradeoff is not:
>>     PMU interrupts but broken timer ticks
>> vs.
>>    No PMU interrupts but working timer ticks
>
>
>
> [1], 3c50729b3fa1cd8ca1f347e6caf1081204cf1a7c
> ARM: OMAP4: PM: Initialise all the clockdomains to supported states
>
> thanks
> --
> Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-29 Thread stephane eranian
Hi,

Ok, so I did a few more tests and there is a serious issue when sampling
in frequency mode (the default). I noticed wrong number of samples, so
I investigated this some more and instrumented the perf_event kernel code.
I found some erratic timer ticks causing broken period adjustments.

In fact, the problem is visible using top.
I am running a noploop program on CPU0 and nothing else besides top.
The noploop program  does: for(;;);. That is 100% user. On a 2-way
system otherwise idle, I expect top to return 50% user 50% idle.

Top with the commit:

top - 16:19:21 up 5 min,  1 user,  load average: 0.23, 0.15, 0.07
Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
Cpu(s): 31.1%us,  2.0%sy,  0.0%ni, 66.2%id,  0.0%wa,  0.0%hi,  0.7%si,  0.0%st
 That's WRONG

Mem:940292k total,74984k used,   865308k free, 8020k buffers
Swap:   524240k total,0k used,   524240k free,37420k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 3770 eranian   20   0   644  160  128 R   99  0.0   0:14.21 noploop
 3771 eranian   20   0  2184 1052  804 R2  0.1   0:00.32 top
1 root  20   0  2564 1528  952 S0  0.2   0:01.26 init


I removed that one liner patch from Ming. The one fiddling with the
clockdomains:

--- a/arch/arm/mach-omap2/clockdomains44xx_data.c
+++ b/arch/arm/mach-omap2/clockdomains44xx_data.c
@@ -390,7 +390,7 @@ static struct clockdomain emu_sys_44xx_clkdm = {
.prcm_partition   = OMAP4430_PRM_PARTITION,
.cm_inst  = OMAP4430_PRM_EMU_CM_INST,
.clkdm_offs   = OMAP4430_PRM_EMU_CM_EMU_CDOFFS,
-   .flags= CLKDM_CAN_HWSUP,
+   .flags= CLKDM_CAN_SWSUP,


When I rerun, the test, it now work:

top - 16:02:51 up 15 min,  1 user,  load average: 1.02, 0.46, 0.21
Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
Cpu(s): 47.2%us,  1.0%sy,  0.0%ni, 50.8%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
    close enough (in it stabilize somehow around 49%
which is good)

Mem:940292k total,75288k used,   865004k free, 8004k buffers
Swap:   524240k total,0k used,   524240k free,37408k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 3771 eranian   20   0   644  160  128 R  100  0.0   0:34.44 noploop

Although the patch fixes PMU interrupts, it breaks the timer tick logic somehow.
The perf problem is related to timer tick.

I am hoping that the tradeoff is not:
 PMU interrupts but broken timer ticks
vs.
No PMU interrupts but working timer ticks


On Fri, Jan 27, 2012 at 6:16 PM, stephane eranian
 wrote:
> On Fri, Jan 27, 2012 at 6:10 PM, Will Deacon  wrote:
>> On Fri, Jan 27, 2012 at 05:03:28PM +0000, stephane eranian wrote:
>>> On Fri, Jan 27, 2012 at 5:59 PM, Will Deacon  wrote:
>>> > That said, if you see any bugs in the code please do shout!
>>> >
>>> I suspect there is something wrong, we shouldn't hit the max_rate_limit.
>>> You may have bursts of interrupts (samples). I'll check on that this 
>>> week-end.
>>
>> Ok, thanks. Keep in mind that you probably have variable rate clocks, which
>> will affect the cycle counter frequency.
>>
> I assume it does not vary the clock if the workload is steady and just burning
> cycles, e.g.: for(;;);
>
>>> >> > A7 and A15 have the ability to filter counters based on privilege 
>>> >> > level, so
>>> >> > you can get more accurate userspace counts there.
>>> >>
>>> >> Ok, that's better. Need to update libpfm4 for A15 with priv levels then!
>>> >
>>> > How do you handle that in libpfm4? On ARM, the event encodings remain the 
>>> > same,
>>> > you just need to set some extra bits to determine which levels are 
>>> > included or
>>> > excluded (you can do this with the perf tool by using the :{u,k,h} suffix 
>>> > on an
>>> > event description).
>>> >
>>> It depends what you call the encoding? If the priv level can be encoded in 
>>> the
>>> attr->config field, then that's easy. If it needs to be set somewhere else, 
>>> then
>>> we need to figure out how you encode it in the attr struct. Either in some 
>>> other
>>> bits in attr->config or use attr->config1, for instance. You tell me.
>>
>> The way it's done with perf is to set the exclude{user,kernel,hv} fields in
>> the attr. The ARM perf backend then translates these into the relevant bits
>> which get orred into the config_base before hitting the hardware.
>>
> Well, that's also how we do it with libpfm4 on X86. This is because
> with perf_events,
> the exclude_* fields have priority over what you set in the attr->config 
> field.
>
>> Will
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-27 Thread stephane eranian
On Fri, Jan 27, 2012 at 6:10 PM, Will Deacon  wrote:
> On Fri, Jan 27, 2012 at 05:03:28PM +0000, stephane eranian wrote:
>> On Fri, Jan 27, 2012 at 5:59 PM, Will Deacon  wrote:
>> > That said, if you see any bugs in the code please do shout!
>> >
>> I suspect there is something wrong, we shouldn't hit the max_rate_limit.
>> You may have bursts of interrupts (samples). I'll check on that this 
>> week-end.
>
> Ok, thanks. Keep in mind that you probably have variable rate clocks, which
> will affect the cycle counter frequency.
>
I assume it does not vary the clock if the workload is steady and just burning
cycles, e.g.: for(;;);

>> >> > A7 and A15 have the ability to filter counters based on privilege 
>> >> > level, so
>> >> > you can get more accurate userspace counts there.
>> >>
>> >> Ok, that's better. Need to update libpfm4 for A15 with priv levels then!
>> >
>> > How do you handle that in libpfm4? On ARM, the event encodings remain the 
>> > same,
>> > you just need to set some extra bits to determine which levels are 
>> > included or
>> > excluded (you can do this with the perf tool by using the :{u,k,h} suffix 
>> > on an
>> > event description).
>> >
>> It depends what you call the encoding? If the priv level can be encoded in 
>> the
>> attr->config field, then that's easy. If it needs to be set somewhere else, 
>> then
>> we need to figure out how you encode it in the attr struct. Either in some 
>> other
>> bits in attr->config or use attr->config1, for instance. You tell me.
>
> The way it's done with perf is to set the exclude{user,kernel,hv} fields in
> the attr. The ARM perf backend then translates these into the relevant bits
> which get orred into the config_base before hitting the hardware.
>
Well, that's also how we do it with libpfm4 on X86. This is because
with perf_events,
the exclude_* fields have priority over what you set in the attr->config field.

> Will
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-27 Thread stephane eranian
On Fri, Jan 27, 2012 at 5:59 PM, Will Deacon  wrote:
> On Fri, Jan 27, 2012 at 03:57:25PM +0000, stephane eranian wrote:
>> On Fri, Jan 27, 2012 at 4:54 PM, Will Deacon  wrote:
>> >
>> > Ok. Note that on ARM the PMU generates a standard IRQ (i.e. not an NMI) so
>> > you may miss samples if they occur during critical kernel sections (and if
>> > you look at a profile, spin_unlock_irqrestore will be quite high).
>> >
>> But I am only running a user space noploop. So it spends 99% in user space, 
>> no
>> critical section.
>
> and your result is almost 99% of the way there :)
>
> There are also potential overheads from the PMU interrupts themselves, since
> there is a latency between overflow and taking the interrupt and then
> between there are actually reading the counter (they continue to count after
> overflow).
>
> That said, if you see any bugs in the code please do shout!
>
I suspect there is something wrong, we shouldn't hit the max_rate_limit.
You may have bursts of interrupts (samples). I'll check on that this week-end.

>> > A7 and A15 have the ability to filter counters based on privilege level, so
>> > you can get more accurate userspace counts there.
>>
>> Ok, that's better. Need to update libpfm4 for A15 with priv levels then!
>
> How do you handle that in libpfm4? On ARM, the event encodings remain the 
> same,
> you just need to set some extra bits to determine which levels are included or
> excluded (you can do this with the perf tool by using the :{u,k,h} suffix on 
> an
> event description).
>
It depends what you call the encoding? If the priv level can be encoded in the
attr->config field, then that's easy. If it needs to be set somewhere else, then
we need to figure out how you encode it in the attr struct. Either in some other
bits in attr->config or use attr->config1, for instance. You tell me.

> Will
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-27 Thread stephane eranian
On Fri, Jan 27, 2012 at 4:54 PM, Will Deacon  wrote:
> On Fri, Jan 27, 2012 at 03:45:53PM +0000, stephane eranian wrote:
>> Hi,
>
> Hi Stephane,
>
>> Ok, with the one-line patch [1], this works much better now.
>> No more wrap around a 4 billion cycles.
>
> Hurrah! Thanks Mans and Ming Lei for helping with this. Unfortunately, I
> remember Santosh had objections to this patch so that needs to be resolved.
>
Yes, this needs to be resolved ASAP.

>> Sampling is okay, though I noticed it tends to not get the
>> correct number of samples for a controlled run:
>>
>> $ perf record -e cycles -c 1009213 noploop 10
>> noploop for 10 seconds
>>
>> $ perf report -D | tail -20
>> cycles stats:
>>            TOTAL events:       9938
>>             MMAP events:         13
>>             COMM events:          2
>>             EXIT events:          2
>>         THROTTLE events:         12
>>       UNTHROTTLE events:         12
>>           SAMPLE events:       9897
>>
>> Should not get throttled samples. Should get abour 10k samples
>> but only seeing 9897. The max_rate limit is way higher
>> than what I set the period (1000 samples/sec). But then,
>> is 3.2.0 throttling is broken. I posted a patch to fix that
>> yesterday. I will try with my patch applied as well.
>
> Ok. Note that on ARM the PMU generates a standard IRQ (i.e. not an NMI) so
> you may miss samples if they occur during critical kernel sections (and if
> you look at a profile, spin_unlock_irqrestore will be quite high).
>
But I am only running a user space noploop. So it spends 99% in user space, no
critical section.

> A7 and A15 have the ability to filter counters based on privilege level, so
> you can get more accurate userspace counts there.

Ok, that's better. Need to update libpfm4 for A15 with priv levels then!

>
> Will
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-27 Thread stephane eranian
Hi,

Ok, with the one-line patch [1], this works much better now.
No more wrap around a 4 billion cycles.

Sampling is okay, though I noticed it tends to not get the
correct number of samples for a controlled run:

$ perf record -e cycles -c 1009213 noploop 10
noploop for 10 seconds

$ perf report -D | tail -20
cycles stats:
   TOTAL events:   9938
MMAP events: 13
COMM events:  2
EXIT events:  2
THROTTLE events: 12
  UNTHROTTLE events: 12
  SAMPLE events:   9897

Should not get throttled samples. Should get abour 10k samples
but only seeing 9897. The max_rate limit is way higher
than what I set the period (1000 samples/sec). But then,
is 3.2.0 throttling is broken. I posted a patch to fix that
yesterday. I will try with my patch applied as well.


Will do some more testing and update to the latest 3.3.0-rc1.
For now this is with 3.2.0 Linus tree.

$ sudo  ./syst_count -c 1 -p -e cpu_cycles

# 1s -
CPU1   G0  1008360963   cpu_cycles (scaling 0.00%,
ena=1000427246, run=1000427246)
# 2s -
CPU1   G0  2016503406   cpu_cycles (scaling 0.00%,
ena=2000610351, run=2000610351)
# 3s -
CPU1   G0  3024622201   cpu_cycles (scaling 0.00%,
ena=3000701904, run=3000701904)
# 4s -
CPU1   G0  4032753756   cpu_cycles (scaling 0.00%,
ena=4000823974, run=4000823974)
# 5s -
CPU1   G0  5041040463   cpu_cycles (scaling 0.00%,
ena=5001098633, run=5001098633)
# 6s -
CPU1   G0  6049184665   cpu_cycles (scaling 0.00%,
ena=6001220703, run=6001220703)
# 7s -
CPU1   G0  7057336298   cpu_cycles (scaling 0.00%,
ena=7001403808, run=7001403808)
# 8s -
CPU1   G0  8065459152   cpu_cycles (scaling 0.00%,
ena=8001556395, run=8001556395)
# 9s -
CPU1   G0  9074297578   cpu_cycles (scaling 0.00%,
ena=9002380370, run=9002380370)
# 10s -
CPU1   G0  10082619086  cpu_cycles (scaling 0.00%,
ena=10003540038, run=10003540038)

On Fri, Jan 27, 2012 at 3:09 PM, Ming Lei  wrote:
> Hi,
>
> 2012/1/27 Will Deacon :
>> Mans,
>>
>> On Fri, Jan 27, 2012 at 12:56:35PM +, Måns Rullgård wrote:
>>> Will Deacon  writes:
>>> > Did this lead anywhere in the end? It seems as though Ming Lei has a 
>>> > working
>>> > setup but Stephane is unable to replicate it, despite applying the 
>>> > necessary
>>> > patches and trying an updated bootloader.
>>>
>>> With the patches listed above plus the one in [1], I get PMU interrupts.
>>> However, unless I restrict the profiled process to one CPU
>>> (taskset 1 perf record ...), I get a panic in armpmu_event_update() with
>>> the 'event' argument being null when called from armv7pmu_handle_irq().
>>>
>>> [1] http://article.gmane.org/gmane.linux.ports.arm.omap/69696
>>
>> Great, thanks for trying this out. Which version of the kernel were you
>> using?
>
> The patch is required for 3.3-rc1 in case that omap4 pmu works well.
>
> thanks,
> --
> Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-27 Thread stephane eranian
2012/1/27 Will Deacon :
> Mans,
>
> On Fri, Jan 27, 2012 at 12:56:35PM +, Måns Rullgård wrote:
>> Will Deacon  writes:
>> > Did this lead anywhere in the end? It seems as though Ming Lei has a 
>> > working
>> > setup but Stephane is unable to replicate it, despite applying the 
>> > necessary
>> > patches and trying an updated bootloader.
>>
>> With the patches listed above plus the one in [1], I get PMU interrupts.
>> However, unless I restrict the profiled process to one CPU
>> (taskset 1 perf record ...), I get a panic in armpmu_event_update() with
>> the 'event' argument being null when called from armv7pmu_handle_irq().
>>
>> [1] http://article.gmane.org/gmane.linux.ports.arm.omap/69696
>
Ok, I am recompiling the kernel for this one line fix:

--- a/arch/arm/mach-omap2/clockdomains44xx_data.c
+++ b/arch/arm/mach-omap2/clockdomains44xx_data.c
@@ -390,7 +390,7 @@ static struct clockdomain emu_sys_44xx_clkdm = {
   .prcm_partition   = OMAP4430_PRM_PARTITION,
   .cm_inst  = OMAP4430_PRM_EMU_CM_INST,
   .clkdm_offs   = OMAP4430_PRM_EMU_CM_EMU_CDOFFS,
-   .flags= CLKDM_CAN_HWSUP,
+   .flags= CLKDM_CAN_SWSUP,
 };
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-27 Thread stephane eranian
On Fri, Jan 27, 2012 at 1:13 PM, Will Deacon  wrote:
> Hi guys,
>
> On Sat, Jan 21, 2012 at 09:16:57AM +, stephane eranian wrote:
>> On Sat, Jan 21, 2012 at 4:25 AM, Ming Lei  wrote:
>> > On Fri, Jan 20, 2012 at 9:47 PM, stephane eranian
>> >  wrote:
>> >> Started afresh from:
>> >>
>> >> 90a4c0f uml: fix compile for x86-64
>> >>
>> >> And added 3, 4, 5, 6:
>> >> 603c316 arm: omap4: pmu: support runtime pm
>> >> 4899fbd arm: omap4: support pmu
>> >> d737bb1 arm: omap4: create pmu device via hwmod
>> >> 4e0259e arm: omap4: hwmod: introduce emu hwmod
>> >>
>> >> Still no interrupts firing. I am using your .config file.
>> >>
>> >> My HW:
>> >> CPU implementer : 0x41
>> >> CPU architecture: 7
>> >> CPU variant     : 0x1
>> >> CPU part        : 0xc09
>> >> CPU revision    : 2
>> >>
>> >> Hardware        : OMAP4 Panda board
>> >> Revision        : 0020
>> >>
>> >> There must be something I am missing here.
>
> Did this lead anywhere in the end? It seems as though Ming Lei has a working
> setup but Stephane is unable to replicate it, despite applying the necessary
> patches and trying an updated bootloader.
>
> Drastic suggestion: Stephane, could you try a kernel *binary* from Ming Lei?
> If that works then you're probably just missing a patch. If it doesn't, then
> there must be something different between your boards.
>
Sure, send me the binary+initrd and I can try it out.

> Will
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-21 Thread stephane eranian
On Sat, Jan 21, 2012 at 4:25 AM, Ming Lei  wrote:
> On Fri, Jan 20, 2012 at 9:47 PM, stephane eranian
>  wrote:
>> Started afresh from:
>>
>> 90a4c0f uml: fix compile for x86-64
>>
>> And added 3, 4, 5, 6:
>> 603c316 arm: omap4: pmu: support runtime pm
>> 4899fbd arm: omap4: support pmu
>> d737bb1 arm: omap4: create pmu device via hwmod
>> 4e0259e arm: omap4: hwmod: introduce emu hwmod
>>
>> Still no interrupts firing. I am using your .config file.
>>
>> My HW:
>> CPU implementer : 0x41
>> CPU architecture: 7
>> CPU variant     : 0x1
>> CPU part        : 0xc09
>> CPU revision    : 2
>>
>> Hardware        : OMAP4 Panda board
>> Revision        : 0020
>>
>> There must be something I am missing here.
>
> Have you applied the patch in link[1]?
>
You mean this:
> [1], http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=summary

It does not point to a patch but to the entire tree.

>
> thanks,
> --
> Ming Lei
>
> [1], http://marc.info/?l=linux-arm-kernel&m=132697975416659&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-20 Thread stephane eranian
Started afresh from:

90a4c0f uml: fix compile for x86-64

And added 3, 4, 5, 6:
603c316 arm: omap4: pmu: support runtime pm
4899fbd arm: omap4: support pmu
d737bb1 arm: omap4: create pmu device via hwmod
4e0259e arm: omap4: hwmod: introduce emu hwmod

Still no interrupts firing. I am using your .config file.

My HW:
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x1
CPU part: 0xc09
CPU revision: 2

Hardware: OMAP4 Panda board
Revision: 0020

There must be something I am missing here.



On Thu, Jan 19, 2012 at 6:07 PM, stephane eranian
 wrote:
> Just did a fresh clone of Linus' tree:
>
> $ git log --oneline | fgrep 'allow platform specific'
> e0516a6 arm: pmu: allow platform specific irq enable/disable handling
>
> $ git log --oneline | fgrep 'cross trigger'
> 14eec97 arm: introduce cross trigger interface helpers
>
> Unless you were referring to a different pair of patches.
>
>
> On Thu, Jan 19, 2012 at 2:51 PM, Ming Lei  wrote:
>> Hi,
>>
>> On Thu, Jan 19, 2012 at 9:32 PM, stephane eranian
>>  wrote:
>>> On Thu, Jan 19, 2012 at 2:26 PM, Ming Lei  wrote:
>>>> Hi,
>>>>
>>>> On Thu, Jan 19, 2012 at 9:14 PM, Ming Lei  wrote:
>>>>> On Thu, Jan 19, 2012 at 8:51 PM, stephane eranian
>>>>>  wrote:
>>>>>> On Thu, Jan 19, 2012 at 1:45 PM, Ming Lei  wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Thu, Jan 19, 2012 at 7:34 PM, stephane eranian
>>>>>>>  wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Ok some update on this.
>>>>>>>> With your .config file + 3.2.0 (Linus) + patch 3, 4, 5, 6, I get a 
>>>>>>>> kernel that
>>>>>>>
>>>>>>> You forget patch 1 and patch 2?
>>>>>>>
>>>>>> They are already in 3.2.0. Unless I am mistaken.
>>>>>
>>>>> Sorry, I just found that they have been merged to 3.2.
>>>>
>>>> After a double check, the two patches are not merged to 3.2, but have
>>>> been merged to the latest linus tree and can be seen in 3.3-rc1.
>>>>
>>>> Also the commit 3c50729b(ARM: OMAP4: PM: Initialise all the clockdomains
>>>> to supported states) has been merged to linus tree too.
>>>>
>>>> So if you just tested the latest linus tree simply, you need to apply
>>>> the patch[1]
>>>> (I have mentioned the problem in the thread.)
>>>>
>>>
>>> Changing LMO, u-boot.bin did not help. Even with perf top I get no
>>> interrupts.
>>>
>>> My Linus tree is at commit fa1952b:
>>>
>>> [6] 11891e1 arm: omap4: pmu: support runtime pm
>>> [5] 25fab8a arm: omap4: support pmu
>>> [4] fddef77 arm: omap4: create pmu device via hwmod
>>>
>>> fa1952b ARM: OMAP4: hwmod data: Add support for the debug modules
>>
>> Sorry, there is no commit fa1952b in linus[1] tree at all, so you are
>> not testing
>> linus tree...
>>
>> If you'd like to follow my instructions, I can help you further.
>>
>>> ccb19d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
>>> c3b5003 tg3: Fix single-vector MSI-X code
>>>
>>> I think [1] had conflicts when applying it to the tree.
>>
>> It is only one line(one character) change, you can do it manually.
>>
>>>
>>>> thanks,
>>>> --
>>>> Ming Lei
>>>>
>>>> [1],
>>>> diff --git a/arch/arm/mach-omap2/clockdomains44xx_data.c
>>>> b/arch/arm/mach-omap2/clockdomains44xx_data.c
>>>> index 9299ac2..41d2260 100644
>>>> --- a/arch/arm/mach-omap2/clockdomains44xx_data.c
>>>> +++ b/arch/arm/mach-omap2/clockdomains44xx_data.c
>>>> @@ -390,7 +390,7 @@ static struct clockdomain emu_sys_44xx_clkdm = {
>>>>       .prcm_partition   = OMAP4430_PRM_PARTITION,
>>>>       .cm_inst          = OMAP4430_PRM_EMU_CM_INST,
>>>>       .clkdm_offs       = OMAP4430_PRM_EMU_CM_EMU_CDOFFS,
>>>> -       .flags            = CLKDM_CAN_HWSUP,
>>>> +       .flags            = CLKDM_CAN_SWSUP,
>>>>  };
>>>>
>>
>> [1], http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=summary
>>
>> thanks,
>> --
>> Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-19 Thread stephane eranian
Just did a fresh clone of Linus' tree:

$ git log --oneline | fgrep 'allow platform specific'
e0516a6 arm: pmu: allow platform specific irq enable/disable handling

$ git log --oneline | fgrep 'cross trigger'
14eec97 arm: introduce cross trigger interface helpers

Unless you were referring to a different pair of patches.


On Thu, Jan 19, 2012 at 2:51 PM, Ming Lei  wrote:
> Hi,
>
> On Thu, Jan 19, 2012 at 9:32 PM, stephane eranian
>  wrote:
>> On Thu, Jan 19, 2012 at 2:26 PM, Ming Lei  wrote:
>>> Hi,
>>>
>>> On Thu, Jan 19, 2012 at 9:14 PM, Ming Lei  wrote:
>>>> On Thu, Jan 19, 2012 at 8:51 PM, stephane eranian
>>>>  wrote:
>>>>> On Thu, Jan 19, 2012 at 1:45 PM, Ming Lei  wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Thu, Jan 19, 2012 at 7:34 PM, stephane eranian
>>>>>>  wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Ok some update on this.
>>>>>>> With your .config file + 3.2.0 (Linus) + patch 3, 4, 5, 6, I get a 
>>>>>>> kernel that
>>>>>>
>>>>>> You forget patch 1 and patch 2?
>>>>>>
>>>>> They are already in 3.2.0. Unless I am mistaken.
>>>>
>>>> Sorry, I just found that they have been merged to 3.2.
>>>
>>> After a double check, the two patches are not merged to 3.2, but have
>>> been merged to the latest linus tree and can be seen in 3.3-rc1.
>>>
>>> Also the commit 3c50729b(ARM: OMAP4: PM: Initialise all the clockdomains
>>> to supported states) has been merged to linus tree too.
>>>
>>> So if you just tested the latest linus tree simply, you need to apply
>>> the patch[1]
>>> (I have mentioned the problem in the thread.)
>>>
>>
>> Changing LMO, u-boot.bin did not help. Even with perf top I get no
>> interrupts.
>>
>> My Linus tree is at commit fa1952b:
>>
>> [6] 11891e1 arm: omap4: pmu: support runtime pm
>> [5] 25fab8a arm: omap4: support pmu
>> [4] fddef77 arm: omap4: create pmu device via hwmod
>>
>> fa1952b ARM: OMAP4: hwmod data: Add support for the debug modules
>
> Sorry, there is no commit fa1952b in linus[1] tree at all, so you are
> not testing
> linus tree...
>
> If you'd like to follow my instructions, I can help you further.
>
>> ccb19d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
>> c3b5003 tg3: Fix single-vector MSI-X code
>>
>> I think [1] had conflicts when applying it to the tree.
>
> It is only one line(one character) change, you can do it manually.
>
>>
>>> thanks,
>>> --
>>> Ming Lei
>>>
>>> [1],
>>> diff --git a/arch/arm/mach-omap2/clockdomains44xx_data.c
>>> b/arch/arm/mach-omap2/clockdomains44xx_data.c
>>> index 9299ac2..41d2260 100644
>>> --- a/arch/arm/mach-omap2/clockdomains44xx_data.c
>>> +++ b/arch/arm/mach-omap2/clockdomains44xx_data.c
>>> @@ -390,7 +390,7 @@ static struct clockdomain emu_sys_44xx_clkdm = {
>>>       .prcm_partition   = OMAP4430_PRM_PARTITION,
>>>       .cm_inst          = OMAP4430_PRM_EMU_CM_INST,
>>>       .clkdm_offs       = OMAP4430_PRM_EMU_CM_EMU_CDOFFS,
>>> -       .flags            = CLKDM_CAN_HWSUP,
>>> +       .flags            = CLKDM_CAN_SWSUP,
>>>  };
>>>
>
> [1], http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=summary
>
> thanks,
> --
> Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-19 Thread stephane eranian
On Thu, Jan 19, 2012 at 2:26 PM, Ming Lei  wrote:
> Hi,
>
> On Thu, Jan 19, 2012 at 9:14 PM, Ming Lei  wrote:
>> On Thu, Jan 19, 2012 at 8:51 PM, stephane eranian
>>  wrote:
>>> On Thu, Jan 19, 2012 at 1:45 PM, Ming Lei  wrote:
>>>> Hi,
>>>>
>>>> On Thu, Jan 19, 2012 at 7:34 PM, stephane eranian
>>>>  wrote:
>>>>> Hi,
>>>>>
>>>>> Ok some update on this.
>>>>> With your .config file + 3.2.0 (Linus) + patch 3, 4, 5, 6, I get a kernel 
>>>>> that
>>>>
>>>> You forget patch 1 and patch 2?
>>>>
>>> They are already in 3.2.0. Unless I am mistaken.
>>
>> Sorry, I just found that they have been merged to 3.2.
>
> After a double check, the two patches are not merged to 3.2, but have
> been merged to the latest linus tree and can be seen in 3.3-rc1.
>
> Also the commit 3c50729b(ARM: OMAP4: PM: Initialise all the clockdomains
> to supported states) has been merged to linus tree too.
>
> So if you just tested the latest linus tree simply, you need to apply
> the patch[1]
> (I have mentioned the problem in the thread.)
>

Changing LMO, u-boot.bin did not help. Even with perf top I get no
interrupts.

My Linus tree is at commit fa1952b:

[6] 11891e1 arm: omap4: pmu: support runtime pm
[5] 25fab8a arm: omap4: support pmu
[4] fddef77 arm: omap4: create pmu device via hwmod

fa1952b ARM: OMAP4: hwmod data: Add support for the debug modules
ccb19d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
c3b5003 tg3: Fix single-vector MSI-X code

I think [1] had conflicts when applying it to the tree.

> thanks,
> --
> Ming Lei
>
> [1],
> diff --git a/arch/arm/mach-omap2/clockdomains44xx_data.c
> b/arch/arm/mach-omap2/clockdomains44xx_data.c
> index 9299ac2..41d2260 100644
> --- a/arch/arm/mach-omap2/clockdomains44xx_data.c
> +++ b/arch/arm/mach-omap2/clockdomains44xx_data.c
> @@ -390,7 +390,7 @@ static struct clockdomain emu_sys_44xx_clkdm = {
>       .prcm_partition   = OMAP4430_PRM_PARTITION,
>       .cm_inst          = OMAP4430_PRM_EMU_CM_INST,
>       .clkdm_offs       = OMAP4430_PRM_EMU_CM_EMU_CDOFFS,
> -       .flags            = CLKDM_CAN_HWSUP,
> +       .flags            = CLKDM_CAN_SWSUP,
>  };
>
>  static struct clockdomain l3_dma_44xx_clkdm = {
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-19 Thread stephane eranian
On Thu, Jan 19, 2012 at 1:51 PM, stephane eranian
 wrote:
> On Thu, Jan 19, 2012 at 1:45 PM, Ming Lei  wrote:
>> Hi,
>>
>> On Thu, Jan 19, 2012 at 7:34 PM, stephane eranian
>>  wrote:
>>> Hi,
>>>
>>> Ok some update on this.
>>> With your .config file + 3.2.0 (Linus) + patch 3, 4, 5, 6, I get a kernel 
>>> that
>>
>> You forget patch 1 and patch 2?
>>
> They are already in 3.2.0. Unless I am mistaken.
>
e0516a6 arm: pmu: allow platform specific irq enable/disable handling
14eec97 arm: introduce cross trigger interface helpers

> are you sure you don't have anything else applied?
>
>>> boots. It does recognize the PMU. However, it still does not count correctly
>>> and I believe for the same reason.: no interrupts are delivered.
>>>
>>> I run a cycle burner program on CPU0, I watch /proc/interrupts.
>>> and then I run  libpfm4 program that does per-cpu monitoring on CPU0 and
>>> print the counts every second:
>>
>> I just run 'perf top', then watch output of '/proc/interrupts' in
>> another terminal. I am sure I can see perf is OK and interrupts are
>> generated on my pandaboard.
>>
>>>
>>> $ sudo ./syst_count -d 10 -p -c 0 -e cpu_cycles
>>> 
>>> # 1s -
>>> CPU0   G0  1008129147           cpu_cycles (scaling 0.00%,
>>> ena=1000152588, run=1000152588)
>>> # 2s -
>>> CPU0   G0  2016240766           cpu_cycles (scaling 0.00%,
>>> ena=2000335693, run=2000335693)
>>> # 3s -
>>> CPU0   G0  3024249265           cpu_cycles (scaling 0.00%,
>>> ena=3000427245, run=3000427245)
>>> # 4s -
>>> CPU0   G0  4072779364           cpu_cycles (scaling 0.00%,
>>> ena=4040710449, run=4040710449)
>>> # 5s -
>>> CPU0   G0  785954705            cpu_cycles (scaling 0.00%,
>>> ena=5040954589, run=5040954589)
>>> # 6s -
>>> CPU0   G0  1803397848           cpu_cycles (scaling 0.00%,
>>> ena=6050384520, run=6050384520)
>>> # 7s -
>>>
>>> You clearly see that after 4s you've reached the 32-bit limit of the
>>> counter and then you wrap around.
>>> It should show 5 billions or so cycles. Over the entire run, no
>>> arm-pmu interrupt was delivered according
>>> to /proc/interrupts.
>>>
>>> I guess you can test the same condition using perf directly, use a
>>> program that burns cycles
>>> for a know duration. Try < 4s and then > 4s. I use 1s vs. 10s and I
>>> expect the count to be
>>> 10x larger in the latter test case. If it's not then, interrupts are
>>> not coming in,
>>>
>>>
>>> On Thu, Jan 19, 2012 at 2:21 AM, Ming Lei  wrote:
>>>> Hi,
>>>>
>>>> On Thu, Jan 19, 2012 at 5:58 AM, stephane eranian
>>>>  wrote:
>>>>> Ming,
>>>>>
>>>>> Ok, so I used Linus' tree @
>>>>>
>>>>> It already includes patches #1 and #2. I applied 4-6.
>>>>
>>>> The patch #3 is missed?
>>>>
>>>>> Recompiled but my kernel does not boot, I don't see
>>>>> anything on the serial console. Could be a broken
>>>>
>>>> I don't think that the patches can cause your non boot, you
>>>> can try the linus tree kernel first, then try the patches.
>>>>
>>>>> .config file. Could you send me your .config for Panda?
>>>>
>>>> See the attachment.
>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>> On Wed, Jan 18, 2012 at 11:07 AM, Ming Lei  wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Wed, Jan 18, 2012 at 5:54 PM, stephane eranian 
>>>>>> 
>>>>>>> Should I use Will's -next tree as the base instead of Linus'?
>>>>>>
>>>>>> Either one is OK. If you use linus tree as base, you need to apply the 
>>>>>> #1 and
>>>>>> #2 patch manually.
>>>>>>
>>>>>>> Given that MARC is shutdown today, would you mind packing those patches
>>>>>>> into a tarball and sending them to me directly?
>>>>>>
>>>>>> See attachment, which includes the patches from #3 to #6.
>>>>>>
>>>>>>>
>>>>>>> When you mention Will's -next tree, are you talking about:
>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
>>>>>>> for-next/perf
>>>>>>
>>>>>> It is perf/omap4 brach, you can pick up the two patches[1][2] directly 
>>>>>> from
>>>>>> the branch.
>>>>>>
>>>>>>
>>>>>> thanks,
>>>>>> --
>>>>>> Ming Lei
>>>>>>
>>>>>> [1], 
>>>>>> http://git.kernel.org/?p=linux/kernel/git/will/linux.git;a=commit;h=7924a3eba0766348d6d6a56cbb9873cdbcab0d8c
>>>>>>
>>>>>> [2], 
>>>>>> http://git.kernel.org/?p=linux/kernel/git/will/linux.git;a=commit;h=bde071f005e2dc71378aff69e86b961d8cd7922f
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>>>>> the body of a message to majord...@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-19 Thread stephane eranian
On Thu, Jan 19, 2012 at 1:45 PM, Ming Lei  wrote:
> Hi,
>
> On Thu, Jan 19, 2012 at 7:34 PM, stephane eranian
>  wrote:
>> Hi,
>>
>> Ok some update on this.
>> With your .config file + 3.2.0 (Linus) + patch 3, 4, 5, 6, I get a kernel 
>> that
>
> You forget patch 1 and patch 2?
>
They are already in 3.2.0. Unless I am mistaken.

are you sure you don't have anything else applied?

>> boots. It does recognize the PMU. However, it still does not count correctly
>> and I believe for the same reason.: no interrupts are delivered.
>>
>> I run a cycle burner program on CPU0, I watch /proc/interrupts.
>> and then I run  libpfm4 program that does per-cpu monitoring on CPU0 and
>> print the counts every second:
>
> I just run 'perf top', then watch output of '/proc/interrupts' in
> another terminal. I am sure I can see perf is OK and interrupts are
> generated on my pandaboard.
>
>>
>> $ sudo ./syst_count -d 10 -p -c 0 -e cpu_cycles
>> 
>> # 1s -
>> CPU0   G0  1008129147           cpu_cycles (scaling 0.00%,
>> ena=1000152588, run=1000152588)
>> # 2s -
>> CPU0   G0  2016240766           cpu_cycles (scaling 0.00%,
>> ena=2000335693, run=2000335693)
>> # 3s -
>> CPU0   G0  3024249265           cpu_cycles (scaling 0.00%,
>> ena=3000427245, run=3000427245)
>> # 4s -
>> CPU0   G0  4072779364           cpu_cycles (scaling 0.00%,
>> ena=4040710449, run=4040710449)
>> # 5s -
>> CPU0   G0  785954705            cpu_cycles (scaling 0.00%,
>> ena=5040954589, run=5040954589)
>> # 6s -
>> CPU0   G0  1803397848           cpu_cycles (scaling 0.00%,
>> ena=6050384520, run=6050384520)
>> # 7s -
>>
>> You clearly see that after 4s you've reached the 32-bit limit of the
>> counter and then you wrap around.
>> It should show 5 billions or so cycles. Over the entire run, no
>> arm-pmu interrupt was delivered according
>> to /proc/interrupts.
>>
>> I guess you can test the same condition using perf directly, use a
>> program that burns cycles
>> for a know duration. Try < 4s and then > 4s. I use 1s vs. 10s and I
>> expect the count to be
>> 10x larger in the latter test case. If it's not then, interrupts are
>> not coming in,
>>
>>
>> On Thu, Jan 19, 2012 at 2:21 AM, Ming Lei  wrote:
>>> Hi,
>>>
>>> On Thu, Jan 19, 2012 at 5:58 AM, stephane eranian
>>>  wrote:
>>>> Ming,
>>>>
>>>> Ok, so I used Linus' tree @
>>>>
>>>> It already includes patches #1 and #2. I applied 4-6.
>>>
>>> The patch #3 is missed?
>>>
>>>> Recompiled but my kernel does not boot, I don't see
>>>> anything on the serial console. Could be a broken
>>>
>>> I don't think that the patches can cause your non boot, you
>>> can try the linus tree kernel first, then try the patches.
>>>
>>>> .config file. Could you send me your .config for Panda?
>>>
>>> See the attachment.
>>>
>>>>
>>>> Thanks.
>>>>
>>>> On Wed, Jan 18, 2012 at 11:07 AM, Ming Lei  wrote:
>>>>> Hi,
>>>>>
>>>>> On Wed, Jan 18, 2012 at 5:54 PM, stephane eranian 
>>>>>> Should I use Will's -next tree as the base instead of Linus'?
>>>>>
>>>>> Either one is OK. If you use linus tree as base, you need to apply the #1 
>>>>> and
>>>>> #2 patch manually.
>>>>>
>>>>>> Given that MARC is shutdown today, would you mind packing those patches
>>>>>> into a tarball and sending them to me directly?
>>>>>
>>>>> See attachment, which includes the patches from #3 to #6.
>>>>>
>>>>>>
>>>>>> When you mention Will's -next tree, are you talking about:
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
>>>>>> for-next/perf
>>>>>
>>>>> It is perf/omap4 brach, you can pick up the two patches[1][2] directly 
>>>>> from
>>>>> the branch.
>>>>>
>>>>>
>>>>> thanks,
>>>>> --
>>>>> Ming Lei
>>>>>
>>>>> [1], 
>>>>> http://git.kernel.org/?p=linux/kernel/git/will/linux.git;a=commit;h=7924a3eba0766348d6d6a56cbb9873cdbcab0d8c
>>>>>
>>>>> [2], 
>>>>> http://git.kernel.org/?p=linux/kernel/git/will/linux.git;a=commit;h=bde071f005e2dc71378aff69e86b961d8cd7922f
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-19 Thread stephane eranian
Hi,

Ok some update on this.
With your .config file + 3.2.0 (Linus) + patch 3, 4, 5, 6, I get a kernel that
boots. It does recognize the PMU. However, it still does not count correctly
and I believe for the same reason.: no interrupts are delivered.

I run a cycle burner program on CPU0, I watch /proc/interrupts.
and then I run  libpfm4 program that does per-cpu monitoring on CPU0 and
print the counts every second:

$ sudo ./syst_count -d 10 -p -c 0 -e cpu_cycles

# 1s -
CPU0   G0  1008129147   cpu_cycles (scaling 0.00%,
ena=1000152588, run=1000152588)
# 2s -
CPU0   G0  2016240766   cpu_cycles (scaling 0.00%,
ena=2000335693, run=2000335693)
# 3s -
CPU0   G0  3024249265   cpu_cycles (scaling 0.00%,
ena=3000427245, run=3000427245)
# 4s -
CPU0   G0  4072779364   cpu_cycles (scaling 0.00%,
ena=4040710449, run=4040710449)
# 5s -
CPU0   G0  785954705cpu_cycles (scaling 0.00%,
ena=5040954589, run=5040954589)
# 6s -
CPU0   G0  1803397848   cpu_cycles (scaling 0.00%,
ena=6050384520, run=6050384520)
# 7s -

You clearly see that after 4s you've reached the 32-bit limit of the
counter and then you wrap around.
It should show 5 billions or so cycles. Over the entire run, no
arm-pmu interrupt was delivered according
to /proc/interrupts.

I guess you can test the same condition using perf directly, use a
program that burns cycles
for a know duration. Try < 4s and then > 4s. I use 1s vs. 10s and I
expect the count to be
10x larger in the latter test case. If it's not then, interrupts are
not coming in,


On Thu, Jan 19, 2012 at 2:21 AM, Ming Lei  wrote:
> Hi,
>
> On Thu, Jan 19, 2012 at 5:58 AM, stephane eranian
>  wrote:
>> Ming,
>>
>> Ok, so I used Linus' tree @
>>
>> It already includes patches #1 and #2. I applied 4-6.
>
> The patch #3 is missed?
>
>> Recompiled but my kernel does not boot, I don't see
>> anything on the serial console. Could be a broken
>
> I don't think that the patches can cause your non boot, you
> can try the linus tree kernel first, then try the patches.
>
>> .config file. Could you send me your .config for Panda?
>
> See the attachment.
>
>>
>> Thanks.
>>
>> On Wed, Jan 18, 2012 at 11:07 AM, Ming Lei  wrote:
>>> Hi,
>>>
>>> On Wed, Jan 18, 2012 at 5:54 PM, stephane eranian 
>>>> Should I use Will's -next tree as the base instead of Linus'?
>>>
>>> Either one is OK. If you use linus tree as base, you need to apply the #1 
>>> and
>>> #2 patch manually.
>>>
>>>> Given that MARC is shutdown today, would you mind packing those patches
>>>> into a tarball and sending them to me directly?
>>>
>>> See attachment, which includes the patches from #3 to #6.
>>>
>>>>
>>>> When you mention Will's -next tree, are you talking about:
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git for-next/perf
>>>
>>> It is perf/omap4 brach, you can pick up the two patches[1][2] directly from
>>> the branch.
>>>
>>>
>>> thanks,
>>> --
>>> Ming Lei
>>>
>>> [1], 
>>> http://git.kernel.org/?p=linux/kernel/git/will/linux.git;a=commit;h=7924a3eba0766348d6d6a56cbb9873cdbcab0d8c
>>>
>>> [2], 
>>> http://git.kernel.org/?p=linux/kernel/git/will/linux.git;a=commit;h=bde071f005e2dc71378aff69e86b961d8cd7922f
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-18 Thread stephane eranian
Ming,

Ok, so I used Linus' tree @

It already includes patches #1 and #2. I applied 4-6.
Recompiled but my kernel does not boot, I don't see
anything on the serial console. Could be a broken
.config file. Could you send me your .config for Panda?

Thanks.

On Wed, Jan 18, 2012 at 11:07 AM, Ming Lei  wrote:
> Hi,
>
> On Wed, Jan 18, 2012 at 5:54 PM, stephane eranian 
>> Should I use Will's -next tree as the base instead of Linus'?
>
> Either one is OK. If you use linus tree as base, you need to apply the #1 and
> #2 patch manually.
>
>> Given that MARC is shutdown today, would you mind packing those patches
>> into a tarball and sending them to me directly?
>
> See attachment, which includes the patches from #3 to #6.
>
>>
>> When you mention Will's -next tree, are you talking about:
>> git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git for-next/perf
>
> It is perf/omap4 brach, you can pick up the two patches[1][2] directly from
> the branch.
>
>
> thanks,
> --
> Ming Lei
>
> [1], 
> http://git.kernel.org/?p=linux/kernel/git/will/linux.git;a=commit;h=7924a3eba0766348d6d6a56cbb9873cdbcab0d8c
>
> [2], 
> http://git.kernel.org/?p=linux/kernel/git/will/linux.git;a=commit;h=bde071f005e2dc71378aff69e86b961d8cd7922f
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-18 Thread stephane eranian
On Wed, Jan 18, 2012 at 5:18 AM, Ming Lei  wrote:
> Hi stephane & Will,
>
> On Tue, Jan 10, 2012 at 8:46 AM, stephane eranian
>  wrote:
>> See the dmesg from my 3.2 kernel:
>>
>>
>> [    0.00] Booting Linux on physical CPU 0[    0.00]
>
> Looks no obvious failure can be found from your 'dmesg'.
>
> I have run upstream 3.2 kernel plus 6 omap4 pmu patches below and
> found perf can work well on my panda board.
>
>        0001-arm-introduce-cross-trigger-interface-helpers.patch
>        0002-arm-pmu-allow-platform-specific-irq-enable-disable-h.patch
>        0003-arm-omap4-hwmod-introduce-emu-hwmod.patch or Benoit's debugss 
> patch[2]
>        0004-arm-omap4-create-pmu-device-via-hwmod.patch[3]
>        0005-arm-omap4-support-pmu.patch[4]
>        0006-arm-omap4-pmu-support-runtime-pm.patch[5]
>
> Could you verify the above patches on 3.2 to see if perf can work well?
> If it doesn't, I may share my u-boot and mlo for your test if you'd like to 
> do.
>
> BTW: #1 and #2 have been in Will's -next tree.
>
Should I use Will's -next tree as the base instead of Linus'?
Given that MARC is shutdown today, would you mind packing those patches
into a tarball and sending them to me directly?

When you mention Will's -next tree, are you talking about:
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git for-next/perf

Thanks.

> thanks,
> --
> Ming Lei
>
> [1], uname -a & cat /proc/interrupts
> [root@root]#uname -a
> Linux beagleboard 3.2.0+ #480 SMP PREEMPT Wed Jan 18 11:38:33 CST 2012
> armv7l GNU/Linux
> [root@root]#cat /proc/interrupts
>           CPU0       CPU1
>  29:      29014      17353       GIC  twd
>  33:      56231          0       GIC  arm-pmu
>  34:          0      25778       GIC  arm-pmu
>
> [2], http://marc.info/?l=linux-omap&m=132162118104901&w=2
> [3],http://marc.info/?t=13222762152&r=1&w=2
> [4],http://marc.info/?t=13222762172&r=1&w=2
> [5],http://marc.info/?t=13222762173&r=1&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oprofile and ARM A9 hardware counter

2012-01-09 Thread stephane eranian
See the dmesg from my 3.2 kernel:


[    0.00] Booting Linux on physical CPU 0[    0.00]
Initializing cgroup subsys cpuset[    0.00] Initializing cgroup
subsys cpu[    0.00] Linux version 3.2.0-omap4 (eranian@panda)
(gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) ) #9 SMP PR[
0.00] CPU: ARMv7 Processor [411fc092] revision 2 (ARMv7),
cr=10c5387d[    0.00] CPU: PIPT / VIPT nonaliasing data cache,
VIPT aliasing instruction cache[    0.00] Machine: OMAP4 Panda
board[    0.00] Reserving 33554432 bytes SDRAM for VRAM[
0.00] Memory policy: ECC disabled, Data cache writealloc[
0.00] On node 0 totalpages: 239616[    0.00]
free_area_init_node: node 0, pgdat c077c180, node_mem_map c07f4000[
0.00]   Normal zone: 1536 pages used for memmap[    0.00]
Normal zone: 0 pages reserved[    0.00]   Normal zone: 180736
pages, LIFO batch:31[    0.00]   HighMem zone: 512 pages used for
memmap[    0.00]   HighMem zone: 56832 pages, LIFO batch:15[
0.00] OMAP4430 ES2.2[    0.00] PERCPU: Embedded 8 pages/cpu
@c0ffc000 s10240 r8192 d14336 u32768[    0.00] pcpu-alloc: s10240
r8192 d14336 u32768 alloc=8*4096[    0.00] pcpu-alloc: [0] 0 [0]
1[    0.00] Built 1 zonelists in Zone order, mobility grouping on.
 Total pages: 237568[    0.00] Kernel command line: ro
elevator=noop vram=32M mem=456M@0x8000 mem=512M@0xA000
root=UUID=ec3f7a[    0.00] PID hash table entries: 4096 (order: 2,
16384 bytes)[    0.00] Dentry cache hash table entries: 131072
(order: 7, 524288 bytes)[    0.00] Inode-cache hash table entries:
65536 (order: 6, 262144 bytes)[    0.00] allocated 4194304 bytes
of page_cgroup[    0.00] please try 'cgroup_disable=memory' option
if you don't want memory cgroups[    0.00] Memory: 456MB 480MB =
936MB total[    0.00] Memory: 934980k/934980k available, 56252k
reserved, 229376K highmem[    0.00] Virtual kernel memory layout:[
   0.00]     vector  : 0x - 0x1000   (   4 kB)[
0.00]     fixmap  : 0xfff0 - 0xfffe   ( 896 kB)[
0.00]     vmalloc : 0xf080 - 0xf800   ( 120 MB)[
0.00]     lowmem  : 0xc000 - 0xf000   ( 768 MB)[
0.00]     pkmap   : 0xbfe0 - 0xc000   (   2 MB)[
0.00]     modules : 0xbf00 - 0xbfe0   (  14 MB)[
0.00]       .text : 0xc0008000 - 0xc06e1134   (7013 kB)[
0.00]       .init : 0xc06e2000 - 0xc071e800   ( 242 kB)[
0.00]       .data : 0xc072 - 0xc077ecf0   ( 380 kB)[
0.00]        .bss : 0xc077ed14 - 0xc07f32ec   ( 466 kB)[
0.00] SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0,
CPUs=2, Nodes=1[    0.00] Preemptible hierarchical RCU
implementation.[    0.00] NR_IRQS:410[    0.00] omap_hwmod:
dpll_mpu_m2_ck: missing clockdomain for dpll_mpu_m2_ck.[    0.00]
OMAP clockevent source: GPTIMER1 at 32768 Hz[    0.00]
sched_clock: 32 bits at 32kHz, resolution 30517ns, wraps every
131071999ms[    0.00] Console: colour dummy device 80x30[
0.00] console [tty0] enabled[    0.000213] Calibrating delay
loop... 1576.53 BogoMIPS (lpj=6156288)[    0.070373] pid_max: default:
32768 minimum: 301[    0.070617] Security Framework initialized[
0.070678] Smack:  Initializing.[    0.070770] Mount-cache hash table
entries: 512[    0.071807] Initializing cgroup subsys cpuacct[
0.071868] Initializing cgroup subsys memory[    0.071929] Initializing
cgroup subsys devices[    0.071929] Initializing cgroup subsys
freezer[    0.071960] Initializing cgroup subsys blkio[    0.071990]
Initializing cgroup subsys perf_event[    0.072143] CPU: Testing write
buffer coherency: ok[    0.072448] CPU0: thread -1, cpu 0, socket 0,
mpidr 8000[    0.072509] Calibrating local timer... 386.32MHz.[
0.117462] hw perfevents: enabled with ARMv7 Cortex-A9 PMU driver, 7
counters available[    0.117523] L310 cache controller enabled[
0.117523] l2x0: 16 ways, CACHE_ID 0x41c4, AUX_CTRL 0x7e47,
Cache size: 1048576 B[    0.194000] CPU1: Booted secondary processor[
  0.224121] CPU1: thread -1, cpu 1, socket 0, mpidr 8001[
0.224151] CPU1: Unknown IPI message 0x1[    0.224182] Brought up 2
CPUs[    0.224212] SMP: Total of 2 processors activated (3115.31
BogoMIPS).[    0.225097] devtmpfs: initialized[    0.228820]
omap_hwmod: l3_div_ck: missing clockdomain for l3_div_ck.[
0.231903] omap_hwmod: dmm: _wait_target_disable failed[    0.234497]
omap_hwmod: emif_fw: _wait_target_disable failed[    0.237091]
omap_hwmod: l3_main_1: _wait_target_disable failed[    0.239715]
omap_hwmod: l3_main_2: _wait_target_disable failed[    0.242309]
omap_hwmod: l4_abe: _wait_target_disable failed[    0.244903]
omap_hwmod: l4_cfg: _wait_target_disable failed[    0.247528]
omap_hwmod: l4_per: _wait_target_disable failed[    0.250610]
omap_hwmod: l4_wkup: _wait_target_disable failed[    0.253234]
omap_hwmod: dma_system: _wait_target_disable failed[    0.255889]
omap_hwmod: dss_core: _wait_target_disable failed[    0.258514]
omap_hwmod: 

[BUG] perf_event: no PMU interrupt on OMAP4 with 3.2.0 (Pandaboard)

2012-01-06 Thread Stephane Eranian
Hi,

I am trying to get perf_event to work properly on my OMAP4 Pandabaord
running the 3.2.0 kernel. I am the developer on libpfm4 and a regular
contributor to the perf_event subsystem and perf tool. I want to use
a Pandaboard to test libpfm4 ARM support.

I have been talking with Will Deacon and he suggested I post to this list.

I know that the off-the-shelf 3.2.0 does not have working PMU interrupts.
I have integrated additional patches from both Ming Lei and Linaro. It
seems some parts of those patches were integrated into 3.2.0.

I have attached my 3.2.0 changes to this message. With that, I get perf_event
to register PMU interrupts 33 and 34. However, they don't fire when I count
therefore the counts are bogus (counters wrap around).

I suspect I am missing something in the patch I put together. But I don't know
what's missing. I am not an ARM platform expert. I am hoping someone on this
list may shed some light on this problem.

It would be really nice to have perf_event running out of the box on Pandaboard
for 3.3. It would go a long way into making performance monitoring usable on 
ARM.

Thanks.
---

diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
index 0bda22c..b5a5be2 100644
--- a/arch/arm/include/asm/pmu.h
+++ b/arch/arm/include/asm/pmu.h
@@ -27,13 +27,22 @@ enum arm_pmu_type {
 /*
  * struct arm_pmu_platdata - ARM PMU platform data
  *
- * @handle_irq: an optional handler which will be called from the interrupt and
- * passed the address of the low level handler, and can be used to implement
- * any platform specific handling before or after calling it.
+ * @handle_irq: an optional handler which will be called from the
+ * interrupt and passed the address of the low level handler,
+ * and can be used to implement any platform specific handling
+ * before or after calling it.
+ * @enable_irq: an optional handler which will be called after
+ * request_irq and be used to handle some platform specific
+ * irq enablement
+ * @disable_irq: an optional handler which will be called before
+ * free_irq and be used to handle some platform specific
+ * irq disablement
  */
 struct arm_pmu_platdata {
irqreturn_t (*handle_irq)(int irq, void *dev,
  irq_handler_t pmu_handler);
+   void (*enable_irq)(int irq);
+   void (*disable_irq)(int irq);
 };
 
 #ifdef CONFIG_CPU_HAS_PMU
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 88b0941..24e001f 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -380,6 +380,8 @@ armpmu_release_hardware(struct arm_pmu *armpmu)
 {
int i, irq, irqs;
struct platform_device *pmu_device = armpmu->plat_device;
+   struct arm_pmu_platdata *plat =
+   dev_get_platdata(&pmu_device->dev);
 
irqs = min(pmu_device->num_resources, num_possible_cpus());
 
@@ -387,8 +389,11 @@ armpmu_release_hardware(struct arm_pmu *armpmu)
if (!cpumask_test_and_clear_cpu(i, &armpmu->active_irqs))
continue;
irq = platform_get_irq(pmu_device, i);
-   if (irq >= 0)
+   if (irq >= 0) {
+   if (plat->disable_irq)
+   plat->disable_irq(irq);
free_irq(irq, armpmu);
+   }
}
 
release_pmu(armpmu->type);
@@ -449,6 +454,8 @@ armpmu_reserve_hardware(struct arm_pmu *armpmu)
armpmu_release_hardware(armpmu);
return err;
}
+   if (plat->enable_irq)
+   plat->enable_irq(irq);
 
cpumask_set_cpu(i, &armpmu->active_irqs);
}
diff --git a/arch/arm/mach-omap2/devices.c b/arch/arm/mach-omap2/devices.c
index c15cfad..6caf31a 100644
--- a/arch/arm/mach-omap2/devices.c
+++ b/arch/arm/mach-omap2/devices.c
@@ -17,12 +17,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -404,14 +406,136 @@ static struct platform_device omap_pmu_device = {
.num_resources  = 1,
 };
 
-static void omap_init_pmu(void)
+static struct arm_pmu_platdata omap4_pmu_data;
+static struct cti omap4_cti[2];
+static struct platform_device *pmu_dev;
+
+static void omap4_enable_cti(int irq)
 {
-   if (cpu_is_omap24xx())
+   pm_runtime_get_sync(&pmu_dev->dev);
+
+   if (irq == OMAP44XX_IRQ_CTI0)
+   cti_enable(&omap4_cti[0]);
+   else if (irq == OMAP44XX_IRQ_CTI1)
+   cti_enable(&omap4_cti[1]);
+}
+
+static void omap4_disable_cti(int irq)
+{
+   if (irq == OMAP44XX_IRQ_CTI0)
+   cti_disable(&omap4_cti[0]);
+   else if (irq == OMAP44XX_IRQ_CTI1)
+   cti_disable(&omap4_cti[1]);
+   pm_runtime_put(&pmu_dev->dev);
+}
+
+static irqreturn_t omap4_pmu_handler(int irq, void *dev, irq_handler_t handler)
+{
+   if (irq