HI Stefane,

Thanks very much for this explanation. I'm CC'ing the list for other  
implementers. Yes, it looks very much like this optimization can be  
made here and for some other platforms as well.

One thing that I've noticed, as this optimization goes, I need do the  
following in these routines:

mask = used_pmc/pmd & impl_pmc/pmd ; // And sometimes cnt_pmds
num = weight(mask)
for_each_bit(mask,num)
   do something;

(Note the kernel does not have (in 2.6.18) a for_each_bit bit macro,  
why?)

The optimizations I made previously, short circuit on the basis of the  
number of bits set, one could take that even further into the above  
using bitmap_ffs and the weight. Furthemore, when the set is created/ 
written to, one could set up the above masks ahead of time, and thus  
when arch_start/arch_restore_pmd/c/arch_stop are called, we would save  
a lot of instructions and some cache misses on the stack. Now with  
only a handful of counters, perhaps this stuff doesn't make a  
difference, but with a bunch, and the bitmaps almost always being very  
sparse, this saves hundreds of instructions. Considering these are in  
the critical path, might make a bunch of sense to precompute the  
following, for platforms that don't have 'leak' risks.

used_impl_pmcs = used_pmc_mask & impl_pmc_mask
used_impl_pmds = used_pmd_mask & impl_pmd_mask
used_impl_cnt_pmds = used_pmd_mask & impl_pmd_mask & cnt_pmd_mask

Along with their corresponding bit weights,

Looking at the high level code (like pfm_save_pmds), I see the some of  
the same opportunities there. I realize that the priority right now is  
simplification, not optimization, but this is just something I have  
noticed. A nice side effect is that the final code in the arch  
specific stuff becomes very readable, something like.

for_each_bit(cntr,used_impl_pmcs,used_impl_pmcs_num)
        pfm_write_pmc(cntr,set->pmc[cntr])

Regards from sunny, but still freaking' cold Sweden,

Phil


On Apr 22, 2008, at 10:48 AM, stephane eranian wrote:

> Phil,
>
> On Tue, Apr 22, 2008 at 10:18 AM, Philip Mucci <[EMAIL PROTECTED]>  
> wrote:
>> Hi again,
>>
>> Of course, as soon as a take the time to write this email, I see  
>> the test
>> is correct here, you are not testing bounds against 'num'.
>>
> Yes, num is not checked against the bounds. With this logic, we scan
> only up to the
> point where we hit all USED registers. That is the minimum we can do.
> Of course, there
> could be an optimization is nused = 1, but many times one counter
> means PMD0 so not
> many iterations for nothing.
>
>> Would there be any reason I could not make this mod to all the  
>> routines in
>> perfmon.c in arch/mips?
>>
> You certainly can when saving PMD registers. On the restore side, you
> have to look at
> risks of leaking and side effects (for PMC). If either risk exists,
> then you have to restore
> ALL registers, otherwise you can afford to restore ONLY what you are
> actually using.
> Take the example of P4, for each counter you have 2 config registers
> (CCCR, ESCR).
> When you restore the config registers, you have to restore all of them
> because you can
> imagine a (buggy) application which only programmed the CCCR. Then on
> restore it could
> pick up the ESCR from another thread.
>
> Perfmon guarantees that unused PMC registers cannot influence the
> measurement, i.e.,
> are loaded with a harmless (quiescent) value. You see clearly that the
> P4 example above
> would violate that guarantee.
>
> Of course, there are ways to avoid the problem without having to
> restore everything at
> each context switch. For instance, it would be possible to install a
> PMC write checker
> that the catch the CCCR write and would systematically mark the
> corresponding ESCR
> as used. Thus there would be no way of getting side effects from a
> previous thread.
>
> But my understanding of MIPS (at least the generic counters) is that  
> there is no
> side effects possible between config registers. Therefore, I think  
> you can make
> this optimization.
>
> Hope this helps.


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to