HI Stefane,
Thanks very much for this explanation. I'm CC'ing the list for other
implementers. Yes, it looks very much like this optimization can be
made here and for some other platforms as well.
One thing that I've noticed, as this optimization goes, I need do the
following in these routines:
mask = used_pmc/pmd & impl_pmc/pmd ; // And sometimes cnt_pmds
num = weight(mask)
for_each_bit(mask,num)
do something;
(Note the kernel does not have (in 2.6.18) a for_each_bit bit macro,
why?)
The optimizations I made previously, short circuit on the basis of the
number of bits set, one could take that even further into the above
using bitmap_ffs and the weight. Furthemore, when the set is created/
written to, one could set up the above masks ahead of time, and thus
when arch_start/arch_restore_pmd/c/arch_stop are called, we would save
a lot of instructions and some cache misses on the stack. Now with
only a handful of counters, perhaps this stuff doesn't make a
difference, but with a bunch, and the bitmaps almost always being very
sparse, this saves hundreds of instructions. Considering these are in
the critical path, might make a bunch of sense to precompute the
following, for platforms that don't have 'leak' risks.
used_impl_pmcs = used_pmc_mask & impl_pmc_mask
used_impl_pmds = used_pmd_mask & impl_pmd_mask
used_impl_cnt_pmds = used_pmd_mask & impl_pmd_mask & cnt_pmd_mask
Along with their corresponding bit weights,
Looking at the high level code (like pfm_save_pmds), I see the some of
the same opportunities there. I realize that the priority right now is
simplification, not optimization, but this is just something I have
noticed. A nice side effect is that the final code in the arch
specific stuff becomes very readable, something like.
for_each_bit(cntr,used_impl_pmcs,used_impl_pmcs_num)
pfm_write_pmc(cntr,set->pmc[cntr])
Regards from sunny, but still freaking' cold Sweden,
Phil
On Apr 22, 2008, at 10:48 AM, stephane eranian wrote:
> Phil,
>
> On Tue, Apr 22, 2008 at 10:18 AM, Philip Mucci <[EMAIL PROTECTED]>
> wrote:
>> Hi again,
>>
>> Of course, as soon as a take the time to write this email, I see
>> the test
>> is correct here, you are not testing bounds against 'num'.
>>
> Yes, num is not checked against the bounds. With this logic, we scan
> only up to the
> point where we hit all USED registers. That is the minimum we can do.
> Of course, there
> could be an optimization is nused = 1, but many times one counter
> means PMD0 so not
> many iterations for nothing.
>
>> Would there be any reason I could not make this mod to all the
>> routines in
>> perfmon.c in arch/mips?
>>
> You certainly can when saving PMD registers. On the restore side, you
> have to look at
> risks of leaking and side effects (for PMC). If either risk exists,
> then you have to restore
> ALL registers, otherwise you can afford to restore ONLY what you are
> actually using.
> Take the example of P4, for each counter you have 2 config registers
> (CCCR, ESCR).
> When you restore the config registers, you have to restore all of them
> because you can
> imagine a (buggy) application which only programmed the CCCR. Then on
> restore it could
> pick up the ESCR from another thread.
>
> Perfmon guarantees that unused PMC registers cannot influence the
> measurement, i.e.,
> are loaded with a harmless (quiescent) value. You see clearly that the
> P4 example above
> would violate that guarantee.
>
> Of course, there are ways to avoid the problem without having to
> restore everything at
> each context switch. For instance, it would be possible to install a
> PMC write checker
> that the catch the CCCR write and would systematically mark the
> corresponding ESCR
> as used. Thus there would be no way of getting side effects from a
> previous thread.
>
> But my understanding of MIPS (at least the generic counters) is that
> there is no
> side effects possible between config registers. Therefore, I think
> you can make
> this optimization.
>
> Hope this helps.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel