* Josh Poimboeuf <jpoim...@redhat.com> wrote:

> Anyway, I used some linker magic to temporarily move the unwinder code to the 
> end of .text, so that unwinder changes don't add unexpected side effects to 
> the 
> microbenchmark behavior.  Now I'm getting more consistent results: the packed 
> struct is measuring ~2% slower.  The slight slowdown might just be explained 
> by 
> the fact that GCC generates some extra instructions for extracting the fields 
> out of the packed struct.

Yeah, the 16-bit field accesses versus a zero-extended 32-bit field are more 
complex to access even on x86 that has a fair amount of 16-bit legacy.

> In the meantime, I found a ~10% speedup by making the "fast lookup table" 
> block 
> size a power-of-two (256) to get rid of the need for a slow 'div' instruction.
> 
> I think I'm done performance tweaking for now.  I'll keep the packed struct, 
> and 
> add the code for the 'div' removal, and hope to submit v3 soon.

Sounds good to me!

~2% slowdown for ~30% RAM savings for a debug data structure that is about as 
large as a typical kernel's total .text is a decent trade-off.

Thanks,

        Ingo

Reply via email to