Hi and thanks for the feedback, indeed the performance will be affected
(greatly) as you demonstrate that s for sure. I completely forgot the
context of my tests those days and regarding your following questions about
compilers ; I might just have CFLAGS environment variable override whereas
I did not notice. So if you ignore this patchset, I wouldn't be affected :-)

Kind regards.

On 16 February 2016 at 16:33, Willy TARREAU <wtarr...@haproxy.com> wrote:

> Hi David,
>
> On Wed, Jan 06, 2016 at 01:23:52PM +0000, David CARLIER wrote:
> > Sure, it is mainly gcc 5.2/5.3, sometimes clang 3.6 depending the
> > machine I was working on.
>
> I'm back (late) on this patch series.
>
> So at this point I'm seeing that the memmem() and ebmb_lookup() functions
> are significantly affected by the change, even more on low register count
> machines like 32-bit x86 where this adds extra stack manipulation because
> the compiler has no way to know that the pointers are the same.
>
> In the case of memmem(), the extra register usage changed the code to the
> point of adding 4 local variables to the stack and causing pointers to be
> read from the stack, modified and stored back onto the stack, as you can
> see in the "diff -y" output below showing the beginning of the function,
> the patched one is on the left and the original one on the right. As you
> can see, stack offsets 0x40, 0x44, 0x48 and 0x4c which were previously
> not used are now used to store copies of general purpose registers, which
> will definitely impact the function's efficiency :
>
> 000022e0 <my_memmem>:                                           000022e0
> <my_memmem>:
>     22e0:       55                      push   %ebp                 22e0:
>      55                      push   %ebp
>     22e1:       57                      push   %edi                 22e1:
>      57                      push   %edi
>     22e2:       56                      push   %esi                 22e2:
>      56                      push   %esi
>     22e3:       53                      push   %ebx                 22e3:
>      53                      push   %ebx
>     22e4:       83 ec 2c                sub    $0x2c,%esp     |     22e4:
>      83 ec 1c                sub    $0x1c,%esp
>     22e7:       8b 4c 24 48             mov    0x48(%esp),%ec |     22e7:
>      8b 7c 24 38             mov    0x38(%esp),%ed
>     22eb:       8b 7c 24 4c             mov    0x4c(%esp),%ed |     22eb:
>      8b 74 24 3c             mov    0x3c(%esp),%es
>     22ef:       85 c9                   test   %ecx,%ecx      |     22ef:
>      85 ff                   test   %edi,%edi
>     22f1:       74 68                   je     235b <my_memme |     22f1:
>      74 75                   je     2368 <my_memme
>     22f3:       8b 54 24 40             mov    0x40(%esp),%ed |     22f3:
>      8b 4c 24 30             mov    0x30(%esp),%ec
>     22f7:       85 d2                   test   %edx,%edx      |     22f7:
>      85 c9                   test   %ecx,%ecx
>     22f9:       74 60                   je     235b <my_memme |     22f9:
>      74 6d                   je     2368 <my_memme
>     22fb:       39 7c 24 44             cmp    %edi,0x44(%esp |     22fb:
>      39 74 24 34             cmp    %esi,0x34(%esp
>     22ff:       72 5a                   jb     235b <my_memme |     22ff:
>      72 67                   jb     2368 <my_memme
>     2301:       8b 44 24 48             mov    0x48(%esp),%ea |     2301:
>      8a 07                   mov    (%edi),%al
>     2305:       31 d2                   xor    %edx,%edx      |     2303:
>      8b 54 24 30             mov    0x30(%esp),%ed
>     2307:       8b 5c 24 40             mov    0x40(%esp),%eb |     2307:
>      25 ff 00 00 00          and    $0xff,%eax
>     230b:       31 ed                   xor    %ebp,%ebp      |     230c:
>      89 c5                   mov    %eax,%ebp
>     230d:       8a 10                   mov    (%eax),%dl     |     230e:
>      eb 27                   jmp    2337 <my_memme
>     230f:       89 54 24 1c             mov    %edx,0x1c(%esp |     2310:
>      8b 44 24 30             mov    0x30(%esp),%ea
>     2313:       eb 20                   jmp    2335 <my_memme |     2314:
>      8b 54 24 34             mov    0x34(%esp),%ed
>     2315:       8d 76 00                lea    0x0(%esi),%esi |     2318:
>      29 d8                   sub    %ebx,%eax
>
> Regarding ebmb_lookup(), it's even worse because there's one load and
> one store on the stack in the fast path for each bit in the tree descent
> while it used to rely exclusively on registers even on such a machine.
> I have not run any benchmarks but it's possible that the change is even
> measurable on geolocation maps and stick-tables.
>
> So I'll see what I can come up with by using only explicit type casts
> where relevant. Normally it should not matter. At worst it would disable
> strict aliasing, but we already disable it anyway.
>
> I'll keep you informed.
>
> Best regards,
> Willy
>
>

Reply via email to