Hi David, On Wed, Jan 06, 2016 at 01:23:52PM +0000, David CARLIER wrote: > Sure, it is mainly gcc 5.2/5.3, sometimes clang 3.6 depending the > machine I was working on.
I'm back (late) on this patch series. So at this point I'm seeing that the memmem() and ebmb_lookup() functions are significantly affected by the change, even more on low register count machines like 32-bit x86 where this adds extra stack manipulation because the compiler has no way to know that the pointers are the same. In the case of memmem(), the extra register usage changed the code to the point of adding 4 local variables to the stack and causing pointers to be read from the stack, modified and stored back onto the stack, as you can see in the "diff -y" output below showing the beginning of the function, the patched one is on the left and the original one on the right. As you can see, stack offsets 0x40, 0x44, 0x48 and 0x4c which were previously not used are now used to store copies of general purpose registers, which will definitely impact the function's efficiency : 000022e0 <my_memmem>: 000022e0 <my_memmem>: 22e0: 55 push %ebp 22e0: 55 push %ebp 22e1: 57 push %edi 22e1: 57 push %edi 22e2: 56 push %esi 22e2: 56 push %esi 22e3: 53 push %ebx 22e3: 53 push %ebx 22e4: 83 ec 2c sub $0x2c,%esp | 22e4: 83 ec 1c sub $0x1c,%esp 22e7: 8b 4c 24 48 mov 0x48(%esp),%ec | 22e7: 8b 7c 24 38 mov 0x38(%esp),%ed 22eb: 8b 7c 24 4c mov 0x4c(%esp),%ed | 22eb: 8b 74 24 3c mov 0x3c(%esp),%es 22ef: 85 c9 test %ecx,%ecx | 22ef: 85 ff test %edi,%edi 22f1: 74 68 je 235b <my_memme | 22f1: 74 75 je 2368 <my_memme 22f3: 8b 54 24 40 mov 0x40(%esp),%ed | 22f3: 8b 4c 24 30 mov 0x30(%esp),%ec 22f7: 85 d2 test %edx,%edx | 22f7: 85 c9 test %ecx,%ecx 22f9: 74 60 je 235b <my_memme | 22f9: 74 6d je 2368 <my_memme 22fb: 39 7c 24 44 cmp %edi,0x44(%esp | 22fb: 39 74 24 34 cmp %esi,0x34(%esp 22ff: 72 5a jb 235b <my_memme | 22ff: 72 67 jb 2368 <my_memme 2301: 8b 44 24 48 mov 0x48(%esp),%ea | 2301: 8a 07 mov (%edi),%al 2305: 31 d2 xor %edx,%edx | 2303: 8b 54 24 30 mov 0x30(%esp),%ed 2307: 8b 5c 24 40 mov 0x40(%esp),%eb | 2307: 25 ff 00 00 00 and $0xff,%eax 230b: 31 ed xor %ebp,%ebp | 230c: 89 c5 mov %eax,%ebp 230d: 8a 10 mov (%eax),%dl | 230e: eb 27 jmp 2337 <my_memme 230f: 89 54 24 1c mov %edx,0x1c(%esp | 2310: 8b 44 24 30 mov 0x30(%esp),%ea 2313: eb 20 jmp 2335 <my_memme | 2314: 8b 54 24 34 mov 0x34(%esp),%ed 2315: 8d 76 00 lea 0x0(%esi),%esi | 2318: 29 d8 sub %ebx,%eax Regarding ebmb_lookup(), it's even worse because there's one load and one store on the stack in the fast path for each bit in the tree descent while it used to rely exclusively on registers even on such a machine. I have not run any benchmarks but it's possible that the change is even measurable on geolocation maps and stick-tables. So I'll see what I can come up with by using only explicit type casts where relevant. Normally it should not matter. At worst it would disable strict aliasing, but we already disable it anyway. I'll keep you informed. Best regards, Willy