Find attached the already announced version of CompareByte.
BTW: If you really like to see a PCMPSTR based implementation, have a
look at Agner Fog's Subroutine library asmlib.zip
(http://agner.org/optimize/).
On 16.10.2017 23:08, Markus Beth wrote:
On 16.10.2017 22:41, Florian Klämpfl wrote:
P.S.: I am currently working on another version of CompareByte that
might have a slightly higher
latency for very small len but a higher throughput (2 cycles per
iteration vs. 3 cycles on an Intel
Arrandale CPU (Westmere microarchitecture)). But this would need some
more testing and benchmarking.
I can come up with it here again if this would be of any interest.
Small lengths in terms of matching string or overall lengths?
It is small length in terms of matching string as there is some setup
work before the loop.
BTW: I would really like to see a PCMPSTR based implementation :)
PCMPSTR is (at the moment) out of my scope. I thought PCMPSTR is part of
SSE4.2. How would you deal with Intel core microarchitecture CPUs that
don't have it?
Index: trunk/rtl/x86_64/x86_64.inc
===================================================================
--- trunk/rtl/x86_64/x86_64.inc (Revision 37497)
+++ trunk/rtl/x86_64/x86_64.inc (Arbeitskopie)
@@ -640,27 +640,36 @@
mov %rsi, %rdx
mov %rdi, %rcx
{$endif win64}
- testq %r8,%r8
- je .LCmpbyteZero
+ negq %r8
+ jz .LCmpbyteZero
+ subq %r8, %rcx
+ subq %r8, %rdx
+
.balign 8
.LCmpbyteLoop:
- movb (%rcx),%r9b
- cmpb (%rdx),%r9b
- leaq 1(%rcx),%rcx
- leaq 1(%rdx),%rdx
+{$ifdef oldbinutils}
+// for the reason why this alternate coding of movzbl is given here
+// see the comments in FillChar above
+ .byte 0x42,0x0F,0xB6,0x04,0x01
+{$else}
+ movzbl (%rcx,%r8), %eax
+{$endif}
+ cmpb (%rdx,%r8), %al
jne .LCmpbyteExitFast
- decq %r8
+ addq $1, %r8
jne .LCmpbyteLoop
+.LCmpbyteZero:
+ xorl %eax, %eax
+ retq
+
.LCmpbyteExitFast:
- movzbq -1(%rdx),%r8 { Compare last position }
- movzbq %r9b,%rax
- subq %r8,%rax
- ret
-
-.LCmpbyteZero:
- movq $0,%rax
- ret
+{$ifdef oldbinutils}
+ .byte 0x42,0x0F,0xB6,0x0C,0x02
+{$else}
+ movzbl (%rdx,%r8), %ecx { Compare last position }
+{$endif}
+ subq %rcx, %rax
end;
{$endif FPC_SYSTEM_HAS_COMPAREBYTE}
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel