Find attached the already announced version of CompareByte.

BTW: If you really like to see a PCMPSTR based implementation, have a
look at Agner Fog's Subroutine library asmlib.zip
(http://agner.org/optimize/).


On 16.10.2017 23:08, Markus Beth wrote:
On 16.10.2017 22:41, Florian Klämpfl wrote:
P.S.: I am currently working on another version of CompareByte that might have a slightly higher latency for very small len but a higher throughput (2 cycles per iteration vs. 3 cycles on an Intel Arrandale CPU (Westmere microarchitecture)). But this would need some more testing and benchmarking.
I can come up with it here again if this would be of any interest.

Small lengths in terms of matching string or overall lengths?

It is small length in terms of matching string as there is some setup work before the loop.

BTW: I would really like to see a PCMPSTR based implementation :)
PCMPSTR is (at the moment) out of my scope. I thought PCMPSTR is part of SSE4.2. How would you deal with Intel core microarchitecture CPUs that don't have it?
Index: trunk/rtl/x86_64/x86_64.inc
===================================================================
--- trunk/rtl/x86_64/x86_64.inc	(Revision 37497)
+++ trunk/rtl/x86_64/x86_64.inc	(Arbeitskopie)
@@ -640,27 +640,36 @@
     mov    %rsi, %rdx
     mov    %rdi, %rcx
 {$endif win64}
-    testq   %r8,%r8
-    je      .LCmpbyteZero
+    negq    %r8
+    jz      .LCmpbyteZero
 
+    subq    %r8, %rcx
+    subq    %r8, %rdx
+
     .balign 8
 .LCmpbyteLoop:
-    movb    (%rcx),%r9b
-    cmpb    (%rdx),%r9b
-    leaq    1(%rcx),%rcx
-    leaq    1(%rdx),%rdx
+{$ifdef oldbinutils}
+// for the reason why this alternate coding of movzbl is given here
+// see the comments in FillChar above
+    .byte 0x42,0x0F,0xB6,0x04,0x01
+{$else}
+    movzbl  (%rcx,%r8), %eax
+{$endif}
+    cmpb    (%rdx,%r8), %al
     jne     .LCmpbyteExitFast
-    decq    %r8
+    addq    $1, %r8
     jne     .LCmpbyteLoop
+.LCmpbyteZero:
+     xorl    %eax, %eax
+     retq
+
 .LCmpbyteExitFast:
-     movzbq  -1(%rdx),%r8     { Compare last position }
-     movzbq  %r9b,%rax
-     subq    %r8,%rax
-     ret
-
-.LCmpbyteZero:
-     movq    $0,%rax
-     ret
+{$ifdef oldbinutils}
+    .byte 0x42,0x0F,0xB6,0x0C,0x02
+{$else}
+     movzbl  (%rdx,%r8), %ecx    { Compare last position }
+{$endif}
+     subq    %rcx, %rax
 end;
 {$endif FPC_SYSTEM_HAS_COMPAREBYTE}
 
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to