@Marco: havent played with popcnt => it could benefit from the "const to var" too.

So I played around a bit...

Of course, all this is intel only....

1)
var
  Mask8, Mask1: qword;
....
  Mask8 := EIGHTYMASK;
  Mask1 := ONEMASK;

And the constant no longer is assigned inside the loop.
Also makes the loop  shorter.

=> improves speed

2)
  //for i := 1 to (ByteCount-cnt) div sizeof(PtrInt) do
  //for i := (ByteCount-cnt) div sizeof(PtrInt) - 1 downto 0 do
  i := (ByteCount-cnt) div sizeof(PtrInt) ;
  repeat
    ....
  dec(i);
  until i = 0;


The orig:
  for i := 1 to (ByteCount-cnt) div sizeof(PtrInt) do
// r9 is reserved to hold the upper bound
.Lj26:
    addq    $1,%r10
...
    cmpq    %r10,%r9
    jnle    .Lj26

Since the counter var "i" is not accessed in the loop, its value does not matter.
So
  for i := (ByteCount-cnt) div sizeof(PtrInt) - 1 downto 0 do
// no longer needs to store an upper bound, but still has an extra "test" since the "subq" is at the start of the loop
.Lj26:
    subq    $1,%r10
...
    testq    %r10,%r10
    jnle    .Lj26

// "repeat " , and there no longer is a "test"
And that again reduced the loop size.
And apparently just below a critical point, as time get a little better again

-------------
WITH the constants moved to var:
orig for : 547
downto:547
repeat: 516

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to