Sorry, ignore last attachment - I forgot to change a line of assembly (it was correct for x86_64-win64!!). Here is the corrected version.

Kit

On 08/10/2023 12:38, J. Gareth Moreton via fpc-devel wrote:
Sorry, I got careless and was in a rush, as both the Pascal code is wrong and I didn't store the result of the benchmark test, hence the error check at the end returned a false negative.

The benchmark code was from Rika's SHA-1 test code, which I didn't properly check, although I assumed the logic was to avoid counting the time of the internal loop as much as possible.  I should have gone with my gut instinct and realised that wasn't the best method.

I've attached the updated test (now called "blea" as it's a benchmark test) with your suggestions implemented, and an improved benchmarking system.  I'm not used to specifying parameters in place of registers - I'm too used to needing total control!

Your results from experiments with adding additional ADD instructions is expected, as LEA uses an AGU for computation, leaving the ALUs free for other tasks (like ADD), so LEA is better even if speed is equal.

Kit

On 08/10/2023 11:06, Marģers . via fpc-devel wrote:
1. why you leave "time:=..." in benchmark loop? It does add 50% of execution time per call.
2. Pascal version does not match assembler version. Had to fix it.
      //Result := X + Counter + $87654321;
      Result:=Result + X + $87654321;
      Result:=Result xor y;
3. Assembler functions can be unified to work under win64,win32, linux 64, linux 32 function Checksum_LEA(const Input, X, Y: LongWord): LongWord; assembler; nostackframe;
asm
@Loop2:
  LEA Input, [Input + X + $87654321]
  XOR Input, y
  DEC y
  JNZ @Loop2
  MOV EAX, Input
end;

4. My results. Ryzen 2700x

   Pascal control case: 0.7 ns/call  0.0710
 Using LEA instruction: 0.7 ns/call  0.0700
Using ADD instructions: 0.7 ns/call  0.0710

Even thou results are equal, i was able to add 4 independent ADD instructions around LEA while results didn't chance, but only 2 around ADD.

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
{ %CPU=i386,x86_64 }
program blea;

{$IF not defined(CPUX86) and not defined(CPUX86_64)}
  {$FATAL This test program requires an Intel x86 or x64 processor }
{$ENDIF}

{$MODE OBJFPC}
{$ASMMODE Intel}

uses
  SysUtils;
  
type
  TBenchmarkProc = function(const Input, X, Y: LongWord): LongWord;
 

function Checksum_PAS(const Input, X, Y: LongWord): LongWord;
var
  Counter: LongWord;
begin
  Result := Input;
  Counter := Y;
  while (Counter > 0) do
    begin
      Result := Result + X + $87654321;
      Result := Result xor Counter;
      Dec(Counter);
    end;
end;

function Checksum_ADD(const Input, X, Y: LongWord): LongWord; assembler; 
nostackframe;
asm
@Loop1:
  ADD Input, $87654321
  ADD Input, X
  XOR Input, Y
  DEC Y
  JNZ @Loop1
  MOV Result, Input
end;

function Checksum_LEA(const Input, X, Y: LongWord): LongWord; assembler; 
nostackframe;
asm
@Loop2:
  LEA Input, [Input + X + $87654321]
  XOR Input, Y
  DEC Y
  JNZ @Loop2
  MOV Result, Input
end;

function Benchmark(const name: string; proc: TBenchmarkProc; Z, X: LongWord): 
LongWord;
const
  internal_reps = 1000;
var
  start: TDateTime;
  time: double;
  reps: cardinal;
begin
  Result := Z;
  reps := 0;
  start := Now;
  repeat
    inc(reps);
    Result := proc(Result, X, internal_reps);
  until (reps >= 10000);
  time := ((Now - start) * SecsPerDay) / reps / internal_reps * 1e9;
  writeln(name, ': ', time:0:ord(time < 10), ' ns/call');
end;

var
  Results: array[0..2] of LongWord;
  FailureCode: Integer;
begin
  Results[0] := Benchmark('   Pascal control case', @Checksum_PAS, 5000000, 
1000);
  Results[1] := Benchmark(' Using LEA instruction', @Checksum_LEA, 5000000, 
1000);
  Results[2] := Benchmark('Using ADD instructions', @Checksum_ADD, 5000000, 
1000);
  
  FailureCode := 0;

  if (Results[0] <> Results[1]) then
    begin
      WriteLn('ERROR: Checksum_LEA doesn''t match control case');
      FailureCode := FailureCode or 1;
    end;
  if (Results[0] <> Results[2]) then
    begin
      WriteLn('ERROR: Checksum_ADD doesn''t match control case');
      FailureCode := FailureCode or 2
    end;
    
  if FailureCode <> 0 then
    Halt(FailureCode);
end.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to