Sorry, I got careless and was in a rush, as both the Pascal code is
wrong and I didn't store the result of the benchmark test, hence the
error check at the end returned a false negative.
The benchmark code was from Rika's SHA-1 test code, which I didn't
properly check, although I assumed the logic was to avoid counting the
time of the internal loop as much as possible. I should have gone with
my gut instinct and realised that wasn't the best method.
I've attached the updated test (now called "blea" as it's a benchmark
test) with your suggestions implemented, and an improved benchmarking
system. I'm not used to specifying parameters in place of registers -
I'm too used to needing total control!
Your results from experiments with adding additional ADD instructions is
expected, as LEA uses an AGU for computation, leaving the ALUs free for
other tasks (like ADD), so LEA is better even if speed is equal.
Kit
On 08/10/2023 11:06, Marģers . via fpc-devel wrote:
1. why you leave "time:=..." in benchmark loop? It does add 50% of
execution time per call.
2. Pascal version does not match assembler version. Had to fix it.
//Result := X + Counter + $87654321;
Result:=Result + X + $87654321;
Result:=Result xor y;
3. Assembler functions can be unified to work under win64,win32, linux
64, linux 32
function Checksum_LEA(const Input, X, Y: LongWord): LongWord;
assembler; nostackframe;
asm
@Loop2:
LEA Input, [Input + X + $87654321]
XOR Input, y
DEC y
JNZ @Loop2
MOV EAX, Input
end;
4. My results. Ryzen 2700x
Pascal control case: 0.7 ns/call 0.0710
Using LEA instruction: 0.7 ns/call 0.0700
Using ADD instructions: 0.7 ns/call 0.0710
Even thou results are equal, i was able to add 4 independent ADD
instructions around LEA while results didn't chance, but only 2 around
ADD.
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
{ %CPU=i386,x86_64 }
program blea;
{$IF not defined(CPUX86) and not defined(CPUX86_64)}
{$FATAL This test program requires an Intel x86 or x64 processor }
{$ENDIF}
{$MODE OBJFPC}
{$ASMMODE Intel}
uses
SysUtils;
type
TBenchmarkProc = function(const Input, X, Y: LongWord): LongWord;
function Checksum_PAS(const Input, X, Y: LongWord): LongWord;
var
Counter: LongWord;
begin
Result := Input;
Counter := Y;
while (Counter > 0) do
begin
Result := Result + X + $87654321;
Result := Result xor Counter;
Dec(Counter);
end;
end;
function Checksum_ADD(const Input, X, Y: LongWord): LongWord; assembler;
nostackframe;
asm
@Loop1:
ADD Input, $87654321
ADD Input, X
XOR Input, Y
DEC Y
JNZ @Loop1
MOV Result, Input
end;
function Checksum_LEA(const Input, X, Y: LongWord): LongWord; assembler;
nostackframe;
asm
@Loop2:
LEA Input, [Input + X + $87654321]
XOR Input, Y
DEC Y
JNZ @Loop2
MOV EAX, ECX
end;
function Benchmark(const name: string; proc: TBenchmarkProc; Z, X: LongWord):
LongWord;
const
internal_reps = 1000;
var
start: TDateTime;
time: double;
reps: cardinal;
begin
Result := Z;
reps := 0;
start := Now;
repeat
inc(reps);
Result := proc(Result, X, internal_reps);
until (reps >= 10000);
time := ((Now - start) * SecsPerDay) / reps / internal_reps * 1e9;
writeln(name, ': ', time:0:ord(time < 10), ' ns/call');
end;
var
Results: array[0..2] of LongWord;
FailureCode: Integer;
begin
Results[0] := Benchmark(' Pascal control case', @Checksum_PAS, 5000000,
1000);
Results[1] := Benchmark(' Using LEA instruction', @Checksum_LEA, 5000000,
1000);
Results[2] := Benchmark('Using ADD instructions', @Checksum_ADD, 5000000,
1000);
FailureCode := 0;
if (Results[0] <> Results[1]) then
begin
WriteLn('ERROR: Checksum_LEA doesn''t match control case');
FailureCode := FailureCode or 1;
end;
if (Results[0] <> Results[2]) then
begin
WriteLn('ERROR: Checksum_ADD doesn''t match control case');
FailureCode := FailureCode or 2
end;
if FailureCode <> 0 then
Halt(FailureCode);
end.
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel