On 24/01/2014 00:17, August Oktobar wrote:
"2) Reporter's assumption about fstp is wrong: the first fstp
instruction removes value from fpu stack, so it cannot be used for the
second time without first reloading value onto stack."
Compiler should reuse loaded value (a[i]) and store to a[i] using
fstl, then fstpl to a[i+1]
That is why Sergei wrote "typical common subexpression elimination". I
am sure it is a todo for the fpc team.
Also In this case optimizing this to be re-used is a small (smaller)
gain. You still have plenty of statements that recalculate the address
of an array element, using multiplication.
Introducing a temporary pointer to a[i], and using addition in each run
of the loop to increment it, will gain a lot more. (Again, I am sure it
is a todo).
Until that is done, your best choice, if you need the speed is to do
this by hand:
if cnt = 0 then exit;
tmpptrA := @a[0];
tmpptrB := @b[0];
for i := 0 to cnt - 1 do
begin
tmpptrA^ := tmpptrA^ + tmpptrB^;
tmpptrA2 := tmpptrA^;
inc(tmpptrA); // assuming a typed pointer
tmpptrA^ := tmpptrA2^;
inc(tmpptrB); // assuming a typed pointer
end;
or better
if cnt = 0 then exit;
tmpptrA := @a[0];
tmpptrB := @b[0];
for i := 0 to cnt - 1 do
begin
tmpVAlue := tmpptrA^ + tmpptrB^;
tmpptrA^ := tmpVAlue;
inc(tmpptrA); // assuming a typed pointer
tmpptrA^ := tmpVAlue;
inc(tmpptrB); // assuming a typed pointer
end;
It looses readability, so keep the good code as comment.
There is a bigger example, where exactly that was done, because FPCs
optimization was not sufficient enough for what the author wanted.
http://bugs.freepascal.org/view.php?id=10275
On Fri, Jan 24, 2014 at 12:26 AM, Sergei Gorelkin
<sergei_gorel...@mail.ru <mailto:sergei_gorel...@mail.ru>> wrote:
1) You are right that it's not the job for peephole analyzer, it
is typical common subexpression elimination.
2) Reporter's assumption about fstp is wrong: the first fstp
instruction removes value from fpu stack, so it cannot be used for
the second time without first reloading value onto stack.
3) The assignments of floating-point values are currently being
generated using integer instructions, hence the subsequent code.
This way it doesn't depend on number of available FPU registers,
which is hard to know at any point.
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel