On 24/01/2014 00:17, August Oktobar wrote:
"2) Reporter's assumption about fstp is wrong: the first fstp instruction removes value from fpu stack, so it cannot be used for the second time without first reloading value onto stack."

Compiler should reuse loaded value (a[i]) and store to a[i] using fstl, then fstpl to a[i+1]

That is why Sergei wrote "typical common subexpression elimination". I am sure it is a todo for the fpc team.

Also In this case optimizing this to be re-used is a small (smaller) gain. You still have plenty of statements that recalculate the address of an array element, using multiplication.

Introducing a temporary pointer to a[i], and using addition in each run of the loop to increment it, will gain a lot more. (Again, I am sure it is a todo).

Until that is done, your best choice, if you need the speed is to do this by hand:

if cnt = 0 then exit;
tmpptrA := @a[0];
tmpptrB := @b[0];
for i := 0 to cnt - 1 do
    begin
      tmpptrA^ := tmpptrA^ + tmpptrB^;
      tmpptrA2 := tmpptrA^;
      inc(tmpptrA); // assuming a typed pointer
      tmpptrA^ := tmpptrA2^;
      inc(tmpptrB); // assuming a typed pointer
    end;

or better
if cnt = 0 then exit;
tmpptrA := @a[0];
tmpptrB := @b[0];
for i := 0 to cnt - 1 do
    begin
      tmpVAlue := tmpptrA^ + tmpptrB^;
      tmpptrA^ := tmpVAlue;
      inc(tmpptrA); // assuming a typed pointer
      tmpptrA^ := tmpVAlue;
      inc(tmpptrB); // assuming a typed pointer
    end;

It looses readability, so keep the good code as comment.

There is a bigger example, where exactly that was done, because FPCs optimization was not sufficient enough for what the author wanted.
http://bugs.freepascal.org/view.php?id=10275




On Fri, Jan 24, 2014 at 12:26 AM, Sergei Gorelkin <sergei_gorel...@mail.ru <mailto:sergei_gorel...@mail.ru>> wrote:


    1) You are right that it's not the job for peephole analyzer, it
    is typical common subexpression elimination.
    2) Reporter's assumption about fstp is wrong: the first fstp
    instruction removes value from fpu stack, so it cannot be used for
    the second time without first reloading value onto stack.
    3) The assignments of floating-point values are currently being
    generated using integer instructions, hence the subsequent code.
    This way it doesn't depend on number of available FPU registers,
    which is hard to know at any point.


_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to