Nicolas Neuss <[EMAIL PROTECTED]> writes:
> Raymond Toy <[EMAIL PROTECTED]> writes:
>
>> Recompile ir1opt.lisp with :block-compile nil and reload ir1opt. You
>> will be able to see ir1-optimize-block now. (It was block-compiled
>> away.)
>>
>> Ray
>
> OK, did that. Profiling yields that most of the time is spent in
> C::IR1-OPTIMIZE-COMBINATION and (called from there) C::IR1-TRANSFORM.
> However, O(n^2) in these routines is not obvious to me.
>
> For now, I have to stop here. Maybe I'll look at it again later this week.
>
> Yours, Nicolas.
I did some further tests now. As much as I see the problem is that both
C::IR1-OPTIMIZE and C::IR1-OPTIMIZE-BLOCK contain a loop over all
successive blocks which results in O(N^2).
One can test this easily with the following simple setup:
;;; Compile cmucl/compiler/ir1opt.lisp with :block-compile nil and load it.
(defun test-code (n)
`(lambda (x y)
,@(loop for i below n collect
`(setf (aref x ,i) (aref y ,i)))))
(profile:unprofile)
(profile:profile c::ir1-phases c::ir1-optimize-until-done c::ir1-optimize
c::ir1-optimize-block c::ir1-optimize-combination)
(loop repeat 4
for n = 50 then (* 2 n) do
(profile:reset-time)
(compile nil (test-code n))
(profile:report-time))
I have attached below the output for my machine. Is there some compiler
expert who could elaborate a bit on this and propose a remedy?
Thanks (and have a nice weekend),
Nicolas.
; Compiling LAMBDA (X Y):
; Compiling Top-Level Form:
Consed | Calls | Secs | Sec/Call | Bytes/C. | Name:
-----------------------------------------------------------------------
7,210,728 | 10,168 | 0.310 | 0.00003 | 709 |
C::IR1-OPTIMIZE-COMBINATION
82,000 | 453 | 0.089 | 0.00020 | 181 |
C::IR1-OPTIMIZE-BLOCK
10,328 | 27 | 0.040 | 0.00148 | 383 | C::IR1-OPTIMIZE
544 | 5 | 0.000 | 0.00000 | 109 |
C::IR1-OPTIMIZE-UNTIL-DONE
168,840 | 2 | 0.000 | 0.00000 | 84,420 | C::IR1-PHASES
-------------------------------------------------------------------
7,472,440 | 10,655 | 0.439 | | | Total
; Compiling LAMBDA (X Y):
; Compiling Top-Level Form:
Consed | Calls | Secs | Sec/Call | Bytes/C. | Name:
-----------------------------------------------------------------------
23,305,488 | 35,318 | 1.479 | 0.00004 | 660 |
C::IR1-OPTIMIZE-COMBINATION
283,200 | 853 | 0.168 | 0.00020 | 332 |
C::IR1-OPTIMIZE-BLOCK
19,928 | 27 | 0.090 | 0.00333 | 738 | C::IR1-OPTIMIZE
314,936 | 2 | 0.040 | 0.02000 | 157,468 | C::IR1-PHASES
544 | 5 | 0.000 | 0.00000 | 109 |
C::IR1-OPTIMIZE-UNTIL-DONE
-------------------------------------------------------------------
23,924,096 | 36,205 | 1.778 | | | Total
; Compiling LAMBDA (X Y):
; Compiling Top-Level Form:
Consed | Calls | Secs | Sec/Call | Bytes/C. | Name:
-----------------------------------------------------------------------
82,135,232 | 130,618 | 5.069 | 0.00004 | 629 |
C::IR1-OPTIMIZE-COMBINATION
1,045,600 | 1,653 | 1.117 | 0.00068 | 633 |
C::IR1-OPTIMIZE-BLOCK
39,128 | 27 | 0.800 | 0.02963 | 1,449 | C::IR1-OPTIMIZE
613,288 | 2 | 0.040 | 0.02000 | 306,644 | C::IR1-PHASES
544 | 5 | 0.000 | 0.00000 | 109 |
C::IR1-OPTIMIZE-UNTIL-DONE
-------------------------------------------------------------------
83,833,792 | 132,305 | 7.025 | | | Total
; Compiling LAMBDA (X Y):
; Compiling Top-Level Form:
Consed | Calls | Secs | Sec/Call | Bytes/C. | Name:
-----------------------------------------------------------------------
306,327,000 | 501,218 | 18.998 | 0.00004 | 611 |
C::IR1-OPTIMIZE-COMBINATION
77,528 | 27 | 5.800 | 0.21481 | 2,871 | C::IR1-OPTIMIZE
4,010,400 | 3,253 | 4.593 | 0.00141 | 1,233 |
C::IR1-OPTIMIZE-BLOCK
1,190,952 | 2 | 0.070 | 0.03500 | 595,476 | C::IR1-PHASES
544 | 5 | 0.000 | 0.00000 | 109 |
C::IR1-OPTIMIZE-UNTIL-DONE
-------------------------------------------------------------------
311,606,424 | 504,505 | 29.461 | | | Total