Re: Generating optimized PIR?
Dan Sugalski wrote: Oh, yeah, lots of spilling. In a hacked up version of the PIR I see 50 spills in the main routine. Brrr. It would definitely help, if you can use lexicals or gloabals, so that you don't have that much long-ranged variables - and of course splitting the beast if possible. leo
Re: Generating optimized PIR?
At 12:30 PM +0100 1/6/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: Optimized for speed, at least. I'm finding that large subs seem to give imcc headaches. I'm not sure if it's O(n^2) or O(2^n) headaches, but definitely issues. Live analysis and register allocation are the main problems. The former is O(n_lines * n_vars). The latter goes horribly slow in the case of spilling. Please run your program with the "-v" switch, you should see some statistics including spilled variables count. Oh, yeah, lots of spilling. In a hacked up version of the PIR I see 50 spills in the main routine. Not a good thing, I think, and it takes forever even with Parrot built with optimizations. (Which does make quite a difference, though not enough) -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Generating optimized PIR?
Dan Sugalski <[EMAIL PROTECTED]> wrote: > Optimized for speed, at least. > I'm finding that large subs seem to give imcc headaches. I'm not sure > if it's O(n^2) or O(2^n) headaches, but definitely issues. Live analysis and register allocation are the main problems. The former is O(n_lines * n_vars). The latter goes horribly slow in the case of spilling. Please run your program with the "-v" switch, you should see some statistics including spilled variables count. leo
Re: Generating optimized PIR?
At 10:59 AM 1/5/2004 -0500, Dan Sugalski wrote: Optimized for speed, at least. I'm finding that large subs seem to give imcc headaches. I'm not sure if it's O(n^2) or O(2^n) headaches, but definitely issues. At the moment I've got parrot churning on some pir code and it's taking quite a while. Final time tally: real41m46.978s user21m24.300s sys 0m41.080s Ack. I expected this would happen. Most probably it is register-coloring/spilling. I'm a little rusty on the register colorer; I do know the first version I wrote was not very fast for large numbers of registers, but I believe Leo had improved on it a bit. I'd really like to see the piece of code so I can do some profiling. Ended with a missing symbol error, of course, just to rub it in a bit. vm_stat reports a lot of zero-fill page allocation (about 1600 4K pages/sec) though the memory footprint isn't growing to match, so that might indicate at least something of the problem. (For all I know there's some sort of degenerate GC issue somewhere, depending on how imcc does its memory allocation and management) That could be IMCC repeatedly allocating/freeing its interference graphs for each basic block, but I'm not positive. I know IMCC's being redone, and we're nowhere near close to optimized, but I think it'd be worth it to get a handle on what sorts of things are likely to trigger off exponential time compiles when fed to IMCC. I'm assuming that it's due to a combination of sub size (about 61K of source in one sub) and locals needing coloring (132) but it'd be nice to know for sure. There are several tests I can think of that we should include in IMCC. 1) A large number of locals with a very long code segment, without branches or labels. This would tests large graph coloring without lots of basic blocks. 2) A large number of locals _with_ the normal amount of branches and labels. This would test the life analysis on a large number of basic blocks. 3) A small number of locals with variants of 1 & 2 above for branching/labels. Any chance of getting the code sample? -Melvin
Re: Generating optimized PIR?
On Mon, 5 Jan 2004 10:59:18 -0500, Dan Sugalski <[EMAIL PROTECTED]> wrote: >I know IMCC's being redone, and we're nowhere near close to >optimized, That was my guess > but I think it'd be worth it to get a handle on what sorts >of things are likely to trigger off exponential time compiles when >fed to IMCC. I'm assuming that it's due to a combination of sub size >(about 61K of source in one sub) and locals needing coloring (132) >but it'd be nice to know for sure. My experience was as follows (don't take these times too literally since this is a very old, very slow box; the relative times are what count). This was creating a single .sub: compile source and write a 2,500 line imc file: 0.2 seconds The first line of the imc file included a hand-crafted 1000 line file. parrot -o t.pasm t.imc: 7 seconds load the final pasm file (now 3,500 lines) and run it: 0.3 seconds Editing the hand-crafted 1000-line include file to replace symbolic registers (ie $I1 etc) with real registers (I1) cut imc time down to 3.5 seconds. Changing the code emitter to re-use symbolic registers no longer in use (ie not local variables, and not still on the parse stack) cut the time down to 1.39 seconds. Just letting you know what I found, I shall let you draw your own conclusions. Lastly, while I know that -O2 is not complete, I don't know by how much it is incomplete. You may want to check the times on that too. Regards, Pete
Generating optimized PIR?
Optimized for speed, at least. I'm finding that large subs seem to give imcc headaches. I'm not sure if it's O(n^2) or O(2^n) headaches, but definitely issues. At the moment I've got parrot churning on some pir code and it's taking quite a while. Final time tally: real41m46.978s user21m24.300s sys 0m41.080s Ended with a missing symbol error, of course, just to rub it in a bit. vm_stat reports a lot of zero-fill page allocation (about 1600 4K pages/sec) though the memory footprint isn't growing to match, so that might indicate at least something of the problem. (For all I know there's some sort of degenerate GC issue somewhere, depending on how imcc does its memory allocation and management) I know IMCC's being redone, and we're nowhere near close to optimized, but I think it'd be worth it to get a handle on what sorts of things are likely to trigger off exponential time compiles when fed to IMCC. I'm assuming that it's due to a combination of sub size (about 61K of source in one sub) and locals needing coloring (132) but it'd be nice to know for sure. For me, reducing the size of the sub's untenable, but I can switch from using locals to using pad or global variables that get fetched as need be if that means less compile time. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk