Re: Pathological Register Allocation Test Generator
At 9:24 AM -0700 10/21/04, Jeff Clites wrote: I think there'll be two types of tie--tied variables (like Perl has already), and tied namespaces (as supposedly some people really need, though I don't fully know why). But even without the above pathological case: with tied namespaces, a namespace fetch potentially has unknown overhead, and a compiler can't know if re-fetching is better than spilling. But on the other hand, maybe that's just part of the deal with tied namespaces--they may be fetched from more often than the code would imply, so the tied namespace needs to be prepared for that. I'm OK with going on record as saying that the pir code generator and optimizer make no guarantees on the number of times something is fetched out of a global namespace or lexical pad. If a language wants guarantees, it can emit absolute code and not leave it up to the register spilling algorithm to decide. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Pathological Register Allocation Test Generator
On Oct 21, 2004, at 4:13 AM, Leopold Toetsch wrote: Jeff Clites <[EMAIL PROTECTED]> wrote: On Oct 20, 2004, at 11:24 PM, Leopold Toetsch wrote: And of course, lexicals and globals already have a storage, you don't need to spill them. I'm not sure that's true. It should read: if there are lexical or global opcodes, lexicals and globals have a storage. Ah yes, true. tied namespaces and such, it may not be legitimate to re-fetch a global (ie, to fetch it multiple times, if the code appears to only fetch it once) -- one could pathologically have a global whose value appears to increase each time it's fetched, Then there's something horribly wrong with that usage of tie: not the value is refetched from the var - the var is refetched from the namespace. I think there'll be two types of tie--tied variables (like Perl has already), and tied namespaces (as supposedly some people really need, though I don't fully know why). But even without the above pathological case: with tied namespaces, a namespace fetch potentially has unknown overhead, and a compiler can't know if re-fetching is better than spilling. But on the other hand, maybe that's just part of the deal with tied namespaces--they may be fetched from more often than the code would imply, so the tied namespace needs to be prepared for that. JEff
Re: Pathological Register Allocation Test Generator
Jeff Clites <[EMAIL PROTECTED]> wrote: > On Oct 20, 2004, at 11:24 PM, Leopold Toetsch wrote: >> And of course, lexicals and globals already have a storage, you don't >> need to spill them. > I'm not sure that's true. It should read: if there are lexical or global opcodes, lexicals and globals have a storage. > ... If there's no 'eval' in scope, lexicals don't > have to live in pads--they could purely exist in registers. And if there is no introspecition and what not. > tied namespaces and such, it may not be legitimate to re-fetch a global > (ie, to fetch it multiple times, if the code appears to only fetch it > once) -- one could pathologically have a global whose value appears to > increase each time it's fetched, Then there's something horribly wrong with that usage of tie: not the value is refetched from the var - the var is refetched from the namespace. > JEff leo
Re: Pathological Register Allocation Test Generator
On Oct 20, 2004, at 11:24 PM, Leopold Toetsch wrote: Bill Coffman <[EMAIL PROTECTED]> wrote: And of course, lexicals and globals already have a storage, you don't need to spill them. I'm not sure that's true. If there's no 'eval' in scope, lexicals don't have to live in pads--they could purely exist in registers. And with tied namespaces and such, it may not be legitimate to re-fetch a global (ie, to fetch it multiple times, if the code appears to only fetch it once) -- one could pathologically have a global whose value appears to increase each time it's fetched, for instance, or you could end up with multiple round-trips to a database. ... Another thing that might be worth checking, after parrot gets out of "alpha", is if reducing or increasing the number of registers will help performance. Just a thought. The 4 x 32 is pretty good. It matches recent hardware too. But if a good register algorithm shows that 4 x 16 is enough, we can of course decrease the register number. Increasing shouldn't be necessary. Matching hardware is probably not too significant--even though the PPC has 32 int registers, we can't map to all of them in JIT (some are dedicated to holding the interpreter pointer, etc.), and we'd really need 3 x 32 hardware int registers to accommodate all we'd like (I, S, and P registers). So even currently it's a loose match. JEff
Re: Pathological Register Allocation Test Generator
Bill Coffman <[EMAIL PROTECTED]> wrote: > Leo, > Thanks for your suggestions and comments. Welcome and thanks to you for looking at that nasty piece of code ;) > On Wed, 20 Oct 2004 10:35:04 +0200, Leopold Toetsch <[EMAIL PROTECTED]> wrote: >> Some remargs WRT gen{3,4}.pl: >> 1) While these programs exhibit some worst case register layout it's >>probably not a very typical layout. > Agreed. The idea was to automate and compare to gcc. There are > real-world tests already in the parrot test suite, but because I can > generate millions of such cases, one hope is to detect errors, and to > get some kind of performance metric, even if these programs are > artificial. Ok, that's of course very reasonable. One of the problems Dan encountered is memory usage, though. Currently the interference graph is built for all four kinds of registers in one piece. By splitting the register allocation into four passes, much memory can be saved. But the memory usage can be estimated w/o using four register kinds too. >> I'd change the simulation program to use PMCs to allow for 2). Now when >> it comes to spilling, these lexicals or globals don't need a new >> storage, their live range can just get discarded and at the next usage >> of this lexical or global it just can be refetched[1]. Implementing this >> should already vastly improve the register allocation. >> >> [1] The refetching can of course be into a different Parrot register. >> Thus the usage of globals or lexicals wouldn't interfer and generate >> huge live ranges over the whole function as it currently does. > I don't quite understand this. You think I should create PMC > variables, instead of integers? Yes. As said, with PMC lexicals or globals the register allocation can be simplified. But it depends. E.g. .local pmc foo ... find_global foo, "foo" ... find_global foo, "foo" The C is an C argument. If your algorithm does register renaming, all is fine. If it doesn't, like now, all the usage of the Cs take the same register over the whole unit, which is the major PITA of the current register allocation. And of course, lexicals and globals already have a storage, you don't need to spill them. One more difference between {I,N,S} and {P} registers is the value behavior. E.g. =item B(out INT, in INT, in INT) =item B(out NUM, in NUM, in NUM) =item B(in PMC, in PMC, in PMC) ^^ That is, with {I,N,S} register ranges are cut early at each binary operation. You don't have that with PMCs. > ... It's probably a good idea to have a > mix of the four types I suppose. If you want to compare with gcc too, you could use three types: I ... int N ... double P ... V4SI (vector for sse, altivec, ...) > Once thing I can say is that the interference graph is pretty > conservatively generated. When a variable is reassigned, it can be > treated as a new variable which can reduce register pressure, but I > don't think the code is doing that yet. Yep, register renaming. That would already improve things vastly. [ metrics ] > could be helpful. That could be incorporated into the register > allocator itself (which already is, if you run with -d 0008, then grep > Spill). With much less output "-v". The "spill" number is ok, the usage count is broken, though. >> I don't know if we need some syntax to easily support lexicals or >> globals or if the register allocation can deduce that itself. But if a >> new syntax simplifies that, we put it in. > My preference is to not distinguish between various types of > variables, but to have the algorithms deal best with nodes based on > the structure of the graph(s). That's good. But as said, you don't have to spill lexicals or globals. >> For 3) there is already a separate pass in >> imcc/reg_alloc.c:allocate_non_interfering(). > Yes. This function seems to find variables whose live ranges (life > ranges) are restricted to a single basic block, and assign them reg > 28,29, or 30. > It's > usually best to color the easy stuff last. I did assign them first to reduce the size of the interference graph, which was one of the problems - "out of memory". But that is better solved by doing four passes. > ... Another > optimization to the algorithm is to incorporate a score that causes > the most frequently accessed nodes not to get spilled. ... and stuff inside inner loops. > scoring mechanism is pretty primitive right now Yep. A broad field for experiments. > ... I think it's important to > look at experimental results as these decisions are made. Yep. > With so many registers, it may be more important to focus on swapping > registers in and out between subroutine calls, rather than the actual > register allocation optimization. That's currently being addressed. Each subroutine gets a fresh set of registers. > ... Another thing that might be worth > checking, after parrot gets out of "alpha", is if reducing or > increasing the number of registers will help
Re: Pathological Register Allocation Test Generator
Leo, Thanks for your suggestions and comments. On Wed, 20 Oct 2004 10:35:04 +0200, Leopold Toetsch <[EMAIL PROTECTED]> wrote: > Some remargs WRT gen{3,4}.pl: > 1) While these programs exhibit some worst case register layout it's >probably not a very typical layout. Agreed. The idea was to automate and compare to gcc. There are real-world tests already in the parrot test suite, but because I can generate millions of such cases, one hope is to detect errors, and to get some kind of performance metric, even if these programs are artificial. To get real metrics that people would pay attention to, we might implement SPEC99, etc. > 2) RL programs have lexicals and globals access spread over the code like >in the generated gen.imc > 3) and that's intersparsed with highly local access to temporaries coming >from expression evaluations. I think I could modify the tests so that the variables have a Poisson distribution, as far as their usage distance goes (max number of lines away from first use). This would cause some of them to look more like temporary variables, and be a somewhat better simulation of real programs. > I'd change the simulation program to use PMCs to allow for 2). Now when > it comes to spilling, these lexicals or globals don't need a new > storage, their live range can just get discarded and at the next usage > of this lexical or global it just can be refetched[1]. Implementing this > should already vastly improve the register allocation. > > [1] The refetching can of course be into a different Parrot register. > Thus the usage of globals or lexicals wouldn't interfer and generate > huge live ranges over the whole function as it currently does. > I don't quite understand this. You think I should create PMC variables, instead of integers? It's probably a good idea to have a mix of the four types I suppose. Once thing I can say is that the interference graph is pretty conservatively generated. When a variable is reassigned, it can be treated as a new variable which can reduce register pressure, but I don't think the code is doing that yet. It sounds like you're talking about implementing the interference graph better. I may get into that too, but for now, I want to get the rester coloring (mapping vars to registers) working better. Also, I'd like to be able to capture the effects of changes in metrics (via scripts like those I posted) so we can see if different ideas will actually improve the situation or not. I don't even have any runtime checks yet, but creating a score to see how much was spilled, could be helpful. That could be incorporated into the register allocator itself (which already is, if you run with -d 0008, then grep Spill). > I don't know if we need some syntax to easily support lexicals or > globals or if the register allocation can deduce that itself. But if a > new syntax simplifies that, we put it in. My preference is to not distinguish between various types of variables, but to have the algorithms deal best with nodes based on the structure of the graph(s). > > For 3) there is already a separate pass in > imcc/reg_alloc.c:allocate_non_interfering(). Yes. This function seems to find variables whose live ranges (life ranges) are restricted to a single basic block, and assign them reg 28,29, or 30. My preference is that low hanging fruit like temporary vars should be assigned by the algorithm. Such variables will tend not to have much interference with others, and are thus probably easier to color. My experience with coloring, is that assigning low degree (aka arity, #neighbors) nodes to fixed colors first, will actually reduce the performance of the graph coloring algorithm. It's usually best to color the easy stuff last. The algorithm I'd like to use is one that happens to work very well for interference graphs, and it is very simple. This algorithm selects a node that interferes with the fewest other variables (lowest degree), and removes it from the interference graph, and puts it into a stack. It then recalculates the lower degrees of each neighbor, and iteratively removes all nodes from the graph in this way, pushing them onto the stack. It then, pops off the nodes from the stack and colors them in lifo order, using the first available color. Nodes that need a color higher that 30, or whatever the max. turns out to be, are spilled. The spilled nodes are always in the densest part of the graph, where something has to spill. I believe this is essentially the algorithm Briggs/Chaitin used, and some variation of it has been used in most compilers since the 70's. One exception is that certain variables want certain colors (want_regno) -- those should be colored first, as the current code does. A disadvantage of the above algorithm is that it tends to randomly select which nodes get spilled, among the densest parts. Another optimization to the algorithm is to incorporate a score that causes the most frequently accessed nodes not
Re: Pathological Register Allocation Test Generator
Bill Coffman <[EMAIL PROTECTED]> wrote: > I am currently working on a fix to the large subroutine register > allocation bug, aka, "massive spilling not yet implemented". The > problem, is that the register allocation code is complex, and I'm not > all that familiar with it, or even with working with compilers at the > coding level. If you encounter any problems, please just ask. > I am comming at the problem as a former graph theorist, That's good. > ... The first attached program (gen3.pl), Some remargs WRT gen{3,4}.pl: 1) While these programs exhibit some worst case register layout it's probably not a very typical layout. 2) RL programs have lexicals and globals access spread over the code like in the generated gen.imc 3) and that's intersparsed with highly local access to temporaries coming from expression evaluations. I'd change the simulation program to use PMCs to allow for 2). Now when it comes to spilling, these lexicals or globals don't need a new storage, their live range can just get discarded and at the next usage of this lexical or global it just can be refetched[1]. Implementing this should already vastly improve the register allocation. [1] The refetching can of course be into a different Parrot register. Thus the usage of globals or lexicals wouldn't interfer and generate huge live ranges over the whole function as it currently does. I don't know if we need some syntax to easily support lexicals or globals or if the register allocation can deduce that itself. But if a new syntax simplifies that, we put it in. For 3) there is already a separate pass in imcc/reg_alloc.c:allocate_non_interfering(). leo
Re: Pathological Register Allocation Test Generator
Hello All, This is my first post to the parrot list, but I hope that many will follow. Thanks to all of you for working so dilligently on building this wonderful new toy for all us geeks to play with! I am currently working on a fix to the large subroutine register allocation bug, aka, "massive spilling not yet implemented". The problem, is that the register allocation code is complex, and I'm not all that familiar with it, or even with working with compilers at the coding level. I am comming at the problem as a former graph theorist, who was a consultant to a compiler project (at UC Davis, in the late 90's where some of the work was applied toward what is the current Intel compiler). I talked a lot, and learned a lot, but didn't actually contribute any coding to the project. I also spent a lot of time coding graph coloring heuristics with C++. I expect to produce a patch soon, that will make register allocation better. To that end, I felt it was important to come up with some metrics. The first attached program (gen3.pl), which is based on Greg Purdy's gen-pra.pl, gathers data on compilation performance for varying numbers of variables and compares this to GCC. Mine also uses gnuplot to make pretty pictures. If everything is installed okay on your system, a graph will pop up displaying runtime results. I'm using Debian Linux, so it was pretty easy to set up. The second program, gen4.pl, is a randomized, automated, blackbox tester. It works similar to the above, but the corresponding parrot and GCC outputs are compared. And now, for the philosophical note ... One of the biggest wins of RISC over CISC was not that RISC was smaller, but that architects looked at actual code that would run on their machines. They removed an instruction from the instruction set if it wasn't used that frequently. They considered wheather to implement features in the hardware, or the compiler, and they tested each, to find the optimum balance. This approach may be helpful in building parrot as well, and it's what I've tried to do, in a very modest way, in my scripts. Thanks to Gregg Purdy for getting the ball rolling in this general direction. Thanks, Bill Coffman On Mon, 11 Oct 2004 09:54:31 -0400, Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 4:58 PM -0700 10/2/04, Gregor N. Purdy wrote: > >Dan et al. -- > > > >I made a new version of the script that creates gen.cpp and gen.imc > >(attached). You can run it like this: > > > > perl gen-pra.pl 1000 1 > > > >(for 1000 labels and 1 variables) and it will create equivalent > >gen.imc and gen.cpp files. You can test-compile them with these > >commands: > > > >g++ -c gen.cpp > > > >and > > > > imcc -o x.x gen.imc > > > >on my system, the g++ compiler does eventually finish, but the imcc > >compiler is eventually killed. > > > >Maybe this could be used to drive out the underlying problems that > >are keeping parrot from compiling Dan's really large subs? > > Mmmm, degenerate behaviour! Cool, and thanks. Should help judge how > we're doing with the nastier code. > -- > Dan > > --it's like this--- > Dan Sugalski even samurai > [EMAIL PROTECTED] have teddy bears and even >teddy bears get drunk > #!/usr/bin/perl -w use strict; die "Usage: $0 \n" unless @ARGV == 2; my ($start,$stop) = @ARGV; open COMP, ">compile.dat" or die "$!"; print COMP "#vars gcc imc\n"; for (my $vars=$start;$vars<=$stop;$vars++) { my ($total_locals) = $vars; my $total_labels = int ($total_locals / 2); my $labels_so_far = 1; my $locals_so_far = 0; open IMC, ">gen.imc"; open CPP, ">gen.cpp"; print IMC ".sub __MAIN\n\n"; printf IMC "\n_L_%d:\n", 1; print CPP "#include \n\n"; print CPP "int main(int argc, char * arg[])\n{\n"; printf CPP "\n_L_%d:\n", 1; my %action_table = qw(l label v variable a arithmetic c control p print); my @actions = (('l')x2, ('v')x4, ('a')x9, ('c')x3, ('p')x0); while ($labels_so_far < $total_labels or $locals_so_far < $total_locals) { my $action = $action_table{$actions[rand @actions]}; if ($action eq 'label' and $labels_so_far < $total_labels) { my $this_label = ++$labels_so_far; printf IMC "\n_L_%d:\n", $this_label; printf CPP "\n_L_%d:\n", $this_label; } elsif ($action eq 'variable' and $locals_so_far < $total_locals) { my $this_local = ++$locals_so_far; my $this_value = int(rand(99)) + 1; printf IMC " .local int V_%d\n", $this_local; printf IMC " V_%d = %d\n", $this_local, $this_value; printf CPP " int V_%d;\n", $this_local; printf CPP " V_%d = %d;\n", $this_local, $this_value; } elsif ($action eq 'arithmetic' and $locals_so_far > 0) { my $result = 1 + int(rand($locals_so_far)); my $arg1 = 1 + int(rand($locals_so_far)); my $arg2 = 1 + int(rand($locals_s
Re: Pathological Register Allocation Test Generator
At 4:58 PM -0700 10/2/04, Gregor N. Purdy wrote: Dan et al. -- I made a new version of the script that creates gen.cpp and gen.imc (attached). You can run it like this: perl gen-pra.pl 1000 1 (for 1000 labels and 1 variables) and it will create equivalent gen.imc and gen.cpp files. You can test-compile them with these commands: g++ -c gen.cpp and imcc -o x.x gen.imc on my system, the g++ compiler does eventually finish, but the imcc compiler is eventually killed. Maybe this could be used to drive out the underlying problems that are keeping parrot from compiling Dan's really large subs? Mmmm, degenerate behaviour! Cool, and thanks. Should help judge how we're doing with the nastier code. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Pathological Register Allocation Test Generator
Dan et al. -- I made a new version of the script that creates gen.cpp and gen.imc (attached). You can run it like this: perl gen-pra.pl 1000 1 (for 1000 labels and 1 variables) and it will create equivalent gen.imc and gen.cpp files. You can test-compile them with these commands: g++ -c gen.cpp and imcc -o x.x gen.imc on my system, the g++ compiler does eventually finish, but the imcc compiler is eventually killed. Maybe this could be used to drive out the underlying problems that are keeping parrot from compiling Dan's really large subs? Regards, -- Gregor Gregor N. Purdy wrote: Dan -- Something like this for the .imc generation? Regards, -- Gregor --- snip: gen-imc.pl -- #!/usr/bin/perl -w use strict; die "Usage: $0 \n" unless @ARGV == 2; my ($total_labels, $total_locals) = @ARGV; my $labels_so_far = 0; my $locals_so_far = 0; print ".sub __MAIN\n\n"; while ($labels_so_far < $total_labels or $locals_so_far < $total_locals) { my $action = qw(label local arithmetic control)[rand 4]; if ($action eq 'label' and $labels_so_far < $total_labels) { printf "\n_L_%d:\n", ++$labels_so_far; } elsif ($action eq 'local' and $locals_so_far < $total_locals) { printf " .local int V_%d\n", ++$locals_so_far; printf " V_%d = %d\n", $locals_so_far, int(rand()) + 1; } elsif ($action eq 'arithmetic' and $locals_so_far > 0) { my $result = 1 + rand $locals_so_far; my $arg1 = 1 + rand $locals_so_far; my $arg2 = 1 + rand $locals_so_far; my $op = qw( + - * )[rand 3]; printf " V_%d = V_%d %s V_%d\n", $result, $arg1, $op, $arg2; } elsif ($action eq 'control' and $locals_so_far > 0) { my $dest = 1 + rand $total_labels; my $arg1 = 1 + rand $locals_so_far; my $arg2 = 1 + rand $locals_so_far; my $op = qw( < <= == != >= > )[rand 6]; printf " if V_%d %s V_%d goto _L_%d\n", $arg1, $op, $arg2, $dest; } } print "\n"; print ".end\n"; print "\n"; exit 0; - snip Dan Sugalski wrote: At 6:50 AM -0700 8/30/04, Gregor N. Purdy wrote: Dan -- I figured. We could probably write a code-generator that used a PRNG to generate massive amounts of stereotypical code, spitting out C and PIR... That would be easy if it was a bunch of x = x + y z = 3 * x lines that would approximate the structure of your large stuff. Is it much more complicated than that? Oh, yeah. :) Or, rather, no, if you factor in the second simple bit, the massive number of labels and comparisons. Add those into the mix and I think you'd have it. -- Gregor Dan Sugalski wrote: At 6:46 AM -0700 8/27/04, Gregor N. Purdy wrote: Dan -- I think it would be interesting to find out how, say, gcc behaves on the pathological code structures you've run into. Could your compiler spit out a structurally (although not semantically! :) equivalent piece of C code that could be used with a C compiler to see how we do vs. C compilers in these cases? That's a good question. While it's not impossible, I'd need to do a bunch of work on the compiler to get it to emit compilable C instead of PIR, and I don't think it's something I'll have time for anytime soon. :( #!/usr/bin/perl -w use strict; die "Usage: $0 \n" unless @ARGV == 2; my ($total_labels, $total_locals) = @ARGV; my $labels_so_far = 0; my $locals_so_far = 0; open IMC, ">gen.imc"; open CPP, ">gen.cpp"; print IMC ".sub __MAIN\n\n"; print CPP "int main(int argc, char * arg[])\n{\n"; while ($labels_so_far < $total_labels or $locals_so_far < $total_locals) { my $action = qw(label local arithmetic control)[int(rand(4))]; if ($action eq 'label' and $labels_so_far < $total_labels) { my $this_label = ++$labels_so_far; printf IMC "\n_L_%d:\n", $this_label; printf CPP "\n_L_%d:\n", $this_label; } elsif ($action eq 'local' and $locals_so_far < $total_locals) { my $this_local = ++$locals_so_far; my $this_value = int(rand(99)) + 1; printf IMC " .local int V_%d\n", $this_local; printf IMC " V_%d = %d\n", $this_local, $this_value; printf CPP " int V_%d;\n", $this_local; printf CPP " V_%d = %d;\n", $this_local, $this_value; } elsif ($action eq 'arithmetic' and $locals_so_far > 0) { my $result = 1 + int(rand($locals_so_far)); my $arg1 = 1 + int(rand($locals_so_far)); my $arg2 = 1 + int(rand($locals_so_far)); my $op = qw( + - * )[rand 3]; printf IMC " V_%d = V_%d %s V_%d\n", $result, $arg1, $op, $arg2; printf CPP " V_%d = V_%d %s V_%d;\n", $result, $arg1, $op, $arg2; } elsif ($action eq 'control' and $locals_so_far > 0) { my $dest = 1 + int(rand($total_labels)); my $arg1 = 1 + int(rand($locals_so_far)); my $arg2 = 1 + int(rand($locals_so_far)); my $op = qw( < <= == != >= > )[rand 6]; printf IMC " if V_%d %s V_%d goto _L_%d\n", $arg1, $op, $arg2, $dest; printf CPP " if (V_%d %s V_%d) goto _L_%d;\n", $arg1, $op, $arg2, $dest; } } prin