I have hacked the compiler to the point where I can now either load a cached symbol table, or compute and store it, one table per file. The idea is to replace "*.par" files with "*.symtab" files since these represent a point further down the tool chain and caching would save more time.
I'm not yet *using* these tables. I can't look into them (because Erick made the data structures private and abstract :) The current code makes too many assumptions: it isn't functional enough. A huge global structure is passed around and modified by various routines. The current sticking point is bind_asms ... which produces a bound symbol table from a list of assembly instructions. This function is the only public interface and it is doing vastly too much work.. it can't use my unbound symbol tables as input. So: the symbol tables have integer indexes assigned to each symbol, these are obtained from a counter. In normal processing of everything, the counter is incremented sequentially so there's no clash. However, this invariant holds when loading disk images of the tables if the entire processing sequence is unchanged. If one library file is changed to include more functions, then more integers get used up producing its symbol table, and then there will be an overlap with the old version of the next symbol table. To prevent a clash, we have to save the counter range with the symbol table, and when we load it, check its min is greater than or equal to the current counter, then we have to set the current counter to the max. If the check fails, we have to regenerate the whole table. [NOTE: there is a separate problem with counters used up during desugaring when constructing some functions we refer to by index instead of name .. we have to avoid a clash there too, but that's a different issue :] The problem could be ameliorated by a assigning "address space" to symbol tables more cleverly. For example, assign on some 4K boundary, with at least 4K free space, then small modifications won't require rebuilding the symbol tables. [This would play havoc with the algorithm that runs through ALL the integers up to the max used value .. :[ Another option is to base ALL the tables on address 0, and relocate them on loading. In any case: the main point of doing all this is to support the "next step". We actually want to save the *bound* symbol table. This has all the nasty overloading and lookup done. At present, what happens is: the whole program symbol table is constructed, then it is bound (all of it). Then, the optimiser garbage collects relative to the roots, which is the main entry points (top module _init_ routine and flx_main routine) any exports, and all typeclasses and typeclass instances [Can't throw typeclass instances out, because they're bound to their calls later, so we have to keep any of them that might be called: we could throw out unused typeclasses, but it is hard to calculate who they are so we don't bother] What is discarded is: all those standard library functions you didn't actually use. What this means is that you're paying a huge price, binding the whole standard library every compilation. Now: it is NOT possible to bind individual files, because they can cross reference each other. However, we certainly expect the standard library to be complete, and it is passed to flxg as an argument. So there is no reason we can't bind the standard library and store the bound table, then reload it and incrementally extend it with the users program code. We still need the unbound symbol tables though. That's because the lookup routine is very clever and can bind anything based entirely on unbound tables. To do this it has to recursively bind things "up to the type", but the result of these inner bindings is thrown away (barring the ticache, which caches function types). There's a reason for this: binding the whole body of a function may introduce a requirement for type of another function to be known. If that is found by completely binding that function, recursive functions wouldn't work. What we actually do is analyse the code to discover the return type, and bind that, but we don't bind all of the code. In fact, since recursions always occur with branches, we actually allow a binding to fail provided at least one branch can be bound. It would actually be nice if we could get rid of the unbound symbol tables once he corresponding bound tables exist, but this would requires extensive changes to the lookup code (to first try the bound table, then try the unbound one if that fails). -- john skaller skal...@users.sourceforge.net ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language