[Felix-language] caching symbol tables

john skaller Mon, 15 Nov 2010 19:13:37 -0800

I have hacked the compiler to the point where I can now either
load a cached symbol table, or compute and store it, one table
per file. The idea is to replace "*.par" files  with "*.symtab" files
since these represent a point further down the tool chain
and caching would save more time.


I'm not yet *using* these tables. I can't look into them
(because Erick made the data structures private and
abstract :)

The current code makes too many assumptions: it isn't
functional enough. A huge global structure is passed around
and modified by various routines. The current sticking point is

bind_asms ...

which produces a bound symbol table from a list of assembly 
instructions. This function is the only public interface and it is doing
vastly too much work.. it can't use my unbound symbol tables as input.

So: the symbol tables have integer indexes assigned to each
symbol, these are obtained from a counter. In normal processing
of everything, the counter is incremented sequentially so there's
no clash.

However, this invariant holds when loading disk images of
the tables if the entire processing sequence is unchanged.
If one library file is changed to include more functions,
then more integers get used up producing its symbol table,
and then there will be an overlap with the old version of
the next symbol table.

To prevent a clash, we have to save the counter range with
the symbol table, and when we load it, check its min is
greater than or equal to the current counter, then we have
to set the current counter to the max. If the check fails,
we have to regenerate the whole table.

[NOTE: there is a separate problem  with counters used up
during desugaring when constructing some functions we
refer to by index instead of name .. we have to avoid a clash
there too, but that's a different issue :]

The problem could be ameliorated by a assigning
"address space" to symbol tables more cleverly.
For example,  assign on some 4K boundary,
with at least 4K free space, then small modifications
won't require rebuilding the symbol tables.

[This would play havoc with the algorithm that runs
through ALL the integers up to the max used value .. :[

Another option is to base ALL the tables on address 0,
and relocate them on loading.

In any case: the main point of doing all this is to support
the "next step". We actually want to save the *bound* symbol table.
This has all the nasty overloading and lookup done.

At present, what happens is: the whole program symbol table
is constructed, then it is bound (all of it). Then, the optimiser
garbage collects relative to the roots, which is the main entry
points (top module _init_ routine and flx_main routine)
any exports, and all typeclasses and typeclass instances

[Can't throw typeclass instances out, because they're bound
to their calls later, so we have to keep any of them that might
be called: we could throw out unused typeclasses, but it is
hard to calculate who they are so we don't bother]

What is discarded is: all those standard library functions you didn't
actually use.

What this means is that you're paying a huge price, binding
the whole standard library every compilation.

Now: it is NOT possible to bind individual files, because they can
cross reference each other. However, we certainly expect the
standard library to be complete, and it is passed to flxg as an
argument. So there is no reason we can't bind the standard library
and store the bound table, then reload it and incrementally extend
it with the users program code.

We still need the unbound symbol tables though. That's because the 
lookup routine is very clever and can bind anything based entirely
on unbound tables. To do this it has to recursively bind things
"up to the type", but the result of these inner bindings is thrown
away (barring the ticache, which caches function types).

There's a reason for this: binding the whole body of a function
may introduce a requirement for type of another function
to be known. If that is found by completely binding that function,
recursive functions wouldn't work. What we actually do
is analyse the code to discover the return type, and bind that,
but we don't bind all of the code. In fact, since recursions always
occur with branches, we actually allow a binding to fail provided
at least one branch can be bound.

It would actually be nice if we could get rid of the unbound symbol tables
once he corresponding bound tables exist, but this would requires
extensive changes to the lookup code (to first try the bound table,
then try the unbound one if that fails).



--
john skaller
skal...@users.sourceforge.net





------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

[Felix-language] caching symbol tables

Reply via email to