After I changed the microcode to hold only weak references to interned
symbols without global bindings, I noticed a number of problems with
weak hash tables keyed by symbols. The procedure MAKE-EQ-HASH-TABLE,
counterintuitively to me, yields hash tables with only weak references
to their keys. Many uses of MAKE-EQ-HASH-TABLE in MIT Scheme actually
need key-strong hash tables, not key-weak hash tables, and after going
through all the uses, I see that the problem is worse than I expected.
Attached to this message is a summary of all sixty-two uses; here are
two noteworthy statistics:
- Only eleven, i.e. 17%, of the uses are obviously legitimate uses for
key-weak tables.
- Thirteen of the uses are obviously incorrect and require key-strong
hash tables. That is, more than 20% of the uses are bugs.
At the very least, we should make sure that every use of hash tables
correctly uses key-strong or key-weak hash tables. However, I'd like
to go a step further and change the semantics of MAKE-EQ-HASH-TABLE to
yield strong hash tables, and add a separate MAKE-WEAK-EQ-HASH-TABLE
which would behave as MAKE-EQ-HASH-TABLE does currently.
I realize that MAKE-EQ-HASH-TABLE has yielded weak hash tables since
1993, and that its predecessor MAKE-OBJECT-HASH-TABLE has done so for
even longer than that, but since even in MIT Scheme itself there are
more incorrect uses of MAKE-EQ-HASH-TABLE than correct uses (and many
more ambiguous uses where the intent is not clear), I think that it is
dangerous to use the name MAKE-EQ-HASH-TABLE, or any name without the
word `weak' in it, for a constructor of weak hash tables.
If we made this change, existing code using MAKE-EQ-HASH-TABLE
legitimately would exhibit space leaks only on new versions of Scheme.
On the other hand, if we don't make this change, I think that more and
more code will be written that accidentally uses weak hash tables
where strong ones are at best intended or at worst needed, with subtle
heisenbugs that show up depending on when garbage collection happens.
Comments?
* Use of MAKE-EQ-HASH-TABLE in MIT Scheme, 2009-12-19 -*- outline -*-
In MIT Scheme, MAKE-EQ-HASH-TABLE yields a hash table whose keys are
held only weakly, so that they will be garbage-collected if there are
no strong references to them. To make a similar hash table whose keys
are held strongly, one must use MAKE-STRONG-EQ-HASH-TABLE explicitly.
Of the sixty-two uses of MAKE-EQ-HASH-TABLE throughout the MIT Scheme
source code, only seven appear to need weak references to their keys,
and for only four more is there an obvious reason to use weak
references. This list categorizes most uses of MAKE-EQ-HASH-TABLE. I
have not analyzed the complicated uses of MAKE-EQ-HASH-TABLE in
xdoc/xdoc.scm and xml/xml-names.scm.
** Hash table must be key-weak
*** edwin/comman.scm, permanent-local-variables
*** edwin/curren.scm, screen-buffer-layouts
*** edwin/eystep.scm, stepper-buffers
*** edwin/xterm.scm, display/cached-atoms-table
*** edwin/xterm.scm, display/selection-records
*** imail/imail-core.scm, memoized-resources
*** runtime/sfile.scm, interned-mime-types
** Hash table should probably be key-weak
*** edwin/prompt.scm, prompt-histories
*** edwin/xterm.scm, symbol->atom table in display/cached-atom-tables
*** imail/imail-file.scm, file-folder-types
*** sos/class.scm, built-in-class-table
** Hash table could be key-strong; not sure whether it should be
*** compiler/machines/i386/lapopt.scm, *rules*
If the compiler never generates instructions with certain symbols in
them, and hence the compiler's code has no strong references to those
symbols, then the rules for those symbols may as well be discarded.
But that's pretty sketchy.
*** compiler/machines/svm/assembler-runtime.scm, symbol tables
*** compiler/machines/svm/assembler-runtime.scm, symbolic-operators
*** compiler/machines/svm/assembler-runtime.scm, pvar-type-table
*** compiler/machines/svm/lapopt.scm, *rules* (not really used)
*** compiler/machines/x86-64/lapopt.scm, *rules* (not really used)
*** edwin/nntp.scm, tables in convert-header-graphs-to-trees
In the only caller of CONVERT-HEADER-GRAPHS-TO-TREES, the (strong)
list of headers is still strongly referenced, so the keys of the two
hash tables in TABLES will not be garbage-collected. But this is
still sketchy (and see below on BUILD-EQUIVALENCE-CLASSES in the same
file).
*** edwin/xterm.scm, built-in-atoms-table
If the binding for BUILT-IN-ATOMS were collected while that for
BUILT-IN-ATOMS-TABLE were not, then the latter would be in trouble.
But this doesn't happen currently.
*** microcode/os2pm.scm, type-abbreviations
*** microcode/os2pm.scm, id-external-roots
*** microcode/os2pm.scm, pm-procedures
This code is probably defunct, but if it weren't, and if the
presentation manager procedure abstraction were used outside this
file, it would probably be necessary to make these three hash tables
key-strong. It's not clear to me that PM-PROCEDURES is safe as is,
too.
*** runtime/genio.scm, {en,de}coder/sizer/{,de}normalizer maps
Since there are maps in both directions, each hash table's keys also
have strong references in the data positions of the other hash table.
But this is pretty fragile, and in any case there is no need to use
key-weak hash tables.
*** runtime/syntax-output.scm, unmappings
*** runtime/syntax-output.scm, rename-databases' mapping-tables
*** runtime/syntax-output.scm, rename-databases' id-tables
*** ssp/xmlrpc.scm, methods
Since the hash table is used only in one place, and only one key is
fetched out of it, that key will be strongly referenced until it is
fetched, and the other keys don't matter.
*** xml/turtle.scm, table in write-prefixes
** Hash table must be key-strong
*** edwin/nntp.scm, equivalences in build-equivalence-classes
In the expression for the call to MAP at the end of
BUILD-EQUIVALENCE-CLASSES, there are no extant references to the
threads except in the keys of the hash table. (SUBJECT-ALIST, and the
caller's THREADS in ORGANIZE-HEADERS-INTO-THREADS, are no longer
referenced at this point.) Hence the garbage collector may reclaim
some (or in fact all) of the hash table's associations.
*** imail/imail-mime.scm, mime-encodings
*** runtime/http-syntax.scm, header-value-defns
*** ssp/mod-lisp.scm, mime-handlers
*** ssp/xhtml-expander.scm, *sabbr-table*
The processing instructions are processed incrementally as the file is
parsed, so keys in the sabbr table may be garbage-collected and then
re-interned, between which times the associations will be destroyed.
*** star-parser/matcher.scm, matcher-preprocessors
*** star-parser/matcher.scm, matcher-compilers
*** star-parser/parser.scm, parser-preprocessors
*** star-parser/parser.scm, parser-compilers
*** star-parser/shared.scm, make-parser-macros
*** star-parser/shared.scm, *global-parser-macros*
*** xdoc/validate-xdoc.scm, element-checkers
*** xml/xhtml.scm, element-context-map
** Unsure
*** compiler/machines/C/stackify.scm, stackify count tables
*** compiler/rtlbase/rtlobj.scm, label->object maps
It's not immediately obvious to me whether LABEL->OBJECT will ever be
used after the last references to the keys of the hash tables involved
(which cause strong references to those keys to be dropped, if
COMPILER:PRESERVE-DATA-STRUCTURES? is false). A little further
analysis is required.
*** edwin/eystep.scm, ynode-regions
What are the keys to these hash tables? I don't know how long they
persist.
*** edwin/win32.scm, event-handlers
The keys in EVENT-HANDLERS are integers. Should the table be a strong
eqv hash table rather than a weak eq hash table?
*** edwin/xterm.scm, selection->record table in display/selection-records
I'm not sure what the domain of possible keys to this hash table is --
it may be just the symbols PRIMARY and CLIPBOARD, which will probably
be strongly referenced by the rest of the Edwin code, but on the other
hand I think this should probably be a key-strong hash table.
*** xml/xhtml-entities.scm, table in html-char->name-map
The keys in the table inside HTML-CHAR->NAME-MAP is keyed by
characters. Should this be a strong eqv hash table rather than a weak
eq hash table?
** Complicated
*** xdoc/xdoc.scm
*** xml/xml-names.scm
_______________________________________________
MIT-Scheme-devel mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/mit-scheme-devel