[MIT-Scheme-devel] specifying hash table weakness

2010-08-14 Thread Taylor R Campbell
Bike shed time!

There are many different options in hash tables for weakly referenced
keys and data.  When I implement these, we could multiply the various
classes of procedures for constructing hash table constructors and
types and instances by all the possible weakness options:

x-HASH-TABLE/CONSTRUCTOR
x-EQ-HASH-TABLE-TYPEstrong
x-EQV-HASH-TABLE-TYPE   key-weak (formerly `weak')
x-EQUAL-HASH-TABLE-TYPE\ /  datum-weak
x-STRING-HASH-TABLE-TYPEX   key-or-datum-weak
MAKE-x-HASH-TABLE-TYPE / \  key-ephemeral
MAKE-x-EQ-HASH-TABLEdatum-ephemeral
MAKE-x-EQV-HASH-TABLE   key-and-datum-ephemeral
MAKE-x-EQUAL-HASH-TABLE
MAKE-x-STRING-HASH-TABLE

This leads to 70 different bindings.  Some combinations make little
sense together (e.g., key-weak string hash tables), but even if those
combinations are pruned, the result would be a *trifle* excessive.  So
what should we do instead?

Instead of a myriad of different hash table type and constructor
constructors, we could have just two procedures that take an extra
argument to specify the weakness option:

(HASH-TABLE-CONSTRUCTOR key-hash key=? rehash-after-gc? weakness)
(MAKE-HASH-TABLE-TYPE key-hash key=? rehash-after-gc? weakness)

Instead of a myriad of different hash table constructors for
particular hash functions, we could have one procedure per hash
function that takes an extra argument to specify the weakness option:

(MAKE-x-HASH-TABLE* weakness #!OPTIONAL initial-size),
  x \in {EQ, EQV, EQUAL, STRING}

None of these names conflict with existing names.  This leads to six
bindings, plus all the old ones, and perhaps seven bindings for the
weakness types (e.g., WEAKNESS:STRONG, WEAKNESS:KEY-WEAK, c.).

Questions?  Comments?  Flames?  Other ideas?

Here are all the weakness options:

- strong: Entries are never removed but with HASH-TABLE/REMOVE!.

- key-weak: The keys are referenced weakly, and entries whose keys
  have been dropped are removed when the table is rehashed or cleaned.
  Note that this can leave references to effectively unreachable data
  in the hash table for arbitrarily long times, particularly for non-
  address-hashed tables.  These hash tables are currently just called
  `weak' in MIT Scheme.

- datum-weak: The data are referenced weakly, and entries whose data
  have been dropped are removed when the table is rehashed or cleaned.
  Note that this can leave references to effectively unreachable keys
  in the hash table for arbitrarily long times, particularly for non-
  address-hashed tables.

- key-or-datum-weak: The keys and data are referenced weakly, and any
  entry whose key or datum has been dropped is removed when the table
  is rehashed or cleaned.  Broken entries can stay dormant in the
  table for arbitrarily long times, but not the references to their
  keys and data.

- key-ephemeral: The keys and data are referenced weakly, and for any
  entry whose key has been dropped, its datum is dropped too; any
  entry whose key (and datum) has been dropped is removed when the
  table is rehashed or cleaned.  Broken entries can stay dormant in
  the table for arbitrarily long times, but not the references to
  their keys or data.  References to the key through the datum don't
  count if the only reference to the datum is through an ephemeron.

- datum-ephemeral: The keys and data are referenced weakly, and for
  any entry whose datum has been dropped, its key is dropped too; any
  entry whose datum (and key) has been dropped is removed when the
  table is rehashed or cleaned.  Broken entries can stay dormant in
  the table for arbitrarily long times, but not the references to
  their keys or data.  References to the datum through the key don't
  count if the only reference to the key is through an ephemeron.

- key-and-datum-ephemeral: The keys and data are referenced weakly,
  and an entry is dropped if and only if neither its key nor its datum
  has any strong reference; any entry whose key and datum have been
  dropped is removed when the table is rehashed or cleaned.  Broken
  entries can stay dormant for arbitrarily long times, but not the
  references to their keys or data.

Weak hash tables are lighter-weight than ephemeral hash tables,
requiring approximately half the storage, but there are space leaks
that ephemeral hash tables can solve which weak hash tables cannot.
Most Lisp systems do not provide both -- either they provide what I
have called weak hash tables here, or what they call weak hash tables
are what I have called ephemeral hash tables here.  You should use
ephemeral hash tables unless you are sure that weak hash tables are
safe.  For example, if you want to hang a small number property on
arbitrary objects, a weak hash table may be appropriate.

It may turn out that there's a nice way to make ephemeral hash tables
just as cheap as weak hash tables, but I haven't persuaded myself that
teaching the garbage 

Re: [MIT-Scheme-devel] specifying hash table weakness

2010-08-14 Thread Arthur A. Gleckler
 (MAKE-x-HASH-TABLE* weakness #!OPTIONAL initial-size),
  x \in {EQ, EQV, EQUAL, STRING}

I wonder whether it makes sense to make X a parameter, too.  That
wouldn't prevent special treatment of EQ.

The other obvious possibility is to make a hash table constructor
maker, i.e. to curry the construction of hash tables, e.g.:

  ((make-hash-table-constructor comparator weakness) initial-size)

That requires the addition of the fewest new identifiers.  We can
watch to see which combinations are actually used and then make
shortcut identifiers for them.

Keep painting!

___
MIT-Scheme-devel mailing list
MIT-Scheme-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/mit-scheme-devel


Re: [MIT-Scheme-devel] specifying hash table weakness

2010-08-14 Thread Taylor R Campbell
   Date: Sat, 14 Aug 2010 19:20:06 -0700
   From: Arthur A. Gleckler a...@speechcode.com

(MAKE-x-HASH-TABLE* weakness #!OPTIONAL initial-size),
 x \in {EQ, EQV, EQUAL, STRING}

   I wonder whether it makes sense to make X a parameter, too.  That
   wouldn't prevent special treatment of EQ.

That would require adding some more parameters, unless there is only a
fixed set of valid arguments for x so that the appropriate values for
the other parameters can be chosen like in SRFI 69 implementation.
Personally I prefer encoding such limited choices in the names -- if
you can pass a procedure, it is tempting to pass a procedure outside
the allowed set, and the subsequent failure is confusing.

We could have a procedure

(MAKE-HASH-TABLE* key=? key-hash rehash-after-gc? weakness 
initial-size)

or something, too.

   The other obvious possibility is to make a hash table constructor
   maker, i.e. to curry the construction of hash tables, e.g.:

 ((make-hash-table-constructor comparator weakness) initial-size)

   That requires the addition of the fewest new identifiers.  We can
   watch to see which combinations are actually used and then make
   shortcut identifiers for them.

This is what the proposed HASH-TABLE-CONSTRUCTOR already does, except
that it has two more parameters, for the hash function and for whether
the hash function is altered by garbage collection.

___
MIT-Scheme-devel mailing list
MIT-Scheme-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/mit-scheme-devel


Re: [MIT-Scheme-devel] specifying hash table weakness

2010-08-14 Thread Chris Hanson
In cases like these I prefer a small number of very general
definitions, so I'd favor having all the options be defined as
arguments.  Provided, of course, that there are compatibility bindings
for existing usage.

On Sat, Aug 14, 2010 at 8:12 PM, Taylor R Campbell campb...@mumble.net wrote:
   Date: Sat, 14 Aug 2010 19:20:06 -0700
   From: Arthur A. Gleckler a...@speechcode.com

    (MAKE-x-HASH-TABLE* weakness #!OPTIONAL initial-size),
     x \in {EQ, EQV, EQUAL, STRING}

   I wonder whether it makes sense to make X a parameter, too.  That
   wouldn't prevent special treatment of EQ.

 That would require adding some more parameters, unless there is only a
 fixed set of valid arguments for x so that the appropriate values for
 the other parameters can be chosen like in SRFI 69 implementation.
 Personally I prefer encoding such limited choices in the names -- if
 you can pass a procedure, it is tempting to pass a procedure outside
 the allowed set, and the subsequent failure is confusing.

 We could have a procedure

 (MAKE-HASH-TABLE* key=? key-hash rehash-after-gc? weakness 
 initial-size)

 or something, too.

   The other obvious possibility is to make a hash table constructor
   maker, i.e. to curry the construction of hash tables, e.g.:

     ((make-hash-table-constructor comparator weakness) initial-size)

   That requires the addition of the fewest new identifiers.  We can
   watch to see which combinations are actually used and then make
   shortcut identifiers for them.

 This is what the proposed HASH-TABLE-CONSTRUCTOR already does, except
 that it has two more parameters, for the hash function and for whether
 the hash function is altered by garbage collection.

 ___
 MIT-Scheme-devel mailing list
 MIT-Scheme-devel@gnu.org
 http://lists.gnu.org/mailman/listinfo/mit-scheme-devel


___
MIT-Scheme-devel mailing list
MIT-Scheme-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/mit-scheme-devel