[Chicken-users] Hash table equality pitfall

2012-03-01 Thread Peter Bex
Hi all!

I ran into some very tricky behavior with hash tables that I'd like to
document for posterity.

Turns out that (equal? (make-hash-table) (make-hash-table)) no longer
always returns #t in the current master, but it did in Chicken 4.7.0.

This is because of obscure internal details: the hash table's internal
hashing function is a closure which contains the randomization fixnum,
which usually differs between tables.

In Chicken 4.7.0 and earlier, equal? only *appears* to work correctly for
hash tables.  In fact, it works if the table is filled such that buckets
contain at most one value, or if they were filled in the same order:

#;1 (use srfi-69)
; loading library srfi-69 ...
#;2 (define x (make-hash-table size: 2))
#;3 (define y (make-hash-table size: 2))
#;4 (equal? x y)
#t
#;5 (hash-table-set! x 'a 1)
#;6 (hash-table-set! x 'b 2)
#;7 (hash-table-set! x 'c 3)
#;8 (hash-table-set! y 'a 1)
#;9 (hash-table-set! y 'b 2)
#;10 (hash-table-set! y 'c 3)
#;11 (equal? x y)
#t

However, if we change the fill order, it no longer compares equal:

#;1 (use srfi-69)
; loading library srfi-69 ...
#;2 (define x (make-hash-table size: 2))
#;3 (define y (make-hash-table size: 2))
#;4 (hash-table-set! x 'a 1)
#;5 (hash-table-set! x 'a 2)
#;6 (hash-table-set! x 'b 2)
#;7 (hash-table-set! x 'c 3)
#;8 (hash-table-set! y 'c 3)
#;9 (hash-table-set! y 'b 2)
#;10 (hash-table-set! y 'a 1)
#;11 (equal? x y)
#f

The reason for this is that equal? is a very generic procedure, which
knows about core datatypes and compares record type values field by field.
A hash table is just a record type, and it contains some internal
slots with the configuration details as well as a vector which holds the
hash table's buckets.  The bucket contents themselves are alists, which
are filled simply using alist-cons when you use hash-table-set!.  That's
why the ordering differs.

Of course, from the end-user perspective this is *very* tricky behaviour,
so I've added a note to the manual about this.  I checked the SRFI, but
it doesn't mention anything about hash tables having to compare equal to
eachother so Chicken's behaviour is strictly not in violation of any
standard :)  The only other implementation I was able to test with was
Racket (why is SRFI-69 so uncommon in Schemes?) and there hash tables
also don't compare with equal?

Luckily, with the new version the mistake of thinking that equal? is
defined for hash-tables is a lot harder to make since the randomization
factor will ensure they are almost always different.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music.
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Hash table equality pitfall

2012-03-01 Thread John Cowan
Peter Bex scripsit:

 Of course, from the end-user perspective this is *very* tricky
 behaviour, so I've added a note to the manual about this.  I checked
 the SRFI, but it doesn't mention anything about hash tables having to
 compare equal to eachother so Chicken's behaviour is strictly not in
 violation of any standard :)

The R5RS definition of `equal?` says it is recursive on pairs, vectors,
and strings, going on to say that `equal?` defers to `eqv?` on
other objects such as numbers and symbols.  It is not clear whether
this means all other objects (in which case `equal?` on hashtables
must be the same as `eqv?`), or only some other objects, including
numbers and symbols but allowing implementation-defined behavior on
implementation-defined objects.  Chicken evidently assumes the latter
interpretation.

In R6RS, `equal?` definitely defers to `eqv?` for all objects except
pairs, vectors, strings, and bytevectors.  R7RS currently uses the same
language as R5RS, with the same ambiguity, but requires descent into
bytevectors and records.

 The only other implementation I was able to test with was Racket (why
 is SRFI-69 so uncommon in Schemes?) and there hash tables also don't
 compare with equal?

I maintain a table of what Schemes implement which SRFIs at
http://tinyurl.com/scheme-s5 (Spreadsheet of Scheme Systems Supporting
SRFIs).  There is a portable R6RS implementation of SRFI-69 on top of
R6RS hashtables.  These are naturally not `equal?` in accordance with
the R6RS rule given above.

In addition, the following Schemes support SRFI-69 with `equal?`
descending into hash tables:  Kawa, Chibi.  The following Schemes also
support it, with `equal?` the same as `eqv?`:  Guile, STklos.

I think the main reason SRFI 69 doesn't have much support is its fairly
recent date (2005).  It would probably be straightforward to layer it
over the native support in most Schemes, but people haven't bothered to
do so.  In addition, the reference implementation is more of a proof of
concept than a drop-in implementation like those provided with SRFI-1,
SRFI-2, SRFI-43, etc.

 Luckily, with the new version the mistake of thinking that equal?
 is defined for hash-tables is a lot harder to make since the
 randomization factor will ensure they are almost always different.

I think that's good.  The behavior of `equal?` is primarily historical
anyway; people can and should write their own equivalence predicates to
solve their own problems.

-- 
John Cowanhttp://ccil.org/~cowan  co...@ccil.org
The Penguin shall hunt and devour all that is crufty, gnarly and
bogacious; all code which wriggles like spaghetti, or is infested with
blighting creatures, or is bound by grave and perilous Licences shall it
capture.  And in capturing shall it replicate, and in replicating shall
it document, and in documentation shall it bring freedom, serenity and
most cool froodiness to the earth and all who code therein.  --Gospel of Tux

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Hash table equality pitfall

2012-03-01 Thread Peter Bex
On Thu, Mar 01, 2012 at 12:03:25PM -0500, John Cowan wrote:
 In addition, the following Schemes support SRFI-69 with `equal?`
 descending into hash tables:  Kawa, Chibi.

What does this mean in practice?  Do they do a dumb comparison like
Chicken does (ie, producing different results depending on the insertion
order) or do they check whether the hash tables have exactly the same
keys, each with identical corresponding values?

If they do it correctly, how do they deal with differences in initial
bucket size, and what do they do with hash tables having identical
key/values but different hashing or different comparison procedures?

 I think the main reason SRFI 69 doesn't have much support is its fairly
 recent date (2005).  It would probably be straightforward to layer it
 over the native support in most Schemes, but people haven't bothered to
 do so.

That's rather odd, considering there were quite a few people discussing
it, judging by the SRFI discussion archive (not as much as some other
SRFIs, but still a few).  It's also a very useful thing to have.
I guess that's just a Chickeneer's way of looking at it :)

 In addition, the reference implementation is more of a proof of
 concept than a drop-in implementation like those provided with SRFI-1,
 SRFI-2, SRFI-43, etc.

Why is it not drop-in?  Maybe I'm missing something, but from a quick
glance the implementation doesn't really look too different from
Chicken's implementation, conceptually.

  Luckily, with the new version the mistake of thinking that equal?
  is defined for hash-tables is a lot harder to make since the
  randomization factor will ensure they are almost always different.
 
 I think that's good.  The behavior of `equal?` is primarily historical
 anyway; people can and should write their own equivalence predicates to
 solve their own problems.

You could easily argue either way, but it's good that it doesn't seem to
be correct when you first try it.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music.
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Hash table equality pitfall

2012-03-01 Thread Peter Bex
On Thu, Mar 01, 2012 at 12:45:57PM -1000, Derrell Piper wrote:
 Um, your 5th line resets 'a to 2, which is why they're not comparing
 equal?.  Removing that yields the expected results on 4.7.0.5-st:

Whoops, thanks for pointing that out, Derrel.  I overlooked it (that's
what you get while quickly writing up such things on the sly at work!).

The fact is that it sizes the hash table only in terms of the next value
in hash-table-prime-lengths, which means it'll be 307 items big.

However, this does not invalidate my main point.  If you insert either
more than 307 items in different order or manage to hash two objects
to the same value you get this problem:

#;1 (use srfi-69)
; loading library srfi-69 ...
#;2 (define x (make-hash-table))
#;3 (hash-table-set! x a 1)
#;4 (hash-table-set! x \x00a 1)
#;5 (define y (make-hash-table))
#;6 (hash-table-set! y \x00a 1)
#;7 (hash-table-set! y a 1)
#;8 (equal? x y)
#f

while in 4.5.0,

#;1 (use srfi-69)
; loading library srfi-69 ...
#;2 (define x (make-hash-table))
#;3 (hash-table-set! x a 1)
#;4 (hash-table-set! x \x00a 1)
#;5 (define y (make-hash-table))
#;6 (hash-table-set! y a 1)
#;7 (hash-table-set! y \x00a 1)
#;8 (equal? x y)
#t

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music.
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Hash table equality pitfall

2012-03-01 Thread John Cowan
Alex Shinn scripsit:

  What does this mean in practice?  Do they do a dumb comparison like
  Chicken does (ie, producing different results depending on the insertion
  order) or do they check whether the hash tables have exactly the same
  keys, each with identical corresponding values?
 
 Chibi does a dumb comparison like Chicken does.

I'd guess that Kawa delegates to Java's java.util.AbstractMap#equals
method, which returns true if the keys and their values are equal;
it does not care whether the maps are of the same types.

-- 
Si hoc legere scis, nimium eruditionis habes.

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Hash table equality pitfall

2012-03-01 Thread John Cowan
Peter Bex scripsit:

 That's rather odd, considering there were quite a few people discussing
 it, judging by the SRFI discussion archive (not as much as some other
 SRFIs, but still a few).  It's also a very useful thing to have.
 I guess that's just a Chickeneer's way of looking at it :)

Chicken didn't have a pre-existing hash table implementation to support.
As I've said before, most applications don't need to be portable across
multiple Schemes, since almost all are highly portable themselves.
So if you are writing in MIT Scheme or Racket, you can use the hash
tables provided by MIT Scheme or Racket without a problem.  It's library
authors who benefit most from standardization.

 Why is it not drop-in?  Maybe I'm missing something, but from a quick
 glance the implementation doesn't really look too different from
 Chicken's implementation, conceptually.

For one thing, the identity hash function is the same as the main
hash function, which only works on standard R5RS types.
There may be other issues.

-- 
John Cowanco...@ccil.orghttp://ccil.org/~cowan
   There was an old manSaid with a laugh, I
 From Peru, whose lim'ricks all  Cut them in half, the pay is
   Look'd like haiku.  He  Much better for two.
 --Emmet O'Brien

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users