Re: [Scheme-reports] Cycle detection problems: #442, #338, "equal?"

John Boyle Fri, 31 Aug 2012 01:31:57 -0700

Ray Dillinger mentions:
> structural isomorphism

The mathematical definition of isomorphism is that there exists a
one-to-one mapping between the objects in the first set and the objects in
the second set that preserves some property.  (Here the property would be
"which pointer-slots in the objects in the structure point to which other
objects in the structure, or to which atomic objects".)  I think that's
what you mean?  By the way, given that definition, isomorphism is trivial
to implement with an eq-hash, using O(n) space and O(n) hash insertions and
lookups: you establish the mapping naively as you look at each pair of
objects.  See http://paste.lisp.org/display/131368 for a sketch
implementation in Racket.

... I must say, I assumed that equal? did in fact mean
"isomorphic/one-to-one mapping", not "(possibly infinite) unfoldings".  I
was astonished to hear that it was the latter, and that #0=(x . #0#) was
equal? to #0=(x x . #0#), but Racket confirms it.  At first, I couldn't
imagine why it would be this way, but now I realize: the list ((x) (x)) is
not isomorphic to the list (#0=(x) #0#), yet they are considered equal?.

... I am tempted to suggest, as you seem to do, that we take a hybrid
approach and have "equal?" degrade to "isomorphic?" when a cycle is
detected (the way "write" is probably going to degrade to "write-shared"
when a cycle is detected).  However, that is ugly, and has expectably ugly
consequences.  In particular, it would imply that, if a = '(6 3), b = (list
a a), c = '((6 3) (6 3)), and d = #0=(1 #0#), then we have (equal? b c) and
(equal? d d) but not (equal? (cons b d) (cons c d)).

(Likewise, (write b) and (write c) would both look like ((6 3) (6 3)), but
(write (cons b d)) and (write (cons c d)) would probably look like (((6 3)
(6 3)) . #0=(1 #0#)) and ((#0=(6 3) #0#) . #1=(1 #1#)), respectively.  For
this and related reasons, you should probably not use "looks the same under
'write'" to define equal?, unless you intend to either adopt "degrades to
isomorphic?" semantics or absolutely specify the output of "write".)

...I still think "equal?, but degrades to isomorphic?" is the most useful
function for my purposes, but eh, whatever.  Incidentally, I can imagine a
version of equal? that... come to think of it, this is probably in fact
what you meant by your description of structural isomorphism.  Basically,
you would treat any irreducible group of objects--a group such that you
could follow pointers from any object in the group to any other object in
the group--as a separate kind of data structure (with an ordered list of
external pointers), and conceptually replace that group of objects (with
its downward pointers) with an instance of the new struct-type (with the
same downward pointers).  For example, #0=(<ptr1> <ptr2> . #0#) might
become (cyclic-list-2 <ptr1> <ptr2>), while #0=(<ptr1> <ptr2> #0# <ptr3> .
#0#) might become (generated-struct-name <ptr1> <ptr2> <ptr3>).  In this
way, you can convert arbitrary structures into trees, and compare them with
naive equal?.  Of course, it'd probably be rather complicated and expensive
to do this... but it seems to be a complete and internally consistent
specification, and it should satisfy any invariant about equal? that you
could come up with when working without cycles.  --And it would think that
#0=(1 . #0#), (1 . #0=(1 . #0#)), and #0=(1 1 . #0#) were all different,
while the unfolding equal? would think the opposite.

Actually, come to think of it, there is one fact about equal? that is true
for all non-cyclic structures that isn't true under unfolding: if (equal? a
b), then not (equal? (cons <something> a) b).  Therefore, it seems the
"equal?, replacing irreducible cycles with structs" approach is
semantically superior.

Is it worth arguing for?  I dunno.  It seems difficult to implement
efficiently, though suggestions are welcome.  (Can you quickly partition n
objects into groups such that each object lies in a group where you can
follow pointers from any element of a group to any other element of the
group, but that you can't go from one group to another and back?  That's
the start.)  Meanwhile, Will Clinger has a reference implementation of
"unfolding", Racket and probably others have full implementations, and R6RS
specifies it.  Is it worth bothering about?  Meh.

All this is sort of an unnatural problem anyway.  Comparing compound
structures based on their structure and leaves, rather than object
identity, is something you tend to do in the absence of set-ca/dr! (where
you care nothing for object identity, and might even use "hash-consing" to
share as much structure as physically possible), while cycles can only
arise through the use of set-ca/dr!.  Perhaps it's a bit much to expect a
single function to do well in both contexts; indeed, equal? originally
didn't handle cycles at all, and the desire for a safe version seems to
have been the reason for the "unfolding" extension in the first place.

Perhaps we can be satisfied with making a possibly arbitrary and dumb, but
consistent (and reflexive and symmetric and transitive, and not too hard to
implement efficiently), choice for how to extend equal?'s semantics to
cycles.  But let it be known that there are alternatives.

----

By the way, if the "(possibly infinite) unfoldings" definition is
considered confusing, I'll throw out some suggestions for a precise
definition:
- "Compound structures x and y are not equal? if there exists a sequence of
accesses made by composing [car, cdr, vector-ref, struct slot accessors,
and anything else] that, when applied to x and y respectively, yields two
objects that either have distinct types or are both atomic and not eqv?.
 Otherwise, they are equal?."

- "x and y are equal? if there exists a many-to-one mapping of the compound
objects in x and the compound objects in y to elements of a set z, such
that map(x) = map(y) and, for any compound same-typed elements A and B from
x and y respectively, map(A) = map(B) only if the contents of each slot of
A is either eqv? to, or maps to the same element of z as, the corresponding
slot of B."  This is essentially an inlined mathematical definition of
equivalence classes, as Clinger's code suggests:

- "x and y are equal? if there is a way of partitioning all the objects of
x and y collectively into separate groups ["equivalence classes"]--such
that we say A = B if and only if A and B belong to the same group or A and
B are both atoms and are eqv?--so that x = y and, for any elements A and B
from x and y respectively, if A = B, then each slot in A = corresponding
slot in B."
--John Boyle
*Science is what we understand well enough to explain to a computer. Art is
everything else we do.* --Knuth

On Thu, Aug 30, 2012 at 12:30 PM, Ray Dillinger <[email protected]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 08/30/2012 11:57 AM, Ray Dillinger wrote:
>
> > The behavior I would prefer, however, is that outlined by Per
> > Bothner.
> >
> > To state it precisely in terms clear enough for the standard and
> > also calculated to tell people exactly how to implement it:
> >
> > "If one operand has a cycle and the other does not, then the
> > predicate returns #f.  If all have cycles, then iff the printed
> > representations of the structures are identical up to the point
> > when all have entered their cycles at least twice, then the
> > predicate returns #t."
>
> Actually, as I consider the above, there is one case where I'm
> not comfortable with this definition. Vectors can have elements
> whose printed representation is not part of the "infinite
> printed representation" if there is a cycle that starts before
> those elements.  Consider two arrays, each of which has itself
> as its own second element:
>
> V1 ==>   #1=#(1 #1 6)
> V2 ==>   #2=#(1 #2 9)
>
> (equal? V1 v2) ==> ?
>
> The infinite printed representation is identical, well past the
> point where both have entered their cycle at least twice. Both
> would show "#(1 #(1 #(1 #(1 #(1 #(1 ......"  But they can't be
> considered equal? because
>
> (equal? (vector-ref V1 2) (vector-ref V2 2)) => #f
>
> It requires cycle-aware print to distinguish the written
> representation.
>
> So, given the above, I think it's necessary that equal? should
> detect structural isomorphism (written representation under
> cycle-aware printing is equal) rather than generation of
> identical infinite sequences ("infinite" written representation
> up to second entry of cycle is equal).
>
>                                         Bear
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.12 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJQP79pAAoJEAOzWkqOibfNvzIH/jZBCxouUxtXCDHPvzGMzYwP
> dLj+1cNmUGLLLmCyuFyCYtR9y3H52zfKRoWUI9FXEpkgFsVSd5trWmpvs9Q+1het
> yY7NfnVYRCftFxRHoCBNKxax/L4X24AWrFlaLfLuy8T93VgJbMgG5R9fnUokeeqZ
> bhh8Yy22mjOHuAWitqsG7DqwsbYUeybReh4X29bQyEZce+z65US4MyHpTHMGGA4D
> v6jkHgum2BXNqebHPjTBvJ3YY0EtzKv3/Cu8R5BNHPJlly6yCiUYyyydcfZfVkgZ
> KfKLCL/b4fe0uCtcBWio8vCl/7q4OkqM/O+bifx1qhwl6AUanVOhyVSIfHS9/9Y=
> =dhoB
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Scheme-reports mailing list
> [email protected]
> http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports
>

_______________________________________________
Scheme-reports mailing list
[email protected]
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

Re: [Scheme-reports] Cycle detection problems: #442, #338, "equal?"

Reply via email to