Re: [Scheme-reports] Cycle detection problems: #442, #338, "equal?"

John Boyle Fri, 31 Aug 2012 11:37:08 -0700

John Cowan writes:
> Informally, we can say that your first two expressions can both be
> characterized as "a list with an infinite number of x's"
-snip-
> That's just a way of encoding the paradoxical fact that one plus infinity
> is still equal to infinity.  If you don't like that, avoid infinite
> structures or define your own equality.

I might define some kind of binary tree structure, where each node has a
pointer to its parent--or to the root of the tree.  I would characterize
this as a very finite graph, and would not consider a tree with two equal?
nodes added to it as the same as a tree with only one node.

Also, I might use a circular list to maintain a free-list of some kind of
object, null-ifying all fields of objects as they were "free"d and put back
on the list, and--though this is a complaint about write or read, not
equal?--I would be rather confounded if I wrote all data structures in the
program to a file, restarted the program, read everything back in, and
found that there was 1 object on the free-list rather than 10,000, or vice
versa.  True, I should have used "write-shared", but I can imagine some
users learning that "write"/"read" seemed to handle everything including
cycles just fine, and then being totally mystified at this behavior.

"infinite list" is only one valid interpretation; there are others.

> That's a non-conformant implementation of R7RS `write`, which is required
> to use labels only in the presence of cycles, not in the presence of
> structure-sharing.

Hah.  Do you mean to say that an implementation should detect the *exact*
amount of datum-labeling that is necessary to represent a list without
infinite loops, and do no more than that?  For example:

In this discussion, circular-list is defined thus:
  (define (circular-list . args)
    (let ((u (apply list args)))
      (set-cdr! (last-pair u) u)
      u))

So, according to you, a correct implementation of "write" should be able to
do this:
- If a = '(1 2) and b = (circular-list a a), then b writes as #0=((1 2) (1
2) . #0#), not as #0=(#1=(1 2) #1# . #0#).  Because the two (1 2)
structures don't need to be shared.
- If a = (circular-list 1 2) and b = (list a a), then b writes as (#0=(1 2
. #0#) #1=(1 2 . #1#)), not as (#0=(1 2 . #0#) #0#).  Because the two
pointers to the object a in the list b don't actually need to be shared,
although they still require datum labels within themselves.
- If a = (list 1 (cdr b)) and b = (circular-list 3 a) [which could be
accomplished through further assignment], then... we might naively write b
as #0=(3 . #1=((1 #1#) . #0#)), but it would save a datum label to
duplicate the first cons cell, as in (3 . #0=((1 #0#) 3 . #0#)).  So maybe
the latter should be preferred (though I don't think you actually said
this).

It seems that figuring out a minimal level of sharing requires identifying
sub-portions of the graph of objects that don't directly point to each
other once we've given names to key nodes--it's close to making "write"
perform the whole "replacing irreducible cycles with structs" approach I
suggested.  Furthermore, if you want it to figure out the absolute minimum
number of datum labels required to print a structure, that's a nontrivial
problem, and I think is well beyond the scope of what anyone wants "write"
to do.

The obvious thing to do is just to indiscriminately use write-shared for
the whole structure once a single cycle has been found; this is what Racket
does, and I would be amazed if an implementation did anything else.

It seems our choices are:
- Specify exactly how "write" manages to minimize sharing in the presence
of cycles.  This seems to mean implementation complification, bad
performance, and little benefit.
- Specify that "write" must act like "write-shared" in the presence of
cycles.  This seems to be the easiest to implement and seems already done.
- Specify merely that "write" must produce something that, when read back
in, will be equal? to the results of "write-shared".  I imagine "read" will
remain simple (will not, for example, read "(1 1 1 . #0=(1 . #0#))" into a
single cons cell--basically, it assumes the output was made by
"write-shared").  Under the "infinite unfoldings" definition of equal?, it
would then be permissible for "write" to produce any amount of "(1 1 1 1 1
..." before realizing "Oh crap there's a cycle" and printing some datum
labels.  ... I can see only one advantage to this--a possible performance
improvement from being lazy about checking for cycles, as was mentioned at
the beginning of this email thread.  A disadvantage is that, as I mentioned
earlier, structures may get a lot bigger after being written and read back
in.  (And in general, anyone who uses cyclical structures will probably
mutate them further, and if they don't remain "isomorphic" after writing
and reading--meaning mutations will have different effects--that will
probably mystify and frustrate the user.)

I think the disadvantages of the third option are serious and intolerable.
I do think "write" will have to check for cycles at the beginning--or write
to a buffer, and if it grows suspicious of cycles (say, when a 4K buffer is
full) and discovers they exist, will have to throw the buffer away and
start again [actually you can just overwrite at the start--or write
directly into the destination port, since write-shared will not need to
backtrack, and leave the write-buffer for future use].  Maybe that would be
acceptable?  So "write" remains basically as is, and "equal?" remains as
is.  (Yeah, sorry, David Wheeler.  Maybe the write-buffer suggestion will
help.)

> Fortunately, `equal?` is not in any way encoded into Scheme, other than
> its use in defining `assoc` and `member` (and these now allow you to
> provide your own equality predicate).

That's good.  By the way, when hash-tables are specified, will it be
possible to make a hash-table with an arbitrary equality predicate?
--John Boyle
*Science is what we understand well enough to explain to a computer. Art is
everything else we do.* --Knuth

On Fri, Aug 31, 2012 at 8:21 AM, John Cowan <[email protected]> wrote:

> John Boyle scripsit:
>
> > ... I must say, I assumed that equal? did in fact mean
> > "isomorphic/one-to-one mapping", not "(possibly infinite) unfoldings".  I
> > was astonished to hear that it was the latter, and that #0=(x . #0#) was
> > equal? to #0=(x x . #0#), but Racket confirms it.  At first, I couldn't
> > imagine why it would be this way, but now I realize: the list ((x) (x))
> is
> > not isomorphic to the list (#0=(x) #0#), yet they are considered equal?.
>
> Informally, we can say that your first two expressions can both be
> characterized as "a list with an infinite number of x's" and are thus
> equal; the last two expressions are similarly "a list of two elements,
> each of which is a list containing x".
>
> > ... I am tempted to suggest, as you seem to do, that we take a hybrid
> > approach and have "equal?" degrade to "isomorphic?" when a cycle is
> > detected (the way "write" is probably going to degrade to "write-shared"
> > when a cycle is detected).
>
> That's a non-conformant implementation of R7RS `write`, which is required
> to use labels only in the presence of cycles, not in the presence of
> structure-sharing.
>
> > ...I still think "equal?, but degrades to isomorphic?" is the most useful
> > function for my purposes, but eh, whatever.
>
> Fortunately, `equal?` is not in any way encoded into Scheme, other than
> its use in defining `assoc` and `member` (and these now allow you to
> provide your own equality predicate).  It's a straightforward matter to
> roll a predicate that does exactly what your particular application needs.
>
> > Actually, come to think of it, there is one fact about equal? that is
> true
> > for all non-cyclic structures that isn't true under unfolding: if
> (equal? a
> > b), then not (equal? (cons <something> a) b).
>
> That's just a way of encoding the paradoxical fact that one plus infinity
> is still equal to infinity.  If you don't like that, avoid infinite
> structures or define your own equality.
>
> > Meanwhile, Will Clinger has a reference implementation of "unfolding",
> > Racket and probably others have full implementations, and R6RS
> > specifies it.
>
> Indeed, Chez, Vicare, Ypsilon, Mosh, and IronScheme all implement it
> correctly.  For whatever reason, Larceny returns #f at the REPL.
>
> > By the way, if the "(possibly infinite) unfoldings" definition is
> > considered confusing, I'll throw out some suggestions for a precise
> > definition:
>
> Ack.  I think your definitions are more confusing than the R6RS version.
> What I really don't understand is what "regular trees" means in the
> phrase "(possibly infinite) unfoldings are equal as regular trees."
> I know what regular tree grammars are, but what are regular trees?
>
>
> --
> Clear?  Huh!  Why a four-year-old child         John Cowan
> could understand this report.  Run out          [email protected]
> and find me a four-year-old child.  I           http://www.ccil.org/~cowan
> can't make head or tail out of it.
>         --Rufus T. Firefly on government reports
>

_______________________________________________
Scheme-reports mailing list
[email protected]
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

Re: [Scheme-reports] Cycle detection problems: #442, #338, "equal?"

Reply via email to