Re: [Scheme-reports] Write procedure is not backwards compatible

John Boyle Mon, 02 Jul 2012 04:20:07 -0700

> We definitely cannot use the SRFI-38 semantics, which
> provide no equivalent to write-simple (i.e. no procedure
> which guarantees to be fast).

I see a use case, where a user generates code with shared structure and
"write"s it to a file, then expects to be able to read that code back in
and execute it.

(define (make-plus-expression arglist)
  `(lambda ,arglist (+ ,@arglist)))

Natural implementations of "write" that handle cycles would probably just
dumbly handle all shared structure the same way.  It would probably get
written like this:

(write (make-plus-expression '(x y)))
=> (lambda #0=(x y) (+ . #0#))

But I have the impression that this would not be legal code even in R7RS,
let alone any previous Scheme versions.  R7RS draft 6 says: "It is an error
for a ⟨program⟩ or ⟨library⟩ to include circular references except in
literals"; this does not specifically allow or disallow shared structure
that isn't circular.  Racket, for one, mercilessly rejects the above
lambda-expression (or any #n= syntax) in a program, although it can be read
in with "read", in which case it prints it back as simply '(lambda (x y) (+
x y)).

Now, either you could specify (and implementers would implement) that
shared structure in code must be accepted; or the user would need to
"write" code without the shared-structure syntax.  I suspect the first
option would be an annoying pain to put into a "small language"
implementation, especially because they'd probably want to disallow
circular code (Shivers's "circularize" notwithstanding, see
http://www.ccs.neu.edu/home/shivers/newstyle.html ), and detecting cycles
adds more complexity (but is doable, as I shall describe soon).

Therefore, let us consider the second option, of "write"ing code without
the #n= syntax.  Obviously write-simple would accomplish this--but it would
loop infinitely if passed a circular structure.  The user might never
generate circular objects, but then he might encounter them occasionally
(especially with other-user-submitted content).  It seems like a bad kind
of bug, where you don't expect anything to go wrong, and it almost never
does--but when it does, it manifests as an infinite loop.  It would be nice
if the language had rock-solid primitives that just handled this stuff
safely for you...

I submit that it would be useful to have an "attempt-to-write-simply"
procedure (name is up for grabs), which would work like write-simple except
that it would detect cycles and would throw an error if it found any.
(This could even displace "write-simple", which might survive merely as an
unsafe operation used for efficiency reasons.)  You give the criterion
"guarantees to be fast".  Yes, that is desired... Well, here's what I've
come up with:

Given a structure with a total of n objects (anything that contains no
pointers can be considered an indivisible object for our purposes here),
which have m pointers to each other (note m ≥ n-1), and given some hash
table-like thing with O(log n) insertion and lookup, you can determine
whether there is a cycle in O(n log n + m) steps and O(n) total size of
intermediate objects (which all become garbage).

The algorithm I have in mind is simply to construct a reference-counted
copy of the graph of objects, then destroy the root node and perform a
reference-count garbage collection.  If any objects (or their
representations in the graph) are not destroyed by the end of this process,
then there is a cycle and we throw an error; otherwise, there is no cycle
and it is safe to use write-simple.

- Trace the graph of interesting (possibly pointer-containing) objects and
collect them all in the hash table.  O(n log n + m) time, O(n) space here;
this stuff is discussed in SRFI 38.
- Associate every object with a reference count, initially zero.  This
could be accomplished by replacing each x in the table with (cons 0 x).
O(n) time and space.
- For each object (iterate over the table), take each pointer in it and
increment the reference count of what it points to.  O(m) time.
- "Destroy" the root node: for each pointer in it, decrement the reference
count of what it points to, and if the refcount of the child object reaches
zero, push it onto a stack of objects to destroy.  (In fact, you shouldn't
just destroy the root object; check if its refcount is zero, and if not,
then you've found a cycle already.)  Iterate until the stack is empty.
O(m) time, potentially O(n) stack space.
- Look through the table and see if you find an object whose refcount isn't
zero.  (Or, in the previous step, maintain a count of how many objects you
destroyed, and see if that's less than the number of entries in the
table.)  O(n) (or O(1)) time.

This requires something like a hash table, as SRFI 38 describes.  I'm just
assuming O(log n) lookup and insertion, and O(n) total space, to store n
key-value pairs.  Can you require implementations to make good hash
tables?  Or maybe you describe this as an optional feature?

Performance: This process requires tracing all pointers in the objects a
total of three times each--3m.  One could combine the first three
steps--making reference counts as you found each object--to reduce this to
two times, so 2m.  It generates a hash table full of n objects, n cons
cells, and up to n stack space.  And probably once or twice it'll spend "n
log n" time in table creation, and two or three times it'll do n lookups in
the possibly log n table.

I think it's impossible to do much better than this in general.

So, how are we doing?  Is this cost acceptable?  Well, "write-simple" must
follow every pointer (hence O(m ≥ n-1) time, versus my O(n log n +
m)--seems comparable except maybe for gigantic structures), and in most
cases, the structure to be printed already resides in memory and takes up
O(n) space (versus my *additional* O(n) space).  The cost of cycle
detection would only be unacceptable if, say, you were trying to write a
structure that was bigger than, say, a third of all remaining memory.
Programs that specialize enough that a constant factor of 3 or 4 (in space;
I'd guess 10-100 in time) matters, or that write structures that are stored
in memory-mapped files on the disk (or swap space), can use "write-simple"
and worry about cycles themselves.

Any other objections?  A nice thing about "write-simple" is that it can
probably be made to generate zero garbage--just grow and shrink the program
stack, and write characters into a buffer (possibly specially handled).
This tends to mean fewer GC pauses and more deterministic performance.  But
that is something that should be solved through other means: real-time
garbage collection methods.  (I will take the opportunity to mention Henry
Baker's early work: http://www.pipeline.com/~hbaker1/RealTimeGC.html )  The
last thing that occurs to me is that "write-simple" can start printing
characters immediately, while this "attempt-to-write-simply" procedure
would do a lot of work before it printed anything.  Again, a user who
writes big enough objects that the time difference is noticeable, in an
environment where the difference is important, will probably work to
understand why his "write" is slow and learn about cycles and
"write-simple" and handle them himself.

I think difficulty of (internally) implementing eq hash tables or an
equivalent is the only real problem here.  Others know better than I about
that.  So I'll end here, with a pointer back to my "I submit that"
paragraph.
--John Boyle
*Science is what we understand well enough to explain to a computer. Art is
everything else we do.* --Knuth

On Sun, Jul 1, 2012 at 3:04 PM, Alex Shinn <[email protected]> wrote:

> On Mon, Jul 2, 2012 at 6:26 AM, Jonathan Rees <[email protected]> wrote:
> > Ouch! If true, I second this comment. Backward compatibility for write
> is pretty important. Consider the case of running an R5RS (or R7RS) program
> in an R7RS implementation to generate a file that will be consumed by an
> R5RS implementation.
> >
> > There is no mention of this incompatibility in the section "Language
> changes since R5RS".
>
> Ticket #442 added to vote on this in the next ballot.
>
> Requiring the output of R7RS programs to be readable
> by R5RS programs is forwards compatibility, not backwards,
> and is unachievable the moment you make any changes
> to the lexical syntax (e.g. with bytevector literals).
>
> You can make the case that passing simple lists of
> simple data-structures should be compatible between
> any Scheme-like system.  The WG will discuss and
> vote on this.  The community is encouraged to provide
> additional feedback and examples.
>
> We definitely cannot use the SRFI-38 semantics, which
> provide no equivalent to write-simple (i.e. no procedure
> which guarantees to be fast).
>
> --
> Alex
>
> >
> > Jonathan
> >
> > On Jul 1, 2012, at 12:36 AM, Marc Feeley wrote:
> >
> >> Formal Comment
> >>
> >> Submitter's name: Marc Feeley
> >> Submitter's email: feeley at iro.umontreal.ca
> >> Relevant draft: r7rs draft 6
> >>
> >> Type: defect
> >> Priority: major
> >> Relevant section of draft: 6.13.3. Output
> >>
> >> Summary: Write procedure is not backwards compatible
> >>
> >> R7RS introduces a new output procedure called write-simple, which has
> >> the same semantics as the R5RS write procedure.  On the other hand,
> >> the R7RS write procedure handles shared structures differently than
> >> the R5RS.  For example :
> >>
> >>   (let ((x (list 1 2))) (write (list x x)))
> >>
> >>       displays ((1 2) (1 2)) in an R5RS system
> >>   and displays (#0=(1 2) #0#) in an R7RS system
> >>
> >> To preserve backwards compatibility, it is the version of the write
> >> procedure which uses datum labels which should have a different name.
> >> In fact SRFI-38 has specified the name write-with-shared-structure for
> >> this output procedure.  This name should be maintained since it has
> >> been implemented with that name in some Scheme systems.
> >>
> >>
> >> _______________________________________________
> >> Scheme-reports mailing list
> >> [email protected]
> >> http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports
> >
> >
> > _______________________________________________
> > Scheme-reports mailing list
> > [email protected]
> > http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports
>
> _______________________________________________
> Scheme-reports mailing list
> [email protected]
> http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports
>

_______________________________________________
Scheme-reports mailing list
[email protected]
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

Re: [Scheme-reports] Write procedure is not backwards compatible

Reply via email to