Re: the new gc asserts in master

2008-08-28 Thread Ludovic Courtès
Han-Wen Nienhuys [EMAIL PROTECTED] writes:

 Ludovic Courtès escreveu:

 I'm still in favor of git revert since the log message makes it clear
 which patch was reverted and why.  We can then take our time and work
 out a proper fix, and finally re-merge the patch plus its fix.
 Furthermore, in the eventuality where none of us eventually finds a fix,
 `master' is left in the previous state, which is better IMO.

 'master' in its previous states grows the heap to 600M doing the 1000-fold
 version of srfi-18 test I posted. I think it's not a good solution.

 Commenting out the assert for x86-64 should yield better behavior.

Alright, then please go ahead.

Thanks,
Ludo'.





Re: the new gc asserts in master

2008-08-28 Thread Han-Wen Nienhuys
Han-Wen Nienhuys escreveu:
 The use of scm_gc_mark() outside of GC is fundamentally broken, since it
 creates race conditions in the presence of threads.
 I was not aware that this was the case. 

 My impression was that the mark phase is global; it requires all threads
 that were in guile mode to go dormant, and those that were not in guile
 mode cannot enter guile mode until the mark is complete.
 
 Yes, the mark phase is global, but the thread locking is done in
 scm_i_gc; once the marking starts, there is only one thread.  Since
 scm_gc_mark is called from the smob mark functions, it does not force
 other threads to go dormant.  It could, but I suspect the lock would
 be a contention point.

It would be very cool to have thread safe marking for a different reason:
marking it is the expensive step in GC, so if we can do that in N threads 
concurrently 
(on a SMP machine) we have can speed it up by almost a factor N.

To do it properly, you could do the bitvector marking with 
a compare  swap instruction.

-- 
 Han-Wen Nienhuys - [EMAIL PROTECTED] - http://www.xs4all.nl/~hanwen





Re: the new gc asserts in master

2008-08-28 Thread Ludovic Courtès
Hello,

Thanks for the nice summary!

Andy Wingo [EMAIL PROTECTED] writes:

 But what if it goes like this:

   S becomes collectable in theory

   mark phase: S is indeed marked as collectable

   C is returned from a callback: get_ptr() return S

   at some later time the card containing S is swept; S's free function
   is run, and S is marked as a free cell

   at some later point maybe S gets reused for some other purpose

   however S was already alive in scheme, and we are using it as a smob!

 The point is:

   You cannot do C-Scheme mapping reliably in the presence of lazy
   sweeping, because there is a time in which the object is marked as
   sweepable but not swept, but the C-Scheme code has no way of knowing
   this.

 (While talking with Ludovic we realized that his code has this problem.)

The code I had in mind is GnuTLS [0], but it's a slightly more specific
scenario and I now don't think the problem applies there.

In GnuTLS, there are session objects; session objects can have a
Scheme port attached to them, and the `session-record-port' procedure
returns that port.  What we want is:

  (eq? (session-record-port s) (session-record-port s))

IOW, there must be a mapping from the session to the port so that we
don't create a new port each time `session-record-port' is called.

Here's how it's achieved.  Let `s' be the C session object, `S' the
corresponding SMOB, and `P' the port (an `SCM').  We have the following
object graph:

 SCM_STREAM(P)
  S  P
  |   ^
  |SCM_SMOB_DATA(S)   |
  |   |
  |  _'
  | /  gnutls_session_get_ptr(s)
  v/
  s

The mark procedure of `S' marks `P'.  Thus, as long as `S' is reachable,
`P' is reachable.  In addition, as long as `P' is reachable, `S' is
reachable.

The important difference with the generalized scheme you described is
that when `S' becomes unreferenced by Scheme code, there's no way `s'
can suddenly reappear at the Scheme level because GnuTLS doesn't have
any function that would return `s'.  Thus, the race condition you
described cannot happen.

The key insight here is that `S' and `s' aggregate `P', i.e., the
lifetime of `S' and `s' is always greater than or equal to that of `P'.


I presume that this scheme is applicable to many (most?) object-oriented
APIs.  It's actually what lead to the inception of the `aggregated'
typespec in G-Wrap [1].

Now, I haven't considered call-backs.  But maybe a call-back can be seen
as a procedure that's aggregated by some object; in turn, the procedure
refer to other objects in its environment, such that the lifetime of the
objects involved is similarly hierarchical.

Sorry for the digression but I think it's important to know whether
Guile's API intrinsically makes it hard to handle such common use cases.

Thanks,
Ludo'.

[0] 
http://git.savannah.gnu.org/gitweb/?p=gnutls.git;a=blob;f=guile/src/core.c#l42
[1] http://www.nongnu.org/g-wrap/manual/Wrapping-a-C-Function.html





RFD: please drop ChangeLog updates

2008-08-28 Thread Han-Wen Nienhuys
Reasons:

* Much more detailed and inherently correct information can be gotten from 

  git log -- libguile/

  git log -- test-suite/

etc.  

* The ChangeLog duplicates the git log information if done correctly.  Hence 
it requires double work for the committer.

* Since updates to the ChangeLog always happen at the top, they virtually
always conflict on pulls and cherry-picks.  This makes it impossible to
use the power of git.  For example, rebase is the standard git approach 
to creating linear history of changes.  This is apparently something 
the GUILE devs think is important, but the changes to ChangeLog ensure
that every cherry pick will need manual conflict resolution. 


-- 
 Han-Wen Nienhuys - [EMAIL PROTECTED] - http://www.xs4all.nl/~hanwen





Re: RFD: please drop ChangeLog updates

2008-08-28 Thread dsmich
 Han-Wen Nienhuys [EMAIL PROTECTED] wrote: 
 Reasons:
 
 * Much more detailed and inherently correct information can be gotten from 
 
   git log -- libguile/
 
   git log -- test-suite/
 
 etc.  
 
 * The ChangeLog duplicates the git log information if done correctly.  Hence 
 it requires double work for the committer.

Guile is distributed as a tarball, not a git repo.  Does it make sense to 
create the ChangeLog from the git log at make dist time?

-Dale