Re: MVar semantics: proposal

2006-04-11 Thread Jan-Willem Maessen
Sorry for the long delay in responding to this message---this issue  
takes all the brain cells I've got in one go.


Ordinarily I'd trim the forgoing discussion, but it was rusty enough  
that I've retained it:


On Apr 4, 2006, at 7:12 AM, Simon Marlow wrote:

Jan-Willem - thanks for your thoughts on this, it's greatly  
appreciated.


On 31 March 2006 18:49, Jan-Willem Maessen wrote:


John -

You are, in effect, proposing a memory model for MVars and IORefs.
The high-level model for programmers is "In order to communicate data
between threads, you *must* use an MVar, and never an IORef."

But the devil is in the details.  I'd like to strongly urge *against*
adopting the extremely loose model you have proposed.  The following
things seem particularly important:

* reads and writes to IORefs should be atomic, meaning either a
complete update is observed or no change is observed.  In the absence
of this guarantee, misuse of IORefs can cause programs to crash in
unrepeatable ways.  If the machine doesn't make this easy, the
implementor ought to sweat a little so that Haskell programmers don't
have to sweat at all.

* I assume forkIO constitutes a sequence point.  I suspect throwTo et
al ought to as well.

* I would urge that atomicModifyIORef constitute a sequence point---I
suspect it loses a great deal of its utility otherwise.

Now, on to more difficult issues...  Consider the following example
(untested):

data RefList a = Nil | Cons a (IORef (RefList a))

cons :: a -> RefList a -> IO (RefList a)
cons x xs = do
   a <- newIORef xs
   return (Cons x a)

hd :: RefList a -> a
hd (Cons a _) = a

tl :: RefList a -> IO (RefList a)
tl (Cons a t) = readIORef a

setTl :: RefList a -> RefList a -> IO ()
setTl (Cons a t) t' = writeIORef t t'

main = do a <- cons 'a' Nil
   forkIO $ do
 c <- cons 'c' Nil
 b <- cons 'b' Nil
setTl b c
 setTl a b
   at <- tl a
   case at of
 Nil -> return ()
 Cons _ _ -> do
  putChar (hd at)
   att <- tl at

This program is, by your informal model, buggy.  The question is
this: how badly wrong is it?
Let's say at happens to read b.  Is (hd at) well defined?  That's
assuming very strong consistency from the memory system already.  How
about the IORef in at?  Is that fully allocated, and properly
initialized?  Again, if it is, that implies some pretty strong
consistency from the memory system.

Now, what about att?  By your argument, it may or may not be c.  We
can ask the same questions about its contents assuming it happens to
be c.

People have talked a lot about weakly-ordered NUMA machines for more
than a decade, and they're always just a couple of years away.  In
practical terms, non-atomic NUMA memory models tend to be so hard to
program that these machines have never found any traction---you need
to throw away all of your software, including your OS, and start
afresh with programmers that are vastly more skilled than the ones
who wrote the stuff you've already got.

My feeling is that the purely-functional portion of the Haskell
language already makes pretty stringent demands of memory
consistency.


This is true - in GHC we are required to add a memory barrier to thunk
update on architectures that don't have strong memory ordering,  
just to

ensure that when you follow the pointer in an indirection you can
actually see the value at the end of the pointer.

Since x86 & x86_64 can implement strong memory ordering without
(seemingly) too much overhead, surely adding the barrier  
instruction for
other architectures shouldn't impose too much of a penalty, at  
least in

theory?


Interesting question.  The currently-popular architectures can get by  
without too many memory barriers, in large part by requiring stores  
to commit to memory in order; my belief is that SPARC TSO can get by  
with no memory barriers for thunk update/read, and that PowerPC  
requires a write barrier (and perhaps read barriers).


It remains to be seen whether multi-core pipelines will change this  
equation; there are reasons an architect might prefer to use a single  
store pipeline for multiple threads, satisfying loads from one thread  
from pending stores for another thread.  The practical upshot would  
be weaker memory models all around.


Sadly, x86 has a bad record of bungling synchronization operations,  
and clear documentation on the x86 memory model is conspicuous by its  
absence.



In light of those demands, and the fact that mutable
state is used in pretty tightly-controlled ways, it's worth
considering much stronger memory models than the one you propose.
I'd even go so far as to say "IORefs and IOArrays are sequentially
consistent".


Certainly possible; again on x86 & x86_64 it's a no-op, on other
architectures it means adding a barrier to writeIORef.  In GHC we're
already doing a write barrier (of the generational GC kind, not the
microprocessor kind) in writeIORef

RE: MVar semantics: proposal

2006-04-04 Thread Simon Marlow
Jan-Willem - thanks for your thoughts on this, it's greatly appreciated.

On 31 March 2006 18:49, Jan-Willem Maessen wrote:

> John -
> 
> You are, in effect, proposing a memory model for MVars and IORefs.
> The high-level model for programmers is "In order to communicate data
> between threads, you *must* use an MVar, and never an IORef."
> 
> But the devil is in the details.  I'd like to strongly urge *against*
> adopting the extremely loose model you have proposed.  The following
> things seem particularly important:
> 
> * reads and writes to IORefs should be atomic, meaning either a
> complete update is observed or no change is observed.  In the absence
> of this guarantee, misuse of IORefs can cause programs to crash in
> unrepeatable ways.  If the machine doesn't make this easy, the
> implementor ought to sweat a little so that Haskell programmers don't
> have to sweat at all.
> 
> * I assume forkIO constitutes a sequence point.  I suspect throwTo et
> al ought to as well.
> 
> * I would urge that atomicModifyIORef constitute a sequence point---I
> suspect it loses a great deal of its utility otherwise.
> 
> Now, on to more difficult issues...  Consider the following example
> (untested):
> 
> data RefList a = Nil | Cons a (IORef (RefList a))
> 
> cons :: a -> RefList a -> IO (RefList a)
> cons x xs = do
>a <- newIORef xs
>return (Cons x a)
> 
> hd :: RefList a -> a
> hd (Cons a _) = a
> 
> tl :: RefList a -> IO (RefList a)
> tl (Cons a t) = readIORef a
> 
> setTl :: RefList a -> RefList a -> IO ()
> setTl (Cons a t) t' = writeIORef t t'
> 
> main = do a <- cons 'a' Nil
>forkIO $ do
>  c <- cons 'c' Nil
>  b <- cons 'b' Nil
>   setTl b c
>  setTl a b
>at <- tl a
>case at of
>  Nil -> return ()
>  Cons _ _ -> do
> putChar (hd at)
>att <- tl at
> 
> This program is, by your informal model, buggy.  The question is
> this: how badly wrong is it?
> Let's say at happens to read b.  Is (hd at) well defined?  That's
> assuming very strong consistency from the memory system already.  How
> about the IORef in at?  Is that fully allocated, and properly
> initialized?  Again, if it is, that implies some pretty strong
> consistency from the memory system.
> 
> Now, what about att?  By your argument, it may or may not be c.  We
> can ask the same questions about its contents assuming it happens to
> be c.
> 
> People have talked a lot about weakly-ordered NUMA machines for more
> than a decade, and they're always just a couple of years away.  In
> practical terms, non-atomic NUMA memory models tend to be so hard to
> program that these machines have never found any traction---you need
> to throw away all of your software, including your OS, and start
> afresh with programmers that are vastly more skilled than the ones
> who wrote the stuff you've already got.
> 
> My feeling is that the purely-functional portion of the Haskell
> language already makes pretty stringent demands of memory
> consistency.

This is true - in GHC we are required to add a memory barrier to thunk
update on architectures that don't have strong memory ordering, just to
ensure that when you follow the pointer in an indirection you can
actually see the value at the end of the pointer.

Since x86 & x86_64 can implement strong memory ordering without
(seemingly) too much overhead, surely adding the barrier instruction for
other architectures shouldn't impose too much of a penalty, at least in
theory?

> In light of those demands, and the fact that mutable
> state is used in pretty tightly-controlled ways, it's worth
> considering much stronger memory models than the one you propose.
> I'd even go so far as to say "IORefs and IOArrays are sequentially
> consistent".

Certainly possible; again on x86 & x86_64 it's a no-op, on other
architectures it means adding a barrier to writeIORef.  In GHC we're
already doing a write barrier (of the generational GC kind, not the
microprocessor kind) in writeIORef anyway.

> The only argument against this behavior is their use in
> the internals of arrays, file I/O, the FFI, etc., etc. (though really
> it's all about IOUArrays in the latter cases) where we might
> conceivably pay a bundle in performance.
> 
> Another possibility is an algebraic model based on commuting IO
> actions.  That approach is a particular bias of mine, having tangled
> with these issues extensively in the past.  It'd go something like
>this: * Any data written to an IORef can safely be read by another
> thread; we cannot observe
>partially-written objects.
>* readIORef commutes with readIORef.
>* newIORef commutes with newIORef.
>* writeIORef and newIORef commute with writeIORef or readIORef to
> a different IORef.
>* Nothing commutes with readMVar, writeMVar, or atomicModifyIORef.
>* Nothing before a forkIO can be commuted to after forkIO.

Does this model mean anyt

Re: MVar semantics: proposal

2006-03-31 Thread Jan-Willem Maessen

John -

You are, in effect, proposing a memory model for MVars and IORefs.   
The high-level model for programmers is "In order to communicate data  
between threads, you *must* use an MVar, and never an IORef."


But the devil is in the details.  I'd like to strongly urge *against*  
adopting the extremely loose model you have proposed.  The following  
things seem particularly important:


* reads and writes to IORefs should be atomic, meaning either a  
complete update is observed or no change is observed.  In the absence  
of this guarantee, misuse of IORefs can cause programs to crash in  
unrepeatable ways.  If the machine doesn't make this easy, the  
implementor ought to sweat a little so that Haskell programmers don't  
have to sweat at all.


* I assume forkIO constitutes a sequence point.  I suspect throwTo et  
al ought to as well.


* I would urge that atomicModifyIORef constitute a sequence point---I  
suspect it loses a great deal of its utility otherwise.


Now, on to more difficult issues...  Consider the following example  
(untested):


data RefList a = Nil | Cons a (IORef (RefList a))

cons :: a -> RefList a -> IO (RefList a)
cons x xs = do
  a <- newIORef xs
  return (Cons x a)

hd :: RefList a -> a
hd (Cons a _) = a

tl :: RefList a -> IO (RefList a)
tl (Cons a t) = readIORef a

setTl :: RefList a -> RefList a -> IO ()
setTl (Cons a t) t' = writeIORef t t'

main = do a <- cons 'a' Nil
  forkIO $ do
c <- cons 'c' Nil
b <- cons 'b' Nil
setTl b c
setTl a b
  at <- tl a
  case at of
Nil -> return ()
Cons _ _ -> do
  putChar (hd at)
  att <- tl at

This program is, by your informal model, buggy.  The question is  
this: how badly wrong is it?
Let's say at happens to read b.  Is (hd at) well defined?  That's  
assuming very strong consistency from the memory system already.  How  
about the IORef in at?  Is that fully allocated, and properly  
initialized?  Again, if it is, that implies some pretty strong  
consistency from the memory system.


Now, what about att?  By your argument, it may or may not be c.  We  
can ask the same questions about its contents assuming it happens to  
be c.


People have talked a lot about weakly-ordered NUMA machines for more  
than a decade, and they're always just a couple of years away.  In  
practical terms, non-atomic NUMA memory models tend to be so hard to  
program that these machines have never found any traction---you need  
to throw away all of your software, including your OS, and start  
afresh with programmers that are vastly more skilled than the ones  
who wrote the stuff you've already got.


My feeling is that the purely-functional portion of the Haskell  
language already makes pretty stringent demands of memory  
consistency.  In light of those demands, and the fact that mutable  
state is used in pretty tightly-controlled ways, it's worth  
considering much stronger memory models than the one you propose.   
I'd even go so far as to say "IORefs and IOArrays are sequentially  
consistent".  The only argument against this behavior is their use in  
the internals of arrays, file I/O, the FFI, etc., etc. (though really  
it's all about IOUArrays in the latter cases) where we might  
conceivably pay a bundle in performance.


Another possibility is an algebraic model based on commuting IO  
actions.  That approach is a particular bias of mine, having tangled  
with these issues extensively in the past.  It'd go something like this:
  * Any data written to an IORef can safely be read by another  
thread; we cannot observe

  partially-written objects.
  * readIORef commutes with readIORef.
  * newIORef commutes with newIORef.
  * writeIORef and newIORef commute with writeIORef or readIORef to  
a different IORef.

  * Nothing commutes with readMVar, writeMVar, or atomicModifyIORef.
  * Nothing before a forkIO can be commuted to after forkIO.

I think it's a Good Idea to choose a model that is conceptually  
simple now, at the cost of imposing a few constraints on  
implementors, rather than a complex specification which permits  
maximum implementation flexibility but is utterly opaque.   
Realistically, the machines which are likely to be built will make it  
easy to comply with a strong specification.


-Jan-Willem Maessen
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: MVar semantics: proposal

2006-03-31 Thread John Meacham
On Fri, Mar 31, 2006 at 01:43:15PM +0100, Simon Marlow wrote:
> > We should drop atomicModifyIORef since we have MVars, for
> > architectures 
> > with only a test and set instruction and no atomic exchange,
> > supporting atomicModifyIORef would entail the same overhead as MVars.
> 
> Slightly less overhead than an MVar, because you only need one
> lock/release to implement atomicModifyIORef, but two lock/release
> combinations are involved in an update of an MVar.

hmm.. is atomicModifyIORef meant to be atomic with respect to all other
IORef calls or _just_ other atomicModifyIORef calls? because if the
second then that is a whole lot easier to implement and I could be on
board with that. :)

> 
> > atomicModifyIORef also cannot (easily) be implemented on
> > implementations 
> > that use update-in-place rather than indirections for thunk updates.
> 
> I don't follow you - how would that make it harder?

for instance in a TIM implementation (do any exist?) you have both a
code pointer and a frame address to represent a value, most arches don't
have an atomic way to set two memory locations at once. However, these
implementations perhaps could use a single indirection just for
implementing IORefs... jhc would be in this boat as it modifies values
by rewriting nodes, not by swizzling pointers, but I can make IORefs go
through an indirection if needed.

John

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


RE: MVar semantics: proposal

2006-03-31 Thread Simon Marlow
Sequence points... yes, all seems reasonable to me.

On 31 March 2006 00:50, John Meacham wrote:

> We should drop atomicModifyIORef since we have MVars, for
> architectures 
> with only a test and set instruction and no atomic exchange,
> supporting atomicModifyIORef would entail the same overhead as MVars.

Slightly less overhead than an MVar, because you only need one
lock/release to implement atomicModifyIORef, but two lock/release
combinations are involved in an update of an MVar.

atomicModifyIORef would be a sequence point, BTW.  Semantically, think
of it as having a hidden MVar attached to the IORef:

  withMVar m $ \_ -> do
x <- readIORef  r
let (x',y) = f x
writeIORef r x'
return y

as long as you have some way to enforce exclusion with respect to other
atomicModifyIORef operations on a given IORef, you can implement it like
this.

> atomicModifyIORef also cannot (easily) be implemented on
> implementations 
> that use update-in-place rather than indirections for thunk updates.

I don't follow you - how would that make it harder?

Cheers,
Simon
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: MVar semantics: proposal

2006-03-31 Thread John Meacham
Of course, let me know if I am just being overly paranoid about crazy
future NUMA machines. We can just say full read-write memory barrier on
every putMVar, takeMVar and leave it at that. :)

John


-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime