RE: Lazy ST vs concurrency

Simon Peyton Jones via ghc-devs Wed, 01 Feb 2017 01:50:43 -0800

But what if GHC decided to add another thunk, e.g.

Ugh. That is even more horrible.  Clearly I’m not very good at thinking about 
this stuff.  It smells wrong; we should find a simple, composable primitive 
that does the right thing.


I’d like to leave some breadcrumbs of this thread, at very least pointing to it 
so we don’t repeat the thinking later.

Meanwhile, for the present, what are we do to about LazyST?

Simon

From: Simon Marlow [mailto:marlo...@gmail.com]
Sent: 01 February 2017 09:02
To: Simon Peyton Jones <simo...@microsoft.com>
Cc: David Feuer <da...@well-typed.com>; ghc-devs@haskell.org
Subject: Re: Lazy ST vs concurrency

On 31 January 2017 at 10:02, Simon Peyton Jones 
<simo...@microsoft.com<mailto:simo...@microsoft.com>> wrote:
Huh. You are right.  That’s horrible.

Horrible indeed!


OK, here’s another idea.  Provide,
            applyOnce# :: (a->b) -> a -> b

which behaves like
            applyOnce f x = f x

but guarantees that any thunk  (applyOnce# f x) will be evaluated with atomic 
eager black-holing.

\(s :: State# s) ->
   let unsafePerformIO = \g -> g s
        thunk = applyOnce# unsafePerformIO (\s -> ... )
   in
      ...

But what if GHC decided to add another thunk, e.g.

let
  thunk =
    let x = applyOnce# unsafePerformIO (\s -> ...)
    in x

now we need atomicity on both thunks, but there's no way to tell. (of course 
GHC probably wouldn't do this particularly transformation, but there are a 
whole range of other things that it might correctly do that would subvert the 
programmer's intention to make a single atomic thunk).

noDuplicate# avoids this problem because it walks the whole stack, claiming the 
whole chain of thunks currently under evaluation.  But this is a messy 
solution, I don't like it either.

Cheers
Simon


Of course this does not guarantee safety.  But I think it’d give a per-thunk 
way to specify it.

Simon

From: Simon Marlow [mailto:marlo...@gmail.com<mailto:marlo...@gmail.com>]
Sent: 31 January 2017 09:25

To: Simon Peyton Jones <simo...@microsoft.com<mailto:simo...@microsoft.com>>
Cc: David Feuer <da...@well-typed.com<mailto:da...@well-typed.com>>; 
ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>
Subject: Re: Lazy ST vs concurrency

On 31 January 2017 at 09:11, Simon Peyton Jones 
<simo...@microsoft.com<mailto:simo...@microsoft.com>> wrote:
If we could identify exactly the thunks we wanted to be atomic, then yes, that 
would be better than a whole-module solution.  However I'm not sure how to do 
that - doing it on the basis of a free variable with State# type doesn't work 
if the State# is buried in a data structure or a function closure, for instance.

I disagree.  Having a free State# variable is precisely necessary and 
sufficient, I claim.  Can you provide a counter-example?

Sure, what I had in mind is something like this, defining a local 
unsafePerformIO:

\(s :: State# s) ->
   let unsafePerformIO = \g -> g s
        thunk = unsafePerformIO (\s -> ... )
   in
      ...

and "thunk" doesn't have a free variable of type State#.

Cheers
Simon


Informal proof:

•         The model is that a value of type (State# t) is a linear value that 
we mutate in-place.  We must not consume it twice.

•         Evaluating a thunk that has a free (State# t) variable is precisely 
“consuming” it.  So we should only do that once


I think -fatomic-eager-blackholing might "fix" it with less overhead, though

But precisely where would you have to use that flag?  Inlining could meant that 
the code appears anywhere!  Once we have the ability to atomically-blackhole a 
thunk, we can just use my criterion above, I claim.

Stopgap story for 8.2.   I am far from convinced that putting unsafePerformIO 
in the impl of (>>=) for the ST monad will be correct; but if you tell me it 
is, and if it is surrounded with huge banners saying that this is the wrong 
solution, and pointing to a new ticket to fix it, then OK.

Arguably this isn't all that urgent, given that it's been broken for 8 years or 
so.


Simon

From: Simon Marlow [mailto:marlo...@gmail.com<mailto:marlo...@gmail.com>]
Sent: 31 January 2017 08:59
To: Simon Peyton Jones <simo...@microsoft.com<mailto:simo...@microsoft.com>>
Cc: David Feuer <da...@well-typed.com<mailto:da...@well-typed.com>>; 
ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>

Subject: Re: Lazy ST vs concurrency

On 30 January 2017 at 22:56, Simon Peyton Jones 
<simo...@microsoft.com<mailto:simo...@microsoft.com>> wrote:
We don’t want to do this on a per-module basis do we, as 
-fatomic-eager-blackholing would suggest.  Rather, on per-thunk basis, no?  
Which thunks, precisely?   I think perhaps precisely thunks one of whose free 
variables has type (Sttate# s) for some s.  These are thunks that consume a 
state token, and must do so no more than once.

If we could identify exactly the thunks we wanted to be atomic, then yes, that 
would be better than a whole-module solution.  However I'm not sure how to do 
that - doing it on the basis of a free variable with State# type doesn't work 
if the State# is buried in a data structure or a function closure, for instance.

If entering such thunks was atomic, could we kill off noDuplicate#?

I still don’t understand exactly what noDuplicate# does, what problem it 
solves, and how the problem it solves relates to this LazyST problem.

Back in our "Haskell on a Shared Memory Multiprocessor" paper 
(http://simonmar.github.io/bib/papers/multiproc.pdf<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsimonmar.github.io%2Fbib%2Fpapers%2Fmultiproc.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7C49b93aee78394d54fcab08d449b76706%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636214499439419212&sdata=81aU2TCVDxdNFl7CIHd8GxWUUdmUn%2FdRO4bOi2ScpVw%3D&reserved=0>)
 we described a scheme to try to avoid duplication of work when multiple cores 
evaluate the same thunk.  This is normally applied lazily, because it involves 
walking the stack and atomically black-holing thunks pointed to by update 
frames.  The noDuplicate# primop just invokes the stack walk immediately; the 
idea is to try to prevent multiple threads from evaluating a thunk containing 
unsafePerformIO.

It's expensive.  It's also not foolproof, because if you already happened to 
create two copies of the unsafePerformIO thunk then noDuplicate# can't help. 
I've never really liked it for these reasons, but I don't know a better way.  
We have unsafeDupablePerformIO that doesn't call noDuplicate#, and the 
programmer can use when the unsafePerformIO can safely be executed multiple 
times.


We need some kind of fix for 8.2.  Simon what do you suggest?

David's current fix would be OK (along with a clear notice in the release notes 
etc. to note that the implementation got slower).  I think 
-fatomic-eager-blackholing might "fix" it with less overhead, though.

Ben's suggestion:

> eagerlyBlackhole :: a -> a

is likely to be unreliable I think.  We lack the control in the source language 
to tie it to a particular thunk.

Cheers
Simon


Simon

From: Simon Marlow [mailto:marlo...@gmail.com<mailto:marlo...@gmail.com>]
Sent: 30 January 2017 21:51
To: David Feuer <da...@well-typed.com<mailto:da...@well-typed.com>>
Cc: Simon Peyton Jones <simo...@microsoft.com<mailto:simo...@microsoft.com>>; 
ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>
Subject: Re: Lazy ST vs concurrency

On 30 January 2017 at 16:18, David Feuer 
<da...@well-typed.com<mailto:da...@well-typed.com>> wrote:
I forgot to CC ghc-devs the first time, so here's another copy.

I was working on #11760 this weekend, which has to do with concurrency
breaking lazy ST. I came up with what I thought was a pretty decent solution (
https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is quite
unhappy about the idea of sticking this weird unsafePerformIO-like code
(noDup, which I originally implemented as (unsafePerformIO . evaluate), but
which he finds ugly regardless of the details) into fmap and (>>=).  He's also
concerned that the noDuplicate# applications will kill performance in the
multi-threaded case, and suggests he would rather leave lazy ST broken, or
even remove it altogether, than use a fix that will make it slow sometimes,
particularly since there haven't been a lot of reports of problems in the
wild.

In a nutshell, I think we have to fix this despite the cost - the 
implementation is incorrect and unsafe.

Unfortunately the mechanisms we have right now to fix it aren't ideal - 
noDuplicate# is a bigger hammer than we need.  All we really need is some way 
to make a thunk atomic, it would require some special entry code to the thunk 
which did atomic eager-blackholing.  Hmm, now that I think about it, perhaps we 
could just have a flag, -fatomic-eager-blackholing.  We already do this for 
CAFs, incidentally. The idea is to compare-and-swap the blackhole info pointer 
into the thunk, and if we didn't win the race, just re-enter the thunk (which 
is now a blackhole).  We already have the cmpxchg MachOp, so It shouldn't be 
more than a few lines in the code generator to implement it.  It would be too 
expensive to do by default, but doing it just for Control.Monad.ST.Lazy should 
be ok and would fix the unsafety.

(I haven't really thought this through, just an idea off the top of my head, so 
there could well be something I'm overlooking here...)

Cheers
Simon


My view is that leaving it broken, even if it only causes trouble
occasionally, is simply not an option. If users can't rely on it to always
give correct answers, then it's effectively useless. And for the sake of
backwards compatibility, I think it's a lot better to keep it around, even if
it runs slowly multithreaded, than to remove it altogether.

Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST has
always been a bit of deep magic. You can't *really* carry a moment of time
around in your pocket and make its history happen only if necessary. We can
make it work in GHC because its execution model is entirely based around graph
reduction, so evaluation is capable of driving execution. Whereas lazy IO is
extremely tricky because it causes effects observable in the real world, lazy
ST is only *moderately* tricky, causing effects that we have to make sure
don't lead to weird interactions between threads. I don't think it's terribly
surprising that it needs to do a few more weird things to work properly.

David

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

RE: Lazy ST vs concurrency

Reply via email to