Thoughts on async RTS API?

2021-12-14 Thread Cheng Shao
Hi devs,

To invoke Haskell computation in C, we need to call one of rts_eval*
functions, which enters the scheduler loop, and returns only when the
specified Haskell thread is finished or killed. We'd like to enhance
the scheduler and add async variants of the rts_eval* functions, which
take C callbacks to consume the Haskell thread result, kick off the
scheduler loop, and the loop is allowed to exit when the Haskell
thread is blocked. Sync variants of RTS API will continue to work with
unchanged behavior.

The main intended use case is async foreign calls for the WebAssembly
target. When an async foreign call is made, the Haskell thread will
block on an MVar to be fulfilled with the call result. But the
scheduler will eventually fail to find work due to empty run queue and
exit with error! We need a way to gracefully exit the scheduler, so
the RTS API caller can process the async foreign call, fulfill that
MVar and resume Haskell computation later.

Question I: does the idea of adding async RTS API sound acceptable by
GHC HQ? To be honest, it's not impossible to workaround lack of async
RTS API: reuse the awaitEvent() logic in non-threaded RTS, pretend
each async foreign call reads from a file descriptor and can be
handled by the POSIX select() function in awaitEvent(). But it'd
surely be nice to avoid such hacks and do things the principled way.

Question II: how to modify the scheduler loop to implement this
feature? Straightforward answer seems to be: check some RTS API
non-blocking flag, if present, allow early exit due to empty run
queue.

Thanks a lot for reading this, I appreciate any suggestions or
questions :)

Best regards,
Cheng
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Thoughts on async RTS API?

2021-12-15 Thread Ben Gamari
Cheng Shao  writes:

> Hi devs,
>
> To invoke Haskell computation in C, we need to call one of rts_eval*
> functions, which enters the scheduler loop, and returns only when the
> specified Haskell thread is finished or killed. We'd like to enhance
> the scheduler and add async variants of the rts_eval* functions, which
> take C callbacks to consume the Haskell thread result, kick off the
> scheduler loop, and the loop is allowed to exit when the Haskell
> thread is blocked. Sync variants of RTS API will continue to work with
> unchanged behavior.
>
> The main intended use case is async foreign calls for the WebAssembly
> target. When an async foreign call is made, the Haskell thread will
> block on an MVar to be fulfilled with the call result. But the
> scheduler will eventually fail to find work due to empty run queue and
> exit with error! We need a way to gracefully exit the scheduler, so
> the RTS API caller can process the async foreign call, fulfill that
> MVar and resume Haskell computation later.
>
> Question I: does the idea of adding async RTS API sound acceptable by
> GHC HQ? To be honest, it's not impossible to workaround lack of async
> RTS API: reuse the awaitEvent() logic in non-threaded RTS, pretend
> each async foreign call reads from a file descriptor and can be
> handled by the POSIX select() function in awaitEvent(). But it'd
> surely be nice to avoid such hacks and do things the principled way.
>
While the idea here sounds reasonable, I'm not sure I quite understand
how this will be used in Asterius's case. Specifically, I would be
worried about the lack of fairness in this scheme: no progress will be
made on any foreign call until all Haskell evaluation has blocked.
Is this really the semantics that you want?

> Question II: how to modify the scheduler loop to implement this
> feature? Straightforward answer seems to be: check some RTS API
> non-blocking flag, if present, allow early exit due to empty run
> queue.
>
`schedule` is already a very large function with loops, gotos,
mutability, and quite complex control flow. I would be reluctant
to add to this complexity without first carrying out some
simplification. Instead of adding yet another bail-out case to the loop,
I would probably rather try to extract the loop body into a new
function. That is, currently `schedule` is of the form:

// Perform work until we are asked to shut down.
Capability *schedule (Capability *initialCapability, Task *task) {
Capability *cap = initialCapability;
while (1) {
scheduleYield(&cap, task);

if (emptyRunQueue(cap)) {
continue;
}

if (shutting_down) {
return cap;
}

StgTSO *t = popRunQueue(cap);

if (! t.can_run_on_capability(cap)) {
// Push back on the run queue and loop around again to
// yield the capability to the appropriate task
pushOnRunQueue(cap, t);
continue;
}

runMutator(t);

if (needs_gc) {
scheduleDoGC();
}
}
}

I might rather extract this into something like:

enum ScheduleResult {
NoWork,  // There was no work to do
PerformedWork,   // Ran precisely one thread
Yield,   // The next thread scheduled to run cannot run on the
 // given capability; yield.
ShuttingDown,// We were asked to shut down
}

// Schedule at most one thread once
ScheduleResult scheduleOnce (Capability **cap, Task *task) {
if (emptyRunQueue(cap)) {
return NoWork;
}

if (shutting_down) {
return ShuttingDown;
}

StgTSO *t = popRunQueue(cap);

if (! t.can_run_on_capability(cap)) {
pushOnRunQueue(cap, t);
return Yield;
}

runMutator(t);

if (needs_gc) {
scheduleDoGC();
}

return PerformedWork;
}

This is just a sketch but I hope it's clear that with something like
this this you can easily implement the existing `schedule` function, as
well as your asynchronous variant. 

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Thoughts on async RTS API?

2021-12-15 Thread Cheng Shao
> While the idea here sounds reasonable, I'm not sure I quite understand
> how this will be used in Asterius's case. Specifically, I would be
> worried about the lack of fairness in this scheme: no progress will be
> made on any foreign call until all Haskell evaluation has blocked.
> Is this really the semantics that you want?

Asterius runtime scheduler divides work into individual "tick"s. Each
tick does some work, much like one single iteration in the while(1)
scheduler loop. Ticks are not synchronously invoked by previous ticks,
instead they are started asynchronously and placed inside the host
event loop, fully interleaved with other host events. This way,
Haskell concurrency works with host concurrency without requiring host
multi-threading.

It's possible to wait for run queue to be emptied, then process all
blocking foreign calls in one batch, similar to awaitEvent() logic in
non-threaded RTS. It's also possible to exit scheduler and resume it
many more times, similar to current Asterius scheduler. Both semantics
can be implemented, to guarantee fairness, the latter sounds more
preferrable. The key issue is finding a way to break up the current
while(1) loop in schedule() in a principled way.

> `schedule` is already a very large function with loops, gotos,
> mutability, and quite complex control flow. I would be reluctant
> to add to this complexity without first carrying out some
> simplification. Instead of adding yet another bail-out case to the loop,
> I would probably rather try to extract the loop body into a new
> function. That is, currently `schedule` is of the form:
>
> // Perform work until we are asked to shut down.
> Capability *schedule (Capability *initialCapability, Task *task) {
> Capability *cap = initialCapability;
> while (1) {
> scheduleYield(&cap, task);
>
> if (emptyRunQueue(cap)) {
> continue;
> }
>
> if (shutting_down) {
> return cap;
> }
>
> StgTSO *t = popRunQueue(cap);
>
> if (! t.can_run_on_capability(cap)) {
> // Push back on the run queue and loop around again to
> // yield the capability to the appropriate task
> pushOnRunQueue(cap, t);
> continue;
> }
>
> runMutator(t);
>
> if (needs_gc) {
> scheduleDoGC();
> }
> }
> }
>
> I might rather extract this into something like:
>
> enum ScheduleResult {
> NoWork,  // There was no work to do
> PerformedWork,   // Ran precisely one thread
> Yield,   // The next thread scheduled to run cannot run on the
>  // given capability; yield.
> ShuttingDown,// We were asked to shut down
> }
>
> // Schedule at most one thread once
> ScheduleResult scheduleOnce (Capability **cap, Task *task) {
> if (emptyRunQueue(cap)) {
> return NoWork;
> }
>
> if (shutting_down) {
> return ShuttingDown;
> }
>
> StgTSO *t = popRunQueue(cap);
>
> if (! t.can_run_on_capability(cap)) {
> pushOnRunQueue(cap, t);
> return Yield;
> }
>
> runMutator(t);
>
> if (needs_gc) {
> scheduleDoGC();
> }
>
> return PerformedWork;
> }
>
> This is just a sketch but I hope it's clear that with something like
> this this you can easily implement the existing `schedule` function, as
> well as your asynchronous variant.
>

Thanks for the sketch! I definitely agree we should simplify
schedule() in some way instead of adding ad-hoc bail out case. The
ScheduleResult type and scheduleOnce() function looks good to me,
although I need to do a lot more experiments to confirm.

Cheers,
Cheng
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Thoughts on async RTS API?

2021-12-16 Thread Cheng Shao
Hi Alex,

Thanks for reminding. hs_try_put_mvar() wouldn't work for our use
case. If the C function finishes work and calls hs_try_put_mvar()
synchronously, in Haskell takeMVar wouldn't block at all, which is all
fine. However, if the C function is expected to call hs_try_put_mvar()
asynchronously, the non-threaded RTS will hang!

Here's a minimal repro. It works with non-threaded RTS at first, but
if you change scheduleCallback() in C so hs_try_put_mvar() is only
invoked in a detached pthread, then the program hangs.

The proposed async RTS API and related scheduler refactorings can't be
avoided, if the MVar is intended to be fulfilled in an async manner,
using the non-threaded RTS, on a platform with extremely limited
syscall capabilities.

```haskell
import Control.Concurrent
import Control.Exception
import Foreign
import Foreign.C
import GHC.Conc

main :: IO ()
main = makeExternalCall >>= print

makeExternalCall :: IO CInt
makeExternalCall = mask_ $ do
  mvar <- newEmptyMVar
  sp <- newStablePtrPrimMVar mvar
  fp <- mallocForeignPtr
  withForeignPtr fp $ \presult -> do
(cap,_) <- threadCapability =<< myThreadId
scheduleCallback sp cap presult
takeMVar mvar
peek presult

foreign import ccall "scheduleCallback"
  scheduleCallback :: StablePtr PrimMVar
   -> Int
   -> Ptr CInt
   -> IO ()
```

```c
#include "HsFFI.h"
#include "Rts.h"
#include "RtsAPI.h"
#include 
#include 

struct callback {
HsStablePtr mvar;
int cap;
int *presult;
};

void* callback(struct callback *p)
{
usleep(1000);
*p->presult = 42;
hs_try_putmvar(p->cap,p->mvar);
free(p);
return NULL;
}

void scheduleCallback(HsStablePtr mvar, HsInt cap, int *presult)
{
pthread_t t;
struct callback *p = malloc(sizeof(struct callback));
p->mvar = mvar;
p->cap = cap;
p->presult = presult;
// pthread_create(&t, NULL, callback, p);
// pthread_detach(t);
callback(p);
}
```

On Thu, Dec 16, 2021 at 12:10 PM Alexander V Vershilov
 wrote:
>
> Hello, replying off-the thread as it would be basically an offtopic.
>
> But you can achieve the solution using MVars only.
> The idea is that you can call mkStablePtr on the MVar that way it will
> not be marked as dead, so RTS will not exit.
> Then you can use hs_try_put_mvar in C thread to call the thread back.
>
> On Wed, 15 Dec 2021 at 05:07, Cheng Shao  wrote:
> >
> > Hi devs,
> >
> > To invoke Haskell computation in C, we need to call one of rts_eval*
> > functions, which enters the scheduler loop, and returns only when the
> > specified Haskell thread is finished or killed. We'd like to enhance
> > the scheduler and add async variants of the rts_eval* functions, which
> > take C callbacks to consume the Haskell thread result, kick off the
> > scheduler loop, and the loop is allowed to exit when the Haskell
> > thread is blocked. Sync variants of RTS API will continue to work with
> > unchanged behavior.
> >
> > The main intended use case is async foreign calls for the WebAssembly
> > target. When an async foreign call is made, the Haskell thread will
> > block on an MVar to be fulfilled with the call result. But the
> > scheduler will eventually fail to find work due to empty run queue and
> > exit with error! We need a way to gracefully exit the scheduler, so
> > the RTS API caller can process the async foreign call, fulfill that
> > MVar and resume Haskell computation later.
> >
> > Question I: does the idea of adding async RTS API sound acceptable by
> > GHC HQ? To be honest, it's not impossible to workaround lack of async
> > RTS API: reuse the awaitEvent() logic in non-threaded RTS, pretend
> > each async foreign call reads from a file descriptor and can be
> > handled by the POSIX select() function in awaitEvent(). But it'd
> > surely be nice to avoid such hacks and do things the principled way.
> >
> > Question II: how to modify the scheduler loop to implement this
> > feature? Straightforward answer seems to be: check some RTS API
> > non-blocking flag, if present, allow early exit due to empty run
> > queue.
> >
> > Thanks a lot for reading this, I appreciate any suggestions or
> > questions :)
> >
> > Best regards,
> > Cheng
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
>
> --
> --
> Alexander
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs