"containing" memory-consuming computations

2012-04-19 Thread Herbert Valerio Riedel
Hello GHC Devs,

One issue that's been bothering me when writing Haskell programs meant
to be long-running processes performing computations on external
input-data in terms of an event/request-loop (think web-services,
SQL-servers, or REPLs), that it is desirable to be able to limit
resource-usage and be able to "contain" the effects of computations
which exhausts the resource-limits (i.e. w/o crashing and burning the
whole process)

For the time-dimension, I'm already using functions such as
System.Timeout.timeout which I can use to make sure that even a (forced)
pure computation doesn't require (significantly) more wall-clock time
than I expect it to.

But I'm missing a similar facility for constraining the
space-dimension. In some other languages such as C, I have (more or
less) the ability to check for /local/ out-of-memory conditions (e.g. by
checking the return value of e.g. malloc(3) for heap-allocations, or by
handling an OOM exception), rollback the computation, and be able to
skip to the next computation request (which hopefully requires less
memory...)

So, is there already any such facility provided by the GHC Platform I've
missed so far?

...and if not, would such a memory-limiting facility be reconcilable
with the GHC RTS architecture?

Cheers,
  hvr
-- 

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: "containing" memory-consuming computations

2012-04-19 Thread Ryan Newton
Hi Herbert,

It sounds like you're interested in running just one client computation at
once?  Hence you don't have a disambiguation problem -- if the total memory
footprint crosses a threshold you know who to blame.

At least this seems easier than needing a per-computation or per-IO-thread
caps.  By the way, the folks who implement Second Life did an interesting
job of that -- they hacked Mono to be able to execute untrusted code with
resource bounds.

Cheers,
  -Ryan

On Thu, Apr 19, 2012 at 6:45 AM, Herbert Valerio Riedel  wrote:

> Hello GHC Devs,
>
> One issue that's been bothering me when writing Haskell programs meant
> to be long-running processes performing computations on external
> input-data in terms of an event/request-loop (think web-services,
> SQL-servers, or REPLs), that it is desirable to be able to limit
> resource-usage and be able to "contain" the effects of computations
> which exhausts the resource-limits (i.e. w/o crashing and burning the
> whole process)
>
> For the time-dimension, I'm already using functions such as
> System.Timeout.timeout which I can use to make sure that even a (forced)
> pure computation doesn't require (significantly) more wall-clock time
> than I expect it to.
>
> But I'm missing a similar facility for constraining the
> space-dimension. In some other languages such as C, I have (more or
> less) the ability to check for /local/ out-of-memory conditions (e.g. by
> checking the return value of e.g. malloc(3) for heap-allocations, or by
> handling an OOM exception), rollback the computation, and be able to
> skip to the next computation request (which hopefully requires less
> memory...)
>
> So, is there already any such facility provided by the GHC Platform I've
> missed so far?
>
> ...and if not, would such a memory-limiting facility be reconcilable
> with the GHC RTS architecture?
>
> Cheers,
>  hvr
> --
>
> ___
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users@haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: "containing" memory-consuming computations

2012-04-19 Thread Herbert Valerio Riedel
Ryan Newton  writes:

> It sounds like you're interested in running just one client computation at
> once? 

Actually, I was hoping for a more general solution, which is also
applicable to e.g. a web-server running with `+RTS -N8`, where each HTTP
request spawns a new thread, and multiple requests are supposed to run
in parallel and concurrently.

> Hence you don't have a disambiguation problem -- if the total memory
> footprint crosses a threshold you know who to blame.

I could use that, but then I'd have to clone the process (or start the
processes multiple times requiring to duplicate all in-memory
data-structures), and I'd have the problem that the amount of
parallelism/concurrency is limited by the number of cloned unix
processes. Alas, this works only for some use-cases (like e.g. a
single-user GHCi REPL)

> At least this seems easier than needing a per-computation or
> per-IO-thread caps.

How hard would per-IO-thread caps be?

> By the way, the folks who implement Second Life did an interesting job
> of that -- they hacked Mono to be able to execute untrusted code with
> resource bounds.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: "containing" memory-consuming computations

2012-04-19 Thread Bardur Arantsson

On 04/19/2012 12:45 PM, Herbert Valerio Riedel wrote:

Hello GHC Devs,

>

But I'm missing a similar facility for constraining the
space-dimension. In some other languages such as C, I have (more or
less) the ability to check for /local/ out-of-memory conditions (e.g. by
checking the return value of e.g. malloc(3) for heap-allocations, or by
handling an OOM exception), rollback the computation, and be able to


Just a minor point which may be of interest: Many operating systems 
(including Linux by default) overcommit memory, so there is in fact no 
guarantee that memory returned by malloc() will in fact be usable under 
memory pressure.


So, getting a non-NULL pointer from malloc() is not a guarantee.

(There are good reasons for this behavior.)

Regards,


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: "containing" memory-consuming computations

2012-04-20 Thread Simon Marlow

On 19/04/2012 11:45, Herbert Valerio Riedel wrote:

> For the time-dimension, I'm already using functions such as
> System.Timeout.timeout which I can use to make sure that even a (forced)
> pure computation doesn't require (significantly) more wall-clock time
> than I expect it to.

Note that timeout uses wall-clock time, but you're really interested in 
CPU time (presumably).  If there are other threads running, then using 
timeout will not do what you want.


You could track allocation and CPU usage per thread, but note that 
laziness could present a problem: if a thread evaluates a lazy 
computation created by another thread, it will be charged to the thread 
that evaluated it, not the thread that created it.  To get around this 
you would need to use the profiling system, which tracks costs 
independently of lazy evaluation.


On 19/04/2012 17:04, Herbert Valerio Riedel wrote:


At least this seems easier than needing a per-computation or
per-IO-thread caps.


How hard would per-IO-thread caps be?


For tracking "memory use", which I think is what you're asking for, it 
would be quite hard.  One problem is sharing: when a data structure is 
shared between multiple threads, which one should it be charged to?  Both?


To calculate the amount of memory use per thread you would need to run 
the GC multiple times, once per thread, and observe how much data is 
reachable.  I can't think of any fundamental difficulties with doing 
that, but it could be quite expensive.  There might be some tricky 
interactions with the reachability property of threads themselves: a 
blocked thread is only reachable if the object it is blocked on is also 
reachable.


Cheers,
Simon



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: "containing" memory-consuming computations

2012-04-20 Thread Edward Z. Yang
So, it would be pretty interesting if we could have an ST s style
mechanism, where the data structure is not allowed to escape.
But I wonder if this would be too cumbersome for anyone to use.

Edward

Excerpts from Simon Marlow's message of Fri Apr 20 06:07:20 -0400 2012:
> On 19/04/2012 11:45, Herbert Valerio Riedel wrote:
> 
>  > For the time-dimension, I'm already using functions such as
>  > System.Timeout.timeout which I can use to make sure that even a (forced)
>  > pure computation doesn't require (significantly) more wall-clock time
>  > than I expect it to.
> 
> Note that timeout uses wall-clock time, but you're really interested in 
> CPU time (presumably).  If there are other threads running, then using 
> timeout will not do what you want.
> 
> You could track allocation and CPU usage per thread, but note that 
> laziness could present a problem: if a thread evaluates a lazy 
> computation created by another thread, it will be charged to the thread 
> that evaluated it, not the thread that created it.  To get around this 
> you would need to use the profiling system, which tracks costs 
> independently of lazy evaluation.
> 
> On 19/04/2012 17:04, Herbert Valerio Riedel wrote:
> 
> >> At least this seems easier than needing a per-computation or
> >> per-IO-thread caps.
> >
> > How hard would per-IO-thread caps be?
> 
> For tracking "memory use", which I think is what you're asking for, it 
> would be quite hard.  One problem is sharing: when a data structure is 
> shared between multiple threads, which one should it be charged to?  Both?
> 
> To calculate the amount of memory use per thread you would need to run 
> the GC multiple times, once per thread, and observe how much data is 
> reachable.  I can't think of any fundamental difficulties with doing 
> that, but it could be quite expensive.  There might be some tricky 
> interactions with the reachability property of threads themselves: a 
> blocked thread is only reachable if the object it is blocked on is also 
> reachable.
> 
> Cheers,
> Simon
> 

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: "containing" memory-consuming computations

2012-04-20 Thread Brandon Allbery
On Fri, Apr 20, 2012 at 12:56, Edward Z. Yang  wrote:

> So, it would be pretty interesting if we could have an ST s style
> mechanism, where the data structure is not allowed to escape.
> But I wonder if this would be too cumbersome for anyone to use.
>

Isn't this what monadic regions are for?

-- 
brandon s allbery  allber...@gmail.com
wandering unix systems administrator (available) (412) 475-9364 vm/sms
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: "containing" memory-consuming computations

2012-04-20 Thread Edward Z. Yang
Excerpts from Brandon Allbery's message of Fri Apr 20 19:31:54 -0400 2012:
> > So, it would be pretty interesting if we could have an ST s style
> > mechanism, where the data structure is not allowed to escape.
> > But I wonder if this would be too cumbersome for anyone to use.
> 
> Isn't this what monadic regions are for?

That's right!  But we have a hard enough time convincing people it's
worth it, just for file handles.

Edward

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: "containing" memory-consuming computations

2012-07-16 Thread Ryan Newton
Simon mentioned a system of doing multiple GC's to measure actual live data.

But wouldn't a more limited alternative be capping *allocation* rather than
live data?  GHC already has an mechanism to preempt IO threads based on an
allocation trip wire.  In fact that's *the* preemption mechanism.  Isn't
the only piece missing to have a primitive similar to chez Scheme's
"make-engine":

   http://www.scheme.com/csug8/control.html#./control:s13

... which would transfer control to a child computation, but would return
control to the parent (along with a continuation) when its allocation
budget is exhausted?

Make-engine + safe-haskell + timeouts should be everything one needs to
resist an adversarial untrusted program.  Maybe?

  -Ryan

P.S. Chez Scheme engines are actually related to # procedure calls, not
allocation as far as I know.


On Fri, Apr 20, 2012 at 7:35 PM, Edward Z. Yang  wrote:

> Excerpts from Brandon Allbery's message of Fri Apr 20 19:31:54 -0400 2012:
> > > So, it would be pretty interesting if we could have an ST s style
> > > mechanism, where the data structure is not allowed to escape.
> > > But I wonder if this would be too cumbersome for anyone to use.
> >
> > Isn't this what monadic regions are for?
>
> That's right!  But we have a hard enough time convincing people it's
> worth it, just for file handles.
>
> Edward
>
> ___
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users@haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users