Re: [capnproto] flow limit related deadlock?

2021-11-24 Thread 'Kenton Varda' via Cap'n Proto
It sounds like what Go is providing today would be equivalent to a KJ API
that returns two promises: once which waits for backpressure, and one which
waits for the RPC return value. In principle, we could provide a new
version of `send()` in C++ which provides this. But if it only recognizes
backpressure from the first-hop socket, this could cause more harm than
good, causing people to write apps that tend to cause proxy buffer bloat.
Backpressure in Cap'n Proto really needs to be based on return messages to
handle proxies correctly. This is what `-> stream` provides.

(Note we definitely wouldn't want to stall the whole KJ event loop when one
connection has backpressure. Applications may be doing many concurrent
tasks on the same event loop.)

-Kenton

On Wed, Nov 24, 2021 at 6:05 AM Erin Shepherd  wrote:

> On Wed, 24 Nov 2021, at 02:28, Ian Denhardt wrote:
>
> Quoting Kenton Varda (2021-11-23 19:20:50)
> > Cap'n Proto doesn't provide any backpressure from the underlying TCP
> > connection to the app, except through streaming. If you just make a ton
> > of calls all at once without waiting for returns, you'll bloat your
> > memory with unsent messages. And possibly worse: if the capability
> > bounces through a proxy, and you have a fast connection (say, a unix
> > socket) to the proxy, but a slow connection from the proxy to the
> > eventual destination, you'll end up bloating the proxy's memory.
>
> This is not true of the Go implementation, which currently blocks
> until the message has been written to the socket (we don't currently
> treat streaming specially, but presumably we could keep the blocking
> API when we add that; I don't think we even need to treat the annotation
> specially, we should be able to do it for all calls). So I don't think
> this applies to a scenario where both sides of the connection work like
> the Go implementation. But I hadn't thought about the proxy issue
> (where the proxy might be using a different implementation); thank
> you for pointing that out.
>
>
> I guess this is a matter of futures implementation: callback based futures
> (like kj) often do not provides such a backpressure mechanism. Othe
> solutions (like Go fibers or Rust poll-based futures) can easily provide
> such a mechanism, though
>
> If a concurrency limiter were provided at the capnp-server level (i.e.
> allow no more than X RPCs to be in flight at once), I wonder if that would
> help things?
>
> --
>
> As to the original topic: I wonder if a message could be added which
> advises the peer as to limits on the number of questions and/or the maximum
> size of them which may be in flight? The client might then be able to
> thrrottle itself and the sender could then reject (immediately, with an
> error) any calls once that limit was exceeded. This would keep the socket
> unblocked for finish messages
>
> --
> You received this message because you are subscribed to the Google Groups
> "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to capnproto+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/capnproto/19d6b0bb-83d5-4e22-b3d1-eeb2efb10c1b%40www.fastmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/CAJouXQmxKav171EdQkf5iP90BRxe2iBmcwgOrG_v4NBUX3y%2BPA%40mail.gmail.com.


Re: [capnproto] flow limit related deadlock?

2021-11-24 Thread Erin Shepherd
On Wed, 24 Nov 2021, at 02:28, Ian Denhardt wrote:
> Quoting Kenton Varda (2021-11-23 19:20:50)
> > Cap'n Proto doesn't provide any backpressure from the underlying TCP
> > connection to the app, except through streaming. If you just make a ton
> > of calls all at once without waiting for returns, you'll bloat your
> > memory with unsent messages. And possibly worse: if the capability
> > bounces through a proxy, and you have a fast connection (say, a unix
> > socket) to the proxy, but a slow connection from the proxy to the
> > eventual destination, you'll end up bloating the proxy's memory.
> 
> This is not true of the Go implementation, which currently blocks
> until the message has been written to the socket (we don't currently
> treat streaming specially, but presumably we could keep the blocking
> API when we add that; I don't think we even need to treat the annotation
> specially, we should be able to do it for all calls). So I don't think
> this applies to a scenario where both sides of the connection work like
> the Go implementation. But I hadn't thought about the proxy issue
> (where the proxy might be using a different implementation); thank
> you for pointing that out.

I guess this is a matter of futures implementation: callback based futures 
(like kj) often do not provides such a backpressure mechanism. Othe solutions 
(like Go fibers or Rust poll-based futures) can easily provide such a 
mechanism, though

If a concurrency limiter were provided at the capnp-server level (i.e. allow no 
more than X RPCs to be in flight at once), I wonder if that would help things?

--

As to the original topic: I wonder if a message could be added which advises 
the peer as to limits on the number of questions and/or the maximum size of 
them which may be in flight? The client might then be able to thrrottle itself 
and the sender could then reject (immediately, with an error) any calls once 
that limit was exceeded. This would keep the socket unblocked for finish 
messages

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/19d6b0bb-83d5-4e22-b3d1-eeb2efb10c1b%40www.fastmail.com.


Re: [capnproto] flow limit related deadlock?

2021-11-23 Thread Ian Denhardt
Quoting Kenton Varda (2021-11-23 19:20:50)

>On Tue, Nov 23, 2021 at 5:01 PM Ian Denhardt <[1]i...@zenhack.net>
>wrote:
> 
>  Ok, I think I get it, let me know if I have this right:
>  [...]
> 
>Right.

Ok, great. Thanks for your patience.

> Cap'n Proto doesn't provide any backpressure from the underlying TCP
> connection to the app, except through streaming. If you just make a ton
> of calls all at once without waiting for returns, you'll bloat your
> memory with unsent messages. And possibly worse: if the capability
> bounces through a proxy, and you have a fast connection (say, a unix
> socket) to the proxy, but a slow connection from the proxy to the
> eventual destination, you'll end up bloating the proxy's memory.

This is not true of the Go implementation, which currently blocks
until the message has been written to the socket (we don't currently
treat streaming specially, but presumably we could keep the blocking
API when we add that; I don't think we even need to treat the annotation
specially, we should be able to do it for all calls). So I don't think
this applies to a scenario where both sides of the connection work like
the Go implementation. But I hadn't thought about the proxy issue
(where the proxy might be using a different implementation); thank
you for pointing that out.

-Ian

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/163771729674.16273.9436747546473613420%40localhost.localdomain.


Re: [capnproto] flow limit related deadlock?

2021-11-23 Thread 'Kenton Varda' via Cap'n Proto
On Tue, Nov 23, 2021 at 5:01 PM Ian Denhardt  wrote:

> Ok, I think I get it, let me know if I have this right:
>
> The correct thing to do is to handle congestion/flow control for
> multiple calls on each object individually, using something like
> the mechanisms provided by the C++ implementiation's streaming
> construct. This is important so that calls on different objects
> originating from the same vat do not cause problems with one
> another (whereas from TCP's standpoint, they're the same stream,
> so it can't help). This also provides backpressure wrt. the
> receiving vat; it avoids queueing too many calls on the connection
> overall.
>

Right.


> Another check of my understanding: am I correct in thinking that in a
> client implementation that's just a single threaded loop calling
> methods on one object in sequence, it would not cause a problem to
> skip in-process flow control and let the TCP connection deal with it,
> since there is only one stream involved anyway? Assuming of course there
> is something like the flow limit provided by the vat on the other side
> of the connection.
>

Cap'n Proto doesn't provide any backpressure from the underlying TCP
connection to the app, except through streaming. If you just make a ton of
calls all at once without waiting for returns, you'll bloat your memory
with unsent messages. And possibly worse: if the capability bounces through
a proxy, and you have a fast connection (say, a unix socket) to the proxy,
but a slow connection from the proxy to the eventual destination, you'll
end up bloating the proxy's memory.

-Kenton


>
> -Ian
>
> Quoting Kenton Varda (2021-11-23 17:32:44)
> >On Tue, Nov 23, 2021 at 3:59 PM Ian Denhardt <[1]i...@zenhack.net>
> >wrote:
> >
> >  What are apps *supposed* to do here? It isn't clear to me where else
> >  the
> >  backpressure is supposed to come from?
> >
> >Apps should cap the number of write()s they have in-flight at once.
> >(`-> stream` helps a lot with this, as it'll automatically figure out
> >how many is a good number of requests to have in flight.)
> >
> >  Most apps are using sandstorm-http-bridge anyway, so they're just
> >  acting
> >  like normal http servers -- which generally write out data to the
> >  response stream as fast as the socket will take it. Skimming
> >  sendRequest() in the bridge's source, it looks like it just copies
> >  that
> >  data directly into the response stream. So I am confused as to what
> >  a
> >  "well written" app would do?
> >
> >sandstorm-http-bridge currently only does one outstanding write RPC at
> >a time. The code is convoluted but look at pumpWrites() -- it waits
> for
> >each send() to complete before performing the next one.
> >Historically there was a time where it didn't implement such a limit
> >and would just pump the whole response as fast as it got it, which led
> >to the need to do some enforcement on the supervisor side.
> >-Kenton
> >
> > Verweise
> >
> >1. mailto:i...@zenhack.net
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/CAJouXQk2Y%2BvpN8YUrVCA_S%2B5nuUoPidh%2BsbiQ5ezO5KEpHbOVA%40mail.gmail.com.


Re: [capnproto] flow limit related deadlock?

2021-11-23 Thread Ian Denhardt
Ok, I think I get it, let me know if I have this right:

The correct thing to do is to handle congestion/flow control for
multiple calls on each object individually, using something like
the mechanisms provided by the C++ implementiation's streaming
construct. This is important so that calls on different objects
originating from the same vat do not cause problems with one
another (whereas from TCP's standpoint, they're the same stream,
so it can't help). This also provides backpressure wrt. the
receiving vat; it avoids queueing too many calls on the connection
overall.

Another check of my understanding: am I correct in thinking that in a
client implementation that's just a single threaded loop calling
methods on one object in sequence, it would not cause a problem to
skip in-process flow control and let the TCP connection deal with it,
since there is only one stream involved anyway? Assuming of course there
is something like the flow limit provided by the vat on the other side
of the connection.

-Ian

Quoting Kenton Varda (2021-11-23 17:32:44)
>On Tue, Nov 23, 2021 at 3:59 PM Ian Denhardt <[1]i...@zenhack.net>
>wrote:
> 
>  What are apps *supposed* to do here? It isn't clear to me where else
>  the
>  backpressure is supposed to come from?
> 
>Apps should cap the number of write()s they have in-flight at once.
>(`-> stream` helps a lot with this, as it'll automatically figure out
>how many is a good number of requests to have in flight.)
> 
>  Most apps are using sandstorm-http-bridge anyway, so they're just
>  acting
>  like normal http servers -- which generally write out data to the
>  response stream as fast as the socket will take it. Skimming
>  sendRequest() in the bridge's source, it looks like it just copies
>  that
>  data directly into the response stream. So I am confused as to what
>  a
>  "well written" app would do?
> 
>sandstorm-http-bridge currently only does one outstanding write RPC at
>a time. The code is convoluted but look at pumpWrites() -- it waits for
>each send() to complete before performing the next one.
>Historically there was a time where it didn't implement such a limit
>and would just pump the whole response as fast as it got it, which led
>to the need to do some enforcement on the supervisor side.
>-Kenton
> 
> Verweise
> 
>1. mailto:i...@zenhack.net

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/163770835576.11740.7320320383419454803%40localhost.localdomain.


Re: [capnproto] flow limit related deadlock?

2021-11-23 Thread 'Kenton Varda' via Cap'n Proto
On Tue, Nov 23, 2021 at 3:59 PM Ian Denhardt  wrote:

> What are apps *supposed* to do here? It isn't clear to me where else the
> backpressure is supposed to come from?
>

Apps should cap the number of write()s they have in-flight at once. (`->
stream` helps a lot with this, as it'll automatically figure out how many
is a good number of requests to have in flight.)

Most apps are using sandstorm-http-bridge anyway, so they're just acting
> like normal http servers -- which generally write out data to the
> response stream as fast as the socket will take it. Skimming
> sendRequest() in the bridge's source, it looks like it just copies that
> data directly into the response stream. So I am confused as to what a
> "well written" app would do?
>

sandstorm-http-bridge currently only does one outstanding write RPC at a
time. The code is convoluted but look at pumpWrites() -- it waits for each
send() to complete before performing the next one.

Historically there was a time where it didn't implement such a limit and
would just pump the whole response as fast as it got it, which led to the
need to do some enforcement on the supervisor side.

-Kenton

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/CAJouXQ%3D9PVexazqg30yqh9Umb83Z5Vnb7%2Bu8A2q%2BDyZiPX68WQ%40mail.gmail.com.


Re: [capnproto] flow limit related deadlock?

2021-11-23 Thread Ian Denhardt
(Adding Louis to cc per his request)

Quoting Kenton Varda (2021-11-23 14:50:20)
>On Tue, Nov 23, 2021 at 12:41 PM Ian Denhardt <[1]i...@zenhack.net>
>wrote:
> 
>  Wouldn't releasing it on return allow the caller to cause runaway
>  memory
>  usage by just never sending the finish? the return entry needs to be
>  kept
>  around in case calls are pipelined on it, and itself might take up
>  some
>  space (arguably it should count against the limit too...).
> 
>In all honesty, this isn't a very good defense against resource
>exhaustion attacks either way. The problem that motivated it was
>actually badly-written apps that would stream a whole large HTTP
>response body without paying attention� to backpressure.

What are apps *supposed* to do here? It isn't clear to me where else the
backpressure is supposed to come from?

Most apps are using sandstorm-http-bridge anyway, so they're just acting
like normal http servers -- which generally write out data to the
response stream as fast as the socket will take it. Skimming
sendRequest() in the bridge's source, it looks like it just copies that
data directly into the response stream. So I am confused as to what a
"well written" app would do?

>Technically the callee doesn't need to keep the whole response struct
>in memory, only any returned capabilities and their paths. The current
>implementation does keep the whole struct� though.
>My general opinion about resource exhaustion is that it's not so much a
>security issue as an abuse issue. These attacks create a temporary
>nuisance� / disruption, but no permanent damage. It's probably
>impossible to anticipate every attack. But, since the attacks don't
>actually get anything for the attacker, they are much more rare than
>you might think, and it's usually fine to respond reactively by
>revoking capabilities, banning users, blocking IPs, etc. If the problem
>recurs then you start investing new safeguards against the specific
>scenario.

I suppose that's a reasonable perspective; you're right that it's a
different beast (Mark Miller wrote
https://agoric.com/blog/all/taxonomy-of-security-issues/ which is
really useful for thinking about this stuff. In his terminology we are
talking about availibility rather than integrity).

Every approach I've been able to think of for dealing with availability
attacks gracefully in the capnp implementation itself does seem very
fragile. Perhaps we could treat the flow limit as a soft limit, not
bound the space taken up by returns and table entries, but track it,
and provide a way to notify the user if it goes past a harder limit,
at which point they can drop the connection if they want. If we were
doing this at the process level I'd be inclined to just lean on the
kernel's resource management, but I'd like to a least find a way to
confine these issues to a single connection. Anyway, thanks for your
input.

-Ian

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/163770466055.10656.12069341791801755860%40localhost.localdomain.


Re: [capnproto] flow limit related deadlock?

2021-11-23 Thread 'Kenton Varda' via Cap'n Proto
On Tue, Nov 23, 2021 at 12:41 PM Ian Denhardt  wrote:

> Wouldn't releasing it on return allow the caller to cause runaway memory
> usage by just never sending the finish? the return entry needs to be kept
> around in case calls are pipelined on it, and itself might take up some
> space (arguably it should count against the limit too...).
>

In all honesty, this isn't a very good defense against resource exhaustion
attacks either way. The problem that motivated it was actually
badly-written apps that would stream a whole large HTTP response body
without paying attention to backpressure.

Technically the callee doesn't need to keep the whole response struct in
memory, only any returned capabilities and their paths. The current
implementation does keep the whole struct though.

My general opinion about resource exhaustion is that it's not so much a
security issue as an abuse issue. These attacks create a temporary
nuisance / disruption, but no permanent damage. It's probably impossible to
anticipate every attack. But, since the attacks don't actually get anything
for the attacker, they are much more rare than you might think, and it's
usually fine to respond reactively by revoking capabilities, banning users,
blocking IPs, etc. If the problem recurs then you start investing new
safeguards against the specific scenario.

-Kenton

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/CAJouXQkNQVASw4QisbtnOvywO5330tWd9muzMpDx31i26piyCQ%40mail.gmail.com.


Re: [capnproto] flow limit related deadlock?

2021-11-23 Thread Ian Denhardt
Quoting Kenton Varda (2021-11-23 13:02:32)

>Hmm, I think the intention was that the flow limit should be released
>on Return, independent of Finish. But I can totally believe I
>implemented it wrong. Could we just change it to be based on Return?
>FWIW by default there is no flow limit, it's only enabled in the
>Sandstorm supervisor to defend against an app sending excessive
>requests that end up queued in memory elsewhere in the system.

Wouldn't releasing it on return allow the caller to cause runaway memory
usage by just never sending the finish? the return entry needs to be kept
around in case calls are pipelined on it, and itself might take up some
space (arguably it should count against the limit too...).

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/163769277955.6603.3901503529037025482%40localhost.localdomain.


Re: [capnproto] flow limit related deadlock?

2021-11-23 Thread 'Kenton Varda' via Cap'n Proto
Hmm, I think the intention was that the flow limit should be released on
Return, independent of Finish. But I can totally believe I implemented it
wrong. Could we just change it to be based on Return?

FWIW by default there is no flow limit, it's only enabled in the Sandstorm
supervisor to defend against an app sending excessive requests that end up
queued in memory elsewhere in the system.

-Kenton

On Tue, Nov 23, 2021 at 11:51 AM Ian Denhardt  wrote:

> Hey all,
>
> A few days ago one of my co-maintainers (Louis) alerted me to a deadlock
> in the Go implementation. We've pinned down the cause, and while trying
> to figure out how to fix it, I looked into how the C++ implementation
> handles backpressure.
>
> From what I can tell, the only way backpressure is applied is via the
> flow limit, which limits the total size of arguments to in-flight
> incoming calls. The portion of the quota reserved by a call is returned
> to the pool when the call is removed from the questions table, which
> makes sense, since this is when the memory is actually freed.
>
> However, I see two possible deadlocks that could occur because of this.
>
> The one I am less concerned about is one where calls that depend on one
> another go back and forth between vats, until both vats exceed their
> quota and block on oneanother, causing a deadlock. I am less concerned
> about this since it is basically the RPC equivalent of a stack overflow,
> and could be turned from a crash into a thrown exception by adding a
> timeout or such.
>
> The one I'm more worried about comes up in the context of streaming;
> the problematic scenario is as follows:
>
> Alice in vat A is continuously streaming calls to bob in vat B. It
> possible and expected that at some point alice will cause vat B's
> flow limit to be reached, at which point further calls will block
> until some outstanding calls return. Good so far: this backpressure
> is exactly what we want.
>
> The problem arises after the return message arrives at vat A. vat A
> then sends a finish message, but this message is *behind other calls*,
> so it will not reach vat B until vat B reads in all outstanding calls.
> This will never happen, since vat B is waiting on the flow limit.
>
> I don't know how to avoid this problem with the current protocol as
> specified. One way that almost works is for vat A to just send the
> finish message for each streaming call before the next call is sent,
> relying on the -> stream annotations to know what calls it should do
> this for. but this doesn't quite work since vat B is allowed to
> cancel an ongoing call in response to a finish message. Some
> extension to the protocol to allow non-cancelling finish messages
> would solve this.
>
> Is there a solution that I haven't seen? Are there other ways
> of dealing with this in the protocol?
>
> -Ian
>
> --
> You received this message because you are subscribed to the Google Groups
> "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to capnproto+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/capnproto/163768976630.4734.18127071831897488161%40localhost.localdomain
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/CAJouXQ%3Dnik3XG3AQ6K%3DOb1Xq-gvrkLDd2Yi--Qz3U%3DmDZDx8Sg%40mail.gmail.com.