On Wed, Aug 07, 2024 at 12:43:29PM -0500, Eric Blake wrote:
> Allowing an unlimited number of clients to any web service is a recipe
> for a rudimentary denial of service attack: the client merely needs to
> open lots of sockets without closing them, until qemu no longer has
> any more fds available to allocate.
> 
> For qemu-nbd, we default to allowing only 1 connection unless more are
> explicitly asked for (-e or --shared); this was historically picked as
> a nice default (without an explicit -t, a non-persistent qemu-nbd goes
> away after a client disconnects, without needing any additional
> follow-up commands), and we are not going to change that interface now
> (besides, someday we want to point people towards qemu-storage-daemon
> instead of qemu-nbd).
> 
> But for qemu proper, the QMP nbd-server-start command has historically
> had a default of unlimited number of connections, in part because
> unlike qemu-nbd it is inherently persistent.  Allowing multiple client
> sockets is particularly useful for clients that can take advantage of
> MULTI_CONN (creating parallel sockets to increase throughput),
> although known clients that do so (such as libnbd's nbdcopy) typically
> use only 8 or 16 connections (the benefits of scaling diminish once
> more sockets are competing for kernel attention).  Picking a number
> large enough for typical use cases, but not unlimited, makes it
> slightly harder for a malicious client to perform a denial of service
> merely by opening lots of connections withot progressing through the
> handshake.
> 
> This change does not eliminate CVE-2024-7409 on its own, but reduces
> the chance for fd exhaustion or unlimited memory usage as an attack
> surface.  On the other hand, by itself, it makes it more obvious that
> with a finite limit, we have the problem of an unauthenticated client
> holding 100 fds opened as a way to block out a legitimate client from
> being able to connect; thus, later patches will further add timeouts
> to reject clients that are not making progress.
> 
> Suggested-by: Daniel P. Berrangé <berra...@redhat.com>
> Signed-off-by: Eric Blake <ebl...@redhat.com>
> ---
>  qapi/block-export.json         | 4 ++--
>  include/block/nbd.h            | 7 +++++++
>  block/monitor/block-hmp-cmds.c | 3 ++-
>  blockdev-nbd.c                 | 8 ++++++++
>  4 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/qapi/block-export.json b/qapi/block-export.json
> index 665d5fd0262..ce33fe378df 100644
> --- a/qapi/block-export.json
> +++ b/qapi/block-export.json
> @@ -28,7 +28,7 @@
>  # @max-connections: The maximum number of connections to allow at the
>  #     same time, 0 for unlimited.  Setting this to 1 also stops the
>  #     server from advertising multiple client support (since 5.2;
> -#     default: 0)
> +#     default: 100)
>  #
>  # Since: 4.2
>  ##
> @@ -63,7 +63,7 @@
>  # @max-connections: The maximum number of connections to allow at the
>  #     same time, 0 for unlimited.  Setting this to 1 also stops the
>  #     server from advertising multiple client support (since 5.2;
> -#     default: 0).
> +#     default: 100).
>  #
>  # Errors:
>  #     - if the server is already running

This is considered a backwards compatibility break.
An intentional one.

Our justification is that we believe it is the least worst option
to mitigate the DoS vector. Lets explore the reasoning for that
belief...

An alternative would be to deprecate the absence of 'max-connections',
and after the deprecation period, make it in into a mandatory
parameter, forcing apps to make a decision. This would be strictly
compliant with our QAPI change policy.

How does that differ in impact from changing the defaults...

  * Changed default
     - Small risk of breaking app if needing > 100 concurrent conns
     - Immediately closes the DoS hole for all apps using QEMU

  * Deprecation & eventual change to mandatory
     - Zero risk of breaking apps today
     - Forces all apps to be changed to pass max-connections
       in 3 releases time
     - Does NOT close the DoS hole until the 3rd future release
       from now.

If we took the deprecation route, arguably any app which isn't
already setting max-connections would need to have a CVE filed
against it *today*, and thus effectively apps would be forced
into immediately setting max-connections, despite our deprecaiton
grace period supposedly giving them 3 releases to adjust.

If we took the changed default route and broke an app, it would
again have to be changed to set max-connections to a value that
satisfies its use case.


So....

If we take the deprecation route, we create work for /all/ app
developers using NBD to update their code now to fix the CVE.

If we take the changed defaults route, and our 100 value choice
is sufficient, then no apps need changing.

If we take the changed defaults route, and our 100 value choice
is NOT sufficient, then most apps still don't need changing, but
perhaps 1 or 2 do need changing.

The unpleasant bit is that if 100 is insufficient, an app
maintainer or user might not become aware of this until they
have been in production for 12 months and suddenly hit a
rare load spike.


Overall though, changing the defaults still looks likely to
be the least disruptive option for fixing the DoS, providing
our choice of 100 is a good one.

I think system emulators are likely OK given common use
cases for NBD in context of a running VM (migration and
backup related).

I've got a nagging concern someone could be doing something
adventurous with QSD though. eg using it to serve content
(perhaps readonly) to a huge pool of clients ?

Probably that's just a risk we have to take, as any app
relying on the max-connections default is vulernable.

There are no good answers AFAICT. Only bad answers. This
one feels least bad to me.

Reviewed-by: Daniel P. Berrangé <berra...@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to