Re: [ovs-dev] [RFC] ovn: minimize the impact of a compromised chassis

Russell Bryant Tue, 23 Aug 2016 14:21:06 -0700

On Tue, Aug 23, 2016 at 5:05 PM, Darrell Ball <[email protected]> wrote:


>
>
> On Mon, Aug 22, 2016 at 1:08 PM, Lance Richardson <[email protected]>
> wrote:
>
>> > From: "Ben Pfaff" <[email protected]>
>> > To: "Russell Bryant" <[email protected]>
>> > Cc: "Lance Richardson" <[email protected]>, "ovs dev" <
>> [email protected]>, "Russell Bryant" <[email protected]>
>> > Sent: Monday, August 22, 2016 1:22:43 PM
>> > Subject: Re: [ovs-dev] [RFC] ovn: minimize the impact of a compromised
>> chassis
>> >
>> > On Mon, Aug 22, 2016 at 01:14:03PM -0400, Russell Bryant wrote:
>> > > On Mon, Aug 22, 2016 at 12:30 PM, Ben Pfaff <[email protected]> wrote:
>> > >
>> > > > On Tue, Aug 16, 2016 at 09:30:21AM -0400, Lance Richardson wrote:
>> > > > > As described in ovn/TODO, these are the two main approaches that
>> could
>> > > > > be
>> > > > > used to minimize the impact of a compromised chassis on the rest
>> of an
>> > > > > OVN OVN network:
>> > > > >
>> > > > >   1) Implement a role- or identity-based access control mechanism
>> for
>> > > > >      ovsdb-server and use it to limit ovn-controller write access
>> to
>> > > > >      tables in the southbound database.
>> > > > >
>> > > > > or
>> > > > >
>> > > > >   2) Disallow all write access to the southbound database by
>> > > > ovn-controller
>> > > > >      (as an optional mode or unconditionally) and provide
>> alternative
>> > > > >      mechanisms for updating the southbound database for entries
>> that
>> > > > >      are
>> > > > >      currently updated by ovn-controller.
>> > > > >
>> > > > > It is believed that option (1) would require somewhat more effort
>> than
>> > > > (2),
>> > > > > and, because it would involve significant modifications to
>> > > > > ovsdb-server,
>> > > > > would also be more likely to add risk and burden to non-OVN users.
>> > > > > Additionally, option (2) will likely place fewer requirements on
>> > > > alternative
>> > > > > databases (such as etcd), so the following implementation
>> discussion
>> > > > > only
>> > > > > considers option (2).
>> > > >
>> > > > I've always pushed back against adding granular access control
>> > > > mechanisms to OVSDB because I didn't believe it was likely that
>> anything
>> > > > that was simple enough to be in the "spirit of OVSDB" (heh) was also
>> > > > going to be sufficient to fit a real use case.  However, if we do
>> now
>> > > > have specific requirements for OVN, then I'd invite descriptions of
>> what
>> > > > access control mechanism would be sufficient.  If it's simple and
>> > > > general enough, then implementing it in OVSDB might totally make
>> sense.
>> > > >
>> > > > I don't think that the "risk and burden" of a simple and general
>> > > > mechanism is a real issue.
>> > >
>> > >
>> > > I think that push back makes sense.
>> > >
>> > > The proposal here was to take route #2.  The only OVSDB feature
>> required in
>> > > that case is to accept read-only connections, which could be on a
>> > > per-socket basis.  This seems much simpler all around, as long as we
>> can
>> > > all get on board with ovn-controller as a read-only client.
>> >
>> > I'm not actually saying we should choose #1.  I'm saying a couple of
>> > things.  First, changing OVSDB is not a huge deal; we do it when it
>> > makes sense.  Second, that it is possible that our specific application
>> > here is a better place to start for OVSDB access control than a blanket
>> > "we need access control for OVSDB" that I've heard a couple of times.
>> >
>>
>> Based on my own narrow view of the world, I think option #1 would need:
>>
>>    - The ability for ovsdb-server to associate a role/identity with each
>>      client connection.  For simplicity this could be a binary
>> "privileged"
>>      vs "non-privileged" association, perhaps using per-role SSL
>> certificates
>>      for TLS connections and treating unix socket connections as
>> "privileged".
>>    - A mechanism for mapping a role/identity to access rights on a
>> per-table
>>      and per-column basis.
>>    - A mechanism for enforcing access rights on a per-table or per-column
>> basis,
>>      in some cases also considering the identity of the client that
>> created
>>      the row.
>>
>> This infrastructure would be applied to OVN to implement the following:
>>     - These tables would be read-only for non-privileged clients:
>>       SB_Global, Logical_Flow, Multicast_Group, Datapath_Binding,
>> Address_Set,
>>       DHCP_Options, and DHCPv6_Options.
>>
>>     - The Chassis and Encap tables would allow insertions by
>> non-privileged clients
>>       and updates to existing rows only for the clients that inserted
>> them.
>>
>>     - The Port_Binding table would be writable only by privileged clients
>>       (ovn-northd) except for the "Chassis" column which should be
>> writable by any
>>       non-privileged client (note that this doesn't do a lot to minimize
>> harm from
>>       a compromised chassis).
>>
>>     - The MAC_Binding table should be writable by any non-privileged
>> client (which also
>>       doesn't do much to minimize harm from a compromised chassis).
>>
>> > > Are you interested in looking closer at what #1 would look like, with
>> > > details of what the access control policy would look like?
>> >
>> > It'll probably be obvious, or close to obvious, what would be needed for
>> > #1 once we talk through what #2 needs.
>> >
>>
>> Here's a slightly more detailed breakdown of the work needed for option
>> #2:
>>
>>     ovsdb-server: Add support for "read-only" connections. Perhaps
>> something
>>       like "--remote ptcp:read-only:<port>[:<ip>]" and variations on that
>> theme
>>       for other connection types.
>>
>>     ovn-controller: Implement new approach for Chassis and Encap tables:
>>          - Remove code from ovn-controller for creating rows in these
>> tables.
>>          - Document how administrators create rows using ovn-sbctl in
>> ovn-controller
>>            man page.
>>          - Update all tests to manually create Chassis/Encap rows.
>>
>>     ovn-controller: Implement new approach for chassis column in
>> Port_Binding table:
>>          - Remove the code to update the chassis column from
>> ovn-controller.
>>          - Add new key to options column of Logical_Switch_Port in
>> OVN_Northbound
>>            database to specify chassis binding.
>>          - Change ovn-northd to update Port_Binding table in southbound
>> db based
>>            on chassis option from Logical_Switch_port in northbound db.
>>          - Write upgrade helper script that sets chassis option for
>> existing
>>            Logical_Switch_Ports based on current values in Port_Binding
>> table of
>>            southbound db
>>          - Document OVN upgrade procedure, including the use of the
>> upgrade helper
>>            script.
>>
>>     ovn-controller: Rework MAC_Binding table
>>          - Propose details of chassis-local mac bindings storage, the two
>> main options
>>            are:
>>            + In ovn-controller memory (simple, but cache reset on
>> ovn-controller restart).
>>            + In Open_vSwitch database (more work, as we need cache
>> invalidation logic added).
>>          - Change ovn-controller to use local store for learned mac
>> bindings.
>>          - Remove code for updating MAC_Binding table from ovn-controller.
>>
>
> Regarding Option 2:
>
> Most distributed systems that share a common management plane would try to
> share
> mac bindings via the common management plane, even if each node maintains
> it own cache.
>

What specific systems are you referring to here?


> Throwing that out entirely because of a fear of a compromised chassis
> seems out of
> proportion to the potential problem. There can be 1000s of chassis part of
> the same
> logical network having packet flows needing the same binding.
>

It's not a fear.  It's a legitimate security issue.


> Furthermore, the risk of a compromised chassis may be very low in many use
> cases.
> The "one known target environment" eluded to in the problem description
> should not "rule all"
> by default.
>

The group that raised this to me was OpenShift (a kubernetes based
platform).  It's a show stopper for them, as I would expect for other
container based systems.

The same issue applies to OpenStack, though it's not quite as pressing of
an issue as other OpenStack components have similar problems anyway.


> Perhaps allowing ovn-controller to write to a candidate mac binding table
> (with some limitations
> as well) and having northd (possibly as background work) detect a
> concensus of binding from > X controller
> client sessions and then populate the actual mac binding table might
> mitigate the exploit concern.
> Only northd would be able to write to the actual mac binding table.
>
> If there is no binding concensus yet on the binding, then the default is
> for the interested
> controller to issue the arp request and use the local controller cache.
> This includes the
> degenerate case where there is only one controller interested in that
> particular mac binding.
>

That sounds like a potential improvement for dynamic mac bindings, at
least.  We still have Chassis, Encap, and Port_Binding to deal with.  It
would also require more complex RBAC capabilities to be added to ovsdb,
which I was hoping to avoid.

-- 
Russell Bryant
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [RFC] ovn: minimize the impact of a compromised chassis

Reply via email to