Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

Matt Riedemann Wed, 23 Sep 2015 09:23:58 -0700


On 9/23/2015 10:00 AM, Sylvain Bauza wrote:



Le 23/09/2015 15:31, Matt Riedemann a écrit :



On 6/25/2015 3:59 AM, Sylvain Bauza wrote:



Le 24/06/2015 19:56, Joe Gordon a écrit :



On Tue, Jun 23, 2015 at 3:41 AM, Sylvain Bauza <[email protected]
<mailto:[email protected]>> wrote:

    Hi team,

    Some discussion occurred over IRC about a bug which was publicly
    open related to TrustedFilter [1]
    I want to take the opportunity for raising my concerns about that
    specific filter, why I dislike it and how I think we could improve
    the situation - and clarify everyone's thoughts)

    The current situation is that way : Nova only checks if one host
    is compromised only when the scheduler is called, ie. only when
    booting/migrating/evacuating/unshelving an instance (well, not
    exactly all the evacuate/live-migrate cases, but let's not discuss
    about that now). When the request goes in the scheduler, all the
    hosts are checked against all the enabled filters and the
    TrustedFilter is making an external HTTP(S) call to the
    Attestation API service (not handled by Nova) for *each host* to
    see if the host is valid (not compromised) or not.

    To be clear, that's the only in-tree scheduler filter which
    explicitly does an external call to a separate service that Nova
    is not managing. I can see at least 3 reasons for thinking about
    why it's bad :

    #1 : that's a terrible bottleneck for performance, because we're
    IO-blocking N times given N hosts (we're even not multiplexing the
    HTTP requests)
    #2 : all the filters are checking an internal Nova state for the
    host (called HostState) but that the TrustedFilter, which means
    that conceptually we defer the decision to a 3rd-party engine
    #3 : that Attestation API services becomes a de facto dependency
    for Nova (since it's an in-tree filter) while it's not listed as a
    dependency and thus not gated.


    All of these reasons could be acceptable if that would cover the
    exposed usecase given in [1] (ie. I want to make sure that if my
    host gets compromised, my instances will not be running on that
    host) but that just doesn't work, due to the situation I mentioned
    above.

    So, given that, here are my thoughts :
    a/ if a host gets compromised, we can just disable its service to
    prevent its election as a valid destination host. There is no need
    for a specialised filter.
    b/ if a host is compromised, we can assume that the instances have
    to resurrect elsewhere, ie. we can call a nova evacuate
    c/ checking if an host is compromised or not is not a Nova
    responsibility since it's already perfectly done by [2]

    In other words, I'm considering that "security" usecase as
    something analog as the HA usecase [3] where we need a 3rd-party
    tool responsible for periodically checking the state of the hosts,
    and if compromised then call the Nova API for fencing the host and
    evacuating the compromised instances.

    Given that, I'm proposing to deprecate TrustedFilter and explictly
    mention to drop it from in-tree in a later cycle
    https://review.openstack.org/194592


Given people are using this, it is a negligible maintenance burden.  I
think deprecating with the intention of removing is not worth it.

Although it would be very useful to further document the risks with
this filter (live migration, possible performance issues etc.)


Well, I can understand that customers could not be agreeing to remove
the filter because there is no clear alternative for them. That said, I
think saying that the filter is deprecated without saying when it would
be removed would help some contributors thinking about that and working
on a better solution, exactly like we did for EC2 API.

To be clear, I want to freeze the filter by deprecating it and
explaining why it's wrong (by amending the devref section and giving a
LOG warning saying it's deprecated) and then leave the filter within
in-tree unless we are sure that there is a good solution out of Nova.

-Sylvain



    Thoughts ?
    -Sylvain



    [1] https://bugs.launchpad.net/nova/+bug/1456228
    [2] https://github.com/OpenAttestation/OpenAttestation
    [3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/



__________________________________________________________________________

    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
[email protected]?subject:unsubscribe
<http://[email protected]?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:[email protected]?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
[email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


I just reviewed the change https://review.openstack.org/#/c/194592/
and agree with Joe.

We can't justify deprecation and removal due to lack of CI testing -
there are many scheduler filters which aren't tested in the gate.  Or
if we can justify it that way, then we're setting a precedent.  So if
testing is the sore spot, then maybe we want Intel to look at setting
up 3rd party CI?  Maybe they could work it into their existing PCI CI?


Well, there is a difference between that filter and others since we
could just provide some functional testing against the other filters
just by adding Tempest tests while it would require far more than that
for the TrustedFilter (ie. either pulling OAT as a dependency for Nova,
or considering a 3rd-party CI).

Tempest tests the API, so adding tests to Tempest would be tricky,unless the test added is a scenario that would only be expected tobehave a certain why if a given filter is available in the configuration.


The scheduler_default_filters is really all we have in the gate runs today.


For sure, I'd love to see some efforts for providing an integration with
OAT if that filter stays in-tree.

I also don't think we can justify the external dependency as grounds
for removal.  There are many possible configurations that require
external dependencies.  90% of cinder/neutron configurations probably
fall into this camp.

Fair enough, I just want to stress the point that some work has to be
done before considering that this filter is having the same level of
confidence than the others.

From other parts of this thread it also sounds like there are
potentially alternatives to this filter but they aren't implemented,
or even written up in a spec.  Given there are users of this, I'd
think we'd want to see an agreed to alternative proposal to replace
this filter.


I totally support that. Like I said in my original email, this is not
only a dependency problem, but rather a design problem. If we want to
cover the given usecases, it requires more than just a filter, and IMHO
all of this needs to be done outside Nova.

I'm all for logging a warning that this filter is experimental
(meaning it's not tested in our CI system).  I don't think there is a
good reason to deprecate it right now though with an open-ended
removal date.


That's a very valid point, I'm fine with that. Thanks for the idea.

-Sylvain

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--

Thanks,

Matt Riedemann


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

Reply via email to