Re: [PATCH v2 4/6] virtio: Initialize authorized attribute for confidential guest

2021-10-01 Thread Dan Williams
On Fri, Oct 1, 2021 at 12:02 PM Alan Stern  wrote:
>
> On Fri, Oct 01, 2021 at 11:09:52AM -0700, Dan Williams wrote:
> > On Fri, Oct 1, 2021 at 9:47 AM Alan Stern  wrote:
> > >
> > > On Fri, Oct 01, 2021 at 09:13:54AM -0700, Dan Williams wrote:
> > > > Bear with me, and perhaps it's a lack of imagination on my part, but I
> > > > don't see how to get to a globally generic "authorized" sysfs ABI
> > > > given that USB and Thunderbolt want to do bus specific actions on
> > > > authorization toggle events. Certainly a default generic authorized
> > > > attribute can be defined for all the other buses that don't have
> > > > legacy here, but Thunderbolt will still require support for '2' as an
> > > > authorized value, and USB will still want to base probe decisions on
> > > > the authorization state of both the usb_device and the usb_interface.
> > >
> > > The USB part isn't really accurate (I can't speak for Thunderbolt).
> > > When a usb_device is deauthorized, the device will be unconfigured,
> > > deleting all its interfaces and removing the need for any probe
> > > decisions about them.  In other words, the probe decision for a
> > > usb_device or usb_interface depends only on the device's/interface's
> > > own authorization state.
> > >
> > > True, the interface binding code does contain a test of the device's
> > > authorization setting.  That test is redundant and can be removed.
> > >
> > > The actions that USB wants to take on authorization toggle events for
> > > usb_devices are: for authorize, select and install a configuration;
> > > for deauthorize, unconfigure the device.  Each of these could be
> > > handled simply enough just by binding/unbinding the device.  (There
> > > is some special code for handling wireless USB devices, but wireless
> > > USB is now defunct.)
> >
> > Ah, so are you saying that it would be sufficient for USB if the
> > generic authorized implementation did something like:
> >
> > dev->authorized = 1;
> > device_attach(dev);
> >
> > ...for the authorize case, and:
> >
> > dev->authorize = 0;
> > device_release_driver(dev);
> >
> > ...for the deauthorize case?
>
> Yes, I think so.  But I haven't tried making this change to test and
> see what really happens.

Sounds like a useful path for this effort to explore. Especially as
Greg seems to want the proposed "has_probe_authorization" flag in the
bus_type to disappear and make this all generic. It just seems that
Thunderbolt would need deeper surgery to move what it does in the
authorization toggle path into the probe and remove paths.

Mika, do you see a path for Thunderbolt to align its authorization
paths behind bus ->probe() ->remove() events similar to what USB might
be able to support for a generic authorization path?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 4/6] virtio: Initialize authorized attribute for confidential guest

2021-10-01 Thread Alan Stern
On Fri, Oct 01, 2021 at 11:09:52AM -0700, Dan Williams wrote:
> On Fri, Oct 1, 2021 at 9:47 AM Alan Stern  wrote:
> >
> > On Fri, Oct 01, 2021 at 09:13:54AM -0700, Dan Williams wrote:
> > > Bear with me, and perhaps it's a lack of imagination on my part, but I
> > > don't see how to get to a globally generic "authorized" sysfs ABI
> > > given that USB and Thunderbolt want to do bus specific actions on
> > > authorization toggle events. Certainly a default generic authorized
> > > attribute can be defined for all the other buses that don't have
> > > legacy here, but Thunderbolt will still require support for '2' as an
> > > authorized value, and USB will still want to base probe decisions on
> > > the authorization state of both the usb_device and the usb_interface.
> >
> > The USB part isn't really accurate (I can't speak for Thunderbolt).
> > When a usb_device is deauthorized, the device will be unconfigured,
> > deleting all its interfaces and removing the need for any probe
> > decisions about them.  In other words, the probe decision for a
> > usb_device or usb_interface depends only on the device's/interface's
> > own authorization state.
> >
> > True, the interface binding code does contain a test of the device's
> > authorization setting.  That test is redundant and can be removed.
> >
> > The actions that USB wants to take on authorization toggle events for
> > usb_devices are: for authorize, select and install a configuration;
> > for deauthorize, unconfigure the device.  Each of these could be
> > handled simply enough just by binding/unbinding the device.  (There
> > is some special code for handling wireless USB devices, but wireless
> > USB is now defunct.)
> 
> Ah, so are you saying that it would be sufficient for USB if the
> generic authorized implementation did something like:
> 
> dev->authorized = 1;
> device_attach(dev);
> 
> ...for the authorize case, and:
> 
> dev->authorize = 0;
> device_release_driver(dev);
> 
> ...for the deauthorize case?

Yes, I think so.  But I haven't tried making this change to test and 
see what really happens.

Alan Stern
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 4/6] virtio: Initialize authorized attribute for confidential guest

2021-10-01 Thread Dan Williams
On Fri, Oct 1, 2021 at 9:47 AM Alan Stern  wrote:
>
> On Fri, Oct 01, 2021 at 09:13:54AM -0700, Dan Williams wrote:
> > Bear with me, and perhaps it's a lack of imagination on my part, but I
> > don't see how to get to a globally generic "authorized" sysfs ABI
> > given that USB and Thunderbolt want to do bus specific actions on
> > authorization toggle events. Certainly a default generic authorized
> > attribute can be defined for all the other buses that don't have
> > legacy here, but Thunderbolt will still require support for '2' as an
> > authorized value, and USB will still want to base probe decisions on
> > the authorization state of both the usb_device and the usb_interface.
>
> The USB part isn't really accurate (I can't speak for Thunderbolt).
> When a usb_device is deauthorized, the device will be unconfigured,
> deleting all its interfaces and removing the need for any probe
> decisions about them.  In other words, the probe decision for a
> usb_device or usb_interface depends only on the device's/interface's
> own authorization state.
>
> True, the interface binding code does contain a test of the device's
> authorization setting.  That test is redundant and can be removed.
>
> The actions that USB wants to take on authorization toggle events for
> usb_devices are: for authorize, select and install a configuration;
> for deauthorize, unconfigure the device.  Each of these could be
> handled simply enough just by binding/unbinding the device.  (There
> is some special code for handling wireless USB devices, but wireless
> USB is now defunct.)

Ah, so are you saying that it would be sufficient for USB if the
generic authorized implementation did something like:

dev->authorized = 1;
device_attach(dev);

...for the authorize case, and:

dev->authorize = 0;
device_release_driver(dev);

...for the deauthorize case?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 4/6] virtio: Initialize authorized attribute for confidential guest

2021-10-01 Thread Alan Stern
On Fri, Oct 01, 2021 at 09:13:54AM -0700, Dan Williams wrote:
> Bear with me, and perhaps it's a lack of imagination on my part, but I
> don't see how to get to a globally generic "authorized" sysfs ABI
> given that USB and Thunderbolt want to do bus specific actions on
> authorization toggle events. Certainly a default generic authorized
> attribute can be defined for all the other buses that don't have
> legacy here, but Thunderbolt will still require support for '2' as an
> authorized value, and USB will still want to base probe decisions on
> the authorization state of both the usb_device and the usb_interface.

The USB part isn't really accurate (I can't speak for Thunderbolt). 
When a usb_device is deauthorized, the device will be unconfigured, 
deleting all its interfaces and removing the need for any probe 
decisions about them.  In other words, the probe decision for a 
usb_device or usb_interface depends only on the device's/interface's 
own authorization state.

True, the interface binding code does contain a test of the device's 
authorization setting.  That test is redundant and can be removed.

The actions that USB wants to take on authorization toggle events for 
usb_devices are: for authorize, select and install a configuration; 
for deauthorize, unconfigure the device.  Each of these could be 
handled simply enough just by binding/unbinding the device.  (There 
is some special code for handling wireless USB devices, but wireless 
USB is now defunct.)

Alan Stern
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 4/6] virtio: Initialize authorized attribute for confidential guest

2021-10-01 Thread Dan Williams
On Fri, Oct 1, 2021 at 12:03 AM Greg Kroah-Hartman
 wrote:
>
> On Thu, Sep 30, 2021 at 12:04:05PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> >
> >
> > On 9/30/21 8:23 AM, Greg Kroah-Hartman wrote:
> > > On Thu, Sep 30, 2021 at 08:18:18AM -0700, Kuppuswamy, Sathyanarayanan 
> > > wrote:
> > > >
> > > >
> > > > On 9/30/21 6:36 AM, Dan Williams wrote:
> > > > > > And in particular, not all virtio drivers are hardened -
> > > > > > I think at this point blk and scsi drivers have been hardened - so
> > > > > > treating them all the same looks wrong.
> > > > > My understanding was that they have been audited, Sathya?
> > > >
> > > > Yes, AFAIK, it has been audited. Andi also submitted some patches
> > > > related to it. Andi, can you confirm.
> > >
> > > What is the official definition of "audited"?
> >
> >
> > In our case (Confidential Computing platform), the host is an un-trusted
> > entity. So any interaction with host from the drivers will have to be
> > protected against the possible attack from the host. For example, if we
> > are accessing a memory based on index value received from host, we have
> > to make sure it does not lead to out of bound access or when sharing the
> > memory with the host, we need to make sure only the required region is
> > shared with the host and the memory is un-shared after use properly.
>
> You have not defined the term "audited" here at all in any way that can
> be reviewed or verified by anyone from what I can tell.

Agree.

>
> You have only described a new model that you wish the kernel to run in,
> one in which it does not trust the hardware at all.  That is explicitly
> NOT what the kernel has been designed for so far, and if you wish to
> change that, lots of things need to be done outside of simply running
> some fuzzers on a few random drivers.
>
> For one example, how do you ensure that the memory you are reading from
> hasn't been modified by the host between writing to it the last time you
> did?  Do you have a list of specific drivers and kernel options that you
> feel you now "trust"?  If so, how long does that trust last for?  Until
> someonen else modifies that code?  What about modifications to functions
> that your "audited" code touches?  Who is doing this auditing?  How do
> you know the auditing has been done correctly?  Who has reviewed and
> audited the tools that are doing the auditing?  Where is the
> specification that has been agreed on how the auditing must be done?
> And so on...
>
> I feel like there are a lot of different things all being mixed up here
> into one "oh we want this to happen!" type of thread.  Please let's just
> stick to the one request that I had here, which was to move the way that
> busses are allowed to authorize the devices they wish to control into a
> generic way instead of being bus-specific logic.

I want to continue to shave this proposal down, but that last sentence
reads as self-contradictory.

Bear with me, and perhaps it's a lack of imagination on my part, but I
don't see how to get to a globally generic "authorized" sysfs ABI
given that USB and Thunderbolt want to do bus specific actions on
authorization toggle events. Certainly a default generic authorized
attribute can be defined for all the other buses that don't have
legacy here, but Thunderbolt will still require support for '2' as an
authorized value, and USB will still want to base probe decisions on
the authorization state of both the usb_device and the usb_interface.

> Any requests outside of that type of functionality are just that,
> outside the scope of this patchset and should get their own patch series
> and discussion.

A way to shave this further would be to default authorize just enough
devices to do the rest of the authorization from initramfs. That would
seem to cut out 'virtio-net' and 'virtio-storage' from default
authorized list.

The generic authorized attribute would only need to appear when
'dev_default_authorization' is set to false.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/6] driver core: Add common support to skip probe for un-authorized devices

2021-10-01 Thread Andi Kleen




Forget about trust for the moment.  Let's say the goal is to prevent
the kernel from creating any bindings other that those in some small
"allowed" set.  To fully specify one of the allowed bindings, you
would have to provide both a device ID and a driver name.  But in
practice this isn't necessary, since a device with a given ID will
bind to only one driver in almost all cases, and hence giving just
the device ID is enough.

So to do what they want, all that's needed is to forbid any bindings
except where the device ID is "allowed".  Or to put it another way,
where the device's authorized flag (which can be initialized based on
the device ID) is set.

(The opposite approach, in which the drivers are "allowed" rather
than the device IDs, apparently has already been discussed and
rejected.  I'm not convinced that was a good decision, but...)

Does this seem like a fair description of the situation?


Yes. That's roughly what the patchkit under discussion implements.


-Andi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/6] driver core: Add common support to skip probe for un-authorized devices

2021-10-01 Thread Alan Stern
On Fri, Oct 01, 2021 at 08:29:36AM +0200, Greg Kroah-Hartman wrote:
> On Thu, Sep 30, 2021 at 12:15:16PM -0700, Andi Kleen wrote:
> > 
> > On 9/30/2021 10:23 AM, Greg Kroah-Hartman wrote:
> > > On Thu, Sep 30, 2021 at 10:17:09AM -0700, Andi Kleen wrote:
> > > > The legacy drivers could be fixed, but nobody really wants to touch them
> > > > anymore and they're impossible to test.
> > > Pointers to them?
> > 
> > For example if you look over old SCSI drivers in drivers/scsi/*.c there is a
> > substantial number that has a module init longer than just registering a
> > driver. As a single example look at drivers/scsi/BusLogic.c
> 
> Great, send patches to fix them up instead of adding new infrastructure
> to the kernel.  It is better to remove code than add it.  You can rip
> the ISA code out of that driver and then you will not have the issue
> anymore.
> 
> Or again, just add that module to the deny list and never load it from
> userspace.
> 
> > There were also quite a few platform drivers like this.
> 
> Of course, platform drivers are horrible abusers of this.  Just like the
> recent one submitted by Intel that would bind to any machine it was
> loaded on and did not actually do any hardware detection assuming that
> it owned the platform:
>   
> https://lore.kernel.org/r/20210924213157.3584061-2-david.e@linux.intel.com
> 
> So yes, some drivers are horrible, it is our job to catch that and fix
> it.  If you don't want to load those drivers on your system, we have
> userspace solutions for that (you can have allow/deny lists there.)
> 
> > > > The drivers that probe something that is not enumerated in a standard 
> > > > way
> > > > have no choice, it cannot be implemented in a different way.
> > > PCI devices are not enumerated in a standard way???
> > 
> > The pci devices are enumerated in a standard way, but typically the driver
> > also needs something else outside PCI that needs some other probing
> > mechanism.
> 
> Like what?  What PCI drivers need outside connections to control the
> hardware?
> 
> > > > So instead we're using a "firewall" the prevents these drivers from 
> > > > doing
> > > > bad things by not allowing ioremap access unless opted in, and also do 
> > > > some
> > > > filtering on the IO ports The device filter is still the primary 
> > > > mechanism,
> > > > the ioremap filtering is just belts and suspenders for those odd cases.
> > > That's horrible, don't try to protect the kernel from itself.  Just fix
> > > the drivers.
> > 
> > I thought we had already established this last time we discussed it.
> > 
> > That's completely impractical. We cannot harden thousands of drivers,
> > especially since it would be all wasted work since nobody will ever need
> > them in virtual guests. Even if we could harden them how would such a work
> > be maintained long term? Using a firewall and filtering mechanism is much
> > saner for everyone.
> 
> I agree, you can not "harden" anything here.  That is why I asked you to
> use the existing process that explicitly moves the model to userspace
> where a user can say "do I want this device to be controlled by the
> kernel or not" which then allows you to pick and choose what drivers you
> want to have in your system.
> 
> You need to trust devices, and not worry about trusting drivers as you
> yourself admit :)

That isn't the way they see it.  In their approach you can never 
trust any devices at all.  Therefore devices can only be allowed to 
bind to a very small set of "hardened" drivers.  Never mind how they 
decide whether or not a driver is sufficiently "hardened".

> The kernel's trust model is that once we bind to them, we trust almost
> all device types almost explicitly.  If you wish to change that model,
> that's great, but it is a much larger discussion than this tiny patchset
> would require.

Forget about trust for the moment.  Let's say the goal is to prevent 
the kernel from creating any bindings other that those in some small 
"allowed" set.  To fully specify one of the allowed bindings, you 
would have to provide both a device ID and a driver name.  But in 
practice this isn't necessary, since a device with a given ID will 
bind to only one driver in almost all cases, and hence giving just 
the device ID is enough.

So to do what they want, all that's needed is to forbid any bindings 
except where the device ID is "allowed".  Or to put it another way, 
where the device's authorized flag (which can be initialized based on 
the device ID) is set.

(The opposite approach, in which the drivers are "allowed" rather 
than the device IDs, apparently has already been discussed and 
rejected.  I'm not convinced that was a good decision, but...)

Does this seem like a fair description of the situation?

Alan Stern
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 4/6] virtio: Initialize authorized attribute for confidential guest

2021-10-01 Thread Andi Kleen



On 10/1/2021 12:03 AM, Greg Kroah-Hartman wrote:

On Thu, Sep 30, 2021 at 12:04:05PM -0700, Kuppuswamy, Sathyanarayanan wrote:


On 9/30/21 8:23 AM, Greg Kroah-Hartman wrote:

On Thu, Sep 30, 2021 at 08:18:18AM -0700, Kuppuswamy, Sathyanarayanan wrote:


On 9/30/21 6:36 AM, Dan Williams wrote:

And in particular, not all virtio drivers are hardened -
I think at this point blk and scsi drivers have been hardened - so
treating them all the same looks wrong.

My understanding was that they have been audited, Sathya?

Yes, AFAIK, it has been audited. Andi also submitted some patches
related to it. Andi, can you confirm.

What is the official definition of "audited"?


In our case (Confidential Computing platform), the host is an un-trusted
entity. So any interaction with host from the drivers will have to be
protected against the possible attack from the host. For example, if we
are accessing a memory based on index value received from host, we have
to make sure it does not lead to out of bound access or when sharing the
memory with the host, we need to make sure only the required region is
shared with the host and the memory is un-shared after use properly.

You have not defined the term "audited" here at all in any way that can
be reviewed or verified by anyone from what I can tell.

You have only described a new model that you wish the kernel to run in,
one in which it does not trust the hardware at all.  That is explicitly
NOT what the kernel has been designed for so far,


It has been already done for a few USB/TB drivers, but yes not for the 
majority of the kernel.



  and if you wish to
change that, lots of things need to be done outside of simply running
some fuzzers on a few random drivers.


The goal is to do similar work as USB/TB did, but do it for a small set 
of virtio drivers and use a custom allow list for those for the specific 
secure guest cases.


(there are some other goals, but let's not discuss them here for now)




For one example, how do you ensure that the memory you are reading from
hasn't been modified by the host between writing to it the last time you
did?


It's similar techniques as we do on user space accesses. For example if 
you bound check some value the code needs to ensure it is cached in 
private memory, not reread from MMIO or shared memory. Of course that's 
a good idea anyways for performance because MMIO is slow.


In the concrete cases of virtio the main problem was the free list in 
shared memory, but that has been addressed now.





  Do you have a list of specific drivers and kernel options that you
feel you now "trust"?


For TDX it's currently only virtio net/block/console

But we expect this list to grow slightly over time, but not at a high 
rate (so hopefully <10)




If so, how long does that trust last for?  Until
someonen else modifies that code?  What about modifications to functions
that your "audited" code touches?  Who is doing this auditing?  How do
you know the auditing has been done correctly?  Who has reviewed and
audited the tools that are doing the auditing?  Where is the
specification that has been agreed on how the auditing must be done?
And so on...


Well, I mean we already have a similar situation with user space APIs. 
So it's not a new problem. For those we've done it for many years, with 
audits and extra fuzzing.


There are people working on the audit and fuzzing today. How exactly it 
will be ensured long term is still be worked out, but I expect we can 
work out something.




I feel like there are a lot of different things all being mixed up here
into one "oh we want this to happen!" type of thread.




Agreed. The thread ended up about a lot of stuff which is outside the 
scope of the patches.



   Please let's just
stick to the one request that I had here, which was to move the way that
busses are allowed to authorize the devices they wish to control into a
generic way instead of being bus-specific logic.

Any requests outside of that type of functionality are just that,
outside the scope of this patchset and should get their own patch series
and discussion.



Yes that's the intention. This patch kit is only about controlling what 
devices can enumerate.


Also please let's avoid the "trusted" term. It's really misleading and 
confusing in the context of confidential computing.



-Andi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 1/1] virtio: write back features before verify

2021-10-01 Thread Cornelia Huck
On Fri, Oct 01 2021, Halil Pasic  wrote:

> On Thu, 30 Sep 2021 13:31:04 +0200
> Cornelia Huck  wrote:
>
>> On Thu, Sep 30 2021, Halil Pasic  wrote:
>> 
>> > On Thu, 30 Sep 2021 11:28:23 +0200
>> > Cornelia Huck  wrote:
>> >  
>> >> On Thu, Sep 30 2021, Halil Pasic  wrote:
>> >> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
>> >> > if (device_features & (1ULL << i))
>> >> > __virtio_set_bit(dev, i);
>> >> >  
>> >> > +   /* Write back features before validate to know endianness */
>> >> > +   if (device_features & (1ULL << VIRTIO_F_VERSION_1))
>> >> > +   dev->config->finalize_features(dev);
>> >> 
>> >> This really looks like a mess :(
>> >> 
>> >> We end up calling ->finalize_features twice: once before ->validate, and
>> >> once after, that time with the complete song and dance. The first time,
>> >> we operate on one feature set; after validation, we operate on another,
>> >> and there might be interdependencies between the two (like a that a bit
>> >> is cleared because of another bit, which would not happen if validate
>> >> had a chance to clear that bit before).  
>> >
>> > Basically the second set is a subset of the first set.  
>> 
>> I don't think that's clear.
>
> Validate can only remove features, or? So I guess after validate
> is a subset of before validate.

I was thinking about (more-or-less hypothetical) interdependencies (see
above). But that's not terribly important.

>
>
>> 
>> >  
>> >> 
>> >> I'm not sure whether that is even a problem in the spec: while the
>> >> driver may read the config before finally accepting features  
>> >
>> > I'm not sure I'm following you. Let me please qoute the specification:
>> > """
>> > 4. Read device feature bits, and write the subset of feature bits
>> > understood by the OS and driver to the device. During this step the driver 
>> > MAY read (but MUST NOT write) the device-specific configuration fields to 
>> > check that it can support the device before accepting it. 
>> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature 
>> > bits after this step. 
>> > """
>> > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-930001
>> >   
>> 
>> Yes, exactly, it MAY read before accepting features. How does the device
>> know whether the config space is little-endian or not?
>> 
>
> Well that is what we are talking about. One can try to infer things from
> the spec. This reset dance I called ugly is probably the cleanest,
> because the spec says that re-nego should work.
>
>> >  
>> >> , it does
>> >> not really make sense to do so before a feature bit as basic as
>> >> VERSION_1 which determines the endianness has been negotiated.   
>> >
>> > Are you suggesting that ->verify() should be after
>> > virtio_finalize_features()?  
>> 
>> No, that would defeat the entire purpose of verify. After
>> virtio_finalize_features(), we are done with feature negotiation.
>>
>
> Exactly!

It seems we are in violent agreement :)

>  
>> > Wouldn't
>> > that mean that verify() can't reject feature bits? But that is the whole
>> > point of commit 82e89ea077b9 ("virtio-blk: Add validation for block size
>> > in config space"). Do you think that the commit in question is
>> > conceptually flawed? My understanding of the verify is, that it is supposed
>> > to fence features and feature bits we can't support, e.g. because of
>> > config space things, but I may be wrong.  
>> 
>> No, that commit is not really flawed on its own, I think the whole
>> procedure may be problematic.
>> 
>
> I agree! But that regression really hurts us. Maybe the best band-aid is
> to conditional-compile it (not compile the check if s390).

It's probably most likely to hit on s390 (big-endian, and devices with a
blocksize != 512 in common use); but I'd like to make that band-aid more
generic than "exclude for s390". A hack for honouring VERSION_1 before
negotiation has finished is probably better as a stop-gap before we
manage to figure out how to deal with this properly.

>
>> >
>> > The trouble is, feature bits are not negotiated one by one, but basically 
>> > all
>> > at once. I suppose, I did the next best thing to first negotiating
>> > VERSION_1.  
>> 
>> We probably need to special-case VERSION_1 to move at least forward;
>> i.e. proceed as if we accepted it when reading the config space.
>> 
>> The problem is that we do not know what the device assumes when we read
>> the config space prior to setting FEATURES_OK. It may assume
>> little-endian if it offered VERSION_1, or it may not. The spec does not
>> really say what happens before feature negotiation has finished.
>> 
> No it does not, but I hope, the implementations we care the most about do
> little endian if VERSION_1 is set but FEATURES_OK is not yet done. A
> transitional device would have to act upon a feature that is set,
> because for legacy there is no FEATURES_OK. Where we can run into
> 

Re: [RFC PATCH 1/1] virtio: write back features before verify

2021-10-01 Thread Christian Borntraeger

Am 30.09.21 um 03:20 schrieb Halil Pasic:

This patch fixes a regression introduced by commit 82e89ea077b9
("virtio-blk: Add validation for block size in config space") and
enables similar checks in verify() on big endian platforms.

The problem with checking multi-byte config fields in the verify
callback, on big endian platforms, and with a possibly transitional
device is the following. The verify() callback is called between
config->get_features() and virtio_finalize_features(). That we have a
device that offered F_VERSION_1 then we have the following options
either the device is transitional, and then it has to present the legacy
interface, i.e. a big endian config space until F_VERSION_1 is
negotiated, or we have a non-transitional device, which makes
F_VERSION_1 mandatory, and only implements the non-legacy interface and
thus presents a little endian config space. Because at this point we
can't know if the device is transitional or non-transitional, we can't
know do we need to byte swap or not.

The virtio spec explicitly states that the driver MAY read config
between reading and writing the features so saying that first accessing
the config before feature negotiation is done is not an option. The
specification ain't clear about setting the features multiple times
before FEATURES_OK, so I guess that should be fine.

I don't consider this patch super clean, but frankly I don't think we
have a ton of options. Another option that may or man not be cleaner,
but is also IMHO much uglier is to figure out whether the device is
transitional by rejecting _F_VERSION_1, then resetting it and proceeding
according tho what we have figured out, hoping that the characteristics
of the device didn't change.

Signed-off-by: Halil Pasic 
Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config 
space")
Reported-by: mark...@us.ibm.com


Just to make this more obvious. Since 5.14 DASD devices as backing for 
virtio-blk no
longer work as the block size is no longer reported to the guest. So we need a 
fix
for the issue.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 1/1] virtio: write back features before verify

2021-10-01 Thread Halil Pasic
On Thu, 30 Sep 2021 13:31:04 +0200
Cornelia Huck  wrote:

> On Thu, Sep 30 2021, Halil Pasic  wrote:
> 
> > On Thu, 30 Sep 2021 11:28:23 +0200
> > Cornelia Huck  wrote:
> >  
> >> On Thu, Sep 30 2021, Halil Pasic  wrote:
> >>   
> >> > This patch fixes a regression introduced by commit 82e89ea077b9
> >> > ("virtio-blk: Add validation for block size in config space") and
> >> > enables similar checks in verify() on big endian platforms.
> >> >
> >> > The problem with checking multi-byte config fields in the verify
> >> > callback, on big endian platforms, and with a possibly transitional
> >> > device is the following. The verify() callback is called between
> >> > config->get_features() and virtio_finalize_features(). That we have a
> >> > device that offered F_VERSION_1 then we have the following options
> >> > either the device is transitional, and then it has to present the legacy
> >> > interface, i.e. a big endian config space until F_VERSION_1 is
> >> > negotiated, or we have a non-transitional device, which makes
> >> > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> >> > thus presents a little endian config space. Because at this point we
> >> > can't know if the device is transitional or non-transitional, we can't
> >> > know do we need to byte swap or not.
> >> >
> >> > The virtio spec explicitly states that the driver MAY read config
> >> > between reading and writing the features so saying that first accessing
> >> > the config before feature negotiation is done is not an option. The
> >> > specification ain't clear about setting the features multiple times
> >> > before FEATURES_OK, so I guess that should be fine.
> >> >
> >> > I don't consider this patch super clean, but frankly I don't think we
> >> > have a ton of options. Another option that may or man not be cleaner,
> >> > but is also IMHO much uglier is to figure out whether the device is
> >> > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> >> > according tho what we have figured out, hoping that the characteristics
> >> > of the device didn't change.
> >> >
> >> > Signed-off-by: Halil Pasic 
> >> > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in 
> >> > config space")
> >> > Reported-by: mark...@us.ibm.com
> >> > ---
> >> >  drivers/virtio/virtio.c | 4 
> >> >  1 file changed, 4 insertions(+)
> >> >
> >> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> >> > index 0a5b54034d4b..9dc3cfa17b1c 100644
> >> > --- a/drivers/virtio/virtio.c
> >> > +++ b/drivers/virtio/virtio.c
> >> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
> >> >  if (device_features & (1ULL << i))
> >> >  __virtio_set_bit(dev, i);
> >> >  
> >> > +/* Write back features before validate to know endianness */
> >> > +if (device_features & (1ULL << VIRTIO_F_VERSION_1))
> >> > +dev->config->finalize_features(dev);
> >> 
> >> This really looks like a mess :(
> >> 
> >> We end up calling ->finalize_features twice: once before ->validate, and
> >> once after, that time with the complete song and dance. The first time,
> >> we operate on one feature set; after validation, we operate on another,
> >> and there might be interdependencies between the two (like a that a bit
> >> is cleared because of another bit, which would not happen if validate
> >> had a chance to clear that bit before).  
> >
> > Basically the second set is a subset of the first set.  
> 
> I don't think that's clear.

Validate can only remove features, or? So I guess after validate
is a subset of before validate.


> 
> >  
> >> 
> >> I'm not sure whether that is even a problem in the spec: while the
> >> driver may read the config before finally accepting features  
> >
> > I'm not sure I'm following you. Let me please qoute the specification:
> > """
> > 4. Read device feature bits, and write the subset of feature bits
> > understood by the OS and driver to the device. During this step the driver 
> > MAY read (but MUST NOT write) the device-specific configuration fields to 
> > check that it can support the device before accepting it. 
> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature 
> > bits after this step. 
> > """
> > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-930001
> >   
> 
> Yes, exactly, it MAY read before accepting features. How does the device
> know whether the config space is little-endian or not?
> 

Well that is what we are talking about. One can try to infer things from
the spec. This reset dance I called ugly is probably the cleanest,
because the spec says that re-nego should work.

> >  
> >> , it does
> >> not really make sense to do so before a feature bit as basic as
> >> VERSION_1 which determines the endianness has been negotiated.   
> >
> > Are you suggesting that ->verify() should be after
> > virtio_finalize_features()?  
> 

Re: [RFC PATCH 1/1] virtio: write back features before verify

2021-10-01 Thread Halil Pasic
On Thu, 30 Sep 2021 07:12:21 -0400
"Michael S. Tsirkin"  wrote:

> On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote:
> > This patch fixes a regression introduced by commit 82e89ea077b9
> > ("virtio-blk: Add validation for block size in config space") and
> > enables similar checks in verify() on big endian platforms.
> > 
> > The problem with checking multi-byte config fields in the verify
> > callback, on big endian platforms, and with a possibly transitional
> > device is the following. The verify() callback is called between
> > config->get_features() and virtio_finalize_features(). That we have a
> > device that offered F_VERSION_1 then we have the following options
> > either the device is transitional, and then it has to present the legacy
> > interface, i.e. a big endian config space until F_VERSION_1 is
> > negotiated, or we have a non-transitional device, which makes
> > F_VERSION_1 mandatory, and only implements the non-legacy interface and
> > thus presents a little endian config space. Because at this point we
> > can't know if the device is transitional or non-transitional, we can't
> > know do we need to byte swap or not.  
> 
> Hmm which transport does this refer to?

It is the same with virtio-ccw and virtio-pci. I see the same problem
with both on s390x. I didn't try with virtio-blk-pci-non-transitional
yet (have to figure out how to do that with libvirt) for pci I used
virtio-blk-pci.

> Distinguishing between legacy and modern drivers is transport
> specific.  PCI presents
> legacy and modern at separate addresses so distinguishing
> between these two should be no trouble.

You mean the device id? Yes that is bolted down in the spec, but
currently we don't exploit that information. Furthermore there
is a fat chance that with QEMU even the allegedly non-transitional
devices only present a little endian config space after VERSION_1
was negotiated. Namely get_config for virtio-blk is implemented in
virtio_blk_update_config() which does virtio_stl_p(vdev,
_size, blk_size) and in there we don't care
about transitional or not:

static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
{
#if defined(LEGACY_VIRTIO_IS_BIENDIAN)
return virtio_is_big_endian(vdev);
#elif defined(TARGET_WORDS_BIGENDIAN)
if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
/* Devices conforming to VIRTIO 1.0 or later are always LE. */
return false;
}
return true;
#else
return false;
#endif
}


> Channel i/o has versioning so same thing?
>

Don't think so. Both a transitional and a non-transitional device
would have to accept revisions higher than 0 if the driver tried to
negotiate those (and we do in our case).
 
> > The virtio spec explicitly states that the driver MAY read config
> > between reading and writing the features so saying that first accessing
> > the config before feature negotiation is done is not an option. The
> > specification ain't clear about setting the features multiple times
> > before FEATURES_OK, so I guess that should be fine.
> > 
> > I don't consider this patch super clean, but frankly I don't think we
> > have a ton of options. Another option that may or man not be cleaner,
> > but is also IMHO much uglier is to figure out whether the device is
> > transitional by rejecting _F_VERSION_1, then resetting it and proceeding
> > according tho what we have figured out, hoping that the characteristics
> > of the device didn't change.  
> 
> I am confused here. So is the problem at the device or at the driver level?

We have a driver regression. Since the 82e89ea077b9 ("virtio-blk: Add
validation for block size in config space") virtio-blk is broken on
s390.

The deeper problem is in the spec. We stated that the driver may read
config space before the feature negotiation is finalized, but we didn't
think enough about what happens when native endiannes is not little
endian in the different cases.

I believe, for non-transitional devices we have a problem in the host as
well (i.e. in QEMU).

> I suspect it's actually the host that has the issue, not
> the guest?

I tend to say we have a problem both in the host and in the guest. I'm
more concerned about the problem in the guest, because that is a really
nasty regression. For the host. I think for legacy we don't have a
problem, because both sides would operate on the assumption no
_F_VERSION_1, IMHO the implementation for the transitional devices is
correct. For non-transitional flavor, it depends on the device. For
example virtio-net and virtio-blk is broken, because we use primitives
like virtio_stl_p() and those don't do the right thing before feature
negotiation is completed. On the other hand virtio-crypto.c as a truly
non-transitional device uses stl_le_p() and IMHO does the right thing.

Thanks for your comments! I hope I managed to answer your questions. I
need some guidance on how do we want to move forward on this.

Regards,
Halil

> 
> 
> > Signed-off-by: Halil Pasic 
> > Fixes: 

Re: [PATCH v2 4/6] virtio: Initialize authorized attribute for confidential guest

2021-10-01 Thread Greg Kroah-Hartman
On Thu, Sep 30, 2021 at 12:04:05PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> 
> 
> On 9/30/21 8:23 AM, Greg Kroah-Hartman wrote:
> > On Thu, Sep 30, 2021 at 08:18:18AM -0700, Kuppuswamy, Sathyanarayanan wrote:
> > > 
> > > 
> > > On 9/30/21 6:36 AM, Dan Williams wrote:
> > > > > And in particular, not all virtio drivers are hardened -
> > > > > I think at this point blk and scsi drivers have been hardened - so
> > > > > treating them all the same looks wrong.
> > > > My understanding was that they have been audited, Sathya?
> > > 
> > > Yes, AFAIK, it has been audited. Andi also submitted some patches
> > > related to it. Andi, can you confirm.
> > 
> > What is the official definition of "audited"?
> 
> 
> In our case (Confidential Computing platform), the host is an un-trusted
> entity. So any interaction with host from the drivers will have to be
> protected against the possible attack from the host. For example, if we
> are accessing a memory based on index value received from host, we have
> to make sure it does not lead to out of bound access or when sharing the
> memory with the host, we need to make sure only the required region is
> shared with the host and the memory is un-shared after use properly.

You have not defined the term "audited" here at all in any way that can
be reviewed or verified by anyone from what I can tell.

You have only described a new model that you wish the kernel to run in,
one in which it does not trust the hardware at all.  That is explicitly
NOT what the kernel has been designed for so far, and if you wish to
change that, lots of things need to be done outside of simply running
some fuzzers on a few random drivers.

For one example, how do you ensure that the memory you are reading from
hasn't been modified by the host between writing to it the last time you
did?  Do you have a list of specific drivers and kernel options that you
feel you now "trust"?  If so, how long does that trust last for?  Until
someonen else modifies that code?  What about modifications to functions
that your "audited" code touches?  Who is doing this auditing?  How do
you know the auditing has been done correctly?  Who has reviewed and
audited the tools that are doing the auditing?  Where is the
specification that has been agreed on how the auditing must be done?
And so on...

I feel like there are a lot of different things all being mixed up here
into one "oh we want this to happen!" type of thread.  Please let's just
stick to the one request that I had here, which was to move the way that
busses are allowed to authorize the devices they wish to control into a
generic way instead of being bus-specific logic.

Any requests outside of that type of functionality are just that,
outside the scope of this patchset and should get their own patch series
and discussion.

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/6] driver core: Add common support to skip probe for un-authorized devices

2021-10-01 Thread Greg Kroah-Hartman
On Thu, Sep 30, 2021 at 12:15:16PM -0700, Andi Kleen wrote:
> 
> On 9/30/2021 10:23 AM, Greg Kroah-Hartman wrote:
> > On Thu, Sep 30, 2021 at 10:17:09AM -0700, Andi Kleen wrote:
> > > The legacy drivers could be fixed, but nobody really wants to touch them
> > > anymore and they're impossible to test.
> > Pointers to them?
> 
> For example if you look over old SCSI drivers in drivers/scsi/*.c there is a
> substantial number that has a module init longer than just registering a
> driver. As a single example look at drivers/scsi/BusLogic.c

Great, send patches to fix them up instead of adding new infrastructure
to the kernel.  It is better to remove code than add it.  You can rip
the ISA code out of that driver and then you will not have the issue
anymore.

Or again, just add that module to the deny list and never load it from
userspace.

> There were also quite a few platform drivers like this.

Of course, platform drivers are horrible abusers of this.  Just like the
recent one submitted by Intel that would bind to any machine it was
loaded on and did not actually do any hardware detection assuming that
it owned the platform:

https://lore.kernel.org/r/20210924213157.3584061-2-david.e@linux.intel.com

So yes, some drivers are horrible, it is our job to catch that and fix
it.  If you don't want to load those drivers on your system, we have
userspace solutions for that (you can have allow/deny lists there.)

> > > The drivers that probe something that is not enumerated in a standard way
> > > have no choice, it cannot be implemented in a different way.
> > PCI devices are not enumerated in a standard way???
> 
> The pci devices are enumerated in a standard way, but typically the driver
> also needs something else outside PCI that needs some other probing
> mechanism.

Like what?  What PCI drivers need outside connections to control the
hardware?

> > > So instead we're using a "firewall" the prevents these drivers from doing
> > > bad things by not allowing ioremap access unless opted in, and also do 
> > > some
> > > filtering on the IO ports The device filter is still the primary 
> > > mechanism,
> > > the ioremap filtering is just belts and suspenders for those odd cases.
> > That's horrible, don't try to protect the kernel from itself.  Just fix
> > the drivers.
> 
> I thought we had already established this last time we discussed it.
> 
> That's completely impractical. We cannot harden thousands of drivers,
> especially since it would be all wasted work since nobody will ever need
> them in virtual guests. Even if we could harden them how would such a work
> be maintained long term? Using a firewall and filtering mechanism is much
> saner for everyone.

I agree, you can not "harden" anything here.  That is why I asked you to
use the existing process that explicitly moves the model to userspace
where a user can say "do I want this device to be controlled by the
kernel or not" which then allows you to pick and choose what drivers you
want to have in your system.

You need to trust devices, and not worry about trusting drivers as you
yourself admit :)

The kernel's trust model is that once we bind to them, we trust almost
all device types almost explicitly.  If you wish to change that model,
that's great, but it is a much larger discussion than this tiny patchset
would require.

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization