On 05/08/2018 10:44 AM, Stephen  Bates wrote:
Hi Dan

    It seems unwieldy that this is a compile time option and not a runtime
    option. Can't we have a kernel command line option to opt-in to this
    behavior rather than require a wholly separate kernel image?
I think because of the security implications associated with p2pdma and ACS we wanted to make it very clear people were choosing one (p2pdma) or the other (IOMMU groupings and isolation). However personally I would prefer including the option of a run-time kernel parameter too. In fact a few months ago I proposed a small patch that did just that [1]. It never really went anywhere but if people were open to the idea we could look at adding it to the series.

It is clear if it is a kernel command-line option or a CONFIG option.
One does not have access to the kernel command-line w/o a few privs.
A CONFIG option prevents a distribution to have a default, locked-down kernel 
_and_ the ability to be 'unlocked' if the customer/site is 'secure' via other 
means.
A run/boot-time option is more flexible and achieves the best of both.
Why is this text added in a follow on patch and not the patch that
  introduced the config option?

Because the ACS section was added later in the series and this information is 
associated with that additional functionality.
I'm also wondering if that command line option can take a 'bus device
function' address of a switch to limit the scope of where ACS is
disabled.

Well, p2p DMA is a function of a cooperating 'agent' somewhere above the two 
devices.
That agent should 'request' to the kernel that ACS be removed/circumvented (p2p 
enabled) btwn two endpoints.
I recommend doing so via a sysfs method.

That way, the system can limit the 'unsecure' space btwn two devices, likely 
configured on a separate switch, from the rest of the still-secured/ACS-enabled 
PCIe tree.
PCIe is pt-to-pt, effectively; maybe one would have multiple nics/fabrics p2p to/from NVME, 
but one could look at it as a list of pairs (nic1<->nvme1; nic2<->nvme2; ....).
A pair-listing would be optimal, allowing the kernel to figure out the ACS 
path, and not making it endpoint-switch-switch...-switch-endpt error-entry 
prone.
Additionally, systems that can/prefer to do so via a RP's IOMMU, albeit not 
optimal, but better then all the way to/from memory, and a security/iova-check 
possible,
can modify the pt-to-pt ACS algorithm to accomodate over time (e.g., cap bits 
be they hw or device-driver/extension/quirk defined for each bridge/RP in a PCI 
domain).

Kernels that never want to support P2P could build w/o it enabled.... cmdline 
option is moot.
Kernels built with it on, *still* need cmdline option, to be blunt that the 
kernel is enabling a feature that could render the entire (IO sub)system 
unsecure.

By this you mean the address for either a RP, DSP, USP or MF EP below which we 
disable ACS? We could do that but I don't think it avoids the issue of changes 
in IOMMU groupings as devices are added/removed. It simply changes the problem 
from affecting and entire PCI domain to a sub-set of the domain. We can already 
handle this by doing p2pdma on one RP and normal IOMMU isolation on the other 
RPs in the system.

as devices are added, they start in ACS-enabled, secured mode.
As sysfs entry modifies p2p ability, IOMMU group is modified as well.


btw -- IOMMU grouping is a host/HV control issue, not a VM control/knowledge 
issue.
       So I don't understand the comments why VMs should need to know.
       -- configure p2p _before_ assigning devices to VMs. ... iommu groups are 
checked at assignment time.
          -- so even if hot-add, separate iommu group, then enable p2p, becomes 
same IOMMU group, then can only assign to same VM.
       -- VMs don't know IOMMU's & ACS are involved now, and won't later, even 
if device's dynamically added/removed

Is there a thread I need to read up to explain /clear-up the thoughts above?

Stephen

[1] https://marc.info/?l=linux-doc&m=150907188310838&w=2

Reply via email to