On Fri, 24 Jun 2016 10:52:58 +0800
Yongji Xie <xyj...@linux.vnet.ibm.com> wrote:
> On 2016/6/24 0:12, Alex Williamson wrote:
> > On Mon, 30 May 2016 21:06:37 +0800
> > Yongji Xie <xyj...@linux.vnet.ibm.com> wrote:
> >> +static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
> >> +{
> >> +  struct resource *res;
> >> +  int bar;
> >> +  struct vfio_pci_dummy_resource *dummy_res;
> >> +
> >> +  INIT_LIST_HEAD(&vdev->dummy_resources_list);
> >> +
> >> +  for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
> >> +          res = vdev->pdev->resource + bar;
> >> +
> >> +          if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
> >> +                  goto no_mmap;
> >> +
> >> +          if (!(res->flags & IORESOURCE_MEM))
> >> +                  goto no_mmap;
> >> +
> >> +          /*
> >> +           * The PCI core shouldn't set up a resource with a
> >> +           * type but zero size. But there may be bugs that
> >> +           * cause us to do that.
> >> +           */
> >> +          if (!resource_size(res))
> >> +                  goto no_mmap;
> >> +
> >> +          if (resource_size(res) >= PAGE_SIZE) {
> >> +                  vdev->bar_mmap_supported[bar] = true;
> >> +                  continue;
> >> +          }
> >> +
> >> +          if (!(res->start & ~PAGE_MASK)) {
> >> +                  /*
> >> +                   * Add a dummy resource to reserve the remainder
> >> +                   * of the exclusive page in case that hot-add
> >> +                   * device's bar is assigned into it.
> >> +                   */
> >> +                  dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL);
> >> +                  if (dummy_res == NULL)
> >> +                          goto no_mmap;
> >> +
> >> +                  dummy_res->resource.start = res->end + 1;
> >> +                  dummy_res->resource.end = res->start + PAGE_SIZE - 1;
> >> +                  dummy_res->resource.flags = res->flags;
> >> +                  if (request_resource(res->parent,
> >> +                                          &dummy_res->resource)) {
> >> +                          kfree(dummy_res);
> >> +                          goto no_mmap;
> >> +                  }  
> > Isn't it true that request_resource() only tells us that at a given
> > point in time, no other drivers have reserved that resource?  It seems
> > like it does not guarantee that the resource isn't routed to another
> > device or that another driver won't at some point attempt to request
> > that same resource.  So for example if a user constructs their initrd
> > to bind vfio-pci to devices before other modules load, this
> > request_resource() may succeed, at the expense of drivers loaded later
> > now failing.  The behavior will depend on driver load order and we're
> > not actually insuring that the overflow resource is unused, just that
> > we got it first.  Can we do better?  Am I missing something that
> > prevents this?  Thanks,
> >
> > Alex  
> 
> Couldn't PCI resources allocator prevent this, which will find a
> empty slot in the resource tree firstly, then try to request that
> resource in allocate_resource() when a PCI device is probed.
> And I'd like to know why a PCI device driver would attempt to
> call request_resource()? Should this be done in PCI enumeration?

Hi Yongji,

Looks like most pci drivers call pci_request_regions().  From there the
call path is:

pci_request_selected_regions
  __pci_request_selected_regions
    __pci_request_region
      __request_mem_region
        __request_region
          __request_resource

We see this driver ordering issue sometimes with users attempting to
blacklist native pci drivers, trying to leave a device free for use by
vfio-pci.  If the device is a graphics card, the generic vesa or uefi
driver can request device resources causing a failure when vfio-pci
tries to request those same resources.  I expect that unless it's a
boot device, like vga in my example, the resources are not enabled
until the driver opens the device, therefore the request_resource() call
doesn't occur until that point.

For another trivial example, look at /proc/iomem as you load and unload
a driver, on my laptop with e1000e unloaded I see:

  e1200000-e121ffff : 0000:00:19.0
  e123e000-e123efff : 0000:00:19.0

When e1000e is loaded, each of these becomes claimed by the e1000e
driver:

  e1200000-e121ffff : 0000:00:19.0
    e1200000-e121ffff : e1000e
  e123e000-e123efff : 0000:00:19.0
    e123e000-e123efff : e1000e

Clearly pci core knows the resource is associated with the device, but
I don't think we're tapping into that with request_resource(), we're
just potentially stealing resources that another driver might have
claimed otherwise as I described above.  That's my suspicion at
least, feel free to show otherwise if it's incorrect.  Thanks,

Alex

Reply via email to