On Tue, Jan 19, 2021 at 5:03 PM Daniel Vetter <daniel.vet...@ffwll.ch> wrote:
>
> On Tue, Jan 19, 2021 at 4:20 PM Greg Kroah-Hartman
> <gre...@linuxfoundation.org> wrote:
> >
> > On Tue, Jan 19, 2021 at 03:34:47PM +0100, Daniel Vetter wrote:
> > > On Tue, Jan 19, 2021 at 3:32 PM Greg Kroah-Hartman
> > > <gre...@linuxfoundation.org> wrote:
> > > >
> > > > On Tue, Jan 19, 2021 at 09:17:55AM +0100, Daniel Vetter wrote:
> > > > > On Fri, Nov 27, 2020 at 5:42 PM Daniel Vetter 
> > > > > <daniel.vet...@ffwll.ch> wrote:
> > > > > >
> > > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this 
> > > > > > is
> > > > > > the default for all driver uses.
> > > > > >
> > > > > > Except there's two more ways to access PCI BARs: sysfs and proc mmap
> > > > > > support. Let's plug that hole.
> > > > > >
> > > > > > For revoke_devmem() to work we need to link our vma into the same
> > > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > > > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way 
> > > > > > is
> > > > > > to adjust this at at ->open time:
> > > > > >
> > > > > > - for sysfs this is easy, now that binary attributes support this. 
> > > > > > We
> > > > > >   just set bin_attr->mapping when mmap is supported
> > > > > > - for procfs it's a bit more tricky, since procfs pci access has 
> > > > > > only
> > > > > >   one file per device, and access to a specific resources first 
> > > > > > needs
> > > > > >   to be set up with some ioctl calls. But mmap is only supported for
> > > > > >   the same resources as sysfs exposes with mmap support, and 
> > > > > > otherwise
> > > > > >   rejected, so we can set the mapping unconditionally at open time
> > > > > >   without harm.
> > > > > >
> > > > > > A special consideration is for arch_can_pci_mmap_io() - we need to
> > > > > > make sure that the ->f_mapping doesn't alias between ioport and 
> > > > > > iomem
> > > > > > space. There's only 2 ways in-tree to support mmap of ioports: 
> > > > > > generic
> > > > > > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single
> > > > > > architecture hand-rolling. Both approach support ioport mmap 
> > > > > > through a
> > > > > > special pfn range and not through magic pte attributes. Aliasing is
> > > > > > therefore not a problem.
> > > > > >
> > > > > > The only difference in access checks left is that sysfs PCI mmap 
> > > > > > does
> > > > > > not check for CAP_RAWIO. I'm not really sure whether that should be
> > > > > > added or not.
> > > > > >
> > > > > > Acked-by: Bjorn Helgaas <bhelg...@google.com>
> > > > > > Reviewed-by: Dan Williams <dan.j.willi...@intel.com>
> > > > > > Signed-off-by: Daniel Vetter <daniel.vet...@intel.com>
> > > > > > Cc: Jason Gunthorpe <j...@ziepe.ca>
> > > > > > Cc: Kees Cook <keesc...@chromium.org>
> > > > > > Cc: Dan Williams <dan.j.willi...@intel.com>
> > > > > > Cc: Andrew Morton <a...@linux-foundation.org>
> > > > > > Cc: John Hubbard <jhubb...@nvidia.com>
> > > > > > Cc: Jérôme Glisse <jgli...@redhat.com>
> > > > > > Cc: Jan Kara <j...@suse.cz>
> > > > > > Cc: Dan Williams <dan.j.willi...@intel.com>
> > > > > > Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
> > > > > > Cc: linux...@kvack.org
> > > > > > Cc: linux-arm-ker...@lists.infradead.org
> > > > > > Cc: linux-samsung-...@vger.kernel.org
> > > > > > Cc: linux-me...@vger.kernel.org
> > > > > > Cc: Bjorn Helgaas <bhelg...@google.com>
> > > > > > Cc: linux-...@vger.kernel.org
> > > > > > Signed-off-by: Daniel Vetter <daniel.vet...@ffwll.ch>
> > > > > > --
> > > > > > v2:
> > > > > > - Totally new approach: Adjust filp->f_mapping at open time. Note 
> > > > > > that
> > > > > >   this now works on all architectures, not just those support
> > > > > >   ARCH_GENERIC_PCI_MMAP_RESOURCE
> > > > > > ---
> > > > > >  drivers/pci/pci-sysfs.c | 4 ++++
> > > > > >  drivers/pci/proc.c      | 1 +
> > > > > >  2 files changed, 5 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > > > > > index d15c881e2e7e..3f1c31bc0b7c 100644
> > > > > > --- a/drivers/pci/pci-sysfs.c
> > > > > > +++ b/drivers/pci/pci-sysfs.c
> > > > > > @@ -929,6 +929,7 @@ void pci_create_legacy_files(struct pci_bus *b)
> > > > > >         b->legacy_io->read = pci_read_legacy_io;
> > > > > >         b->legacy_io->write = pci_write_legacy_io;
> > > > > >         b->legacy_io->mmap = pci_mmap_legacy_io;
> > > > > > +       b->legacy_io->mapping = iomem_get_mapping();
> > > > > >         pci_adjust_legacy_attr(b, pci_mmap_io);
> > > > > >         error = device_create_bin_file(&b->dev, b->legacy_io);
> > > > > >         if (error)
> > > > > > @@ -941,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b)
> > > > > >         b->legacy_mem->size = 1024*1024;
> > > > > >         b->legacy_mem->attr.mode = 0600;
> > > > > >         b->legacy_mem->mmap = pci_mmap_legacy_mem;
> > > > > > +       b->legacy_io->mapping = iomem_get_mapping();
> > > > >
> > > > > Unlike the normal pci stuff below, the legacy files here go boom
> > > > > because they're set up much earlier in the boot sequence. This only
> > > > > affects HAVE_PCI_LEGACY architectures, which aren't that many. So what
> > > > > should we do here now:
> > > > > - drop the devmem revoke for these
> > > > > - rework the init sequence somehow to set up these files a lot later
> > > > > - redo the sysfs patch so that it doesn't take an address_space
> > > > > pointer, but instead a callback to get at that (since at open time
> > > > > everything is set up). Imo rather ugly
> > > > > - ditch this part of the series (since there's not really any takers
> > > > > for the latter parts it might just not make sense to push for this)
> > > > > - something else?
> > > > >
> > > > > Bjorn, Greg, thoughts?
> > > >
> > > > What sysfs patch are you referring to here?
> > >
> > > Currently in linux-next:
> > >
> > > commit 74b30195395c406c787280a77ae55aed82dbbfc7 (HEAD ->
> > > topic/iomem-mmap-vs-gup, drm/topic/iomem-mmap-vs-gup)
> > > Author: Daniel Vetter <daniel.vet...@ffwll.ch>
> > > Date:   Fri Nov 27 17:41:25 2020 +0100
> > >
> > >    sysfs: Support zapping of binary attr mmaps
> > >
> > > Or the patch right before this one in this submission here:
> > >
> > > https://lore.kernel.org/dri-devel/20201127164131.2244124-12-daniel.vet...@ffwll.ch/
> >
> > Ah.  Hm, a callback in the sysfs file logic seems really hairy, so I
> > would prefer that not happen.  If no one really needs this stuff, why
> > not just drop it like you mention?
>
> Well it is needed, but just on architectures I don't care about much.
> Most relevant is perhaps powerpc (that's where Stephen hit the issue).
> I do wonder whether we could move the legacy pci files setup to where
> the modern stuff is set up from pci_create_resource_files() or maybe
> pci_create_sysfs_dev_files() even for HAVE_PCI_LEGACY. I think that
> might work, but since it's legacy flow on some funny architectures
> (alpha, itanium, that kind of stuff) I have no idea what kind of
> monsters I'm going to anger :-)

Back from a week of vacation, I looked at this again and I think
shouldn't be hard to fix this with the sam trick
pci_create_sysfs_dev_files() uses: As long as sysfs_initialized isn't
set we skip, and then later on when the vfs is up&running we can
initialize everything.

To be able to apply the same thing to pci_create_legacy_files() I
think all I need is to iterate overa all struct pci_bus in
pci_sysfs_init() and we're good. Unfortunately I didn't find any
for_each_pci_bus(), so how do I do that?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Reply via email to