Re: [PATCH 07/13] huge_memory: Allow mappings of PUD sized pages

2024-07-02 Thread Christoph Hellwig
On Tue, Jul 02, 2024 at 08:19:01PM +1000, Alistair Popple wrote:
> > (B) As long as we have subpage mapcounts, this prevents vmemmap
> > optimizations [1]. Is that only used for device-dax for now and are
> > there no plans to make use of that for fs-dax?
> 
> I don't have any plans to. This is purely focussed on refcounting pages
> "like normal" so we can get rid of all the DAX special casing.
> 
> > (C) We managed without so far :)
> 
> Indeed, although Christoph has asked repeatedly ([1], [2] and likely
> others) that this gets fixed and I finally got sick of it coming up
> everytime I need to touch something with ZONE_DEVICE pages :)
> 
> Also it removes the need for people to understand the special DAX page
> recounting scheme and ends up removing a bunch of cruft as a bonus:
> 
>  59 files changed, 485 insertions(+), 869 deletions(-)
> 
> And that's before I clean up all the pgmap reference handling. It also
> removes the pXX_trans_huge and pXX_leaf distinction. So we managed, but
> things could be better IMHO.

Yes.  I can't wait for this series making the finish line.  There might
be more chance for cleanups and optimizations around ZONE_DEVICE, but
this alone is a huge step forward.




Re: [PATCH 06/13] mm/memory: Add dax_insert_pfn

2024-07-02 Thread Christoph Hellwig
On Tue, Jul 02, 2024 at 09:18:31AM +0200, David Hildenbrand wrote:
> We have this comparably nasty vmf_insert_mixed() that FS dax abused to 
> insert into !VM_MIXED VMAs. Is that abuse now stopping and are there maybe 
> ways to get rid of vmf_insert_mixed()?

Unfortunately it is also used by a few drm drivers and not just DAX.



Re: [PATCH 10/13] fs/dax: Properly refcount fs dax pages

2024-06-26 Thread Christoph Hellwig
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index eb61598..b7a31ae 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -126,11 +126,11 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax 
> *dev_dax,
>   return VM_FAULT_SIGBUS;
>   }
>  
> - pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
> + pfn = phys_to_pfn_t(phys, 0);
>  
>   dax_set_mapping(vmf, pfn, fault_size);
>  
> - return vmf_insert_mixed(vmf->vma, vmf->address, pfn);
> + return dax_insert_pfn(vmf->vma, vmf->address, pfn, vmf->flags & 
> FAULT_FLAG_WRITE);

Plenty overly long lines here and later.

Q: hould dax_insert_pfn take a vm_fault structure instead of the vma?
Or are the potential use cases that aren't from the fault path?
similar instead of the bool write passing the fault flags might actually
make things more readable than the bool.

Also at least currently it seems like there are no modular users despite
the export, or am I missing something?

> + blk_queue_flag_set(QUEUE_FLAG_DAX, q);

Just as a heads up, setting of these flags has changed a lot in
linux-next.

>  {
> + /*
> +  * Make sure we flush any cached data to the page now that it's free.
> +  */
> + if (PageDirty(page))
> + dax_flush(NULL, page_address(page), page_size(page));
> +

Adding the magic dax_dev == NULL case to dax_flush and going through it
vs just calling arch_wb_cache_pmem directly here seems odd.

But I also don't quite understand how it is related to the rest
of the patch anyway.

> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -373,6 +373,8 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long addr,
>   unsigned long start = addr;
>  
>   ptl = pmd_trans_huge_lock(pmd, vma);
> + if (vma_is_dax(vma))
> + ptl = NULL;
>   if (ptl) {

This feels sufficiently magic to warrant a comment.

>   if (!pmd_present(*pmd))
>   goto out;
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index b7e1599..f11ee0d 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1016,7 +1016,8 @@ static void __ref __init_zone_device_page(struct page 
> *page, unsigned long pfn,
>*/
>   if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
>   pgmap->type == MEMORY_DEVICE_COHERENT ||
> - pgmap->type == MEMORY_DEVICE_PCI_P2PDMA)
> + pgmap->type == MEMORY_DEVICE_PCI_P2PDMA ||
> + pgmap->type == MEMORY_DEVICE_FS_DAX)
>   set_page_count(page, 0);
>  }

So we'll skip this for MEMORY_DEVICE_GENERIC only.  Does anyone remember
if that's actively harmful or just not needed?  If the latter it might
be simpler to just set the page count unconditionally here.




Re: [PATCH 05/13] mm: Allow compound zone device pages

2024-06-26 Thread Christoph Hellwig
On Thu, Jun 27, 2024 at 10:54:20AM +1000, Alistair Popple wrote:
>  static struct nouveau_dmem_chunk *nouveau_page_to_chunk(struct page *page)
>  {
> - return container_of(page->pgmap, struct nouveau_dmem_chunk, pagemap);
> + return container_of(page_dev_pagemap(page), struct nouveau_dmem_chunk, 
> pagemap);

Overly long line hee (and quite a few more).




Re: [PATCH 04/13] fs/dax: Add dax_page_free callback

2024-06-26 Thread Christoph Hellwig
On Thu, Jun 27, 2024 at 10:54:19AM +1000, Alistair Popple wrote:
> When a fs dax page is freed it has to notify filesystems that the page
> has been unpinned/unmapped and is free. Currently this involves
> special code in the page free paths to detect a transition of refcount
> from 2 to 1 and to call some fs dax specific code.
> 
> A future change will require this to happen when the page refcount
> drops to zero. In this case we can use the existing
> pgmap->ops->page_free() callback so wire that up for all devices that
> support FS DAX (nvdimm and virtio).

Given that ->page_ffree is only called from free_zone_device_folio
and right next to a switch on the the type, can't we just do the
wake_up_var there without the somewhat confusing indirect call that
just back in common code without any driver logic?




Re: [PATCH 03/13] fs/dax: Refactor wait for dax idle page

2024-06-26 Thread Christoph Hellwig
On Thu, Jun 27, 2024 at 10:54:18AM +1000, Alistair Popple wrote:
> A FS DAX page is considered idle when its refcount drops to one. This
> is currently open-coded in all file systems supporting FS DAX. Move
> the idle detection to a common function to make future changes easier.
> 
> Signed-off-by: Alistair Popple 
> Reviewed-by: Jan Kara 

I'm pretty sure I already review this ages ago, but:

Reviewed-by: Christoph Hellwig 




Re: [PATCH 02/13] pci/p2pdma: Don't initialise page refcount to one

2024-06-26 Thread Christoph Hellwig
On Thu, Jun 27, 2024 at 10:54:17AM +1000, Alistair Popple wrote:
> The reference counts for ZONE_DEVICE private pages should be
> initialised by the driver when the page is actually allocated by the
> driver allocator, not when they are first created. This is currently
> the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages
> but not MEMORY_DEVICE_PCI_P2PDMA pages so fix that up.
> 
> Signed-off-by: Alistair Popple 
> ---
>  drivers/pci/p2pdma.c | 2 ++
>  mm/memremap.c| 8 
>  mm/mm_init.c | 4 +++-
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 4f47a13..1e9ea32 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -128,6 +128,8 @@ static int p2pmem_alloc_mmap(struct file *filp, struct 
> kobject *kobj,
>   goto out;
>   }
>  
> + set_page_count(virt_to_page(kaddr), 1);

Can we have a comment here?  Without that it feels a bit too much like
black magic when reading the code.

> + if (folio->page.pgmap->type == MEMORY_DEVICE_PRIVATE ||
> + folio->page.pgmap->type == MEMORY_DEVICE_COHERENT)
> + put_dev_pagemap(folio->page.pgmap);
> + else if (folio->page.pgmap->type != MEMORY_DEVICE_PCI_P2PDMA)
>   /*
>* Reset the refcount to 1 to prepare for handing out the page
>* again.
>*/
>   folio_set_count(folio, 1);

Where the else if evaluates to MEMORY_DEVICE_FS_DAX ||
MEMORY_DEVICE_GENERIC.  Maybe make this a switch statement handling
all cases of the enum to make it clear and have the compiler generate
a warning when a new type is added without being handled here?

> @@ -1014,7 +1015,8 @@ static void __ref __init_zone_device_page(struct page 
> *page, unsigned long pfn,
>* which will set the page count to 1 when allocating the page.
>*/
>   if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
> + pgmap->type == MEMORY_DEVICE_COHERENT ||
> + pgmap->type == MEMORY_DEVICE_PCI_P2PDMA)
>   set_page_count(page, 0);

Similarly here a switch with explanation of what will be handled and
what not would be nice.




Re: [PATCH 06/13] mm/memory: Add dax_insert_pfn

2024-06-26 Thread Christoph Hellwig
On Thu, Jun 27, 2024 at 10:54:21AM +1000, Alistair Popple wrote:
> +extern void prep_compound_page(struct page *page, unsigned int order);

No need for the extern.

>  static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t 
> *pte,
> - unsigned long addr, struct page *page, pgprot_t prot)
> + unsigned long addr, struct page *page, pgprot_t prot, 
> bool mkwrite)

Overly long line.

> + retval = insert_page_into_pte_locked(vma, pte, addr, page, prot, 
> mkwrite);

.. same here.

> +vm_fault_t dax_insert_pfn(struct vm_area_struct *vma,
> + unsigned long addr, pfn_t pfn_t, bool write)

This could probably use a kerneldoc comment.




Re: [PATCH] Documentation: coding-style: don't encourage WARN*()

2024-04-15 Thread Christoph Hellwig
On Mon, Apr 15, 2024 at 10:35:21AM +0200, Greg KH wrote:
> On Mon, Apr 15, 2024 at 01:07:41AM -0700, Christoph Hellwig wrote:
> > No, this advice is wronger than wrong.  If you set panic_on_warn you
> > get to keep the pieces.  
> > 
> 
> But don't add new WARN() calls please, just properly clean up and handle
> the error.  And any WARN() that userspace can trigger ends up triggering
> syzbot reports which also is a major pain, even if you don't have
> panic_on_warn enabled.

Important distinction here:  WARN_ON_ONCE is for internal error
checking and absolutely intentional, and does not replace error
handling, that's why it passes the error value through.  OF course
it should not be trigger by user action.

> And I think the "do not use panic_on_warn" recommendation has been
> ignored, given the huge use of it by vendors who have enabled it (i.e.
> all Samsung phones and cloud servers).

Sucks for them.




Re: [PATCH] Documentation: coding-style: don't encourage WARN*()

2024-04-15 Thread Christoph Hellwig
No, this advice is wronger than wrong.  If you set panic_on_warn you
get to keep the pieces.  




Re: [PATCH v2] docs: submitting-patches: encourage direct notifications to commenters

2023-10-05 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH] docs: submitting-patches: encourage direct notifications to reviewers

2023-10-02 Thread Christoph Hellwig
On Mon, Oct 02, 2023 at 10:36:01AM +0300, Jani Nikula wrote:
> On Sun, 01 Oct 2023, Christoph Hellwig  wrote:
> > On Fri, Sep 29, 2023 at 09:24:57AM +0200, Thomas Weißschuh wrote:
> >> > This does not scale.
> >> 
> >> Could you elaborate in which way it doesn't scale?
> >
> > If I send a modest cross-subsystem series it often touches 20+
> > subsystems.  Between mailing lists and maintainers that's usually
> > already 60+ recipients.  If you now add a another 2-3 maintainers
> > we're just going to hit limits in mail servers.
> 
> I thought this was about adding people who have commented on previous
> versions to Cc. That's usually a very limited number.

Oh, that absolutely makes sense.  But maybe we need to stop overloading
the term reviewer :)



Re: [PATCH] docs: submitting-patches: encourage direct notifications to reviewers

2023-10-01 Thread Christoph Hellwig
On Fri, Sep 29, 2023 at 09:24:57AM +0200, Thomas Weißschuh wrote:
> > This does not scale.
> 
> Could you elaborate in which way it doesn't scale?

If I send a modest cross-subsystem series it often touches 20+
subsystems.  Between mailing lists and maintainers that's usually
already 60+ recipients.  If you now add a another 2-3 maintainers
we're just going to hit limits in mail servers.

> > Please read the mailinglist, that's the whole
> > point of having it.
> 
> When reviewing things in various subsystem this would require reading
> all of LKML, which is impractical.

Works with your maintainers to have useful lists for their subsystems.
That's again the point.


Re: [PATCH] docs: submitting-patches: encourage direct notifications to reviewers

2023-09-28 Thread Christoph Hellwig
NAK.

This does not scale.  Please read the mailinglist, that's the whole
point of having it.


Re: [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()

2021-11-29 Thread Christoph Hellwig
On Mon, Nov 29, 2021 at 03:49:46PM +, Joao Martins wrote:
> Hmmm -- if by individual helpers moving to __dev_dax_{pte,pmd,pud}_fault()
> it would be slightly less straighforward. Unless you might mean to move
> to check_vma() (around the dax_alive() check) and that might actually
> remove the opencoding of dax_read_lock in dax_mmap() even.
> 
> I would rather prefer that this cleanup around dax_read_{un,}lock is
> a separate patch separate to this series, unless you feel strongly that
> it needs to be part of this set.

Feel free to keep it as-is.



Re: [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()

2021-11-28 Thread Christoph Hellwig
On Fri, Nov 26, 2021 at 06:39:39PM +, Joao Martins wrote:
> @@ -230,23 +235,18 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault 
> *vmf,
>   id = dax_read_lock();
>   switch (pe_size) {
>   case PE_SIZE_PTE:
> - fault_size = PAGE_SIZE;
>   rc = __dev_dax_pte_fault(dev_dax, vmf, &pfn);
>   break;
>   case PE_SIZE_PMD:
> - fault_size = PMD_SIZE;
>   rc = __dev_dax_pmd_fault(dev_dax, vmf, &pfn);
>   break;
>   case PE_SIZE_PUD:
> - fault_size = PUD_SIZE;
>   rc = __dev_dax_pud_fault(dev_dax, vmf, &pfn);
>   break;
>   default:
>   rc = VM_FAULT_SIGBUS;
>   }
> 
>   dax_read_unlock(id);

I wonder if if would make sense to move dax_read_lock / dax_read_unlock
іnto the individul helpers as well now.  That way you could directly
return from the switch.  Aso it seems like pfn is only an input
parameter now and doesn't need to be passed by reference.



Re: [PATCH v6 04/10] mm/memremap: add ZONE_DEVICE support for compound pages

2021-11-24 Thread Christoph Hellwig
On Wed, Nov 24, 2021 at 07:09:59PM +, Joao Martins wrote:
> Add a new @vmemmap_shift property for struct dev_pagemap which specifies that 
> a
> devmap is composed of a set of compound pages of order @vmemmap_shift, 
> instead of
> base pages. When a compound page devmap is requested, all but the first
> page are initialised as tail pages instead of order-0 pages.

Please wrap commit log lines after 73 characters.

>  #define for_each_device_pfn(pfn, map, i) \
> - for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = 
> pfn_next(pfn))
> + for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = 
> pfn_next(map, pfn))

It would be nice to fix up this long line while you're at it.

>  static void dev_pagemap_kill(struct dev_pagemap *pgmap)
>  {
> @@ -315,8 +315,8 @@ static int pagemap_range(struct dev_pagemap *pgmap, 
> struct mhp_params *params,
>   memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
>   PHYS_PFN(range->start),
>   PHYS_PFN(range_len(range)), pgmap);
> - percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
> - - pfn_first(pgmap, range_id));
> + percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id)
> + - pfn_first(pgmap, range_id)) >> pgmap->vmemmap_shift);

In the Linux coding style the - goes ointo the first line.

But it would be really nice to clean this up with a helper ala pfn_len
anyway:

percpu_ref_get_many(pgmap->ref,
pfn_len(pgmap, range_id) >> pgmap->vmemmap_shift);



Re: [PATCH v5 6/8] device-dax: use struct_size()

2021-11-17 Thread Christoph Hellwig
> + pgmap = devm_kzalloc(
> +   dev, struct_size(pgmap, ranges, dev_dax->nr_range - 
> 1),
> +   GFP_KERNEL);

Keeping the dev argument on the previous line would not only make this
much more readable but also avoid the overly long line.



Re: [PATCH v5 8/8] device-dax: compound devmap support

2021-11-17 Thread Christoph Hellwig
On Fri, Nov 12, 2021 at 04:08:24PM +0100, Joao Martins wrote:
> Use the newly added compound devmap facility which maps the assigned dax
> ranges as compound pages at a page size of @align.
> 
> dax devices are created with a fixed @align (huge page size) which is
> enforced through as well at mmap() of the device. Faults, consequently
> happen too at the specified @align specified at the creation, and those
> don't change throughout dax device lifetime. MCEs unmap a whole dax
> huge page, as well as splits occurring at the configured page size.
> 
> Performance measured by gup_test improves considerably for
> unpin_user_pages() and altmap with NVDIMMs:
> 
> $ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w
> (pin_user_pages_fast 2M pages) put:~71 ms -> put:~22 ms
> [altmap]
> (pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -> get: ~127ms put:~71ms
> 
>  $ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w
> (pin_user_pages_fast 2M pages) put:~513 ms -> put:~188 ms
> [altmap with -m 127004]
> (pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -> get:~1sec 
> put:~563ms
> 
> .. as well as unpin_user_page_range_dirty_lock() being just as effective
> as THP/hugetlb[0] pages.
> 
> [0] 
> https://lore.kernel.org/linux-mm/20210212130843.13865-5-joao.m.mart...@oracle.com/
> 
> Signed-off-by: Joao Martins 
> Reviewed-by: Dan Williams 
> ---
>  drivers/dax/device.c | 57 ++--
>  1 file changed, 44 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index a65c67ab5ee0..0c2ac97d397d 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -192,6 +192,42 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax 
> *dev_dax,
>  }
>  #endif /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>  
> +static void set_page_mapping(struct vm_fault *vmf, pfn_t pfn,
> +  unsigned long fault_size,
> +  struct address_space *f_mapping)
> +{
> + unsigned long i;
> + pgoff_t pgoff;
> +
> + pgoff = linear_page_index(vmf->vma, ALIGN(vmf->address, fault_size));
> +
> + for (i = 0; i < fault_size / PAGE_SIZE; i++) {
> + struct page *page;
> +
> + page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
> + if (page->mapping)
> + continue;
> + page->mapping = f_mapping;
> + page->index = pgoff + i;
> + }
> +}

No need to pass f_mapping here, it must be vmf->vma->vm_file->f_mapping.

> +static void set_compound_mapping(struct vm_fault *vmf, pfn_t pfn,
> +  unsigned long fault_size,
> +  struct address_space *f_mapping)
> +{
> + struct page *head;
> +
> + head = pfn_to_page(pfn_t_to_pfn(pfn));
> + head = compound_head(head);
> + if (head->mapping)
> + return;
> +
> + head->mapping = f_mapping;
> + head->index = linear_page_index(vmf->vma,
> + ALIGN(vmf->address, fault_size));
> +}

Same here.

>   if (rc == VM_FAULT_NOPAGE) {
> - unsigned long i;
> - pgoff_t pgoff;
> + struct dev_pagemap *pgmap = dev_dax->pgmap;

If you're touching this anyway:  why not do an early return here for
the error case to simplify the flow?



Re: [PATCH v5 7/8] device-dax: ensure dev_dax->pgmap is valid for dynamic devices

2021-11-17 Thread Christoph Hellwig
> +bool static_dev_dax(struct dev_dax *dev_dax)
> +{
> + return is_static(dev_dax->region);
> +}
> +EXPORT_SYMBOL_GPL(static_dev_dax);

This function would massively benefit from documentic what a static
DAX region is and why someone would want to care.  Because even as
someone occasionally dabbling with the DAX code I have no idea at all
what that means.



Re: [PATCH v4 04/14] mm/memremap: add ZONE_DEVICE support for compound pages

2021-09-01 Thread Christoph Hellwig
On Fri, Aug 27, 2021 at 05:00:11PM +0100, Joao Martins wrote:
> So felt like doing it inline straight away inline when calling 
> percpu_ref_get_many():
>   
>   (pfn_end(pgmap, range_id) - pfn_first(pgmap, range_id)) / 
> pgmap_geometry(pgmap);
> 
> I can switch to a shift if you prefer:
> 
>   (pfn_end(pgmap, range_id) - pfn_first(pgmap, range_id))
>   << pgmap_geometry_order(pgmap);

Yes.  A shift is less overhead than a branch.

> > Also geometry sounds a bit strange, even if I can't really
> > offer anything better offhand.
> > 
> We started with @align (like in device dax core), and then we switched
> to @geometry because these are slightly different things (one relates
> to vmemmap metadata structure (number of pages) and the other is how
> the mmap is aligned to a page size. I couldn't suggest anything else,
> besides a more verbose name like vmemmap_align maybe.

It for sure isn't an alignment.  I think the term that comes closest
is a granularity.  But something like vmemmap_shift if switching to
a shift might be descriptive enough for the struct member name.



Re: [PATCH v4 04/14] mm/memremap: add ZONE_DEVICE support for compound pages

2021-08-27 Thread Christoph Hellwig
On Fri, Aug 27, 2021 at 03:58:09PM +0100, Joao Martins wrote:
> + * @geometry: structural definition of how the vmemmap metadata is populated.
> + *   A zero or 1 defaults to using base pages as the memmap metadata
> + *   representation. A bigger value will set up compound struct pages
> + *   representative of the requested geometry size.
>   * @ops: method table
>   * @owner: an opaque pointer identifying the entity that manages this
>   *   instance.  Used by various helpers to make sure that no
> @@ -114,6 +118,7 @@ struct dev_pagemap {
>   struct completion done;
>   enum memory_type type;
>   unsigned int flags;
> + unsigned long geometry;

So why not make this a shift as I suggested somewhere deep in the
last thread?  Also geometry sounds a bit strange, even if I can't really
offer anything better offhand.

> +static inline unsigned long pgmap_geometry(struct dev_pagemap *pgmap)
> +{
> + if (pgmap && pgmap->geometry)
> + return pgmap->geometry;

Why does this need to support a NULL pgmap?

> +static void __ref memmap_init_compound(struct page *head, unsigned long 
> head_pfn,

Overly long line.



Re: [PATCH] Documentation: admin-guide: add earlycon documentation for RISC-V

2019-10-17 Thread Christoph Hellwig
On Wed, Oct 09, 2019 at 12:53:50PM -0700, Paul Walmsley wrote:
> 
> Kernels booting on RISC-V can specify "earlycon" with no options on
> the Linux command line, and the generic DT earlycon support will query
> the "chosen/stdout-path" property (if present) to determine which
> early console device to use.  Document this appropriately in the
> admin-guide.

Jon already applied a patch from me removing the bogus arch restrictions
on the earlycon without arguments documentation, so this should not
be required.


[PATCH v2] Documentation: document earlycon without options for more platforms

2019-09-17 Thread Christoph Hellwig
The earlycon options without arguments is supposed to work on all
device tree platforms, not just arm64.

Signed-off-by: Christoph Hellwig 
---

Changes since v1:
 - add comma, fix typo

 Documentation/admin-guide/kernel-parameters.txt | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 4c1971960afa..d5956c29b93b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -977,12 +977,10 @@
 
earlycon=   [KNL] Output early console device and options.
 
-   [ARM64] The early console is determined by the
-   stdout-path property in device tree's chosen node,
-   or determined by the ACPI SPCR table.
-
-   [X86] When used with no options the early console is
-   determined by the ACPI SPCR table.
+   When used with no options, the early console is
+   determined by stdout-path property in device tree's
+   chosen node or the ACPI SPCR table if supported by
+   the platform.
 
cdns,[,options]
Start an early, polled-mode console on a Cadence
-- 
2.20.1



[PATCH] Documentation: document earlycon without options for more platforms

2019-09-16 Thread Christoph Hellwig
The earlycon options without arguments is supposed on all device
tree platforms, not just arm64.

Signed-off-by: Christoph Hellwig 
---
 Documentation/admin-guide/kernel-parameters.txt | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 4c1971960afa..fe81d8922edd 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -977,12 +977,10 @@
 
earlycon=   [KNL] Output early console device and options.
 
-   [ARM64] The early console is determined by the
-   stdout-path property in device tree's chosen node,
-   or determined by the ACPI SPCR table.
-
-   [X86] When used with no options the early console is
-   determined by the ACPI SPCR table.
+   When used with no options the early console is
+   determined by stdout-path property in device tree's
+   chosen node or the ACPI SPCR table if supported by
+   the platform.
 
cdns,[,options]
Start an early, polled-mode console on a Cadence
-- 
2.20.1



Re: [PATCH] doc: replace IFF abbreviation with 'if and only if'

2019-09-09 Thread Christoph Hellwig
On Sat, Sep 07, 2019 at 12:51:16PM +0200, Federico Vaga wrote:
> In a normal piece of text the use of 'iff' does not guarantee a correct
> interpretation because it is easy to confuse it for a typo (if or iff?).
> 
> I believe that IFF should not be used outside a logical/mathematical
> expression. For this reason with this patch I am replacing 'iff' with
> 'if an only if' in text, and I am leaving it as it is in logical formulae.

Hell no.  If you want to avoid the usage in your own docs please go for
it, but as seen in your patch we commonly use and that is a good thing
as it brings the information across in a very compact way.


Re: [PATCH v2] dma-mapping: Fix filename references

2019-09-02 Thread Christoph Hellwig
On Mon, Sep 02, 2019 at 04:22:50PM +0300, Andy Shevchenko wrote:
> > > Any comment on this?
> > 
> > Fine with me, and I also agree with the glue code comment.
> 
> Are you going to apply this?

Thanks, applied to the dma-mapping tree for 5.4.


Re: [PATCH 0/9] drivers: add new variants of devm_platform_ioremap_resource()

2019-08-30 Thread Christoph Hellwig
On Thu, Aug 29, 2019 at 04:48:36PM +0200, Geert Uytterhoeven wrote:
> Hi Bartosz,
> 
> On Thu, Aug 29, 2019 at 4:38 PM Bartosz Golaszewski  wrote:
> > From: Bartosz Golaszewski 
> >
> > The new devm_platform_ioremap_resource() helper has now been widely
> > adopted and used in many drivers. Users of nocache and write-combined
> > ioremap() variants could profit from the same code shrinkage. This
> > series provides two new versions of devm_platform_ioremap_resource()
> > and uses it in a few example drivers with the assumption that - just
> > like was the case previously - a coccinelle script will be developed
> > to ease the transition for others.
> 
> Please be aware that the number of ioremap() variants is being
> reduced, as some of them are redundant (e.g. ioremap() already creates
> an uncached mapping, so ioremap_nocache() is not needed).
> So less is better than more ;-)

Yes.  If I can get the ia64 and openrisc patch in I plan to send Linus
a scripted removal of ioremap_nocache after -rc1.  I already have a
local patch for current mainline as of about two weeks ago.


Re: [PATCH v2] dma-mapping: Fix filename references

2019-08-13 Thread Christoph Hellwig
On Tue, Aug 13, 2019 at 05:42:22PM +0300, Andy Shevchenko wrote:
> On Wed, Jun 19, 2019 at 05:19:55PM +0300, Andy Shevchenko wrote:
> > After the commit cf65a0f6f6ff
> > 
> >   ("dma-mapping: move all DMA mapping code to kernel/dma")
> > 
> > some of the files are referring to outdated information, i.e. old file names
> > of DMA mapping sources.
> > 
> > Fix it here.
> > 
> > Note, the lines with "Glue code for..." have been removed completely.
> 
> Any comment on this?

Fine with me, and I also agree with the glue code comment.


[PATCH] Documentation: move Documentation/virtual to Documentation/virt

2019-07-24 Thread Christoph Hellwig
Renaming docs seems to be en vogue at the moment, so fix on of the
grossly misnamed directories.  We usually never use "virtual" as
a shortcut for virtualization in the kernel, but always virt,
as seen in the virt/ top-level directory.  Fix up the documentation
to match that.

Fixes: ed16648eb5b8 ("Move kvm, uml, and lguest subdirectories under a common 
"virtual" directory, I.E:")
Signed-off-by: Christoph Hellwig 
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 Documentation/{virtual => virt}/index.rst   | 0
 .../{virtual => virt}/kvm/amd-memory-encryption.rst | 0
 Documentation/{virtual => virt}/kvm/api.txt | 2 +-
 Documentation/{virtual => virt}/kvm/arm/hyp-abi.txt | 0
 Documentation/{virtual => virt}/kvm/arm/psci.txt| 0
 Documentation/{virtual => virt}/kvm/cpuid.rst   | 0
 Documentation/{virtual => virt}/kvm/devices/README  | 0
 .../{virtual => virt}/kvm/devices/arm-vgic-its.txt  | 0
 Documentation/{virtual => virt}/kvm/devices/arm-vgic-v3.txt | 0
 Documentation/{virtual => virt}/kvm/devices/arm-vgic.txt| 0
 Documentation/{virtual => virt}/kvm/devices/mpic.txt| 0
 Documentation/{virtual => virt}/kvm/devices/s390_flic.txt   | 0
 Documentation/{virtual => virt}/kvm/devices/vcpu.txt| 0
 Documentation/{virtual => virt}/kvm/devices/vfio.txt| 0
 Documentation/{virtual => virt}/kvm/devices/vm.txt  | 0
 Documentation/{virtual => virt}/kvm/devices/xics.txt| 0
 Documentation/{virtual => virt}/kvm/devices/xive.txt| 0
 Documentation/{virtual => virt}/kvm/halt-polling.txt| 0
 Documentation/{virtual => virt}/kvm/hypercalls.txt  | 4 ++--
 Documentation/{virtual => virt}/kvm/index.rst   | 0
 Documentation/{virtual => virt}/kvm/locking.txt | 0
 Documentation/{virtual => virt}/kvm/mmu.txt | 2 +-
 Documentation/{virtual => virt}/kvm/msr.txt | 0
 Documentation/{virtual => virt}/kvm/nested-vmx.txt  | 0
 Documentation/{virtual => virt}/kvm/ppc-pv.txt  | 0
 Documentation/{virtual => virt}/kvm/review-checklist.txt| 2 +-
 Documentation/{virtual => virt}/kvm/s390-diag.txt   | 0
 Documentation/{virtual => virt}/kvm/timekeeping.txt | 0
 Documentation/{virtual => virt}/kvm/vcpu-requests.rst   | 0
 Documentation/{virtual => virt}/paravirt_ops.rst| 0
 Documentation/{virtual => virt}/uml/UserModeLinux-HOWTO.txt | 0
 MAINTAINERS | 6 +++---
 arch/powerpc/include/uapi/asm/kvm_para.h| 2 +-
 arch/x86/kvm/mmu.c  | 2 +-
 include/uapi/linux/kvm.h| 4 ++--
 tools/include/uapi/linux/kvm.h  | 4 ++--
 virt/kvm/arm/arm.c  | 2 +-
 virt/kvm/arm/vgic/vgic-mmio-v3.c| 2 +-
 virt/kvm/arm/vgic/vgic.h| 4 ++--
 40 files changed, 19 insertions(+), 19 deletions(-)
 rename Documentation/{virtual => virt}/index.rst (100%)
 rename Documentation/{virtual => virt}/kvm/amd-memory-encryption.rst (100%)
 rename Documentation/{virtual => virt}/kvm/api.txt (99%)
 rename Documentation/{virtual => virt}/kvm/arm/hyp-abi.txt (100%)
 rename Documentation/{virtual => virt}/kvm/arm/psci.txt (100%)
 rename Documentation/{virtual => virt}/kvm/cpuid.rst (100%)
 rename Documentation/{virtual => virt}/kvm/devices/README (100%)
 rename Documentation/{virtual => virt}/kvm/devices/arm-vgic-its.txt (100%)
 rename Documentation/{virtual => virt}/kvm/devices/arm-vgic-v3.txt (100%)
 rename Documentation/{virtual => virt}/kvm/devices/arm-vgic.txt (100%)
 rename Documentation/{virtual => virt}/kvm/devices/mpic.txt (100%)
 rename Documentation/{virtual => virt}/kvm/devices/s390_flic.txt (100%)
 rename Documentation/{virtual => virt}/kvm/devices/vcpu.txt (100%)
 rename Documentation/{virtual => virt}/kvm/devices/vfio.txt (100%)
 rename Documentation/{virtual => virt}/kvm/devices/vm.txt (100%)
 rename Documentation/{virtual => virt}/kvm/devices/xics.txt (100%)
 rename Documentation/{virtual => virt}/kvm/devices/xive.txt (100%)
 rename Documentation/{virtual => virt}/kvm/halt-polling.txt (100%)
 rename Documentation/{virtual => virt}/kvm/hypercalls.txt (97%)
 rename Documentation/{virtual => virt}/kvm/index.rst (100%)
 rename Documentation/{virtual => virt}/kvm/locking.txt (100%)
 rename Documentation/{virtual => virt}/kvm/mmu.txt (99%)
 rename Documentation/{virtual => virt}/kvm/msr.txt (100%)
 rename Documentation/{virtual => virt}/kvm/nested-vmx.txt (100%)
 rename Documentation/{virtual => virt}/kvm/ppc-pv.txt (100%)
 rename Docu

Re: [linux-kernel-mentees] [PATCH v6] Doc : fs : convert xfs.txt to ReST

2019-07-09 Thread Christoph Hellwig
On Tue, Jul 09, 2019 at 01:48:59PM +0100, Sheriff Esseson wrote:
> Convert xfs.txt to ReST, rename and fix broken references, consequently.

The subject line still uses completely b0rked naming conventions.


Re: [Linux-kernel-mentees] [PATCH] Doc : fs : move xfs.txt to admin-guide

2019-07-08 Thread Christoph Hellwig
The subjet line seems to be a bit messed up.


Re: [PATCH 3/3] LICENSES: Rename other to deprecated

2019-05-14 Thread Christoph Hellwig
On Tue, May 14, 2019 at 12:26:32PM +0200, Borislav Petkov wrote:
> This breaks scripts/spdxcheck.py, it needs below hunk. Also, should
> "dual" be added to license_dirs too?

Yes.  In fact two people already submitted patches for that before
you, just waiting for them to get picked up.


[PATCH 1/5] ccio: allow large DMA masks

2019-02-15 Thread Christoph Hellwig
There is no harm in setting a 64-bit mask even if we don't need it,
and the current ccio code is one of the few place in the kernel
still rejecting larger than required DMA masks.

Signed-off-by: Christoph Hellwig 
---
 drivers/parisc/ccio-dma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c
index 8d2fc84119c6..24a68fb7b2f9 100644
--- a/drivers/parisc/ccio-dma.c
+++ b/drivers/parisc/ccio-dma.c
@@ -710,8 +710,8 @@ ccio_dma_supported(struct device *dev, u64 mask)
return 0;
}
 
-   /* only support 32-bit devices (ie PCI/GSC) */
-   return (int)(mask == 0xUL);
+   /* only support 32-bit or better devices (ie PCI/GSC) */
+   return (int)(mask >= 0xUL);
 }
 
 /**
-- 
2.20.1



[PATCH 5/5] Documentation/DMA-API-HOWTO: update dma_mask sections

2019-02-15 Thread Christoph Hellwig
We don't require drivers to guess a DMA mask that might actually
match the system capabilities any more, so fix up the documentation
to clear this up.

Signed-off-by: Christoph Hellwig 
---
 Documentation/DMA-API-HOWTO.txt | 121 +++-
 1 file changed, 41 insertions(+), 80 deletions(-)

diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index f0cc3f772265..8e948fae34af 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -146,114 +146,75 @@ What about block I/O and networking buffers?  The block 
I/O and
 networking subsystems make sure that the buffers they use are valid
 for you to DMA from/to.
 
-DMA addressing limitations
+DMA addressing capabilities
 ==
 
-Does your device have any DMA addressing limitations?  For example, is
-your device only capable of driving the low order 24-bits of address?
-If so, you need to inform the kernel of this fact.
+By default, the kernel assumes that your device can address 32-bits of DMA
+addressing.  For a 64-bit capable device, this needs to be increased, and for
+a device with limitations, it needs to be decreased.
 
-By default, the kernel assumes that your device can address the full
-32-bits.  For a 64-bit capable device, this needs to be increased.
-And for a device with limitations, as discussed in the previous
-paragraph, it needs to be decreased.
+Special note about PCI: PCI-X specification requires PCI-X devices to support
+64-bit addressing (DAC) for all transactions.  And at least one platform (SGI
+SN2) requires 64-bit consistent allocations to operate correctly when the IO
+bus is in PCI-X mode.
 
-Special note about PCI: PCI-X specification requires PCI-X devices to
-support 64-bit addressing (DAC) for all transactions.  And at least
-one platform (SGI SN2) requires 64-bit consistent allocations to
-operate correctly when the IO bus is in PCI-X mode.
+For correct operation, you must set the DMA mask to inform the kernel about
+your devices DMA addressing capabilities.
 
-For correct operation, you must interrogate the kernel in your device
-probe routine to see if the DMA controller on the machine can properly
-support the DMA addressing limitation your device has.  It is good
-style to do this even if your device holds the default setting,
-because this shows that you did think about these issues wrt. your
-device.
-
-The query is performed via a call to dma_set_mask_and_coherent()::
+This is performed via a call to dma_set_mask_and_coherent()::
 
int dma_set_mask_and_coherent(struct device *dev, u64 mask);
 
-which will query the mask for both streaming and coherent APIs together.
-If you have some special requirements, then the following two separate
-queries can be used instead:
+which will set the mask for both streaming and coherent APIs together.  If you
+have some special requirements, then the following two separate calls can be
+used instead:
 
-   The query for streaming mappings is performed via a call to
+   The setup for streaming mappings is performed via a call to
dma_set_mask()::
 
int dma_set_mask(struct device *dev, u64 mask);
 
-   The query for consistent allocations is performed via a call
+   The setup for consistent allocations is performed via a call
to dma_set_coherent_mask()::
 
int dma_set_coherent_mask(struct device *dev, u64 mask);
 
-Here, dev is a pointer to the device struct of your device, and mask
-is a bit mask describing which bits of an address your device
-supports.  It returns zero if your card can perform DMA properly on
-the machine given the address mask you provided.  In general, the
-device struct of your device is embedded in the bus-specific device
-struct of your device.  For example, &pdev->dev is a pointer to the
-device struct of a PCI device (pdev is a pointer to the PCI device
-struct of your device).
+Here, dev is a pointer to the device struct of your device, and mask is a bit
+mask describing which bits of an address your device supports.  Often the
+device struct of your device is embedded in the bus-specific device struct of
+your device.  For example, &pdev->dev is a pointer to the device struct of a
+PCI device (pdev is a pointer to the PCI device struct of your device).
 
-If it returns non-zero, your device cannot perform DMA properly on
-this platform, and attempting to do so will result in undefined
-behavior.  You must either use a different mask, or not use DMA.
+These calls usually return zero to indicated your device can perform DMA
+properly on the machine given the address mask you provided, but they might
+return an error if the mask is too small to be supportable on the given
+system.  If it returns non-zero, your device cannot perform DMA properly on
+this platform, and attempting to do so will result in undefined behavior.
+You must not use DMA on this device unless the dma

allow larger than require DMA masks

2019-02-15 Thread Christoph Hellwig
Hi all,

this series finishes off converting our dma mask model to split between
device capabilities (dev->dma_mask and dev->coherent_dma_mask) and system
limitations (dev->bus_dma_mask).  We already accept larger than required
masks in most dma_map_ops implementation, in case of x86 and
implementations based on it since the dawn of time.  Only one parisc
and two sparc64 instances failed larger than required DMA masks, and
this series fixes that up and updates the documentation that devices
don't need to handle DMA mask fallbacks.



[PATCH 2/5] sparc64: refactor the ali DMA quirk

2019-02-15 Thread Christoph Hellwig
Do the quirk first in the dma_supported routines, as we don't need
any of the other checks for it, and remove the duplicate mask checking
that is already done by the callers.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/kernel/iommu.c |  7 +++---
 arch/sparc/kernel/kernel.h|  6 -
 arch/sparc/kernel/pci.c   | 46 ---
 arch/sparc/kernel/pci_sun4v.c |  5 +++-
 4 files changed, 27 insertions(+), 37 deletions(-)

diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index b1a09080e8da..0c253f1c852e 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -745,14 +745,13 @@ static int dma_4u_supported(struct device *dev, u64 
device_mask)
 {
struct iommu *iommu = dev->archdata.iommu;
 
+   if (ali_sound_dma_hack(dev, device_mask))
+   return 1;
+
if (device_mask > DMA_BIT_MASK(32))
return 0;
if ((device_mask & iommu->dma_addr_mask) == iommu->dma_addr_mask)
return 1;
-#ifdef CONFIG_PCI
-   if (dev_is_pci(dev))
-   return pci64_dma_supported(to_pci_dev(dev), device_mask);
-#endif
return 0;
 }
 
diff --git a/arch/sparc/kernel/kernel.h b/arch/sparc/kernel/kernel.h
index ddffd368e057..f6f498ba3198 100644
--- a/arch/sparc/kernel/kernel.h
+++ b/arch/sparc/kernel/kernel.h
@@ -45,7 +45,11 @@ void __irq_entry smp_receive_signal_client(int irq, struct 
pt_regs *regs);
 void __irq_entry smp_kgdb_capture_client(int irq, struct pt_regs *regs);
 
 /* pci.c */
-int pci64_dma_supported(struct pci_dev *pdev, u64 device_mask);
+#ifdef CONFIG_PCI
+int ali_sound_dma_hack(struct device *dev, u64 device_mask);
+#else
+#define ali_sound_dma_hack(dev, mask)  (0)
+#endif
 
 /* signal32.c */
 void do_sigreturn32(struct pt_regs *regs);
diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index bcfec6a85d23..5ed43828e078 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -956,51 +956,35 @@ void arch_teardown_msi_irq(unsigned int irq)
 }
 #endif /* !(CONFIG_PCI_MSI) */
 
-static void ali_sound_dma_hack(struct pci_dev *pdev, int set_bit)
+/* ALI sound chips generate 31-bits of DMA, a special register
+ * determines what bit 31 is emitted as.
+ */
+int ali_sound_dma_hack(struct device *dev, u64 device_mask)
 {
+   struct iommu *iommu = dev->archdata.iommu;
struct pci_dev *ali_isa_bridge;
u8 val;
 
-   /* ALI sound chips generate 31-bits of DMA, a special register
-* determines what bit 31 is emitted as.
-*/
+   if (!dev_is_pci(dev))
+   return 0;
+
+   if (to_pci_dev(dev)->vendor != PCI_VENDOR_ID_AL ||
+   to_pci_dev(dev)->device != PCI_DEVICE_ID_AL_M5451 ||
+   device_mask != 0x7fff)
+   return 0;
+
ali_isa_bridge = pci_get_device(PCI_VENDOR_ID_AL,
 PCI_DEVICE_ID_AL_M1533,
 NULL);
 
pci_read_config_byte(ali_isa_bridge, 0x7e, &val);
-   if (set_bit)
+   if (iommu->dma_addr_mask & 0x8000)
val |= 0x01;
else
val &= ~0x01;
pci_write_config_byte(ali_isa_bridge, 0x7e, val);
pci_dev_put(ali_isa_bridge);
-}
-
-int pci64_dma_supported(struct pci_dev *pdev, u64 device_mask)
-{
-   u64 dma_addr_mask;
-
-   if (pdev == NULL) {
-   dma_addr_mask = 0x;
-   } else {
-   struct iommu *iommu = pdev->dev.archdata.iommu;
-
-   dma_addr_mask = iommu->dma_addr_mask;
-
-   if (pdev->vendor == PCI_VENDOR_ID_AL &&
-   pdev->device == PCI_DEVICE_ID_AL_M5451 &&
-   device_mask == 0x7fff) {
-   ali_sound_dma_hack(pdev,
-  (dma_addr_mask & 0x8000) != 0);
-   return 1;
-   }
-   }
-
-   if (device_mask >= (1UL << 32UL))
-   return 0;
-
-   return (device_mask & dma_addr_mask) == dma_addr_mask;
+   return 1;
 }
 
 void pci_resource_to_user(const struct pci_dev *pdev, int bar,
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index fa0e42b4cbfb..d30eb22b6e11 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -676,6 +676,9 @@ static int dma_4v_supported(struct device *dev, u64 
device_mask)
struct iommu *iommu = dev->archdata.iommu;
u64 dma_addr_mask = iommu->dma_addr_mask;
 
+   if (ali_sound_dma_hack(dev, device_mask))
+   return 1;
+
if (device_mask > DMA_BIT_MASK(32)) {
if (iommu->atu)
dma_addr_mask = iommu->atu->dma_addr_mask;
@@ -685,7 +688,7 @@ static int dma_4v_supported(struct device *dev, u64 
device_mask)
 
if ((device_mask &a

[PATCH 3/5] sparc64/iommu: allow large DMA masks

2019-02-15 Thread Christoph Hellwig
We've been moving to a model where the device just sets the DMA mask
supported by it, instead of having to fallback to something it thinks
the platform might support.  Sparc64 is the remaining holdout forcing
drivers to supply a matching mask.  Change dma_4u_supported to just
check if the supplied dma mask is large enough as nothing in the
iommu.c code (or the core DMA code) actually looks at the DMA mask
later on.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/kernel/iommu.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 0c253f1c852e..4ae7388b1bff 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -748,11 +748,9 @@ static int dma_4u_supported(struct device *dev, u64 
device_mask)
if (ali_sound_dma_hack(dev, device_mask))
return 1;
 
-   if (device_mask > DMA_BIT_MASK(32))
+   if (device_mask < iommu->dma_addr_mask)
return 0;
-   if ((device_mask & iommu->dma_addr_mask) == iommu->dma_addr_mask)
-   return 1;
-   return 0;
+   return 1;
 }
 
 static const struct dma_map_ops sun4u_dma_ops = {
-- 
2.20.1



[PATCH 4/5] sparc64/pci_sun4v: allow large DMA masks

2019-02-15 Thread Christoph Hellwig
We've been moving to a model where the device just sets the DMA mask
supported by it, instead of having to fallback to something it thinks
the platform might support.  Sparc64 is the remaining holdout forcing
drivers to supply a matching mask.  Change dma_4v_supported to just
check if the supplied dma mask is large enough. and adjust the mapping
code to check ATU presence in addition to the DMA mask to decice on
the mapping method.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/kernel/pci_sun4v.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index d30eb22b6e11..a8af6023c126 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -92,7 +92,7 @@ static long iommu_batch_flush(struct iommu_batch *p, u64 mask)
prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE);
 
while (npages != 0) {
-   if (mask <= DMA_BIT_MASK(32)) {
+   if (mask <= DMA_BIT_MASK(32) || !pbm->iommu->atu) {
num = pci_sun4v_iommu_map(devhandle,
  HV_PCI_TSBID(0, entry),
  npages,
@@ -208,7 +208,7 @@ static void *dma_4v_alloc_coherent(struct device *dev, 
size_t size,
atu = iommu->atu;
 
mask = dev->coherent_dma_mask;
-   if (mask <= DMA_BIT_MASK(32))
+   if (mask <= DMA_BIT_MASK(32) || !atu)
tbl = &iommu->tbl;
else
tbl = &atu->tbl;
@@ -674,21 +674,12 @@ static void dma_4v_unmap_sg(struct device *dev, struct 
scatterlist *sglist,
 static int dma_4v_supported(struct device *dev, u64 device_mask)
 {
struct iommu *iommu = dev->archdata.iommu;
-   u64 dma_addr_mask = iommu->dma_addr_mask;
 
if (ali_sound_dma_hack(dev, device_mask))
return 1;
-
-   if (device_mask > DMA_BIT_MASK(32)) {
-   if (iommu->atu)
-   dma_addr_mask = iommu->atu->dma_addr_mask;
-   else
-   return 0;
-   }
-
-   if ((device_mask & dma_addr_mask) == dma_addr_mask)
-   return 1;
-   return 0;
+   if (device_mask < iommu->dma_addr_mask)
+   return 0;
+   return 1;
 }
 
 static const struct dma_map_ops sun4v_dma_ops = {
-- 
2.20.1



[PATCH] Documentation/DMA-ISA-LPC: fix an incorrect reference

2019-02-11 Thread Christoph Hellwig
AFAIK we never had a isa_virt_to_phys, it always was
isa_virt_to_bus.

Signed-off-by: Christoph Hellwig 
---
 Documentation/DMA-ISA-LPC.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/DMA-ISA-LPC.txt b/Documentation/DMA-ISA-LPC.txt
index 8c2b8be6e45b..b1ec7b16c21f 100644
--- a/Documentation/DMA-ISA-LPC.txt
+++ b/Documentation/DMA-ISA-LPC.txt
@@ -52,8 +52,8 @@ Address translation
 ---
 
 To translate the virtual address to a bus address, use the normal DMA
-API. Do _not_ use isa_virt_to_phys() even though it does the same
-thing. The reason for this is that the function isa_virt_to_phys()
+API. Do _not_ use isa_virt_to_bus() even though it does the same
+thing. The reason for this is that the function isa_virt_to_bus()
 will require a Kconfig dependency to ISA, not just ISA_DMA_API which
 is really all you need. Remember that even though the DMA controller
 has its origins in ISA it is used elsewhere.
-- 
2.20.1



Re: [PATCH v2] dma-debug: add dumping facility via debugfs

2019-02-01 Thread Christoph Hellwig
Thanks!

I had to adjust the patch a bit for the debugfs cleanups form Greg
that I applied before, can you check the result here:


http://git.infradead.org/users/hch/dma-mapping.git/commitdiff/0a3b192c26da198fce38e1ee242a34f558670246


Re: [RFC] Provide in-kernel headers for making it easy to extend the kernel

2019-01-19 Thread Christoph Hellwig
This seems like a pretty horrible idea and waste of kernel memory.
Just add support to kbuild to store a compressed archive in initramfs
and unpack it in the right place.


Re: [PATCH 2/3] kbuild: generate asm-generic wrappers if mandatory headers are missing

2018-12-06 Thread Christoph Hellwig
On Wed, Dec 05, 2018 at 08:28:05PM +0900, Masahiro Yamada wrote:
> Some time ago, Sam pointed out a certain degree of overwrap between
> generic-y and mandatory-y. (https://lkml.org/lkml/2017/7/10/121)
> 
> I a bit tweaked the meaning of mandatory-y; now it defines the minimum
> set of ASM headers that all architectures must have.
> 
> If arch does not have specific implementation of a mandatory header,
> Kbuild will let it fallback to the asm-generic one by automatically
> generating a wrapper. This will allow to drop lots of redundant
> generic-y defines.
> 
> Previously, "mandatory" was used in the context of UAPI, but I guess
> this can be extended to kernel space ASM headers.

How useful is it to keep the generic-y behavior around at all vs making
everything useful mandatory?


Re: [PATCH] arm64: Make kpti command line options x86 compatible

2018-11-15 Thread Christoph Hellwig
On Tue, Nov 13, 2018 at 04:20:46PM +0100, Alexander Graf wrote:
> I've already stumbled over 2 cases where people got confused about how to
> disable kpti on AArch64. In both cases, they used existing x86_64 options
> and just applied that to an AArch64 system, expecting it to work.
> 
> I think it makes a lot of sense to have compatible kernel command line
> parameters whenever we can have them be compatible.
> 
> So this patch adds the pti= and no_pti kernel command line options, mapping
> them into the existing kpti= command line framework. It preserves the old
> syntax to maintain compatibility with older command lines.
> 
> While at it, the patch also marks the respective options as dual-arch.

Thanks.  Which also brings up my old complainst that arm64 and x86 should
use the same config option.  Bonus points for moving the parsing code
to a common file..


Re: Re-run features-refresh.sh

2018-11-15 Thread Christoph Hellwig
FYI, we really should kill ARCH_SG_CHAIN in its current form.

See my series here, which could use a review or two:

https://lkml.org/lkml/2018/11/9/958


Re: [dm-devel] [PATCH] dm: add secdel target

2018-10-19 Thread Christoph Hellwig
On Fri, Oct 19, 2018 at 02:49:44PM +0300, Vitaly Chikunov wrote:
> On Thu, Oct 18, 2018 at 11:19:45PM -0700, Christoph Hellwig wrote:
> > Just as a note:  the name is a complete misowner, a couple overwrite
> > are not in any way secure deletion.  So naming it this way and exposing
> > this as erase is a problem that is going to get back to bite us.
> 
> In what way it's not secure deletion?
> 
> It's secure deletion by overwriting discarded data instead of leaving it
> as is.

Overwriting data does not delete data.  Most certainly not in Flash based
SSDs, but also not in many storage arrays, or for that matter many modern
disks that have sectore remapping and various kinds of non-volatile
caches.  There is a reason why devices tend to have special commands to
perform secure erase - depending on the media they might or might not
overwrite internally, but at least they do it in a way that actually
works for the given media and device configuration.


> dm-erase or dm-wipe? dm-discerase?

dm-overwrite?

> But still provide REQ_OP_SECURE_ERASE
> support?

On the one hand that is highly misleading and would warrant a warning
(see above), on the other hand discard is purely advisory and can
be skipped any time, including by intermediate layers.  So I don't think
you can actually do what you want without major changes to the whole
I/O stack.


Re: [dm-devel] [PATCH] dm: add secdel target

2018-10-18 Thread Christoph Hellwig
Just as a note:  the name is a complete misowner, a couple overwrite
are not in any way secure deletion.  So naming it this way and exposing
this as erase is a problem that is going to get back to bite us.

If you really want this anyway at least give it a different way, and
do a one-time warning when th first erase comes in that it is not in
any meaninful way secure.


Re: [GIT PULL] IDA/IDR fixes for 4.19

2018-10-17 Thread Christoph Hellwig
On Tue, Oct 16, 2018 at 10:16:08AM -0700, Matthew Wilcox wrote:
> > >   git://git.infradead.org/users/willy/linux-dax.git ida-fixes-4.19-rc8
> > 
> > How about you at least test these in linux-next?  Putting things on top
> > of the most recent change is a huge tip-off that this branch got no
> > testing :(
> 
> One of the two changes is a comment line in a doc file.
> The other has been tested by 0day (which originally reported the issue).
> 
> I don't see how spending time in linux-next is going to achieve anything.

Especialy if we miss 4.19 final with that.  The documentation patch
avoids the probem of CC-BY-SA-4.0 including larger amounts of GPL files
in the output document, so we should have this before the release for
sure, and the other is just a test that isn't even in a normal kernel
build.

Also after idr.rst is removed we should include something like this:

---
>From c3660257f981e7a7254d18f52af64a2077f7bb49 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Thu, 18 Oct 2018 08:22:39 +0200
Subject: LICENSES: Remove CC-BY-SA-4.0 license text

Using non-GPL licenses for our documentation is rather problematic,
as it can directly include other files, which generally are GPLv2
licensed and thus not compatible.

Remove this license now that the only user (idr.rst) is gone to avoid
people semi-accidentally using it again.

Signed-off-by: Christoph Hellwig 
---
 LICENSES/other/CC-BY-SA-4.0 | 397 
 1 file changed, 397 deletions(-)
 delete mode 100644 LICENSES/other/CC-BY-SA-4.0

diff --git a/LICENSES/other/CC-BY-SA-4.0 b/LICENSES/other/CC-BY-SA-4.0
deleted file mode 100644
index f9158e831e79..
--- a/LICENSES/other/CC-BY-SA-4.0
+++ /dev/null
@@ -1,397 +0,0 @@
-Valid-License-Identifier: CC-BY-SA-4.0
-SPDX-URL: https://spdx.org/licenses/CC-BY-SA-4.0
-Usage-Guide:
-  To use the Creative Commons Attribution Share Alike 4.0 International
-  license put the following SPDX tag/value pair into a comment according to
-  the placement guidelines in the licensing rules documentation:
-SPDX-License-Identifier: CC-BY-SA-4.0
-License-Text:
-
-Creative Commons Attribution-ShareAlike 4.0 International
-
-Creative Commons Corporation ("Creative Commons") is not a law firm and
-does not provide legal services or legal advice. Distribution of Creative
-Commons public licenses does not create a lawyer-client or other
-relationship. Creative Commons makes its licenses and related information
-available on an "as-is" basis. Creative Commons gives no warranties
-regarding its licenses, any material licensed under their terms and
-conditions, or any related information. Creative Commons disclaims all
-liability for damages resulting from their use to the fullest extent
-possible.
-
-Using Creative Commons Public Licenses
-
-Creative Commons public licenses provide a standard set of terms and
-conditions that creators and other rights holders may use to share original
-works of authorship and other material subject to copyright and certain
-other rights specified in the public license below. The following
-considerations are for informational purposes only, are not exhaustive, and
-do not form part of our licenses.
-
-Considerations for licensors: Our public licenses are intended for use by
-those authorized to give the public permission to use material in ways
-otherwise restricted by copyright and certain other rights. Our licenses
-are irrevocable. Licensors should read and understand the terms and
-conditions of the license they choose before applying it. Licensors should
-also secure all rights necessary before applying our licenses so that the
-public can reuse the material as expected. Licensors should clearly mark
-any material not subject to the license. This includes other CC-licensed
-material, or material used under an exception or limitation to
-copyright. More considerations for licensors :
-wiki.creativecommons.org/Considerations_for_licensors
-
-Considerations for the public: By using one of our public licenses, a
-licensor grants the public permission to use the licensed material under
-specified terms and conditions. If the licensor's permission is not
-necessary for any reason - for example, because of any applicable exception
-or limitation to copyright - then that use is not regulated by the
-license. Our licenses grant only permissions under copyright and certain
-other rights that a licensor has authority to grant. Use of the licensed
-material may still be restricted for other reasons, including because
-others have copyright or other rights in the material. A licensor may make
-special requests, such as asking that all changes be marked or described.
-
-Although not required by our licenses, you are encouraged to respect those
-requests where reasonable. More considerations for the public :
-wiki.creativecommons.org/Considerations_for_licensees
-
-Creativ

Re: [PATCH v2 00/22] xfs-4.20: major documentation surgery

2018-10-15 Thread Christoph Hellwig
On Thu, Oct 11, 2018 at 11:27:35AM -0600, Jonathan Corbet wrote:
> On Sat, 6 Oct 2018 10:51:54 +1000
> Dave Chinner  wrote:
> 
> > Can you let us know whether the CC-by-SA 4.0 license is acceptible
> > or not? That's really the only thing that we need clarified at this
> > point - if it's OK I'll to pull this into the XFS tree for the 4.20
> > merge window. If not, we'll go back to the drawing board
> 
> OK, I've had a long conversation with the LF lawyer, and she said clearly
> that we really should not be introducing CC-SA material into the kernel
> source tree.  It's a pain; I really do like CC-SA better for
> documentation, but it's not GPL-compatible, and that creates a problem for
> the processed docs.

That was exactly my concern with this patchset.

Btw, I think we have the same issue with idr.rst, and unless we can
relicense it we should drop it from the tree, as it actually includes
kernel source files.


Re: [PATCH 13/18] wait: wait.h: Get rid of a kernel-doc/Sphinx warnings

2018-05-10 Thread Christoph Hellwig
>   * Use either while holding wait_queue_head::lock or when used for wakeups
> - * with an extra smp_mb() like:
> + * with an extra smp_mb() like::

Independent of any philosophical discussion not allowing a setence to
end with a single ':' is completely idiotic.  Please fix the tooling
instead to allow it, as it is very important for being able to just
write understandable comments.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] mm: replace __HAVE_ARCH_PTE_SPECIAL

2018-04-09 Thread Christoph Hellwig
> -#ifdef __HAVE_ARCH_PTE_SPECIAL
> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
>  # define HAVE_PTE_SPECIAL 1
>  #else
>  # define HAVE_PTE_SPECIAL 0

I'd say kill this odd indirection and just use the
CONFIG_ARCH_HAS_PTE_SPECIAL symbol directly.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/16] remove eight obsolete architectures

2018-03-15 Thread Christoph Hellwig
On Thu, Mar 15, 2018 at 11:42:25AM +0100, Arnd Bergmann wrote:
> Is anyone producing a chip that includes enough of the Privileged ISA spec
> to have things like system calls, but not the MMU parts?

Various SiFive SOCs seem to support M and U mode, but no S mode or
iommu.  That should be enough for nommu Linux running in M mode if
someone cares enough to actually port it.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] soc: qcom: Add GENI based QUP Wrapper driver

2018-03-08 Thread Christoph Hellwig
On Thu, Mar 08, 2018 at 01:24:45PM +, Robin Murphy wrote:
> Implementing dma_map_ops inside a driver for real hardware is almost always 
> the wrong thing to do.

Agreed.  dma_map_ops should be a platform decision based on the bus.

Even our dma_virt_ops basically just works around bad driver layering.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci: Add a acs_disable option for pci kernel parameter

2017-10-26 Thread Christoph Hellwig
On Thu, Oct 26, 2017 at 08:37:49PM -0600, sba...@raithlin.com wrote:
> From: Stephen Bates 
> 
> On some servers the BIOS sets up ACS on any valid pci_dev in the
> system. The kernel has no way of backing this out since the kernel
> only turns ACS capabilities on.
> 
> This patch adds a new boot option to the pci kernel parameter called
> "acs_disable" that will disable ACS. This is useful for PCI peer to
> peer communication but can cause problems when IOVA isolation is
> required and an IOMMU is enabled. Use with care.

Eww.  Can we please add smbios quirks for the systems where you've
observed this? (we probably also want to keep the option just in case).
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RESEND 08/12] ima: added parser for RPM data type

2017-08-01 Thread Christoph Hellwig
On Tue, Aug 01, 2017 at 12:20:36PM +0200, Roberto Sassu wrote:
> This patch introduces a parser for RPM packages. It extracts the digests
> from the RPMTAG_FILEDIGESTS header section and converts them to binary data
> before adding them to the hash table.
> 
> The advantage of this data type is that verifiers can determine who
> produced that data, as headers are signed by Linux distributions vendors.
> RPM headers signatures can be provided as digest list metadata.

Err, parsing arbitrary file formats has no business in the kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/5] mm: add vm_insert_mixed_mkwrite()

2017-07-25 Thread Christoph Hellwig
On Tue, Jul 25, 2017 at 11:35:08AM +0200, Jan Kara wrote:
> On Tue 25-07-17 10:01:58, Christoph Hellwig wrote:
> > On Tue, Jul 25, 2017 at 01:14:00AM +0300, Kirill A. Shutemov wrote:
> > > I guess it's up to filesystem if it wants to reuse the same spot to write
> > > data or not. I think your assumptions works for ext4 and xfs. I wouldn't
> > > be that sure for btrfs or other filesystems with CoW support.
> > 
> > Or XFS with reflinks for that matter.  Which currently can't be
> > combined with DAX, but I had a somewhat working version a few month
> > ago.
> 
> But in cases like COW when the block mapping changes, the process
> must run unmap_mapping_range() before installing the new PTE so that all
> processes mapping this file offset actually refault and see the new
> mapping. So this would go through pte_none() case. Am I missing something?

Yes, for DAX COW mappings we'd probably need something like this, unlike
the pagecache COW handling for which only the underlying block change,
but not the page.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/5] mm: add vm_insert_mixed_mkwrite()

2017-07-25 Thread Christoph Hellwig
On Tue, Jul 25, 2017 at 01:14:00AM +0300, Kirill A. Shutemov wrote:
> I guess it's up to filesystem if it wants to reuse the same spot to write
> data or not. I think your assumptions works for ext4 and xfs. I wouldn't
> be that sure for btrfs or other filesystems with CoW support.

Or XFS with reflinks for that matter.  Which currently can't be
combined with DAX, but I had a somewhat working version a few month
ago.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] changes.rst: explain the usage of virtual environment

2017-06-19 Thread Christoph Hellwig
On Mon, Jun 19, 2017 at 08:24:10AM -0300, Mauro Carvalho Chehab wrote:
> As the Sphinx build seems very fragile, specially for
> PDF output, add a notice about how to use it on a virtual
> environment.

Please don't.  python venv are good at one thing only, and that is
making a mess of your python module path.  Don't ever use them on a
system that has proper package management.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 25/34] swiotlb: Add warnings for use of bounce buffers with SME

2017-06-08 Thread Christoph Hellwig
On Wed, Jun 07, 2017 at 02:17:32PM -0500, Tom Lendacky wrote:
> Add warnings to let the user know when bounce buffers are being used for
> DMA when SME is active.  Since the bounce buffers are not in encrypted
> memory, these notifications are to allow the user to determine some
> appropriate action - if necessary.

And what would the action be?  Do we need a boot or other option to
disallow this fallback for people who care deeply?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] move visorbus out of staging to drivers/virt/visorbus

2017-06-07 Thread Christoph Hellwig
On Tue, Jun 06, 2017 at 04:49:09PM +0200, Greg KH wrote:
> On Mon, Jun 05, 2017 at 04:07:29PM -0400, David Kershner wrote:
> > This patchset moves drivers/staging/unisys/include to
> > include/linux/visorbus, and moves drivers/staging/unisys/visorbus to
> > drivers/virt/visorbus.
> 
> Um, are you thinking it is ready to be moved?  Have you asked for
> another review?
> 
> In a totally random chance, I was doing some driver core work today and
> I noticed that in drivers/staging/unisys/visorbus/visorbus_main.c, you
> have 2 tabs for your 'struct attribute' variables, which is really odd.

That's the least of the problems.  Just about any function in there is
doing functionally stupid things.  E.g. raw sg_phys() calls for I/O
instead of dma mapping routines or parsing SCSI INQUIRY data in the
driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] include: linux: visorbus: Add visorbus to include/linux directory

2017-06-07 Thread Christoph Hellwig
Please don't send any move patches but the actual code added to the
kernel proper.  And based on what's in linux-next I don't think this
giant pile of junk is anywhere near mergeable.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/22] PCI: dwc: designware: Add EP mode support

2017-03-08 Thread Christoph Hellwig
On Wed, Mar 08, 2017 at 03:32:03PM +, Joao Pinto wrote:
> #define PCIE_GET_ATU_INB_UNR_REG_ADDR(region, register)   \
>   ((0x3 << 20) | (region << 9) |  \
>   (0x1 << 8)

Can you turn this and any similar macro into inline functions?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/22] Documentation: PCI: Guide to use pci endpoint configfs

2017-02-17 Thread Christoph Hellwig
I'm commenting on the configfs layout here instead of the patch with the
code as the issues are easier to explain that way.  I think the layout
is a bit confusing and could be cleaner by making use of pre-created
entries and symlinks.  Here is my suggestion:

/sys/kernel/config/pci_ep/functions/
.. test/# a directory for each function 
driver
   ... user-specified-name1/
   ... user-specified-name2
.. nvme/
   ... user-specified-name42/   

Each directory under /sys/kernel/config/pci_ep/functions/ is owned
by a function drivers.  Under that function driver's directory you
can create a directory for each instance of a function driver.  The
configfs layout is controlled by the function driver.  E.g. your current
EPF fields would move into the test function driver, while the nvme
function would expose totally different fields.


/sys/kernel/config/pci_ep/controllers/
... dwc-0/
... function
... dwc-1/
... function
... vhost-0/
... function

Here you have a directory for each controller that can be bound
to a function.  The directories are pre-created for each
controller port that is EP capable.
Function is a symlink to the function instance above.
Additional parameters might also be present depending on the
EPC driver.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/22] PCI: endpoint: Introduce configfs entry for configuring EP functions

2017-02-17 Thread Christoph Hellwig
On Fri, Feb 17, 2017 at 03:20:23PM +0530, Kishon Vijay Abraham I wrote:
> Introduce a new configfs entry to configure the EP function (like
> configuring the standard configuration header entries) and to
> bind the EP function with EP controller.
> 
> Signed-off-by: Kishon Vijay Abraham I 
> ---
>  drivers/pci/endpoint/Kconfig  |   14 +-
>  drivers/pci/endpoint/Makefile |1 +
>  drivers/pci/endpoint/pci-ep-cfs.c |  427 
> +
>  3 files changed, 440 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/pci/endpoint/pci-ep-cfs.c
> 
> diff --git a/drivers/pci/endpoint/Kconfig b/drivers/pci/endpoint/Kconfig
> index 7eb1c79..8470f0b 100644
> --- a/drivers/pci/endpoint/Kconfig
> +++ b/drivers/pci/endpoint/Kconfig
> @@ -6,7 +6,6 @@ menu "PCI Endpoint"
>  
>  config PCI_ENDPOINT
>   bool "PCI Endpoint Support"
> - select CONFIGFS_FS
>   help
>  Enable this configuration option to support configurable PCI
>  endpoint. This should be enabled if the platform has a PCI
> @@ -14,8 +13,19 @@ config PCI_ENDPOINT
>  
>  Enabling this option will build the endpoint library, which
>  includes endpoint controller library and endpoint function
> -library.
> +library. This will also enable the configfs entry required to
> +configure the endpoint function and used to bind the
> +function with a endpoint controller.
>  
>  If in doubt, say "N" to disable Endpoint support.
>  
> +config PCI_ENDPOINT_CONFIGFS
> + bool "PCI Endpoint Configfs Support"
> + depends on PCI_ENDPOINT
> + select CONFIGFS_FS
> + help
> +This will enable the configfs entry that can be used to
> +configure the endpoint function and used to bind the
> +function with a endpoint controller.
> +
>  endmenu
> diff --git a/drivers/pci/endpoint/Makefile b/drivers/pci/endpoint/Makefile
> index dc1bc16..dd9163c 100644
> --- a/drivers/pci/endpoint/Makefile
> +++ b/drivers/pci/endpoint/Makefile
> @@ -4,3 +4,4 @@
>  
>  obj-$(CONFIG_PCI_ENDPOINT)   += pci-epc-core.o pci-epf-core.o\
>  pci-epc-mem.o
> +obj-$(CONFIG_PCI_ENDPOINT_CONFIGFS)  += pci-ep-cfs.o
> diff --git a/drivers/pci/endpoint/pci-ep-cfs.c 
> b/drivers/pci/endpoint/pci-ep-cfs.c
> new file mode 100644
> index 000..ed0f8c2
> --- /dev/null
> +++ b/drivers/pci/endpoint/pci-ep-cfs.c
> @@ -0,0 +1,427 @@
> +/**
> + * configfs to configure the PCI endpoint
> + *
> + * Copyright (C) 2017 Texas Instruments
> + * Author: Kishon Vijay Abraham I 
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 of
> + * the License as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +struct pci_epf_info {
> + struct config_group group;
> + struct list_head list;
> + struct pci_epf *epf;
> +};
> +
> +struct pci_ep_info {
> + struct config_group group;
> + struct config_group pci_epf_group;
> + /* mutex to protect pci_epf list */
> + struct mutex lock;
> + struct list_head pci_epf;
> + const char *epc_name;
> + struct pci_epc *epc;
> +};
> +
> +static inline struct pci_epf_info *to_pci_epf_info(struct config_item *item)
> +{
> + return container_of(to_config_group(item), struct pci_epf_info, group);
> +}
> +
> +static inline struct pci_ep_info *to_pci_ep_info(struct config_item *item)
> +{
> + return container_of(to_config_group(item), struct pci_ep_info, group);
> +}
> +
> +#define PCI_EPF_HEADER_R(_name)  
>\
> +static ssize_t pci_epf_##_name##_show(struct config_item *item,  char 
> *page)\
> +{   \
> + struct pci_epf *epf = to_pci_epf_info(item)->epf;  \
> + if (!epf->header) {\
> + WARN_ON_ONCE("epf device not bound to function driver\n"); \

WARN_ON_ONCE takes a string to evaluate as argument, not a message

> + return 0;   

and if we return 0 here the callers will retry because that is
interpreted as a short read.  The code should be something like:

if (WARN_ON_ONCE(!epf->header))
return -EINVAL;

> + if (!epf->header) {\
>

two small PCI documentation fixups

2017-02-14 Thread Christoph Hellwig
Avoid mentioning deprecated APIs and make the text a little
easier to understand.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] PCI: update pci.txt to talk about pci_alloc_irq_vectors

2017-02-14 Thread Christoph Hellwig
instead of the deprecated pci_enable_msi* APIs.

Signed-off-by: Christoph Hellwig 
---
 Documentation/PCI/pci.txt | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/Documentation/PCI/pci.txt b/Documentation/PCI/pci.txt
index 77f49dc5be23..98875f8cb7ba 100644
--- a/Documentation/PCI/pci.txt
+++ b/Documentation/PCI/pci.txt
@@ -382,18 +382,18 @@ The fundamental difference between MSI and MSI-X is how 
multiple
 "vectors" get allocated. MSI requires contiguous blocks of vectors
 while MSI-X can allocate several individual ones.
 
-MSI capability can be enabled by calling pci_enable_msi() or
-pci_enable_msix() before calling request_irq(). This causes
-the PCI support to program CPU vector data into the PCI device
-capability registers.
-
-If your PCI device supports both, try to enable MSI-X first.
-Only one can be enabled at a time.  Many architectures, chip-sets,
-or BIOSes do NOT support MSI or MSI-X and the call to pci_enable_msi/msix
-will fail. This is important to note since many drivers have
-two (or more) interrupt handlers: one for MSI/MSI-X and another for IRQs.
-They choose which handler to register with request_irq() based on the
-return value from pci_enable_msi/msix().
+MSI capability can be enabled by calling pci_alloc_irq_vectors with the
+PCI_IRQ_MSI and/or PCI_IRQ_MSIX flags before calling request_irq(). This
+causes the PCI support to program CPU vector data into the PCI device
+capability registers.  Many architectures, chip-sets, or BIOSes do NOT
+support MSI or MSI-X and a call to pci_alloc_irq_vectors with just
+the PCI_IRQ_MSI and PCI_IRQ_MSIX flags will fail, so try to always
+specify PCI_IRQ_LEGACY as well.
+
+Drivers that have different interrupt handlers for MSI/MSI-X and
+legacy INTx should chose the right one based on the msi_enabled
+and msix_enabled flags in the pci_dev structure after calling
+pci_alloc_irq_vectors.
 
 There are (at least) two really good reasons for using MSI:
 1) MSI is an exclusive interrupt vector by definition.
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] PCI: update MSI/MSI-X bits in PCIEBUS-HOWTO

2017-02-14 Thread Christoph Hellwig
Stop talking about low-level details that mention deprecated APIs and
concentrate on what service drivers should do and why.

Signed-off-by: Christoph Hellwig 
---
 Documentation/PCI/PCIEBUS-HOWTO.txt | 33 +++--
 1 file changed, 7 insertions(+), 26 deletions(-)

diff --git a/Documentation/PCI/PCIEBUS-HOWTO.txt 
b/Documentation/PCI/PCIEBUS-HOWTO.txt
index 6bd5f372adec..15f0bb3b5045 100644
--- a/Documentation/PCI/PCIEBUS-HOWTO.txt
+++ b/Documentation/PCI/PCIEBUS-HOWTO.txt
@@ -161,21 +161,13 @@ Since all service drivers of a PCI-PCI Bridge Port device 
are
 allowed to run simultaneously, below lists a few of possible resource
 conflicts with proposed solutions.
 
-6.1 MSI Vector Resource
-
-The MSI capability structure enables a device software driver to call
-pci_enable_msi to request MSI based interrupts. Once MSI interrupts
-are enabled on a device, it stays in this mode until a device driver
-calls pci_disable_msi to disable MSI interrupts and revert back to
-INTx emulation mode. Since service drivers of the same PCI-PCI Bridge
-port share the same physical device, if an individual service driver
-calls pci_enable_msi/pci_disable_msi it may result unpredictable
-behavior. For example, two service drivers run simultaneously on the
-same physical Root Port. Both service drivers call pci_enable_msi to
-request MSI based interrupts. A service driver may not know whether
-any other service drivers have run on this Root Port. If either one
-of them calls pci_disable_msi, it puts the other service driver
-in a wrong interrupt mode.
+6.1 MSI and MSI-X Vector Resource
+
+Once MSI or MSI-X interrupts are enabled on a device, it stays in this
+mode until they are disabled again.  Since service drivers of the same
+PCI-PCI Bridge port share the same physical device, if an individual
+service driver enables or disables MSI/MSI-X mode it may result
+unpredictable behavior.
 
 To avoid this situation all service drivers are not permitted to
 switch interrupt mode on its device. The PCI Express Port Bus driver
@@ -187,17 +179,6 @@ driver. Service drivers should use (struct 
pcie_device*)dev->irq to
 call request_irq/free_irq. In addition, the interrupt mode is stored
 in the field interrupt_mode of struct pcie_device.
 
-6.2 MSI-X Vector Resources
-
-Similar to the MSI a device driver for an MSI-X capable device can
-call pci_enable_msix to request MSI-X interrupts. All service drivers
-are not permitted to switch interrupt mode on its device. The PCI
-Express Port Bus driver is responsible for determining the interrupt
-mode and this should be transparent to service drivers. Any attempt
-by service driver to call pci_enable_msix/pci_disable_msix may
-result unpredictable behavior. Service drivers should use
-(struct pcie_device*)dev->irq and call request_irq/free_irq.
-
 6.3 PCI Memory/IO Mapped Regions
 
 Service drivers for PCI Express Power Management (PME), Advanced
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 31/37] misc: Add host side pci driver for pci test function device

2017-01-24 Thread Christoph Hellwig
On Thu, Jan 12, 2017 at 03:56:20PM +0530, Kishon Vijay Abraham I wrote:
> Add PCI endpoint test driver that can verify base address
> register, legacy interrupt/MSI interrupt and read/write/copy
> buffers between host and device. The corresponding pci-epf-test
> function driver should be used on the EP side.

Just curious:  what would you think of a text based (e.g. debugfs)
interface to avoid the need for a userspace tool here?

> +static const struct pci_device_id pci_endpoint_test_tbl[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_ANY_ID) },
> + { }
> +};
> +MODULE_DEVICE_TABLE(pci, pci_endpoint_test_tbl);

Also this looks really odd, and dangerous.  Probing for any
TI device will bind to all kinds of legit devices.  It would
be good if you could squeeze out a single id for this device
out of the TI group responsible for allocating it.  Otherwise
we might try some other venues, e.g. Red Hat through Qumranet
has PCI IDs available for virtio, which might have some left
for other Linux uses.

In general I fear the PCI ID allocation will become a worse
and worse issue once your framework goes in and we'll grow
more PCI device models in the kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/37] PCI: endpoint: Introduce configfs entry for configuring EP functions

2017-01-16 Thread Christoph Hellwig
On Mon, Jan 16, 2017 at 11:31:23AM +0530, Kishon Vijay Abraham I wrote:
> Actually not all devices have hardcoded headers. E.g the platform I'm using
> doesn't have hardcoded headers and it can be configured based on the function
> the user would like to use. If the devices are hardcoded, then using configfs
> can be skipped altogether. In such cases, APIs like pci_epf_create() can
> directly be used by the drivers instead of going via configfs.

That's exactly what I meant - the IDs need to be set by the driver for
the implemented PCI device, and it's not up to the gadget core to configur
e them, it's up to the implementation of the PCIe device which PCI it
exposes.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/37] PCI: endpoint: Introduce configfs entry for configuring EP functions

2017-01-13 Thread Christoph Hellwig
Hi Kishon,

a couple comments on the configfs layout based on my experiments with
your previous drop to implement a NVMe device using it.

I don't think most of these configfs files should be present here, as
they are properties of the implemented PCIe devices.  E.g. for my
NVMe device they will be sort of hardcoded most of the time, as they
would be for other devices that would always have a fixed vendor/device/
class ID, cacheline size, etc.

In the end what we'll to be able to do here is to be able to create
a directory for each function driver, which then can create it's own
attributes inside it.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/7] uapi: export all headers under uapi directories

2017-01-09 Thread Christoph Hellwig
On Fri, Jan 06, 2017 at 10:43:59AM +0100, Nicolas Dichtel wrote:
> Regularly, when a new header is created in include/uapi/, the developer
> forgets to add it in the corresponding Kbuild file. This error is usually
> detected after the release is out.
> 
> In fact, all headers under uapi directories should be exported, thus it's
> useless to have an exhaustive list.
> 
> After this patch, the following files, which were not exported, are now
> exported (with make headers_install_all):

... snip ...

> linux/genwqe/.install
> linux/genwqe/..install.cmd
> linux/cifs/.install
> linux/cifs/..install.cmd

I'm pretty sure these should not be exported!
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fs: configfs: don't return anything from drop_link

2016-12-01 Thread Christoph Hellwig
Thanks a lot Andrzej!

I've applied the patch, but I undid the reformatting of the nvmet
code to keep the patch as simple as possible.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 09/11] scsi: ufs: connect to RPMB subsystem

2016-11-07 Thread Christoph Hellwig
On Mon, Nov 07, 2016 at 07:27:38PM +, Winkler, Tomas wrote:
> I  value your opinion but I'm not responsible for inventing RPMB 
> and/or its  implementation storage devices (eMMC, UFC, NVMe), it's pretty 
> much done deal out there in the wild. 
> I'm just trying to provide common API above it.

And the common API must go through the SCSI midlayer.  If it can't we
won't support it, so please drop the UFS patches from the series.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 09/11] scsi: ufs: connect to RPMB subsystem

2016-11-07 Thread Christoph Hellwig
On Mon, Nov 07, 2016 at 09:53:12PM +0200, Tomas Winkler wrote:
> Register UFS RPMB LUN with the RPMB subsystem and provide
> implementation for the RPMB access operations. RPMB partition is
> accessed via a sequence of security protocol in and security protocol
> out commands with UFS specific parameters. This multi step process is
> abstracted into 4 basic RPMB commands.

This is a giant layering violation - the security protocol is not something
up to the LLDD but the core code.

And honestly the idea of defintining a security protocol in the UFS
spec is just as braindead.  If you care about this please take it up
with T10 to get RPMB support into one of the core SCSI specs instead
of a transport.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-15 Thread Christoph Hellwig
On Wed, Oct 12, 2016 at 08:06:40AM +1100, Dave Chinner wrote:
> Um, I seem to have completely missed that change - when did that
> happen and why?
> 
> Oh, it was part of the misguided "enable DAX on block devices"
> changes -

o, that commit just switched it to use ->f_mapping:

-   return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp));
+   return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);

The original version of it goes all the way back to introducing the
current-day DAX code in d475c6346 ("dax,ext2: replace XIP read and write
with DAX I/O");

> Hence I'd suggest that DAX check in io_is_direct() should be removed
> ASAP; the filesystems don't need it as they check the inode DAX
> state directly, and the code it "fixed" is no longer in the tree.

As long as ext4 still uses the overloaded direct_IO we need the
checks for DAX in the filemap.c generic read/write code.  It seems
like that's only two spots anyway, but I'd feel much safer once ext4
is switched over to the iomap version of the dax code.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 05/11] pci: rename *host* directory to *controller*

2016-10-12 Thread Christoph Hellwig
This is a big and painful change.  I'd suggest to either drop it for
now or convince Bjorn to take it as a scripted renamed just after -rc1.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 02/11] pci: endpoint: introduce configfs entry for configuring EP functions

2016-10-12 Thread Christoph Hellwig
On Wed, Sep 14, 2016 at 10:41:58AM +0530, Kishon Vijay Abraham I wrote:
> diff --git a/drivers/pci/endpoint/Kconfig b/drivers/pci/endpoint/Kconfig
> index a6d827c..f1dd206 100644
> --- a/drivers/pci/endpoint/Kconfig
> +++ b/drivers/pci/endpoint/Kconfig
> @@ -13,7 +13,9 @@ config PCI_ENDPOINT
>  
>  Enabling this option will build the endpoint library, which
>  includes endpoint controller library and endpoint function
> -library.
> +library. This will also enable the configfs entry required to
> +configure the endpoint function and used to bind the
> +function with a endpoint controller.
>  
>  If in doubt, say "N" to disable Endpoint support.

This needs to grow a

depends on CONFIGFS_FS

> +/**
> + * pci-ep-cfs.c - configfs to configure the PCI endpoint

Please don't use the file name in the top of the file comment, it's
only bound to get out of date..

> +struct pci_epf_info {
> + struct config_item pci_epf;
> + struct pci_epf *epf;
> +};

Any reason not to simply embedd the config_item into the pci_epf structure?

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/11] pci: support for configurable PCI endpoint

2016-10-12 Thread Christoph Hellwig
On Mon, Sep 26, 2016 at 11:38:41AM +0530, Kishon Vijay Abraham I wrote:
> > Ok, so in theory there can be other hardware (and quite likely is)
> > that supports multiple functions, and we can extend the framework
> > to support them without major obstacles, but your hardware doesn't,
> > so you kept it simple with one hardcoded function, right?
> 
> right, PCIe can have upto 8 functions. So the issues with the current 
> framework
> has to be fixed. I don't expect major obstacles with this as of now.

I wouldn't be too worried about.  We have two kinds of functions in
PCIe: physical functions, or virtual functions using SR-IOV.

For the first one we pretty much just need the controller driver to
report them separately as there is almost no interaction between
functions.

SR-IOV support will be more interesting as the physical functions
controls creation of the associated virtual functions.  I'd like to
defer that problem until we get hold of a software programmable
controller that supports SR-IOV and has open documentation.  (That
beeing said, if someone has a pointer to such a beast send it my way!)

> > We should still find out whether it's important that you can have
> > a single PCI function with a software multi-function support of some
> > sort. We'd still be limited to six BARs in total, and would also need
> > something to identify those sub-functions, so implementing that might
> > get quite hairy.
> > 
> > Possibly this could be done at a higher level, e.g. by implementing
> > a PCI-virtio multiplexer that can host multiple virtio based devices
> > inside of a single PCI function. If we think that would be a good idea,
> > we should make sure the configfs interface is extensible enough to
> > handle that.
> 
> Okay. So here the main function (actual PCI function) *can* perform the work 
> of
> virtio muliplexer if the platform wants to support sub-functions or it can be 
> a
> normal PCI function. right?

I really don't think we should be worried about this multiplexer.  It's
not something real PCIe devices do (sane ones anyway, the rest is
handled by ad-hoc multiplexers), and we should avoid creating our own
magic periphals for it.

> > One use case I have in mind for this is to have a PCI function that
> > can use virtio to provide rootfs (virtio-blk or 9pfs), network
> > and console to the system that implements the PCI function (note
> > that this is the opposite direction of what almost everyone else
> > uses PCI devices for).
> 
> Do you mean the virtio should actually be in the host side? Even here the
> system that implements PCI function should have multiple functions right? (one
> for network, other for console etc..). So there should be a virtio multiplexer
> both in the host side and in the device side?

We already support virtio over phsysical PCIe buses to support intel MIC
devices.  Take a look at drivers/misc/mic/bus/vop_bus.c and
drivers/misc/mic/vop (yes, what a horrible place for that code, not my
fault)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 01/11] pci: endpoint: add EP core layer to enable EP controller and EP functions

2016-10-12 Thread Christoph Hellwig
> +/**
> + * pci_epc_stop() - stop the PCI link
> + * @epc: the link of the EPC device that has to be stopped
> + *
> + * Invoke to stop the PCI link
> + */
> +void pci_epc_stop(struct pci_epc *epc)
> +{
> + if (IS_ERR(epc) || !epc->ops->stop)
> + return;
> +
> + spin_lock_irq(&epc->irq_lock);
> + epc->ops->stop(epc);
> + spin_unlock_irq(&epc->irq_lock);
> +}
> +EXPORT_SYMBOL_GPL(pci_epc_stop);

Can you elaborate on the synchronization strategy here?  It seems
like irq_lock is generally taken irq save and just around method
calls.  Wou;dn't it be better to leave locking to the methods
themselves?

> +/**
> + * struct pci_epc - represents the PCI EPC device
> + * @dev: PCI EPC device
> + * @ops: function pointers for performing endpoint operations
> + * @mutex: mutex to protect pci_epc ops
> + */
> +struct pci_epc {
> + struct device   dev;
> + /* support only single function PCI device for now */
> + struct pci_epf  *epf;
> + const struct pci_epc_ops*ops;
> + spinlock_t  irq_lock;
> +};

And this still documentes a mutex instead of the irq save spinlock,
while we're at it..

> +/**
> + * struct pci_epf_bar - represents the BAR of EPF device
> + * @phys_addr: physical address that should be mapped to the BAR
> + * @size: the size of the address space present in BAR
> + */
> +struct pci_epf_bar {
> + dma_addr_t  phys_addr;
> + size_t  size;
> +};

Just curious: shouldn't this be a phys_addr_t instead of a dma_addr_t?


Otherwise this looks like a nice little framework to get started!
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-10 Thread Christoph Hellwig
On Mon, Oct 10, 2016 at 05:07:45PM +1100, Dave Chinner wrote:
> > > *However*, the DAX IO path locking in XFS  has changed in 4.9-rc1 to
> > > match the buffered IO single writer POSIX semantics - the test is a
> > > bad test based on the fact it exercised a path that is under heavy
> > > development and so can't be used as a regression test across
> > > multiple kernels.
> > 
> > That being said - I wonder if we should allow the shared lock on DAX
> > files IFF the user is specifying O_DIRECT in the open mode..
> 
> It should do - if it doesn't then we screwed up the IO path
> selection logic in XFS and we'll need to fix it.

Depends on your defintion of "we".  The DAX code has always abused the
direct I/O path, and that abuse is ingrained in the VFS path in a way that
we can't easily undo it in XFS, e.g. take a look at io_is_direct and
iocb_flags in include/linux/fs.h, which ensure that DAX I/O will always
appear as IOCB_DIRECT to the fs.  It will take some time to untagle
this, but it's on my todo list once the last file system (ext4)
untangles the DAX and direct I/O path.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-09 Thread Christoph Hellwig
On Fri, Oct 07, 2016 at 08:47:51AM +1100, Dave Chinner wrote:
> Except that it's DAX, and in 4.7-rc1 that used shared locking at the
> XFS level and never took exclusive locks.
> 
> *However*, the DAX IO path locking in XFS  has changed in 4.9-rc1 to
> match the buffered IO single writer POSIX semantics - the test is a
> bad test based on the fact it exercised a path that is under heavy
> development and so can't be used as a regression test across
> multiple kernels.

That being said - I wonder if we should allow the shared lock on DAX
files IFF the user is specifying O_DIRECT in the open mode..
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] printk: introduce kptr_restrict level 3

2016-10-06 Thread Christoph Hellwig
On Thu, Oct 06, 2016 at 01:47:47PM +, Roberts, William C wrote:
> Out of tree modules still affect core kernel security.

So don't use them.

> I would also bet money, that somewhere
> In-tree someone has put a %p when they wanted a %pK.

So fix them.

> So this method is just quite error
> prone. We currently have a blacklist approach versus whitelist.

Or fix the entire thing, get rid of %pK and always protect %p if you
can show that it doesn't break anything.

But stop posting patches with bullshit arguments like out of tree
modules.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] printk: introduce kptr_restrict level 3

2016-10-06 Thread Christoph Hellwig
On Wed, Oct 05, 2016 at 02:04:46PM -0400, william.c.robe...@intel.com wrote:
> From: William Roberts 
> 
> Some out-of-tree modules do not use %pK and just use %p, as it's
> the common C paradigm for printing pointers. Because of this,
> kptr_restrict has no affect on the output and thus, no way to
> contain the kernel address leak.

So what?  We a) don't care about out of tree modules and b) you could
just triviall fix them up if you care.

No need to bloat the kernel with crap like this.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-10-04 Thread Christoph Hellwig
FYI, the patches look fine to me:

Acked-by: Christoph Hellwig 

but we're past the merge window for 4.9 now unfortunately.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-11 Thread Christoph Hellwig
On Sun, Sep 11, 2016 at 11:14:09AM -0600, Jason Gunthorpe wrote:
> > > We stil always have the common structure first.  And at least for
> > > cgroups supports that's what matters.
> > >
> > > Re the actual structures - we'll really need to make sure we
> > >
> > >  a) expose proper userspace abi headers in the kernel for all code
> > > in the RDMA subsystem
> > >  b) actually use that in the userspace components
> > >
> > > I've posted some initial work toward a) a while ago, and once we
> 
> Did it get merged? Do you have a pointer?

http://www.spinics.net/lists/linux-rdma/msg31958.html

> this without it would be very hard, as everything is cross-linked, I
> couldnn't unwind libibcm until I fixed a bit of verbs, and rdmacm can't
> even include its uapi header until the duplicate definitions in the
> verbs copy are delt with .. and I've also learned we are making
> changing to the kernel uapi header and since nothing uses them we never even
> compile test :( :( eg
> https://github.com/torvalds/linux/commit/b493d91d333e867a043f7ff1397bcba6e2d0dda2]

> However, everything under verbs is not straightforward. The files in
> userspace are not copies...
> 
> user:
> 
> struct ibv_query_device {
>__u32 command;
>__u16 in_words;
>__u16 out_words;
>__u64 response;
>__u64 driver_data[0];
> };
> 
> kernel:
> 
> struct ib_uverbs_query_device {
> __u64 response;
> __u64 driver_data[0];
> };

We'll obviously need different strutures for the libibvers API
and the kernel interface in this case, and we'll need to figure out
how to properly translate them.  I think a cast, plus compile time
type checking ala BUILD_BUG_ON is the way to go.

> eg the userspace version stuffs the header into the struct and the
> kernel version does not. Presumably this is for efficiency so that no
> copies are required when marshaling. This impacts everything :(
> 
> I'm thinking the best way forward might be to use a script and
> transform userspace into:
> 
> struct ibv_query_device {
>   struct ib_uverbs_cmd_hdr hdr;
>   struct ib_uverbs_query_device cmd;
> };

That would break the users of the interface.  However automatically
generating the user ABI from the kernel one might still be a good idea
in the long run.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] DMA-API-HOWTO: is no more

2016-09-11 Thread Christoph Hellwig
So don't mention it.

Signed-off-by: Christoph Hellwig 
---
 Documentation/DMA-API-HOWTO.txt | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index 781024e..494ffac 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -931,10 +931,8 @@ to "Closing".
 
 1) Struct scatterlist requirements.
 
-   Don't invent the architecture specific struct scatterlist; just use
-   . You need to enable
-   CONFIG_NEED_SG_DMA_LENGTH if the architecture supports IOMMUs
-   (including software IOMMU).
+   You need to enable CONFIG_NEED_SG_DMA_LENGTH if the architecture
+   supports IOMMUs (including software IOMMU).
 
 2) ARCH_DMA_MINALIGN
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-11 Thread Christoph Hellwig
On Sat, Sep 10, 2016 at 11:01:51AM -0600, Jason Gunthorpe wrote:
> Sadly, it isn't a step backwards, it is status quo - at least as far
> as the uapi is concerned.

Sort of, see below:

> struct mlx5_create_cq {
> struct ibv_create_cqibv_cmd;
> __u64   buf_addr;
> __u64   db_addr;
> __u32 cqe_size;
> };
> 
> struct iwch_create_cq {
> struct ibv_create_cq ibv_cmd;
> uint64_t user_rptr_addr;
> };
> 
> Love to hear ideas on a way forward that doesn't involve rewriting
> everything :(

We stil always have the common structure first.  And at least for
cgroups supports that's what matters.

Re the actual structures - we'll really need to make sure we

 a) expose proper userspace abi headers in the kernel for all code
in the RDMA subsystem
 b) actually use that in the userspace components

I've posted some initial work toward a) a while ago, and once we
agree on adopting your common repo I'd really like to start through
with that work.  I think it's a pre-requisite for any major new
userspace ABI work.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-10 Thread Christoph Hellwig
On Wed, Sep 07, 2016 at 11:51:42AM +0300, Matan Barak wrote:
> All recent proposals of the new ABI schema deals with extending the 
> flexibility of the current schema by letting drivers define their specific 
> types, actions, attributes, etc. Even more than that, the dispatching 
> starts from the driver and it chooses if it wants to use the common RDMA 
> core layer or have it's own wise implementation instead.
> Some drivers might even prefer not to implement the current verbs types.
> These decisions were made in the OFVWG meetings.

OFVWG meetings have absolutely zero relevance for Linux development.
More "flexibility" for drivers just means giving up on designing a
coherent API and leaving it to drivers authors to add crap to their
own drivers.  That's a major step backwards.

> Sounds reasonable, but what about drivers which ignore the common code and 
> implement it in their own way? What about drivers which don't support the 
> standard RDMA types at all?

They should not be using the code in drivers/infiniband.  usnic is such
an example of a driver that should never have been added in it's current
form.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-10 Thread Christoph Hellwig
On Wed, Sep 07, 2016 at 01:25:13PM +0530, Parav Pandit wrote:
> >  a) delay cgroups support until the grand rewrite is done
> >  b) add it now and deal with the consequences later
> >
> Can we do (b) now and differ adding any HW resources to cgroup until
> they are clearly called out.
> Architecture and APIs are already in place to support this.

Sounds fine to me.  The only thing I want to avoid is pie in the
sky "future proofing" that leads to horrible architectures.  And I assume
that's what Matan proposed.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-09-01 Thread Christoph Hellwig
On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote:
> Well, if I recall, the reason doing so last time was in order to allow 
> flexible updating of ib_core independently, which is obviously not a good 
> reason (to say the least).
>
> Since the new ABI will probably define new object types (all recent 
> proposals go this way), the current approach could lead to either trying to 
> map new objects to existing cgroup resource types, which could lead to some 
> weird non 1:1 mapping, or having a split definitions - such that each 
> driver will declare its objects both in the cgroups mechanism and in its 
> driver dispatch table.
> Even worse than that, drivers could simply ignore the cgroups support while 
> implementing their own resource types and get a very broken containers 
> support.

Sorry guys, but arbitrary extensibility for something not finished is the
worst idea ever.  We have two options here:

 a) delay cgroups support until the grand rewrite is done
 b) add it now and deal with the consequences later

That being said, adding random non-Verbs hardwasre to the RDMA core is
the second worst idea ever.  Guess I need to catch up with the
discussion and start using the flame thrower.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv11 0/3] rdmacg: IB/core: rdma controller support

2016-08-25 Thread Christoph Hellwig
On Wed, Aug 24, 2016 at 05:17:47PM -0400, Tejun Heo wrote:
> Looks good to me.  I just have a nit in the documentation.  Christoph,
> what do you think?

Looks reasonable from a quick look, but I didn't do a full review yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/10] docs: sphinxify sparse.txt and move to dev-tools

2016-08-18 Thread Christoph Hellwig
On Thu, Aug 18, 2016 at 05:46:30PM -0600, Jonathan Corbet wrote:
> So would the old hats be happier with a patch that looks like this?  The
> quality of the formatted output suffers slightly, but it's not a big
> deal...

I think this a major improvement.  The only thing that still strikes me
as rather odd are the "::" and ".." notations, but maybe I'll get used
to it..
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/10] docs: sphinxify sparse.txt and move to dev-tools

2016-08-09 Thread Christoph Hellwig
On Tue, Aug 09, 2016 at 10:28:38AM +0200, Daniel Vetter wrote:
> The point is to make the docs more discoverable by being able to
> cross-link them. Old hats like us don't need that, but it definitely
> has value in bringing new folks on board.

But do that in a way that keeps the old hats happy.  The crazy use of
punctuation and the weird quotes is an absolute no-go.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/10] docs: sphinxify sparse.txt and move to dev-tools

2016-08-09 Thread Christoph Hellwig
On Tue, Aug 09, 2016 at 11:19:50AM +0300, Jani Nikula wrote:
> On Tue, 09 Aug 2016, Christoph Hellwig  wrote:
> > The ugly format is a major regression over a proper simple text
> > file.  What's the point?
> 
> Major regression? Please be reasonable.
> 
> I think the changes are rather small, and it's a fair compromise between
> a simple text file and one that can be used to generate pretty
> documentation [1].

It's a lot less pretty to read - it look like a cat threw up to be
specific.  The point of the Document is to read it quickly in the kernel
tree.  If you want fancy websites write a separate document.

> [1] http://static.lwn.net/kerneldoc/dev-tools/sparse.html

And I wouldn't exactly call that pretty.  There is no value add
over a simple text file here, and it requires a browser to not look
ugly instead of a text editor, which is a giant usablity regression.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/10] docs: sphinxify sparse.txt and move to dev-tools

2016-08-09 Thread Christoph Hellwig
The ugly format is a major regression over a proper simple text
file.  What's the point?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] DMA-API-HOWTO: is no more

2016-07-11 Thread Christoph Hellwig
So don't mention it.

Signed-off-by: Christoph Hellwig 
---
 Documentation/DMA-API-HOWTO.txt | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index 781024e..494ffac 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -931,10 +931,8 @@ to "Closing".
 
 1) Struct scatterlist requirements.
 
-   Don't invent the architecture specific struct scatterlist; just use
-   . You need to enable
-   CONFIG_NEED_SG_DMA_LENGTH if the architecture supports IOMMUs
-   (including software IOMMU).
+   You need to enable CONFIG_NEED_SG_DMA_LENGTH if the architecture
+   supports IOMMUs (including software IOMMU).
 
 2) ARCH_DMA_MINALIGN
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "mm: rename _count, field of the struct page, to _refcount"

2016-06-16 Thread Christoph Hellwig
On Thu, Jun 16, 2016 at 11:22:46AM +0200, Vitaly Kuznetsov wrote:
> _count -> _refcount rename in commit 0139aa7b7fa12 ("mm: rename _count,
> field of the struct page, to _refcount") broke kdump. makedumpfile(8) does
> stuff like READ_MEMBER_OFFSET("page._count", page._count) and fails. While
> it is definitely possible to fix this particular tool I'm not sure about
> other tools which might be doing the same.
> 
> I suggest we remember the "we don't break userspace" rule and revert for
> 4.7 while it's not too late.

Err, sorry - this is not "userspace".  It's crazy crap digging into
kernel internal structure.

The rename was absolutely useful, so fix up your stinking pike in kdump.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add kernel parameter to blacklist modules

2016-06-14 Thread Christoph Hellwig
On Mon, Jun 13, 2016 at 08:32:41AM -0400, Prarit Bhargava wrote:
> Blacklisting a module in linux has long been a problem.  The process of
> blacklisting a module has changed over time, and it seems that every OS
> does it slightly differently and depends on the age of the init system
> used on that OS.

And why would we care about blacklisting a module?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 01/44] dma-mapping: Use unsigned long for dma_attrs

2016-06-13 Thread Christoph Hellwig
On Fri, Jun 10, 2016 at 04:49:47PM +0200, Luis R. Rodriguez wrote:
> Do we not expect the number of argument to grow ? This "cleanup" would
> do away with such possibilities, and then require adding the API later,
> and this requiring a full set of collateral evolutions again when this
> is needed. What was the original motivation for using this instead of
> the approach you are suggesting ?

We still got plenty of space for attrs.  If you're worried about running
out of 32 flags we could do a dma_attrs_t typedef that we could swich
to a u64.  That would have another advantage in that we could add a
__bitwise sparse annotation to avoid people passing the wrong kind of
flags.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] dma-mapping: Use unsigned long for dma_attrs

2016-06-01 Thread Christoph Hellwig
On Wed, Jun 01, 2016 at 07:36:42AM +0200, Krzysztof Kozlowski wrote:
> > No really for this patch, but I would much prefer to document them next
> > to the code in the long run.  Also I really think these BIT() macros
> > are a distraction compared to the (1 << N) notation.
> 
> Not much difference to me but maybe plain number:
> ...   0x01u
> ...   0x02u
> ?

I prefer the little bit shifts, but even the explicit values are much
better than the obsfucating macros :)  Anyway, your patch and in the end
all three methods will get the work done.

> > I'd just kill this helper, much easier to simply open code it in the
> > caller.
> 
> Keeping it for now helps reducing the number of changes in the patch.
> The patch will be quite big as it has to replace all the uses atomically.
> 
> I can get rid of the helper in consecutive patch.

Sounds fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >