Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-27 Thread Shawn Anastasio




On 5/28/19 12:36 AM, Oliver wrote:

On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio  wrote:


Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
  drivers/pci/pci.c   | 9 +++--
  include/linux/pci.h | 1 +
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
 return 0;
  }

+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
  #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
  static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
 p = resource_alignment_param;
 if (!*p && !align)
 goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
 align = 0;
-   pr_info_once("PCI: Ignoring requested alignments 
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
 goto out;
 }


I think the logic here is questionable to begin with. If the user has
explicitly requested re-aligning a resource via the command line then
we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
they get to keep the pieces.

That said, the real issue here is that PCI_PROBE_ONLY probably
shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
hotplugged devices are configured by firmware before it's passed to
the guest and we need to keep the FW assignments otherwise things
break. QEMU however doesn't do any BAR assignments and relies on that
being handled by the guest. At boot time this is done by SLOF, but
Linux only keeps SLOF around until it's extracted the device-tree.
Once that's done SLOF gets blown away and the kernel needs to do it's
own BAR assignments. I'm guessing there's a hack in there to make it
work today, but it's a little surprising that it works at all...

Interesting, I wasn't aware that hotplugged devices are configured
by the hypervisor on PowerVM. That at least means that this patch is
wrong as-is since it won't handle that properly. Definitely seems like
there will need to be different behavior here depending on the hypervisor.

That being said, wouldn't PCI_PROBE_ONLY still be set on pseries/kvm
(at least for initial boot) to observe SLOF's original BAR assignments?
Perhaps it should be un-set after initial PCI init?



IIRC Sam Bobroff was looking at hotplug under pseries recently so he
might have something to add. He's sick at the moment, but I'll ask him
to take a look at this once he's back among the living


Good to know. I'll await his comments before continuing here.


diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a5a84d7bdd4..47471dcdbaf9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int 
active) {}
  int pcibios_alloc_irq(struct pci_dev *dev);
  void pcibios_free_irq(struct pci_dev *dev);
  resource_size_t pcibios_default_alignment(void);
+int pcibios_ignore_alignment_request(void);

  #ifdef CONFIG_HIBERNATE_CALLBACKS
  extern struct dev_pm_ops pcibios_pm_ops;
--
2.20.1



Re: [PATCH v3 14/16] powerpc/32: implement fast entry for syscalls on BOOKE

2019-05-27 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 23/05/2019 à 09:00, Christophe Leroy a écrit :
>
> [...]
>
>>> arch/powerpc/kernel/head_fsl_booke.o: In function `SystemCall':
>>> arch/powerpc/kernel/head_fsl_booke.S:416: undefined reference to 
>>> `kvmppc_handler_BOOKE_INTERRUPT_SYSCALL_SPRN_SRR1'
>>> Makefile:1052: recipe for target 'vmlinux' failed
>>>
 +.macro SYSCALL_ENTRY trapno intno
 +    mfspr    r10, SPRN_SPRG_THREAD
 +#ifdef CONFIG_KVM_BOOKE_HV
 +BEGIN_FTR_SECTION
 +    mtspr    SPRN_SPRG_WSCRATCH0, r10
 +    stw    r11, THREAD_NORMSAVE(0)(r10)
 +    stw    r13, THREAD_NORMSAVE(2)(r10)
 +    mfcr    r13    /* save CR in r13 for now   */
 +    mfspr    r11, SPRN_SRR1
 +    mtocrf    0x80, r11    /* check MSR[GS] without clobbering reg */
 +    bf    3, 1975f
 +    b    kvmppc_handler_BOOKE_INTERRUPT_\intno\()_SPRN_SRR1
>>>
>>> It seems to me that the "_SPRN_SRR1" on the end of this line
>>> isn't meant to be there...  However, it still fails to link with that
>>> removed.
>
> It looks like I missed the macro expansion.
>
> The called function should be kvmppc_handler_8_0x01B
>
> Seems like kisskb doesn't build any config like this.

I thought we did, ie:

http://kisskb.ellerman.id.au/kisskb/buildresult/13817941/

But clearly something is missing to trigger the bug.

cheers


Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-27 Thread Oliver
On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio  wrote:
>
> Introduce a new pcibios function pcibios_ignore_alignment_request
> which allows the PCI core to defer to platform-specific code to
> determine whether or not to ignore alignment requests for PCI resources.
>
> The existing behavior is to simply ignore alignment requests when
> PCI_PROBE_ONLY is set. This is behavior is maintained by the
> default implementation of pcibios_ignore_alignment_request.
>
> Signed-off-by: Shawn Anastasio 
> ---
>  drivers/pci/pci.c   | 9 +++--
>  include/linux/pci.h | 1 +
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 8abc843b1615..8207a09085d1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
> return 0;
>  }
>
> +int __weak pcibios_ignore_alignment_request(void)
> +{
> +   return pci_has_flag(PCI_PROBE_ONLY);
> +}
> +
>  #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>  static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>  static DEFINE_SPINLOCK(resource_alignment_lock);
> @@ -5906,9 +5911,9 @@ static resource_size_t 
> pci_specified_resource_alignment(struct pci_dev *dev,
> p = resource_alignment_param;
> if (!*p && !align)
> goto out;
> -   if (pci_has_flag(PCI_PROBE_ONLY)) {
> +   if (pcibios_ignore_alignment_request()) {
> align = 0;
> -   pr_info_once("PCI: Ignoring requested alignments 
> (PCI_PROBE_ONLY)\n");
> +   pr_info_once("PCI: Ignoring requested alignments\n");
> goto out;
> }

I think the logic here is questionable to begin with. If the user has
explicitly requested re-aligning a resource via the command line then
we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
they get to keep the pieces.

That said, the real issue here is that PCI_PROBE_ONLY probably
shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
hotplugged devices are configured by firmware before it's passed to
the guest and we need to keep the FW assignments otherwise things
break. QEMU however doesn't do any BAR assignments and relies on that
being handled by the guest. At boot time this is done by SLOF, but
Linux only keeps SLOF around until it's extracted the device-tree.
Once that's done SLOF gets blown away and the kernel needs to do it's
own BAR assignments. I'm guessing there's a hack in there to make it
work today, but it's a little surprising that it works at all...

IIRC Sam Bobroff was looking at hotplug under pseries recently so he
might have something to add. He's sick at the moment, but I'll ask him
to take a look at this once he's back among the living

> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 4a5a84d7bdd4..47471dcdbaf9 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, 
> int active) {}
>  int pcibios_alloc_irq(struct pci_dev *dev);
>  void pcibios_free_irq(struct pci_dev *dev);
>  resource_size_t pcibios_default_alignment(void);
> +int pcibios_ignore_alignment_request(void);
>
>  #ifdef CONFIG_HIBERNATE_CALLBACKS
>  extern struct dev_pm_ops pcibios_pm_ops;
> --
> 2.20.1
>


[PATCH v2 2/3] powerpc/mm/hugetlb: Fix kernel crash if we fail to allocate page table caches

2019-05-27 Thread Aneesh Kumar K.V
We only check for hugetlb allocations, because with hugetlb we do conditional
registration. For PGD/PUD/PMD levels we register them always in
pgtable_cache_init.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hugetlbpage.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 1de0f43a68e5..f55dc110f2ad 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -61,12 +61,17 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
num_hugepd = 1;
}
 
+   if (!cachep) {
+   WARN_ONCE(1, "No page table cache created for hugetlb tables");
+   return -ENOMEM;
+   }
+
new = kmem_cache_alloc(cachep, pgtable_gfp_flags(mm, GFP_KERNEL));
 
BUG_ON(pshift > HUGEPD_SHIFT_MASK);
BUG_ON((unsigned long)new & HUGEPD_SHIFT_MASK);
 
-   if (! new)
+   if (!new)
return -ENOMEM;
 
/*
-- 
2.21.0



[PATCH v2 3/3] powerpc/mm/hugetlb: Don't enable HugeTLB if we don't have a page table cache

2019-05-27 Thread Aneesh Kumar K.V
This makes sure we don't enable HugeTLB if the cache is not configured.
I am still not sure about this. IMHO hugetlb support should be a hardware
support derivative and any cache allocation failure should be handled as I did
in the earlier patch. But then if we were not able to create hugetlb page table
cache, we can as well declare hugetlb support disabled thereby avoiding calling
into allocation routines.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hugetlbpage.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index f55dc110f2ad..d34540479b1a 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -601,6 +601,7 @@ __setup("hugepagesz=", hugepage_setup_sz);
 
 static int __init hugetlbpage_init(void)
 {
+   bool configured = false;
int psize;
 
if (hugetlb_disabled) {
@@ -651,10 +652,15 @@ static int __init hugetlbpage_init(void)
pgtable_cache_add(pdshift - shift);
else if (IS_ENABLED(CONFIG_PPC_FSL_BOOK3E) || 
IS_ENABLED(CONFIG_PPC_8xx))
pgtable_cache_add(PTE_T_ORDER);
+
+   configured = true;
}
 
-   if (IS_ENABLED(CONFIG_HUGETLB_PAGE_SIZE_VARIABLE))
-   hugetlbpage_init_default();
+   if (configured) {
+   if (IS_ENABLED(CONFIG_HUGETLB_PAGE_SIZE_VARIABLE))
+   hugetlbpage_init_default();
+   } else
+   pr_info("Failed to initialize. Disabling HugeTLB");
 
return 0;
 }
-- 
2.21.0



[PATCH v2 1/3] powerpc/mm: Handle page table allocation failures

2019-05-27 Thread Aneesh Kumar K.V
This fixes kernel crash that arises due to not handling page table allocation
failures while allocating hugetlb page table.

Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a 
different page table format")
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hugetlbpage.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index b5d92dc32844..1de0f43a68e5 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -130,6 +130,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PUD_SHIFT;
pu = pud_alloc(mm, pg, addr);
+   if (!pu)
+   return NULL;
if (pshift == PUD_SHIFT)
return (pte_t *)pu;
else if (pshift > PMD_SHIFT) {
@@ -138,6 +140,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PMD_SHIFT;
pm = pmd_alloc(mm, pu, addr);
+   if (!pm)
+   return NULL;
if (pshift == PMD_SHIFT)
/* 16MB hugepage */
return (pte_t *)pm;
@@ -154,12 +158,16 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PUD_SHIFT;
pu = pud_alloc(mm, pg, addr);
+   if (!pu)
+   return NULL;
if (pshift >= PUD_SHIFT) {
ptl = pud_lockptr(mm, pu);
hpdp = (hugepd_t *)pu;
} else {
pdshift = PMD_SHIFT;
pm = pmd_alloc(mm, pu, addr);
+   if (!pm)
+   return NULL;
ptl = pmd_lockptr(mm, pm);
hpdp = (hugepd_t *)pm;
}
-- 
2.21.0



[PATCH] powerpc/mm: Move some of the boot time info print to generic file

2019-05-27 Thread Aneesh Kumar K.V
With radix translation enabled we find in dmesg

 hash-mmu: ppc64_pft_size= 0x0
 hash-mmu: kernel vmalloc start   = 0xc008
 hash-mmu: kernel IO start= 0xc00a
 hash-mmu: kernel vmemmap start   = 0xc00c

This is because these pr_info calls are in hash_utils.c which has

 #define pr_fmt(fmt) "hash-mmu: " fmt

The information printed in generic and hence move that to generic file

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/setup-common.c| 4 
 arch/powerpc/mm/book3s64/hash_utils.c | 5 -
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index aad9f5df6ab6..a73a91f2c21f 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -810,6 +810,10 @@ static __init void print_system_info(void)
pr_info("mmu_features  = 0x%08x\n", cur_cpu_spec->mmu_features);
 #ifdef CONFIG_PPC64
pr_info("firmware_features = 0x%016lx\n", powerpc_firmware_features);
+   pr_info("ppc64_pft_size= 0x%llx\n", ppc64_pft_size);
+   pr_info("kernel vmalloc start   = 0x%lx\n", KERN_VIRT_START);
+   pr_info("kernel IO start= 0x%lx\n", KERN_IO_START);
+   pr_info("kernel vmemmap start   = 0x%lx\n", (unsigned long)vmemmap);
 #endif
 
print_system_hash_info();
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 919a861a8ec0..2f677914bfd2 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1950,11 +1950,6 @@ machine_device_initcall(pseries, hash64_debugfs);
 
 void __init print_system_hash_info(void)
 {
-   pr_info("ppc64_pft_size= 0x%llx\n", ppc64_pft_size);
-
if (htab_hash_mask)
pr_info("htab_hash_mask= 0x%lx\n", htab_hash_mask);
-   pr_info("kernel vmalloc start   = 0x%lx\n", KERN_VIRT_START);
-   pr_info("kernel IO start= 0x%lx\n", KERN_IO_START);
-   pr_info("kernel vmemmap start   = 0x%lx\n", (unsigned long)vmemmap);
 }
-- 
2.21.0



Re: kmemleak: 1157 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

2019-05-27 Thread Michael Ellerman
Mathieu Malaterre  writes:
> Hi there,
>
> Is there a way to dump more context (somewhere in of tree
> flattening?). I cannot make sense of the following:

Hmm. Not that I know of.

Those don't look related to OF flattening/unflattening. That's just
sysfs setup based on the unflattened device tree.

The allocations are happening in safe_name() AFAICS.

int __of_add_property_sysfs(struct device_node *np, struct property *pp)
{
...
pp->attr.attr.name = safe_name(>kobj, pp->name);

And the free is in __of_sysfs_remove_bin_file():

void __of_sysfs_remove_bin_file(struct device_node *np, struct property *prop)
{
if (!IS_ENABLED(CONFIG_SYSFS))
return;

sysfs_remove_bin_file(>kobj, >attr);
kfree(prop->attr.attr.name);


There is this check which could be failing leading to us not calling the
free at all:

void __of_remove_property_sysfs(struct device_node *np, struct property *prop)
{
/* at early boot, bail here and defer setup to of_init() */
if (of_kset && of_node_is_attached(np))
__of_sysfs_remove_bin_file(np, prop);
}


So maybe stick a printk() in there to see if you're hitting that
condition, eg something like:

if (of_kset && of_node_is_attached(np))
__of_sysfs_remove_bin_file(np, prop);
else
printk("%s: leaking prop %s on node %pOF\n", __func__, 
prop->attr.attr.name, np);


cheers

> kmemleak: 1157 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
>
> Where:
>
> # head -40 /sys/kernel/debug/kmemleak
> unreferenced object 0xdf44d180 (size 8):
>   comm "swapper", pid 1, jiffies 4294892297 (age 4766.460s)
>   hex dump (first 8 bytes):
> 62 61 73 65 00 00 00 00  base
>   backtrace:
> [<0ca59825>] kstrdup+0x4c/0xb8
> [] kobject_set_name_vargs+0x34/0xc8
> [<661b4c86>] kobject_add+0x78/0x120
> [] __of_attach_node_sysfs+0xa0/0x14c
> [<2a143d10>] of_core_init+0x90/0x114
> [] driver_init+0x30/0x48
> [<84ed01b1>] kernel_init_freeable+0xfc/0x3fc
> [] kernel_init+0x20/0x110
> [] ret_from_kernel_thread+0x14/0x1c
> unreferenced object 0xdf44d178 (size 8):
>   comm "swapper", pid 1, jiffies 4294892297 (age 4766.460s)
>   hex dump (first 8 bytes):
> 6d 6f 64 65 6c 00 97 c8  model...
>   backtrace:
> [<0ca59825>] kstrdup+0x4c/0xb8
> [<0eeb0a3b>] __of_add_property_sysfs+0x88/0x12c
> [] __of_attach_node_sysfs+0xcc/0x14c
> [<2a143d10>] of_core_init+0x90/0x114
> [] driver_init+0x30/0x48
> [<84ed01b1>] kernel_init_freeable+0xfc/0x3fc
> [] kernel_init+0x20/0x110
> [] ret_from_kernel_thread+0x14/0x1c
> unreferenced object 0xdf4021e0 (size 16):
>   comm "swapper", pid 1, jiffies 4294892297 (age 4766.460s)
>   hex dump (first 16 bytes):
> 63 6f 6d 70 61 74 69 62 6c 65 00 01 00 00 00 00  compatible..
>   backtrace:
> [<0ca59825>] kstrdup+0x4c/0xb8
> [<0eeb0a3b>] __of_add_property_sysfs+0x88/0x12c
> [] __of_attach_node_sysfs+0xcc/0x14c
> [<2a143d10>] of_core_init+0x90/0x114
> [] driver_init+0x30/0x48
> [<84ed01b1>] kernel_init_freeable+0xfc/0x3fc
> [] kernel_init+0x20/0x110
> [] ret_from_kernel_thread+0x14/0x1c


Re: [RESEND PATCH 0/3] Allow custom PCI resource alignment on pseries

2019-05-27 Thread Oliver
On Tue, May 28, 2019 at 2:09 PM Shawn Anastasio  wrote:
>
>
>
> On 5/27/19 11:01 PM, Oliver wrote:
> > On Tue, May 28, 2019 at 8:56 AM Shawn Anastasio  wrote:
> >>
> >> Hello all,
> >>
> >> This patch set implements support for user-specified PCI resource
> >> alignment on the pseries platform for hotplugged PCI devices.
> >> Currently on pseries, PCI resource alignments specified with the
> >> pci=resource_alignment commandline argument are ignored, since
> >> the firmware is in charge of managing the PCI resources. In the
> >> case of hotplugged devices, though, the kernel is in charge of
> >> configuring the resources and should obey alignment requirements.
> >
> > Are you using hotplug to work around SLOF (the OF we use under qemu)
> > not aligning BARs to 64K? It looks like there is a commit in SLOF to
> > fix that 
> > (https://git.qemu.org/?p=SLOF.git;a=commit;f=board-qemu/slof/pci-phb.fs;h=1903174472f8800caf50c959b304501b4c01153c).
> >
>
> No, my application actually requires PCI hotplug at run-time.
>
> >> The current behavior of ignoring the alignment for hotplugged devices
> >> results in sub-page BARs landing between page boundaries and
> >> becoming un-mappable from userspace via the VFIO framework.
> >> This issue was observed on a pseries KVM guest with hotplugged
> >> ivshmem devices.
> >
> >> With these changes, users can specify an appropriate
> >> pci=resource_alignment argument on boot for devices they wish to use
> >> with VFIO.
> >>
> >> In the future, this could be extended to provide page-aligned
> >> resources by default for hotplugged devices, similar to what is done
> >> on powernv by commit 382746376993 ("powerpc/powernv: Override
> >> pcibios_default_alignment() to force PCI devices to be page aligned").
> >
> > Can we make aligning the BARs to PAGE_SIZE the default behaviour? The
> > BAR assignment process is complex enough as-is so I'd rather we didn't
> > add another platform hack into the mix.
>
> Absolutely. This will still require the existing changes so that the
> custom alignment isn't flat-out ignored on pseries, but I can set
> it to default to PAGE_SIZE as well, similar to how it's done on PowerNV.
> I've just pushed a v3 to fix a typo and I'll incorporate this change
> in v4.

I was thinking we could get rid of the ppcmd callback and do it in
kernel/pci-common.c. PowerNV is the only platform that implements the
callback and the pseries implementation is going to be identical so I
don't think there's much of point in keeping the callback.

> >> Feedback is appreciated.
> >>
> >> Thanks,
> >> Shawn
> >>
> >> Shawn Anastasio (3):
> >>PCI: Introduce pcibios_ignore_alignment_request
> >>powerpc/64: Enable pcibios_after_init hook on ppc64
> >>powerpc/pseries: Allow user-specified PCI resource alignment after
> >>  init
> >>
> >>   arch/powerpc/include/asm/machdep.h |  6 --
> >>   arch/powerpc/kernel/pci-common.c   |  9 +
> >>   arch/powerpc/kernel/pci_64.c   |  4 
> >>   arch/powerpc/platforms/pseries/setup.c | 22 ++
> >>   drivers/pci/pci.c  |  9 +++--
> >>   5 files changed, 46 insertions(+), 4 deletions(-)
> >>
> >> --
> >> 2.20.1
> >>


Re: [TRIVIAL] [PATCH] powerpc/powernv-eeh: Consisely desribe what this file does

2019-05-27 Thread Stewart Smith
Oliver  writes:

> On Tue, May 28, 2019 at 1:29 PM Stewart Smith  wrote:
>>
>> If the previous comment made sense, continue debugging or call your
>> doctor immediately.
>>
>> Signed-off-by: Stewart Smith 
>> ---
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
>> b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> index f38078976c5d..bea6708be065 100644
>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> @@ -1,7 +1,5 @@
>>  /*
>> - * The file intends to implement the platform dependent EEH operations on
>> - * powernv platform. Actually, the powernv was created in order to fully
>> - * hypervisor support.
>> + * PowerNV Platform dependent EEH operations
>>   *
>>   * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013.
>
> Stewart, Thanks for fixing it up. Since you're at it, Please replace
> the maintainer to yourself.

This message intends to implement the middle raising operations on the
finger platform. Actually, the EEH was created in order to fully
phalange extension.

:)

-- 
Stewart Smith
OPAL Architect, IBM.



Re: [TRIVIAL] [PATCH] powerpc/powernv-eeh: Consisely desribe what this file does

2019-05-27 Thread Russell Currey
On Tue, 2019-05-28 at 13:29 +1000, Stewart Smith wrote:
> If the previous comment made sense, continue debugging or call your
> doctor immediately.
> 
> Signed-off-by: Stewart Smith 

This reply intends to implement the ack dependent EEH patch on powernv
platform.  Actually, the reply was created in order to fully ack
support.

Fully-ack-supported-by: Russell Currey 




Re: [RESEND PATCH 0/3] Allow custom PCI resource alignment on pseries

2019-05-27 Thread Shawn Anastasio




On 5/27/19 11:01 PM, Oliver wrote:

On Tue, May 28, 2019 at 8:56 AM Shawn Anastasio  wrote:


Hello all,

This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of
configuring the resources and should obey alignment requirements.


Are you using hotplug to work around SLOF (the OF we use under qemu)
not aligning BARs to 64K? It looks like there is a commit in SLOF to
fix that 
(https://git.qemu.org/?p=SLOF.git;a=commit;f=board-qemu/slof/pci-phb.fs;h=1903174472f8800caf50c959b304501b4c01153c).



No, my application actually requires PCI hotplug at run-time.


The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.



With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use
with VFIO.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").


Can we make aligning the BARs to PAGE_SIZE the default behaviour? The
BAR assignment process is complex enough as-is so I'd rather we didn't
add another platform hack into the mix.


Absolutely. This will still require the existing changes so that the 
custom alignment isn't flat-out ignored on pseries, but I can set

it to default to PAGE_SIZE as well, similar to how it's done on PowerNV.
I've just pushed a v3 to fix a typo and I'll incorporate this change
in v4.


Feedback is appreciated.

Thanks,
Shawn

Shawn Anastasio (3):
   PCI: Introduce pcibios_ignore_alignment_request
   powerpc/64: Enable pcibios_after_init hook on ppc64
   powerpc/pseries: Allow user-specified PCI resource alignment after
 init

  arch/powerpc/include/asm/machdep.h |  6 --
  arch/powerpc/kernel/pci-common.c   |  9 +
  arch/powerpc/kernel/pci_64.c   |  4 
  arch/powerpc/platforms/pseries/setup.c | 22 ++
  drivers/pci/pci.c  |  9 +++--
  5 files changed, 46 insertions(+), 4 deletions(-)

--
2.20.1



[PATCH v3 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64

2019-05-27 Thread Shawn Anastasio
Enable the pcibios_after_init hook on all powerpc platforms.
This hook is executed at the end of pcibios_init and was previously
only available on CONFIG_PPC32.

Since it is useful and not inherently limited to 32-bit mode,
remove the limitation and allow it on all powerpc platforms.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h | 3 +--
 arch/powerpc/kernel/pci_64.c   | 4 
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2f0ca6560e47..2fbfaa9176ed 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -150,6 +150,7 @@ struct machdep_calls {
void(*init)(void);
 
void(*kgdb_map_scc)(void);
+#endif /* CONFIG_PPC32 */
 
/*
 * optional PCI "hooks"
@@ -157,8 +158,6 @@ struct machdep_calls {
/* Called at then very end of pcibios_init() */
void (*pcibios_after_init)(void);
 
-#endif /* CONFIG_PPC32 */
-
/* Called in indirect_* to avoid touching devices */
int (*pci_exclude_device)(struct pci_controller *, unsigned char, 
unsigned char);
 
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 9d8c10d55407..fba7fe6e4a50 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -68,6 +68,10 @@ static int __init pcibios_init(void)
 
printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
 
+   /* Call machine dependent post-init code */
+   if (ppc_md.pcibios_after_init)
+   ppc_md.pcibios_after_init();
+
return 0;
 }
 
-- 
2.20.1



[PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-27 Thread Shawn Anastasio
Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
 drivers/pci/pci.c   | 9 +++--
 include/linux/pci.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
return 0;
 }
 
+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
 static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
 static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
p = resource_alignment_param;
if (!*p && !align)
goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
align = 0;
-   pr_info_once("PCI: Ignoring requested alignments 
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
goto out;
}
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a5a84d7bdd4..47471dcdbaf9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int 
active) {}
 int pcibios_alloc_irq(struct pci_dev *dev);
 void pcibios_free_irq(struct pci_dev *dev);
 resource_size_t pcibios_default_alignment(void);
+int pcibios_ignore_alignment_request(void);
 
 #ifdef CONFIG_HIBERNATE_CALLBACKS
 extern struct dev_pm_ops pcibios_pm_ops;
-- 
2.20.1



[PATCH v3 0/3] Allow custom PCI resource alignment on pseries

2019-05-27 Thread Shawn Anastasio
Changes from v2 to v3:
  - Fix wrong return type of ppc pcibios_ignore_alignment_request
(Not sure how my local compile didn't catch that!)

Hello all,

This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of 
configuring the resources and should obey alignment requirements.

The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.
 
With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use 
with VFIO.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").

Feedback is appreciated.

Thanks,
Shawn

Shawn Anastasio (3):
  PCI: Introduce pcibios_ignore_alignment_request
  powerpc/64: Enable pcibios_after_init hook on ppc64
  powerpc/pseries: Allow user-specified PCI resource alignment after
init

 arch/powerpc/include/asm/machdep.h |  6 --
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/kernel/pci_64.c   |  4 
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 drivers/pci/pci.c  |  9 +++--
 include/linux/pci.h|  1 +
 6 files changed, 47 insertions(+), 4 deletions(-)

-- 
2.20.1



[PATCH v3 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init

2019-05-27 Thread Shawn Anastasio
On pseries, custom PCI resource alignment specified with the commandline
argument pci=resource_alignment is disabled due to PCI resources being
managed by the firmware. However, in the case of PCI hotplug the
resources are managed by the kernel, so custom alignments should be
honored in these cases. This is done by only honoring custom
alignments after initial PCI initialization is done, to ensure that
all devices managed by the firmware are excluded.

Without this ability, sub-page BARs sometimes get mapped in between
page boundaries for hotplugged devices and are therefore unusable
with the VFIO framework. This change allows users to request
page alignment for devices they wish to access via VFIO using
the pci=resource_alignment commandline argument.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is
done on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned")

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h |  3 +++
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 3 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2fbfaa9176ed..46eb62c0954e 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -179,6 +179,9 @@ struct machdep_calls {
 
resource_size_t (*pcibios_default_alignment)(void);
 
+   /* Called when determining PCI resource alignment */
+   int (*pcibios_ignore_alignment_request)(void);
+
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ff4b7539cbdf..8e0d73b4c188 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
return 0;
 }
 
+int pcibios_ignore_alignment_request(void)
+{
+   if (ppc_md.pcibios_ignore_alignment_request)
+   return ppc_md.pcibios_ignore_alignment_request();
+
+   /* Fall back to default method of checking PCI_PROBE_ONLY */
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #ifdef CONFIG_PCI_IOV
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
 {
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e4f0dfd4ae33..07f03be02afe 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
+static int initial_pci_init_done; /* TRUE if initial pcibios init has 
completed */
+
 static void pSeries_show_cpuinfo(struct seq_file *m)
 {
struct device_node *root;
@@ -749,6 +751,23 @@ static resource_size_t 
pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
 }
 #endif
 
+static void pseries_after_init(void)
+{
+   initial_pci_init_done = 1;
+}
+
+static int pseries_ignore_alignment_request(void)
+{
+   if (initial_pci_init_done)
+   /*
+* Allow custom alignments after init for things
+* like PCI hotplugging.
+*/
+   return 0;
+
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 static void __init pSeries_setup_arch(void)
 {
set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
}
 
ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+   ppc_md.pcibios_after_init = pseries_after_init;
+   ppc_md.pcibios_ignore_alignment_request =
+   pseries_ignore_alignment_request;
 }
 
 static void pseries_panic(char *str)
-- 
2.20.1



Re: [TRIVIAL] [PATCH] powerpc/powernv-eeh: Consisely desribe what this file does

2019-05-27 Thread Oliver
On Tue, May 28, 2019 at 1:29 PM Stewart Smith  wrote:
>
> If the previous comment made sense, continue debugging or call your
> doctor immediately.
>
> Signed-off-by: Stewart Smith 
> ---
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index f38078976c5d..bea6708be065 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -1,7 +1,5 @@
>  /*
> - * The file intends to implement the platform dependent EEH operations on
> - * powernv platform. Actually, the powernv was created in order to fully
> - * hypervisor support.
> + * PowerNV Platform dependent EEH operations
>   *
>   * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013.

Stewart, Thanks for fixing it up. Since you're at it, Please replace
the maintainer to yourself.

>   *
> --
> 2.21.0
>


Re: [RESEND PATCH 0/3] Allow custom PCI resource alignment on pseries

2019-05-27 Thread Oliver
On Tue, May 28, 2019 at 8:56 AM Shawn Anastasio  wrote:
>
> Hello all,
>
> This patch set implements support for user-specified PCI resource
> alignment on the pseries platform for hotplugged PCI devices.
> Currently on pseries, PCI resource alignments specified with the
> pci=resource_alignment commandline argument are ignored, since
> the firmware is in charge of managing the PCI resources. In the
> case of hotplugged devices, though, the kernel is in charge of
> configuring the resources and should obey alignment requirements.

Are you using hotplug to work around SLOF (the OF we use under qemu)
not aligning BARs to 64K? It looks like there is a commit in SLOF to
fix that 
(https://git.qemu.org/?p=SLOF.git;a=commit;f=board-qemu/slof/pci-phb.fs;h=1903174472f8800caf50c959b304501b4c01153c).

> The current behavior of ignoring the alignment for hotplugged devices
> results in sub-page BARs landing between page boundaries and
> becoming un-mappable from userspace via the VFIO framework.
> This issue was observed on a pseries KVM guest with hotplugged
> ivshmem devices.

> With these changes, users can specify an appropriate
> pci=resource_alignment argument on boot for devices they wish to use
> with VFIO.
>
> In the future, this could be extended to provide page-aligned
> resources by default for hotplugged devices, similar to what is done
> on powernv by commit 382746376993 ("powerpc/powernv: Override
> pcibios_default_alignment() to force PCI devices to be page aligned").

Can we make aligning the BARs to PAGE_SIZE the default behaviour? The
BAR assignment process is complex enough as-is so I'd rather we didn't
add another platform hack into the mix.

> Feedback is appreciated.
>
> Thanks,
> Shawn
>
> Shawn Anastasio (3):
>   PCI: Introduce pcibios_ignore_alignment_request
>   powerpc/64: Enable pcibios_after_init hook on ppc64
>   powerpc/pseries: Allow user-specified PCI resource alignment after
> init
>
>  arch/powerpc/include/asm/machdep.h |  6 --
>  arch/powerpc/kernel/pci-common.c   |  9 +
>  arch/powerpc/kernel/pci_64.c   |  4 
>  arch/powerpc/platforms/pseries/setup.c | 22 ++
>  drivers/pci/pci.c  |  9 +++--
>  5 files changed, 46 insertions(+), 4 deletions(-)
>
> --
> 2.20.1
>


Re: [PATCH v3 2/3] arch: wire-up close_range()

2019-05-27 Thread Michael Ellerman
Christian Brauner  writes:
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
> b/arch/powerpc/kernel/syscalls/syscall.tbl
> index 103655d84b4b..ba2c1f078cbd 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -515,3 +515,4 @@
>  431  common  fsconfigsys_fsconfig
>  432  common  fsmount sys_fsmount
>  433  common  fspick  sys_fspick
> +435  common  close_range sys_close_range

With a minor build fix the selftest passes for me on ppc64le:

  # ./close_range_test 
  1..9
  ok 1 do not allow invalid flag values for close_range()
  ok 2 close_range() from 3 to 53
  ok 3 fcntl() verify closed range from 3 to 53
  ok 4 close_range() from 54 to 95
  ok 5 fcntl() verify closed range from 54 to 95
  ok 6 close_range() from 96 to 102
  ok 7 fcntl() verify closed range from 96 to 102
  ok 8 close_range() closed single file descriptor
  ok 9 fcntl() verify closed single file descriptor
  # Pass 9 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0


Acked-by: Michael Ellerman  (powerpc)

cheers


[TRIVIAL] [PATCH] powerpc/powernv-eeh: Consisely desribe what this file does

2019-05-27 Thread Stewart Smith
If the previous comment made sense, continue debugging or call your
doctor immediately.

Signed-off-by: Stewart Smith 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index f38078976c5d..bea6708be065 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1,7 +1,5 @@
 /*
- * The file intends to implement the platform dependent EEH operations on
- * powernv platform. Actually, the powernv was created in order to fully
- * hypervisor support.
+ * PowerNV Platform dependent EEH operations
  *
  * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013.
  *
-- 
2.21.0



Re: [PATCH v2 2/2] tests: add close_range() tests

2019-05-27 Thread Michael Ellerman
Christian Brauner  writes:
> This adds basic tests for the new close_range() syscall.
> - test that no invalid flags can be passed
> - test that a range of file descriptors is correctly closed
> - test that a range of file descriptors is correctly closed if there there
>   are already closed file descriptors in the range
> - test that max_fd is correctly capped to the current fdtable maximum
>
> Signed-off-by: Christian Brauner 
> Cc: Arnd Bergmann 
> Cc: Jann Horn 
> Cc: David Howells 
> Cc: Dmitry V. Levin 
> Cc: Oleg Nesterov 
> Cc: Linus Torvalds 
> Cc: Florian Weimer 
> Cc: linux-...@vger.kernel.org
> ---
> v1: unchanged
> v2:
> - Christian Brauner :
>   - verify that close_range() correctly closes a single file descriptor
> ---
>  tools/testing/selftests/Makefile  |   1 +
>  tools/testing/selftests/core/.gitignore   |   1 +
>  tools/testing/selftests/core/Makefile |   6 +
>  .../testing/selftests/core/close_range_test.c | 142 ++
>  4 files changed, 150 insertions(+)
>  create mode 100644 tools/testing/selftests/core/.gitignore
>  create mode 100644 tools/testing/selftests/core/Makefile
>  create mode 100644 tools/testing/selftests/core/close_range_test.c
>
> diff --git a/tools/testing/selftests/core/.gitignore 
> b/tools/testing/selftests/core/.gitignore
> new file mode 100644
> index ..6e6712ce5817
> --- /dev/null
> +++ b/tools/testing/selftests/core/.gitignore
> @@ -0,0 +1 @@
> +close_range_test
> diff --git a/tools/testing/selftests/core/Makefile 
> b/tools/testing/selftests/core/Makefile
> new file mode 100644
> index ..de3ae68aa345
> --- /dev/null
> +++ b/tools/testing/selftests/core/Makefile
> @@ -0,0 +1,6 @@
> +CFLAGS += -g -I../../../../usr/include/ -I../../../../include

Your second -I pulls the unexported kernel headers in, userspace
programs shouldn't include unexported kernel headers.

It breaks the build on powerpc with eg:

  powerpc64le-linux-gnu-gcc -g -I../../../../usr/include/ -I../../../../include 
   close_range_test.c  -o /output/kselftest/core/close_range_test
  In file included from 
/usr/powerpc64le-linux-gnu/include/bits/fcntl-linux.h:346,
   from /usr/powerpc64le-linux-gnu/include/bits/fcntl.h:62,
   from /usr/powerpc64le-linux-gnu/include/fcntl.h:35,
   from close_range_test.c:5:
  ../../../../include/linux/falloc.h:13:2: error: unknown type name '__s16'
__s16  l_type;
^


Did you do that on purpose or just copy it from one of the other
Makefiles? :)

If you're just wanting to get the syscall number when the headers
haven't been exported, I think the best solution is to do eg:

diff --git a/tools/testing/selftests/core/close_range_test.c 
b/tools/testing/selftests/core/close_range_test.c
index d6e6079d3d53..34c6f02f25de 100644
--- a/tools/testing/selftests/core/close_range_test.c
+++ b/tools/testing/selftests/core/close_range_test.c
@@ -14,6 +14,10 @@

 #include "../kselftest.h"

+#ifndef __NR_close_range
+#define __NR_close_range   435
+#endif
+
 static inline int sys_close_range(unsigned int fd, unsigned int max_fd,
  unsigned int flags)
 {


cheers


[PATCH v2 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64

2019-05-27 Thread Shawn Anastasio
Enable the pcibios_after_init hook on all powerpc platforms.
This hook is executed at the end of pcibios_init and was previously
only available on CONFIG_PPC32.

Since it is useful and not inherently limited to 32-bit mode,
remove the limitation and allow it on all powerpc platforms.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h | 3 +--
 arch/powerpc/kernel/pci_64.c   | 4 
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2f0ca6560e47..2fbfaa9176ed 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -150,6 +150,7 @@ struct machdep_calls {
void(*init)(void);
 
void(*kgdb_map_scc)(void);
+#endif /* CONFIG_PPC32 */
 
/*
 * optional PCI "hooks"
@@ -157,8 +158,6 @@ struct machdep_calls {
/* Called at then very end of pcibios_init() */
void (*pcibios_after_init)(void);
 
-#endif /* CONFIG_PPC32 */
-
/* Called in indirect_* to avoid touching devices */
int (*pci_exclude_device)(struct pci_controller *, unsigned char, 
unsigned char);
 
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 9d8c10d55407..fba7fe6e4a50 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -68,6 +68,10 @@ static int __init pcibios_init(void)
 
printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
 
+   /* Call machine dependent post-init code */
+   if (ppc_md.pcibios_after_init)
+   ppc_md.pcibios_after_init();
+
return 0;
 }
 
-- 
2.20.1



[PATCH v2 0/3] Allow custom PCI resource alignment on pseries

2019-05-27 Thread Shawn Anastasio
Changes from v1 to v2:
  - Fix function declaration warnings caught by sparse

Hello all,

This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of 
configuring the resources and should obey alignment requirements.

The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.
 
With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use 
with VFIO.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").

Feedback is appreciated.

Thanks,
Shawn

Shawn Anastasio (3):
  PCI: Introduce pcibios_ignore_alignment_request
  powerpc/64: Enable pcibios_after_init hook on ppc64
  powerpc/pseries: Allow user-specified PCI resource alignment after
init

 arch/powerpc/include/asm/machdep.h |  6 --
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/kernel/pci_64.c   |  4 
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 drivers/pci/pci.c  |  9 +++--
 include/linux/pci.h|  1 +
 6 files changed, 47 insertions(+), 4 deletions(-)

-- 
2.20.1



[PATCH v2 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init

2019-05-27 Thread Shawn Anastasio
On pseries, custom PCI resource alignment specified with the commandline
argument pci=resource_alignment is disabled due to PCI resources being
managed by the firmware. However, in the case of PCI hotplug the
resources are managed by the kernel, so custom alignments should be
honored in these cases. This is done by only honoring custom
alignments after initial PCI initialization is done, to ensure that
all devices managed by the firmware are excluded.

Without this ability, sub-page BARs sometimes get mapped in between
page boundaries for hotplugged devices and are therefore unusable
with the VFIO framework. This change allows users to request
page alignment for devices they wish to access via VFIO using
the pci=resource_alignment commandline argument.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is
done on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned")

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h |  3 +++
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 3 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2fbfaa9176ed..46eb62c0954e 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -179,6 +179,9 @@ struct machdep_calls {
 
resource_size_t (*pcibios_default_alignment)(void);
 
+   /* Called when determining PCI resource alignment */
+   int (*pcibios_ignore_alignment_request)(void);
+
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ff4b7539cbdf..1a6ded45a701 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
return 0;
 }
 
+resource_size_t pcibios_ignore_alignment_request(void)
+{
+   if (ppc_md.pcibios_ignore_alignment_request)
+   return ppc_md.pcibios_ignore_alignment_request();
+
+   /* Fall back to default method of checking PCI_PROBE_ONLY */
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #ifdef CONFIG_PCI_IOV
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
 {
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e4f0dfd4ae33..07f03be02afe 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
+static int initial_pci_init_done; /* TRUE if initial pcibios init has 
completed */
+
 static void pSeries_show_cpuinfo(struct seq_file *m)
 {
struct device_node *root;
@@ -749,6 +751,23 @@ static resource_size_t 
pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
 }
 #endif
 
+static void pseries_after_init(void)
+{
+   initial_pci_init_done = 1;
+}
+
+static int pseries_ignore_alignment_request(void)
+{
+   if (initial_pci_init_done)
+   /*
+* Allow custom alignments after init for things
+* like PCI hotplugging.
+*/
+   return 0;
+
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 static void __init pSeries_setup_arch(void)
 {
set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
}
 
ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+   ppc_md.pcibios_after_init = pseries_after_init;
+   ppc_md.pcibios_ignore_alignment_request =
+   pseries_ignore_alignment_request;
 }
 
 static void pseries_panic(char *str)
-- 
2.20.1



[PATCH v2 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-27 Thread Shawn Anastasio
Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
 drivers/pci/pci.c   | 9 +++--
 include/linux/pci.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
return 0;
 }
 
+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
 static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
 static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
p = resource_alignment_param;
if (!*p && !align)
goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
align = 0;
-   pr_info_once("PCI: Ignoring requested alignments 
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
goto out;
}
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a5a84d7bdd4..47471dcdbaf9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1990,6 +1990,7 @@ static inline void pcibios_penalize_isa_irq(int irq, int 
active) {}
 int pcibios_alloc_irq(struct pci_dev *dev);
 void pcibios_free_irq(struct pci_dev *dev);
 resource_size_t pcibios_default_alignment(void);
+int pcibios_ignore_alignment_request(void);
 
 #ifdef CONFIG_HIBERNATE_CALLBACKS
 extern struct dev_pm_ops pcibios_pm_ops;
-- 
2.20.1



[PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling

2019-05-27 Thread David Hildenbrand
We only want memory block devices for memory to be onlined/offlined
(add/remove from the buddy). This is required so user space can
online/offline memory and kdump gets notified about newly onlined memory.

Let's factor out creation/removal of memory block devices. This helps
to further cleanup arch_add_memory/arch_remove_memory() and to make
implementation of new features easier - especially sub-section
memory hot add from Dan.

Anshuman Khandual is currently working on arch_remove_memory(). I added
a temporary solution via "arm64/mm: Add temporary arch_remove_memory()
implementation", that is sufficient as a firsts tep in the context of
this series. (we don't cleanup page tables in case anything goes
wrong already)

Did a quick sanity test with DIMM plug/unplug, making sure all devices
and sysfs links properly get added/removed. Compile tested on s390x and
x86-64.

Based on next/master.

Next refactoring on my list will be making sure that remove_memory()
will never deal with zones / access "struct pages". Any kind of zone
handling will have to be done when offlining system memory / before
removing device memory. I am thinking about remove_pfn_range_from_zone()",
du undo everything "move_pfn_range_to_zone()" did.

v2 -> v3:
- Add "s390x/mm: Fail when an altmap is used for arch_add_memory()"
- Add "arm64/mm: Add temporary arch_remove_memory() implementation"
- Add "drivers/base/memory: Pass a block_id to init_memory_block()"
- Various changes to "mm/memory_hotplug: Create memory block devices
  after arch_add_memory()" and "mm/memory_hotplug: Create memory block
  devices after arch_add_memory()" due to switching from sections to
  block_id's.

v1 -> v2:
- s390x/mm: Implement arch_remove_memory()
-- remove mapping after "__remove_pages"

David Hildenbrand (11):
  mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  s390x/mm: Fail when an altmap is used for arch_add_memory()
  s390x/mm: Implement arch_remove_memory()
  arm64/mm: Add temporary arch_remove_memory() implementation
  drivers/base/memory: Pass a block_id to init_memory_block()
  mm/memory_hotplug: Allow arch_remove_pages() without
CONFIG_MEMORY_HOTREMOVE
  mm/memory_hotplug: Create memory block devices after arch_add_memory()
  mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  mm/memory_hotplug: Remove memory block devices before
arch_remove_memory()
  mm/memory_hotplug: Make unregister_memory_block_under_nodes() never
fail
  mm/memory_hotplug: Remove "zone" parameter from
sparse_remove_one_section

 arch/arm64/mm/mmu.c|  17 +
 arch/ia64/mm/init.c|   2 -
 arch/powerpc/mm/mem.c  |   2 -
 arch/s390/mm/init.c|  18 +++--
 arch/sh/mm/init.c  |   2 -
 arch/x86/mm/init_32.c  |   2 -
 arch/x86/mm/init_64.c  |   2 -
 drivers/base/memory.c  | 134 +++--
 drivers/base/node.c|  27 +++
 include/linux/memory.h |   6 +-
 include/linux/memory_hotplug.h |  12 +--
 include/linux/node.h   |   7 +-
 mm/memory_hotplug.c|  44 +--
 mm/sparse.c|  10 +--
 14 files changed, 140 insertions(+), 145 deletions(-)

-- 
2.20.1



Re: [PATCH v2] powerpc/power: Expose pfn_is_nosave prototype

2019-05-27 Thread Michael Ellerman
"Rafael J. Wysocki"  writes:
> On Friday, May 24, 2019 12:44:18 PM CEST Mathieu Malaterre wrote:
>> The declaration for pfn_is_nosave is only available in
>> kernel/power/power.h. Since this function can be override in arch,
>> expose it globally. Having a prototype will make sure to avoid warning
>> (sometime treated as error with W=1) such as:
>> 
>>   arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 
>> 'pfn_is_nosave' [-Werror=missing-prototypes]
>> 
>> This moves the declaration into a globally visible header file and add
>> missing include to avoid a warning on powerpc. Also remove the
>> duplicated prototypes since not required anymore.
>> 
>> Cc: Christophe Leroy 
>> Signed-off-by: Mathieu Malaterre 
>> ---
>> v2: As suggestion by christophe remove duplicates prototypes
>> 
>>  arch/powerpc/kernel/suspend.c | 1 +
>>  arch/s390/kernel/entry.h  | 1 -
>>  include/linux/suspend.h   | 1 +
>>  kernel/power/power.h  | 2 --
>>  4 files changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/kernel/power/power.h b/kernel/power/power.h
>> index 9e58bdc8a562..44bee462ff57 100644
>> --- a/kernel/power/power.h
>> +++ b/kernel/power/power.h
>> @@ -75,8 +75,6 @@ static inline void hibernate_reserved_size_init(void) {}
>>  static inline void hibernate_image_size_init(void) {}
>>  #endif /* !CONFIG_HIBERNATION */
>>  
>> -extern int pfn_is_nosave(unsigned long);
>> -
>>  #define power_attr(_name) \
>>  static struct kobj_attribute _name##_attr = {   \
>>  .attr   = { \
>> 
>
> With an ACK from the powerpc maintainers, I could apply this one.

Sent.

cheers


Re: [PATCH v2] powerpc/power: Expose pfn_is_nosave prototype

2019-05-27 Thread Michael Ellerman
Mathieu Malaterre  writes:
> The declaration for pfn_is_nosave is only available in
> kernel/power/power.h. Since this function can be override in arch,
> expose it globally. Having a prototype will make sure to avoid warning
> (sometime treated as error with W=1) such as:
>
>   arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 
> 'pfn_is_nosave' [-Werror=missing-prototypes]
>
> This moves the declaration into a globally visible header file and add
> missing include to avoid a warning on powerpc. Also remove the
> duplicated prototypes since not required anymore.
>
> Cc: Christophe Leroy 
> Signed-off-by: Mathieu Malaterre 
> ---
> v2: As suggestion by christophe remove duplicates prototypes
>
>  arch/powerpc/kernel/suspend.c | 1 +
>  arch/s390/kernel/entry.h  | 1 -
>  include/linux/suspend.h   | 1 +
>  kernel/power/power.h  | 2 --
>  4 files changed, 2 insertions(+), 3 deletions(-)

Looks fine to me.

Acked-by: Michael Ellerman  (powerpc)

cheers


[RESEND PATCH 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-05-27 Thread Shawn Anastasio
Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
 drivers/pci/pci.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
return 0;
 }
 
+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
 static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
 static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
p = resource_alignment_param;
if (!*p && !align)
goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
align = 0;
-   pr_info_once("PCI: Ignoring requested alignments 
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested alignments\n");
goto out;
}
 
-- 
2.20.1



[RESEND PATCH 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init

2019-05-27 Thread Shawn Anastasio
On pseries, custom PCI resource alignment specified with the commandline
argument pci=resource_alignment is disabled due to PCI resources being
managed by the firmware. However, in the case of PCI hotplug the
resources are managed by the kernel, so custom alignments should be
honored in these cases. This is done by only honoring custom
alignments after initial PCI initialization is done, to ensure that
all devices managed by the firmware are excluded.

Without this ability, sub-page BARs sometimes get mapped in between
page boundaries for hotplugged devices and are therefore unusable
with the VFIO framework. This change allows users to request
page alignment for devices they wish to access via VFIO using
the pci=resource_alignment commandline argument.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is
done on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned")

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h |  3 +++
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 3 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2fbfaa9176ed..46eb62c0954e 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -179,6 +179,9 @@ struct machdep_calls {
 
resource_size_t (*pcibios_default_alignment)(void);
 
+   /* Called when determining PCI resource alignment */
+   int (*pcibios_ignore_alignment_request)(void);
+
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ff4b7539cbdf..1a6ded45a701 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
return 0;
 }
 
+resource_size_t pcibios_ignore_alignment_request(void)
+{
+   if (ppc_md.pcibios_ignore_alignment_request)
+   return ppc_md.pcibios_ignore_alignment_request();
+
+   /* Fall back to default method of checking PCI_PROBE_ONLY */
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #ifdef CONFIG_PCI_IOV
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
 {
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e4f0dfd4ae33..c6af2ed8ee0f 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
+int initial_pci_init_done; /* TRUE if initial pcibios init has completed */
+
 static void pSeries_show_cpuinfo(struct seq_file *m)
 {
struct device_node *root;
@@ -749,6 +751,23 @@ static resource_size_t 
pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
 }
 #endif
 
+static void pseries_after_init(void)
+{
+   initial_pci_init_done = 1;
+}
+
+static int pseries_ignore_alignment_request(void)
+{
+   if (initial_pci_init_done)
+   /*
+* Allow custom alignments after init for things
+* like PCI hotplugging.
+*/
+   return 0;
+
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 static void __init pSeries_setup_arch(void)
 {
set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
}
 
ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+   ppc_md.pcibios_after_init = pseries_after_init;
+   ppc_md.pcibios_ignore_alignment_request =
+   pseries_ignore_alignment_request;
 }
 
 static void pseries_panic(char *str)
-- 
2.20.1



[RESEND PATCH 2/3] powerpc/64: Enable pcibios_after_init hook on ppc64

2019-05-27 Thread Shawn Anastasio
Enable the pcibios_after_init hook on all powerpc platforms.
This hook is executed at the end of pcibios_init and was previously
only available on CONFIG_PPC32.

Since it is useful and not inherently limited to 32-bit mode,
remove the limitation and allow it on all powerpc platforms.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/machdep.h | 3 +--
 arch/powerpc/kernel/pci_64.c   | 4 
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 2f0ca6560e47..2fbfaa9176ed 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -150,6 +150,7 @@ struct machdep_calls {
void(*init)(void);
 
void(*kgdb_map_scc)(void);
+#endif /* CONFIG_PPC32 */
 
/*
 * optional PCI "hooks"
@@ -157,8 +158,6 @@ struct machdep_calls {
/* Called at then very end of pcibios_init() */
void (*pcibios_after_init)(void);
 
-#endif /* CONFIG_PPC32 */
-
/* Called in indirect_* to avoid touching devices */
int (*pci_exclude_device)(struct pci_controller *, unsigned char, 
unsigned char);
 
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 9d8c10d55407..fba7fe6e4a50 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -68,6 +68,10 @@ static int __init pcibios_init(void)
 
printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
 
+   /* Call machine dependent post-init code */
+   if (ppc_md.pcibios_after_init)
+   ppc_md.pcibios_after_init();
+
return 0;
 }
 
-- 
2.20.1



[RESEND PATCH 0/3] Allow custom PCI resource alignment on pseries

2019-05-27 Thread Shawn Anastasio
Hello all,

This patch set implements support for user-specified PCI resource
alignment on the pseries platform for hotplugged PCI devices.
Currently on pseries, PCI resource alignments specified with the
pci=resource_alignment commandline argument are ignored, since
the firmware is in charge of managing the PCI resources. In the
case of hotplugged devices, though, the kernel is in charge of 
configuring the resources and should obey alignment requirements.

The current behavior of ignoring the alignment for hotplugged devices
results in sub-page BARs landing between page boundaries and
becoming un-mappable from userspace via the VFIO framework.
This issue was observed on a pseries KVM guest with hotplugged
ivshmem devices.
 
With these changes, users can specify an appropriate
pci=resource_alignment argument on boot for devices they wish to use 
with VFIO.

In the future, this could be extended to provide page-aligned
resources by default for hotplugged devices, similar to what is done
on powernv by commit 382746376993 ("powerpc/powernv: Override
pcibios_default_alignment() to force PCI devices to be page aligned").

Feedback is appreciated.

Thanks,
Shawn

Shawn Anastasio (3):
  PCI: Introduce pcibios_ignore_alignment_request
  powerpc/64: Enable pcibios_after_init hook on ppc64
  powerpc/pseries: Allow user-specified PCI resource alignment after
init

 arch/powerpc/include/asm/machdep.h |  6 --
 arch/powerpc/kernel/pci-common.c   |  9 +
 arch/powerpc/kernel/pci_64.c   |  4 
 arch/powerpc/platforms/pseries/setup.c | 22 ++
 drivers/pci/pci.c  |  9 +++--
 5 files changed, 46 insertions(+), 4 deletions(-)

-- 
2.20.1



Re: [PATCH] powerpc: Fix loading of kernel + initramfs with kexec_file_load()

2019-05-27 Thread Thiago Jung Bauermann


Michael Ellerman  writes:

> On Wed, 2019-05-22 at 22:01:58 UTC, Thiago Jung Bauermann wrote:
>> Commit b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
>> changed kexec_add_buffer() to skip searching for a memory location if
>> kexec_buf.mem is already set, and use the address that is there.
>> 
>> In powerpc code we reuse a kexec_buf variable for loading both the kernel
>> and the initramfs by resetting some of the fields between those uses, but
>> not mem. This causes kexec_add_buffer() to try to load the kernel at the
>> same address where initramfs will be loaded, which is naturally rejected:
>> 
>>   # kexec -s -l --initrd initramfs vmlinuz
>>   kexec_file_load failed: Invalid argument
>> 
>> Setting the mem field before every call to kexec_add_buffer() fixes this
>> regression.
>> 
>> Fixes: b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
>> Signed-off-by: Thiago Jung Bauermann 
>> Reviewed-by: Dave Young 
>
> Applied to powerpc fixes, thanks.
>
> https://git.kernel.org/powerpc/c/8b909e3548706cbebc0a676067b81aad

Thanks!!

-- 
Thiago Jung Bauermann
IBM Linux Technology Center



[Bug 203725] New: Build error: 'init_module' specifies less restrictive attribute than its target 'rtas_flash_init': 'cold'

2019-05-27 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=203725

Bug ID: 203725
   Summary: Build error: 'init_module' specifies less restrictive
attribute than its target 'rtas_flash_init': 'cold'
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: 4.19.46
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PPC-64
  Assignee: platform_ppc...@kernel-bugs.osdl.org
  Reporter: ja...@bluehome.net
Regression: No

Created attachment 282969
  --> https://bugzilla.kernel.org/attachment.cgi?id=282969=edit
Build log

I recently upgraded to GCC 9.1. This error appears while building 4.19.46. I'm
building with ppc64el. GCC 8.3 doesn't doesn't generate an error so that's my
workaround for now.

  powerpc64le-linux-gcc -Wp,-MD,arch/powerpc/kernel/.rtas_flash.o.d  -nostdinc
-isystem /home/jason/toolchain/bin/../lib/gcc/powerpc64le-linux/9.1.0/include
-I./arch/powerpc/include -I./arch/powerpc/include/generated  -I./include
-I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h
-include ./include/linux/compiler_types.h -D__KERNEL__ -Iarch/powerpc
-DHAVE_AS_ATHIGH=1 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
-fno-strict-aliasing -fno-common -fshort-wchar
-Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE
-DCC_HAVE_ASM_GOTO -mlittle-endian -m64 -msoft-float -pipe -Iarch/powerpc
-mtraceback=no -mabi=elfv2 -mcmodel=medium -mno-pointers-to-nested-functions
-mcpu=power8 -mtune=power9 -mno-altivec -mno-vsx -funit-at-a-time
-fno-dwarf2-cfi-asm -mno-string -Wa,-maltivec -Wa,-mpower4 -Wa,-many
-mno-strict-align -mlittle-endian -fno-delete-null-pointer-checks
-Wno-frame-address -Wno-format-truncation -Wno-format-overflow
-Wno-int-in-bool-context -O2 --param=allow-store-data-races=0
-Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-but-set-variable
-Wno-unused-const-variable -fno-var-tracking-assignments -pg -mprofile-kernel
-Wdeclaration-after-statement -Wno-pointer-sign -Wno-stringop-truncation
-fno-strict-overflow -fno-merge-all-constants -fmerge-constants
-fno-stack-check -fconserve-stack -Werror=implicit-int
-Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types
-Werror=designated-init -fmacro-prefix-map=./= -Wno-packed-not-aligned -Werror 
-DMODULE -mno-save-toc-indirect -mcmodel=large 
-DKBUILD_BASENAME='"rtas_flash"' -DKBUILD_MODNAME='"rtas_flash"' -c -o
arch/powerpc/kernel/rtas_flash.o arch/powerpc/kernel/rtas_flash.c
In file included from arch/powerpc/kernel/rtas_flash.c:16:
./include/linux/module.h:133:6: error: 'init_module' specifies less restrictive
attribute than its target 'rtas_flash_init': 'cold'
[-Werror=missing-attributes]
  133 |  int init_module(void) __attribute__((alias(#initfn)));
  |  ^~~
arch/powerpc/kernel/rtas_flash.c:779:1: note: in expansion of macro
'module_init'
  779 | module_init(rtas_flash_init);
  | ^~~
arch/powerpc/kernel/rtas_flash.c:703:19: note: 'init_module' target declared
here
  703 | static int __init rtas_flash_init(void)
  |   ^~~
In file included from arch/powerpc/kernel/rtas_flash.c:16:
./include/linux/module.h:139:7: error: 'cleanup_module' specifies less
restrictive attribute than its target 'rtas_flash_cleanup': 'cold'
[-Werror=missing-attributes]
  139 |  void cleanup_module(void) __attribute__((alias(#exitfn)));
  |   ^~
arch/powerpc/kernel/rtas_flash.c:780:1: note: in expansion of macro
'module_exit'
  780 | module_exit(rtas_flash_cleanup);
  | ^~~
arch/powerpc/kernel/rtas_flash.c:759:20: note: 'cleanup_module' target declared
here
  759 | static void __exit rtas_flash_cleanup(void)
  |^~
cc1: all warnings being treated as errors
scripts/Makefile.build:309: recipe for target
'arch/powerpc/kernel/rtas_flash.o' failed
make[1]: *** [arch/powerpc/kernel/rtas_flash.o] Error 1
Makefile:1051: recipe for target 'arch/powerpc/kernel' failed
make: *** [arch/powerpc/kernel] Error 2

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 203723] New: Build error: taking address of packed member of 'struct ftrace_graph_ent' may result in an unaligned pointer value

2019-05-27 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=203723

Bug ID: 203723
   Summary: Build error: taking address of packed member of
'struct ftrace_graph_ent' may result in an unaligned
pointer value
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: 4.14.122
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PPC-64
  Assignee: platform_ppc...@kernel-bugs.osdl.org
  Reporter: ja...@bluehome.net
Regression: No

Created attachment 282967
  --> https://bugzilla.kernel.org/attachment.cgi?id=282967=edit
Build log

This error appears while building 4.14.122. I'm building with GCC 9.1 for
ppc64el.

make -f ./scripts/Makefile.build obj=arch/powerpc/kernel/trace
  powerpc64le-linux-gcc -m64 -Wp,-MD,arch/powerpc/kernel/trace/.ftrace.o.d 
-nostdinc -isystem
/home/jason/toolchain/bin/../lib/gcc/powerpc64le-linux/9.1.0/include
-I./arch/powerpc/include -I./arch/powerpc/include/generated  -I./include
-I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h
-D__KERNEL__ -DCC_USING_MPROFILE_KERNEL -Iarch/powerpc -DHAVE_AS_ATHIGH=1 -Wall
-Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
-fshort-wchar -Werror-implicit-function-declaration -Wno-format-security
-std=gnu89 -fno-PIE -msoft-float -pipe -Iarch/powerpc -mtraceback=no
-mabi=elfv2 -mcmodel=medium -mno-pointers-to-nested-functions -mcpu=power8
-mno-altivec -mno-vsx -funit-at-a-time -fno-dwarf2-cfi-asm -mno-string
-Wa,-maltivec -mlittle-endian -mno-strict-align -fno-delete-null-pointer-checks
-Wno-frame-address -Wno-format-truncation -Wno-format-overflow
-Wno-int-in-bool-context -Wno-attribute-alias -O2
--param=allow-store-data-races=0 -DCC_HAVE_ASM_GOTO -Wframe-larger-than=2048
-fno-stack-protector -Wno-unused-but-set-variable -Wno-unused-const-variable
-fno-var-tracking-assignments -Wdeclaration-after-statement -Wno-pointer-sign
-Wno-stringop-truncation -fno-strict-overflow -fno-merge-all-constants
-fmerge-constants -fno-stack-check -fconserve-stack -Werror=implicit-int
-Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types
-Werror=designated-init -Wno-packed-not-aligned -Werror -Werror   
-DKBUILD_BASENAME='"ftrace"'  -DKBUILD_MODNAME='"ftrace"' -c -o
arch/powerpc/kernel/trace/ftrace.o arch/powerpc/kernel/trace/ftrace.c
arch/powerpc/kernel/trace/ftrace.c: In function 'prepare_ftrace_return':
arch/powerpc/kernel/trace/ftrace.c:596:43: error: taking address of packed
member of 'struct ftrace_graph_ent' may result in an unaligned pointer value
[-Werror=address-of-packed-member]
  596 |  if (ftrace_push_return_trace(parent, ip, , 0,
  |   ^~~~
cc1: all warnings being treated as errors
scripts/Makefile.build:326: recipe for target
'arch/powerpc/kernel/trace/ftrace.o' failed
make[2]: *** [arch/powerpc/kernel/trace/ftrace.o] Error 1
scripts/Makefile.build:585: recipe for target 'arch/powerpc/kernel/trace'
failed
make[1]: *** [arch/powerpc/kernel/trace] Error 2
Makefile:1038: recipe for target 'arch/powerpc/kernel' failed
make: *** [arch/powerpc/kernel] Error 2

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v7 1/1] iommu: enhance IOMMU dma mode build options

2019-05-27 Thread Joerg Roedel
Hi Zhen Lei,

On Mon, May 20, 2019 at 09:59:47PM +0800, Zhen Lei wrote:
>  arch/ia64/kernel/pci-dma.c|  2 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c |  3 ++-
>  arch/s390/pci/pci_dma.c   |  2 +-
>  arch/x86/kernel/pci-dma.c |  7 ++---
>  drivers/iommu/Kconfig | 44 
> ++-
>  drivers/iommu/amd_iommu_init.c|  3 ++-
>  drivers/iommu/intel-iommu.c   |  2 +-
>  drivers/iommu/iommu.c |  3 ++-
>  8 files changed, 48 insertions(+), 18 deletions(-)

This needs Acks from the arch maintainers of ia64, powerpc, s390 and
x86, at least.

It is easier for them if you split it up into the Kconfig change and
separete patches per arch and per iommu driver. Then collect the Acks on
the individual patches.

Thanks,

Joerg


[PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section

2019-05-27 Thread David Hildenbrand
The parameter is unused, so let's drop it. Memory removal paths should
never care about zones. This is the job of memory offlining and will
require more refactorings.

Reviewed-by: Dan Williams 
Signed-off-by: David Hildenbrand 
---
 include/linux/memory_hotplug.h | 2 +-
 mm/memory_hotplug.c| 2 +-
 mm/sparse.c| 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2f1f87e13baa..1a4257c5f74c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -346,7 +346,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, 
unsigned long start_pfn,
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern int sparse_add_one_section(int nid, unsigned long start_pfn,
  struct vmem_altmap *altmap);
-extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms,
+extern void sparse_remove_one_section(struct mem_section *ms,
unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
  unsigned long pnum);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 82136c5b4c5f..e48ec7b9dee2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -524,7 +524,7 @@ static void __remove_section(struct zone *zone, struct 
mem_section *ms,
start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
__remove_zone(zone, start_pfn);
 
-   sparse_remove_one_section(zone, ms, map_offset, altmap);
+   sparse_remove_one_section(ms, map_offset, altmap);
 }
 
 /**
diff --git a/mm/sparse.c b/mm/sparse.c
index d1d5e05f5b8d..1552c855d62a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -800,8 +800,8 @@ static void free_section_usemap(struct page *memmap, 
unsigned long *usemap,
free_map_bootmem(memmap);
 }
 
-void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
-   unsigned long map_offset, struct vmem_altmap *altmap)
+void sparse_remove_one_section(struct mem_section *ms, unsigned long 
map_offset,
+  struct vmem_altmap *altmap)
 {
struct page *memmap = NULL;
unsigned long *usemap = NULL;
-- 
2.20.1



[PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail

2019-05-27 Thread David Hildenbrand
We really don't want anything during memory hotunplug to fail.
We always pass a valid memory block device, that check can go. Avoid
allocating memory and eventually failing. As we are always called under
lock, we can use a static piece of memory. This avoids having to put
the structure onto the stack, having to guess about the stack size
of callers.

Patch inspired by a patch from Oscar Salvador.

In the future, there might be no need to iterate over nodes at all.
mem->nid should tell us exactly what to remove. Memory block devices
with mixed nodes (added during boot) should properly fenced off and never
removed.

Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Cc: Alex Deucher 
Cc: "David S. Miller" 
Cc: Mark Brown 
Cc: Chris Wilson 
Cc: David Hildenbrand 
Cc: Oscar Salvador 
Cc: Andrew Morton 
Cc: Jonathan Cameron 
Signed-off-by: David Hildenbrand 
---
 drivers/base/node.c  | 18 +-
 include/linux/node.h |  5 ++---
 2 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 04fdfa99b8bc..9be88fd05147 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block 
*mem_blk, void *arg)
 
 /*
  * Unregister memory block device under all nodes that it spans.
+ * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
  */
-int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
+void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
-   NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
unsigned long pfn, sect_start_pfn, sect_end_pfn;
+   static nodemask_t unlinked_nodes;
 
-   if (!mem_blk) {
-   NODEMASK_FREE(unlinked_nodes);
-   return -EFAULT;
-   }
-   if (!unlinked_nodes)
-   return -ENOMEM;
-   nodes_clear(*unlinked_nodes);
-
+   nodes_clear(unlinked_nodes);
sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
@@ -827,15 +821,13 @@ int unregister_memory_block_under_nodes(struct 
memory_block *mem_blk)
continue;
if (!node_online(nid))
continue;
-   if (node_test_and_set(nid, *unlinked_nodes))
+   if (node_test_and_set(nid, unlinked_nodes))
continue;
sysfs_remove_link(_devices[nid]->dev.kobj,
 kobject_name(_blk->dev.kobj));
sysfs_remove_link(_blk->dev.kobj,
 kobject_name(_devices[nid]->dev.kobj));
}
-   NODEMASK_FREE(unlinked_nodes);
-   return 0;
 }
 
 int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
diff --git a/include/linux/node.h b/include/linux/node.h
index 02a29e71b175..548c226966a2 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -139,7 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, 
unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
void *arg);
-extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
+extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
 
 extern int register_memory_node_under_compute_node(unsigned int mem_nid,
   unsigned int cpu_nid,
@@ -175,9 +175,8 @@ static inline int register_mem_sect_under_node(struct 
memory_block *mem_blk,
 {
return 0;
 }
-static inline int unregister_memory_block_under_nodes(struct memory_block 
*mem_blk)
+static inline void unregister_memory_block_under_nodes(struct memory_block 
*mem_blk)
 {
-   return 0;
 }
 
 static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
-- 
2.20.1



[PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()

2019-05-27 Thread David Hildenbrand
Let's factor out removing of memory block devices, which is only
necessary for memory added via add_memory() and friends that created
memory block devices. Remove the devices before calling
arch_remove_memory().

This finishes factoring out memory block device handling from
arch_add_memory() and arch_remove_memory().

Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Cc: David Hildenbrand 
Cc: "mike.tra...@hpe.com" 
Cc: Andrew Morton 
Cc: Andrew Banman 
Cc: Ingo Molnar 
Cc: Alex Deucher 
Cc: "David S. Miller" 
Cc: Mark Brown 
Cc: Chris Wilson 
Cc: Oscar Salvador 
Cc: Jonathan Cameron 
Cc: Michal Hocko 
Cc: Pavel Tatashin 
Cc: Arun KS 
Cc: Mathieu Malaterre 
Reviewed-by: Dan Williams 
Signed-off-by: David Hildenbrand 
---
 drivers/base/memory.c  | 37 ++---
 drivers/base/node.c| 11 ++-
 include/linux/memory.h |  2 +-
 include/linux/node.h   |  6 ++
 mm/memory_hotplug.c|  5 +++--
 5 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 5a0370f0c506..f28efb0bf5c7 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -763,32 +763,31 @@ int create_memory_block_devices(unsigned long start, 
unsigned long size)
return ret;
 }
 
-void unregister_memory_section(struct mem_section *section)
+/*
+ * Remove memory block devices for the given memory area. Start and size
+ * have to be aligned to memory block granularity. Memory block devices
+ * have to be offline.
+ */
+void remove_memory_block_devices(unsigned long start, unsigned long size)
 {
+   const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
+   const int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
struct memory_block *mem;
+   int block_id;
 
-   if (WARN_ON_ONCE(!present_section(section)))
+   if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
+!IS_ALIGNED(size, memory_block_size_bytes(
return;
 
mutex_lock(_sysfs_mutex);
-
-   /*
-* Some users of the memory hotplug do not want/need memblock to
-* track all sections. Skip over those.
-*/
-   mem = find_memory_block(section);
-   if (!mem)
-   goto out_unlock;
-
-   unregister_mem_sect_under_nodes(mem, __section_nr(section));
-
-   mem->section_count--;
-   if (mem->section_count == 0)
+   for (block_id = start_block_id; block_id != end_block_id; block_id++) {
+   mem = find_memory_block_by_id(block_id, NULL);
+   if (WARN_ON_ONCE(!mem))
+   continue;
+   mem->section_count = 0;
+   unregister_memory_block_under_nodes(mem);
unregister_memory(mem);
-   else
-   put_device(>dev);
-
-out_unlock:
+   }
mutex_unlock(_sysfs_mutex);
 }
 
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 8598fcbd2a17..04fdfa99b8bc 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -801,9 +801,10 @@ int register_mem_sect_under_node(struct memory_block 
*mem_blk, void *arg)
return 0;
 }
 
-/* unregister memory section under all nodes that it spans */
-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-   unsigned long phys_index)
+/*
+ * Unregister memory block device under all nodes that it spans.
+ */
+int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -816,8 +817,8 @@ int unregister_mem_sect_under_nodes(struct memory_block 
*mem_blk,
return -ENOMEM;
nodes_clear(*unlinked_nodes);
 
-   sect_start_pfn = section_nr_to_pfn(phys_index);
-   sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+   sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
+   sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
int nid;
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index db3e8567f900..f26a5417ec5d 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -112,7 +112,7 @@ extern void unregister_memory_notifier(struct 
notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size);
-extern void unregister_memory_section(struct mem_section *);
+void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
 extern int memory_isolate_notify(unsigned long val, void *v);
diff --git a/include/linux/node.h b/include/linux/node.h
index 1a557c589ecb..02a29e71b175 100644
--- 

[PATCH v3 08/11] mm/memory_hotplug: Drop MHP_MEMBLOCK_API

2019-05-27 Thread David Hildenbrand
No longer needed, the callers of arch_add_memory() can handle this
manually.

Cc: Andrew Morton 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Pavel Tatashin 
Cc: Wei Yang 
Cc: Joonsoo Kim 
Cc: Qian Cai 
Cc: Arun KS 
Cc: Mathieu Malaterre 
Signed-off-by: David Hildenbrand 
---
 include/linux/memory_hotplug.h | 8 
 mm/memory_hotplug.c| 9 +++--
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2d4de313926d..2f1f87e13baa 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 
size,
 extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
   unsigned long nr_pages, struct vmem_altmap *altmap);
 
-/*
- * Do we want sysfs memblock files created. This will allow userspace to online
- * and offline memory explicitly. Lack of this bit means that the caller has to
- * call move_pfn_range_to_zone to finish the initialization.
- */
-
-#define MHP_MEMBLOCK_API   (1<<0)
-
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long 
nr_pages,
   struct mhp_restrictions *restrictions);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b1fde90bbf19..9a92549ef23b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -251,7 +251,7 @@ void __init register_page_bootmem_info_node(struct 
pglist_data *pgdat)
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
 static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-   struct vmem_altmap *altmap, bool want_memblock)
+  struct vmem_altmap *altmap)
 {
int ret;
 
@@ -294,8 +294,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
}
 
for (i = start_sec; i <= end_sec; i++) {
-   err = __add_section(nid, section_nr_to_pfn(i), altmap,
-   restrictions->flags & MHP_MEMBLOCK_API);
+   err = __add_section(nid, section_nr_to_pfn(i), altmap);
 
/*
 * EEXIST is finally dealt with by ioresource collision
@@ -1067,9 +1066,7 @@ static int online_memory_block(struct memory_block *mem, 
void *arg)
  */
 int __ref add_memory_resource(int nid, struct resource *res)
 {
-   struct mhp_restrictions restrictions = {
-   .flags = MHP_MEMBLOCK_API,
-   };
+   struct mhp_restrictions restrictions = {};
u64 start, size;
bool new_node = false;
int ret;
-- 
2.20.1



[PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()

2019-05-27 Thread David Hildenbrand
Only memory to be added to the buddy and to be onlined/offlined by
user space using /sys/devices/system/memory/... needs (and should have!)
memory block devices.

Factor out creation of memory block devices. Create all devices after
arch_add_memory() succeeded. We can later drop the want_memblock parameter,
because it is now effectively stale.

Only after memory block devices have been added, memory can be onlined
by user space. This implies, that memory is not visible to user space at
all before arch_add_memory() succeeded.

While at it
- use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
- introduce find_memory_block_by_id() to search via block id
- Use find_memory_block_by_id() in init_memory_block() to catch
  duplicates

Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Cc: David Hildenbrand 
Cc: "mike.tra...@hpe.com" 
Cc: Andrew Morton 
Cc: Ingo Molnar 
Cc: Andrew Banman 
Cc: Oscar Salvador 
Cc: Michal Hocko 
Cc: Pavel Tatashin 
Cc: Qian Cai 
Cc: Wei Yang 
Cc: Arun KS 
Cc: Mathieu Malaterre 
Signed-off-by: David Hildenbrand 
---
 drivers/base/memory.c  | 82 +++---
 include/linux/memory.h |  2 +-
 mm/memory_hotplug.c| 15 
 3 files changed, 63 insertions(+), 36 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index ac17c95a5f28..5a0370f0c506 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -39,6 +39,11 @@ static inline int base_memory_block_id(int section_nr)
return section_nr / sections_per_block;
 }
 
+static inline int pfn_to_block_id(unsigned long pfn)
+{
+   return base_memory_block_id(pfn_to_section_nr(pfn));
+}
+
 static int memory_subsys_online(struct device *dev);
 static int memory_subsys_offline(struct device *dev);
 
@@ -582,10 +587,9 @@ int __weak arch_get_memory_phys_device(unsigned long 
start_pfn)
  * A reference for the returned object is held and the reference for the
  * hinted object is released.
  */
-struct memory_block *find_memory_block_hinted(struct mem_section *section,
- struct memory_block *hint)
+static struct memory_block *find_memory_block_by_id(int block_id,
+   struct memory_block *hint)
 {
-   int block_id = base_memory_block_id(__section_nr(section));
struct device *hintdev = hint ? >dev : NULL;
struct device *dev;
 
@@ -597,6 +601,14 @@ struct memory_block *find_memory_block_hinted(struct 
mem_section *section,
return to_memory_block(dev);
 }
 
+struct memory_block *find_memory_block_hinted(struct mem_section *section,
+ struct memory_block *hint)
+{
+   int block_id = base_memory_block_id(__section_nr(section));
+
+   return find_memory_block_by_id(block_id, hint);
+}
+
 /*
  * For now, we have a linear search to go find the appropriate
  * memory_block corresponding to a particular phys_index. If
@@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, 
int block_id,
unsigned long start_pfn;
int ret = 0;
 
+   mem = find_memory_block_by_id(block_id, NULL);
+   if (mem) {
+   put_device(>dev);
+   return -EEXIST;
+   }
mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return -ENOMEM;
@@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
return 0;
 }
 
+static void unregister_memory(struct memory_block *memory)
+{
+   if (WARN_ON_ONCE(memory->dev.bus != _subsys))
+   return;
+
+   /* drop the ref. we got via find_memory_block() */
+   put_device(>dev);
+   device_unregister(>dev);
+}
+
 /*
- * need an interface for the VM to add new memory regions,
- * but without onlining it.
+ * Create memory block devices for the given memory area. Start and size
+ * have to be aligned to memory block granularity. Memory block devices
+ * will be initialized as offline.
  */
-int hotplug_memory_register(int nid, struct mem_section *section)
+int create_memory_block_devices(unsigned long start, unsigned long size)
 {
-   int block_id = base_memory_block_id(__section_nr(section));
-   int ret = 0;
+   const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
+   int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
struct memory_block *mem;
+   unsigned long block_id;
+   int ret = 0;
 
-   mutex_lock(_sysfs_mutex);
+   if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
+!IS_ALIGNED(size, memory_block_size_bytes(
+   return -EINVAL;
 
-   mem = find_memory_block(section);
-   if (mem) {
-   mem->section_count++;
-   put_device(>dev);
-   } else {
+   mutex_lock(_sysfs_mutex);
+   for (block_id = start_block_id; block_id != end_block_id; block_id++) {
ret = init_memory_block(, 

[PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE

2019-05-27 Thread David Hildenbrand
We want to improve error handling while adding memory by allowing
to use arch_remove_memory() and __remove_pages() even if
CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:

arch_add_memory()
rc = do_something();
if (rc) {
arch_remove_memory();
}

We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
quite some dependencies for memory offlining.

Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Mike Rapoport 
Cc: David Hildenbrand 
Cc: Oscar Salvador 
Cc: "Kirill A. Shutemov" 
Cc: Alex Deucher 
Cc: "David S. Miller" 
Cc: Mark Brown 
Cc: Chris Wilson 
Cc: Christophe Leroy 
Cc: Nicholas Piggin 
Cc: Vasily Gorbik 
Cc: Rob Herring 
Cc: Masahiro Yamada 
Cc: "mike.tra...@hpe.com" 
Cc: Andrew Banman 
Cc: Pavel Tatashin 
Cc: Wei Yang 
Cc: Arun KS 
Cc: Qian Cai 
Cc: Mathieu Malaterre 
Cc: Baoquan He 
Cc: Logan Gunthorpe 
Cc: Anshuman Khandual 
Signed-off-by: David Hildenbrand 
---
 arch/arm64/mm/mmu.c| 2 --
 arch/ia64/mm/init.c| 2 --
 arch/powerpc/mm/mem.c  | 2 --
 arch/s390/mm/init.c| 2 --
 arch/sh/mm/init.c  | 2 --
 arch/x86/mm/init_32.c  | 2 --
 arch/x86/mm/init_64.c  | 2 --
 drivers/base/memory.c  | 2 --
 include/linux/memory.h | 2 --
 include/linux/memory_hotplug.h | 2 --
 mm/memory_hotplug.c| 2 --
 mm/sparse.c| 6 --
 12 files changed, 28 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e569a543c384..9ccd7539f2d4 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1084,7 +1084,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
   restrictions);
 }
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
 {
@@ -1103,4 +1102,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
-#endif
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index d28e29103bdb..aae75fd7b810 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -681,7 +681,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
return ret;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
 {
@@ -693,4 +692,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
-#endif
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index e885fe2aafcc..e4bc2dc3f593 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -130,7 +130,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void __ref arch_remove_memory(int nid, u64 start, u64 size,
 struct vmem_altmap *altmap)
 {
@@ -164,7 +163,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
pr_warn("Hash collision while resizing HPT\n");
 }
 #endif
-#endif /* CONFIG_MEMORY_HOTPLUG */
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 void __init mem_topology_setup(void)
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 14955e0a9fcf..ffb81fe95c77 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -239,7 +239,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
return rc;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
 {
@@ -251,5 +250,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
__remove_pages(zone, start_pfn, nr_pages, altmap);
vmem_remove_mapping(start, size);
 }
-#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 13c6a6bb5fd9..dfdbaa50946e 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -429,7 +429,6 @@ int memory_add_physaddr_to_nid(u64 addr)
 EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
 {
@@ -440,5 +439,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
zone = page_zone(pfn_to_page(start_pfn));
__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
-#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git 

[PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block()

2019-05-27 Thread David Hildenbrand
We'll rework hotplug_memory_register() shortly, so it no longer consumes
pass a section.

Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Signed-off-by: David Hildenbrand 
---
 drivers/base/memory.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f180427e48f4..f914fa6fe350 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -651,21 +651,18 @@ int register_memory(struct memory_block *memory)
return ret;
 }
 
-static int init_memory_block(struct memory_block **memory,
-struct mem_section *section, unsigned long state)
+static int init_memory_block(struct memory_block **memory, int block_id,
+unsigned long state)
 {
struct memory_block *mem;
unsigned long start_pfn;
-   int scn_nr;
int ret = 0;
 
mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return -ENOMEM;
 
-   scn_nr = __section_nr(section);
-   mem->start_section_nr =
-   base_memory_block_id(scn_nr) * sections_per_block;
+   mem->start_section_nr = block_id * sections_per_block;
mem->end_section_nr = mem->start_section_nr + sections_per_block - 1;
mem->state = state;
start_pfn = section_nr_to_pfn(mem->start_section_nr);
@@ -694,7 +691,8 @@ static int add_memory_block(int base_section_nr)
 
if (section_count == 0)
return 0;
-   ret = init_memory_block(, __nr_to_section(section_nr), MEM_ONLINE);
+   ret = init_memory_block(, base_memory_block_id(base_section_nr),
+   MEM_ONLINE);
if (ret)
return ret;
mem->section_count = section_count;
@@ -707,6 +705,7 @@ static int add_memory_block(int base_section_nr)
  */
 int hotplug_memory_register(int nid, struct mem_section *section)
 {
+   int block_id = base_memory_block_id(__section_nr(section));
int ret = 0;
struct memory_block *mem;
 
@@ -717,7 +716,7 @@ int hotplug_memory_register(int nid, struct mem_section 
*section)
mem->section_count++;
put_device(>dev);
} else {
-   ret = init_memory_block(, section, MEM_OFFLINE);
+   ret = init_memory_block(, block_id, MEM_OFFLINE);
if (ret)
goto out;
mem->section_count++;
-- 
2.20.1



[PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation

2019-05-27 Thread David Hildenbrand
A proper arch_remove_memory() implementation is on its way, which also
cleanly removes page tables in arch_add_memory() in case something goes
wrong.

As we want to use arch_remove_memory() in case something goes wrong
during memory hotplug after arch_add_memory() finished, let's add
a temporary hack that is sufficient enough until we get a proper
implementation that cleans up page table entries.

We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up
patches.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Andrew Morton 
Cc: Ard Biesheuvel 
Cc: Chintan Pandya 
Cc: Mike Rapoport 
Cc: Jun Yao 
Cc: Yu Zhao 
Cc: Robin Murphy 
Cc: Anshuman Khandual 
Signed-off-by: David Hildenbrand 
---
 arch/arm64/mm/mmu.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a1bfc4413982..e569a543c384 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1084,4 +1084,23 @@ int arch_add_memory(int nid, u64 start, u64 size,
return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
   restrictions);
 }
+#ifdef CONFIG_MEMORY_HOTREMOVE
+void arch_remove_memory(int nid, u64 start, u64 size,
+   struct vmem_altmap *altmap)
+{
+   unsigned long start_pfn = start >> PAGE_SHIFT;
+   unsigned long nr_pages = size >> PAGE_SHIFT;
+   struct zone *zone;
+
+   /*
+* FIXME: Cleanup page tables (also in arch_add_memory() in case
+* adding fails). Until then, this function should only be used
+* during memory hotplug (adding memory), not for memory
+* unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
+* unlocked yet.
+*/
+   zone = page_zone(pfn_to_page(start_pfn));
+   __remove_pages(zone, start_pfn, nr_pages, altmap);
+}
+#endif
 #endif
-- 
2.20.1



[PATCH v3 03/11] s390x/mm: Implement arch_remove_memory()

2019-05-27 Thread David Hildenbrand
Will come in handy when wanting to handle errors after
arch_add_memory().

Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Mike Rapoport 
Cc: David Hildenbrand 
Cc: Vasily Gorbik 
Cc: Oscar Salvador 
Signed-off-by: David Hildenbrand 
---
 arch/s390/mm/init.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index d552e330fbcc..14955e0a9fcf 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -243,12 +243,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
 void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
 {
-   /*
-* There is no hardware or firmware interface which could trigger a
-* hot memory remove on s390. So there is nothing that needs to be
-* implemented.
-*/
-   BUG();
+   unsigned long start_pfn = start >> PAGE_SHIFT;
+   unsigned long nr_pages = size >> PAGE_SHIFT;
+   struct zone *zone;
+
+   zone = page_zone(pfn_to_page(start_pfn));
+   __remove_pages(zone, start_pfn, nr_pages, altmap);
+   vmem_remove_mapping(start, size);
 }
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.20.1



[PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory()

2019-05-27 Thread David Hildenbrand
ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we
don't forget arch_add_memory()/arch_remove_memory() when unlocking
support.

Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Mike Rapoport 
Cc: David Hildenbrand 
Cc: Vasily Gorbik 
Cc: Oscar Salvador 
Suggested-by: Dan Williams 
Signed-off-by: David Hildenbrand 
---
 arch/s390/mm/init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 14d1eae9fe43..d552e330fbcc 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -226,6 +226,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
unsigned long size_pages = PFN_DOWN(size);
int rc;
 
+   if (WARN_ON_ONCE(restrictions->altmap))
+   return -EINVAL;
+
rc = vmem_add_mapping(start, size);
if (rc)
return rc;
-- 
2.20.1



[PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()

2019-05-27 Thread David Hildenbrand
By converting start and size to page granularity, we actually ignore
unaligned parts within a page instead of properly bailing out with an
error.

Cc: Andrew Morton 
Cc: Oscar Salvador 
Cc: Michal Hocko 
Cc: David Hildenbrand 
Cc: Pavel Tatashin 
Cc: Qian Cai 
Cc: Wei Yang 
Cc: Arun KS 
Cc: Mathieu Malaterre 
Reviewed-by: Dan Williams 
Reviewed-by: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 mm/memory_hotplug.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e096c987d261..762887b2358b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1051,16 +1051,11 @@ int try_online_node(int nid)
 
 static int check_hotplug_memory_range(u64 start, u64 size)
 {
-   unsigned long block_sz = memory_block_size_bytes();
-   u64 block_nr_pages = block_sz >> PAGE_SHIFT;
-   u64 nr_pages = size >> PAGE_SHIFT;
-   u64 start_pfn = PFN_DOWN(start);
-
/* memory range must be block size aligned */
-   if (!nr_pages || !IS_ALIGNED(start_pfn, block_nr_pages) ||
-   !IS_ALIGNED(nr_pages, block_nr_pages)) {
+   if (!size || !IS_ALIGNED(start, memory_block_size_bytes()) ||
+   !IS_ALIGNED(size, memory_block_size_bytes())) {
pr_err("Block size [%#lx] unaligned hotplug range: start %#llx, 
size %#llx",
-  block_sz, start, size);
+  memory_block_size_bytes(), start, size);
return -EINVAL;
}
 
-- 
2.20.1



Re: [PATCH v2] powerpc/power: Expose pfn_is_nosave prototype

2019-05-27 Thread Rafael J. Wysocki
On Friday, May 24, 2019 12:44:18 PM CEST Mathieu Malaterre wrote:
> The declaration for pfn_is_nosave is only available in
> kernel/power/power.h. Since this function can be override in arch,
> expose it globally. Having a prototype will make sure to avoid warning
> (sometime treated as error with W=1) such as:
> 
>   arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 
> 'pfn_is_nosave' [-Werror=missing-prototypes]
> 
> This moves the declaration into a globally visible header file and add
> missing include to avoid a warning on powerpc. Also remove the
> duplicated prototypes since not required anymore.
> 
> Cc: Christophe Leroy 
> Signed-off-by: Mathieu Malaterre 
> ---
> v2: As suggestion by christophe remove duplicates prototypes
> 
>  arch/powerpc/kernel/suspend.c | 1 +
>  arch/s390/kernel/entry.h  | 1 -
>  include/linux/suspend.h   | 1 +
>  kernel/power/power.h  | 2 --
>  4 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/suspend.c b/arch/powerpc/kernel/suspend.c
> index a531154cc0f3..9e1b6b894245 100644
> --- a/arch/powerpc/kernel/suspend.c
> +++ b/arch/powerpc/kernel/suspend.c
> @@ -8,6 +8,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  
> diff --git a/arch/s390/kernel/entry.h b/arch/s390/kernel/entry.h
> index 20420c2b8a14..b2956d49b6ad 100644
> --- a/arch/s390/kernel/entry.h
> +++ b/arch/s390/kernel/entry.h
> @@ -63,7 +63,6 @@ void __init startup_init(void);
>  void die(struct pt_regs *regs, const char *str);
>  int setup_profiling_timer(unsigned int multiplier);
>  void __init time_init(void);
> -int pfn_is_nosave(unsigned long);
>  void s390_early_resume(void);
>  unsigned long prepare_ftrace_return(unsigned long parent, unsigned long sp, 
> unsigned long ip);
>  
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index 6b3ea9ea6a9e..e8b8a7bede90 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -395,6 +395,7 @@ extern bool system_entering_hibernation(void);
>  extern bool hibernation_available(void);
>  asmlinkage int swsusp_save(void);
>  extern struct pbe *restore_pblist;
> +int pfn_is_nosave(unsigned long pfn);
>  #else /* CONFIG_HIBERNATION */
>  static inline void register_nosave_region(unsigned long b, unsigned long e) 
> {}
>  static inline void register_nosave_region_late(unsigned long b, unsigned 
> long e) {}
> diff --git a/kernel/power/power.h b/kernel/power/power.h
> index 9e58bdc8a562..44bee462ff57 100644
> --- a/kernel/power/power.h
> +++ b/kernel/power/power.h
> @@ -75,8 +75,6 @@ static inline void hibernate_reserved_size_init(void) {}
>  static inline void hibernate_image_size_init(void) {}
>  #endif /* !CONFIG_HIBERNATION */
>  
> -extern int pfn_is_nosave(unsigned long);
> -
>  #define power_attr(_name) \
>  static struct kobj_attribute _name##_attr = {\
>   .attr   = { \
> 

With an ACK from the powerpc maintainers, I could apply this one.






Re: [PATCH] mm/nvdimm: Use correct alignment when looking at first pfn from a region

2019-05-27 Thread Aneesh Kumar K.V
"Aneesh Kumar K.V"  writes:

> On 5/14/19 9:59 AM, Dan Williams wrote:
>> On Mon, May 13, 2019 at 7:55 PM Aneesh Kumar K.V
>>  wrote:
>>>
>>> We already add the start_pad to the resource->start but fails to section
>>> align the start. This make sure with altmap we compute the right first
>>> pfn when start_pad is zero and we are doing an align down of start address.
>>>
>>> Signed-off-by: Aneesh Kumar K.V 
>>> ---
>>>   kernel/memremap.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/memremap.c b/kernel/memremap.c
>>> index a856cb5ff192..23d77b60e728 100644
>>> --- a/kernel/memremap.c
>>> +++ b/kernel/memremap.c
>>> @@ -59,9 +59,9 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap)
>>>   {
>>>  const struct resource *res = >res;
>>>  struct vmem_altmap *altmap = >altmap;
>>> -   unsigned long pfn;
>>> +   unsigned long pfn = PHYS_PFN(res->start);
>>>
>>> -   pfn = res->start >> PAGE_SHIFT;
>>> +   pfn = SECTION_ALIGN_DOWN(pfn);
>> 
>> This does not seem right to me it breaks the assumptions of where the
>> first expected valid pfn occurs in the passed in range.
>> 
>
> How do we define the first valid pfn? Isn't that at pfn_sb->dataoff ?

for altmap the pfn_first should be

pfn_first = altmap->base_pfn + vmem_altmap_offset(altmap);

?

-aneesh



Re: [PATCH 10/10] docs: fix broken documentation links

2019-05-27 Thread Rafael J. Wysocki
On Mon, May 20, 2019 at 4:48 PM Mauro Carvalho Chehab
 wrote:
>
> Mostly due to x86 and acpi conversion, several documentation
> links are still pointing to the old file. Fix them.
>
> Signed-off-by: Mauro Carvalho Chehab 

For the ACPI part:

Acked-by: Rafael J. Wysocki