date:20180817

[PATCH] powerpc/fadump: cleanup crash memory ranges support

2018-08-17 Thread Hari Bathini

Commit 1bd6a1c4b80a ("powerpc/fadump: handle crash memory ranges array
index overflow") changed crash memory ranges to a dynamic array that
is reallocated on-demand with krealloc(). The relevant header for this
call was not included. The kernel compiles though. But be cautious and
add the header anyway.

Also, memory allocation logic in fadump_add_crash_memory() takes care
of memory allocation for crash memory ranges in all scenarios. Drop
unnecessary memory allocation in fadump_setup_crash_memory_ranges().

Fixes: 1bd6a1c4b80a ("powerpc/fadump: handle crash memory ranges array index 
overflow")
Cc: Mahesh Salgaonkar 
Signed-off-by: Hari Bathini 
---

* Actually posted a V3 with this changes but V2 made it!
- https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=59839


 arch/powerpc/kernel/fadump.c |8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 986ec47..a711d22 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1019,13 +1020,6 @@ static int fadump_setup_crash_memory_ranges(void)
pr_debug("Setup crash memory ranges.\n");
crash_mem_ranges = 0;
 
-   /* allocate memory for crash memory ranges for the first time */
-   if (!max_crash_mem_ranges) {
-   ret = allocate_crash_memory_ranges();
-   if (ret)
-   return ret;
-   }
-
/*
 * add the first memory chunk (RMA_START through boot_memory_size) as
 * a separate memory chunk. The reason is, at the time crash firmware

Re: [PATCH] powerpc64/ftrace: Include ftrace.h needed for enable/disable calls

2018-08-17 Thread Luke Dashjr

On Friday 17 August 2018 10:25:40 Naveen N. Rao wrote:
> Luke Dashjr wrote:
> > this_cpu_disable_ftrace and this_cpu_enable_ftrace are inlines in
> > ftrace.h Without it included, the build fails.
>
> I'm unable to reproduce this. Can you share your .config and the build
> environment?

https://luke.dashjr.org/tmp/code/4.18-config.xz

Gentoo GNU/Linux (mostly stable keywords) on a Raptor Talos II POWER9 system.

> > Fixes: a4bc64d305af ("powerpc64/ftrace: Disable ftrace during kvm
> > entry/exit") Signed-off-by: Luke Dashjr 
> > Cc: sta...@vger.kernel.org
> > ---
> >  arch/powerpc/kvm/book3s_hv.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index ee4a8854985e..15c2c64291f4 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -46,6 +46,7 @@
> >  #include 
> >  #include 
> >
> > +#include 
> >  #include 
> >  #include 
> >  #include 
>
> In any case, this change itself looks alright to me. So:
> Acked-by: Naveen N. Rao 
>
>
> Thanks,
> Naveen

Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread Greg Kroah-Hartman

On Fri, Aug 17, 2018 at 01:56:35PM +0200, David Hildenbrand wrote:
> E.g. When adding the memory block devices, we know that there won't be a
> driver to attach to (as there are no drivers for the "memory" subsystem)
> - the bus_probe_device() function that takes the device_lock() could
> pretty much be avoided for that case. But burying such special cases
> down in core driver code definitely won't make locking related to memory
> hotplug easier.

You don't have to have a driver for a device if you don't want to, or
you can just have a default one for all memory devices if you somehow
need it.  No reason to not do this if it makes things easier for you.

thanks,

greg k-h

Re: [PATCH] powerpc64/ftrace: Include ftrace.h needed for enable/disable calls

2018-08-17 Thread Naveen N. Rao

Luke Dashjr wrote:

On Friday 17 August 2018 10:25:40 Naveen N. Rao wrote:

Luke Dashjr wrote:
> this_cpu_disable_ftrace and this_cpu_enable_ftrace are inlines in
> ftrace.h Without it included, the build fails.

I'm unable to reproduce this. Can you share your .config and the build
environment?

https://luke.dashjr.org/tmp/code/4.18-config.xz

Gentoo GNU/Linux (mostly stable keywords) on a Raptor Talos II POWER9 system.

Thanks. Your config is missing CONFIG_TRACEPOINTS and I hadn't tested 
that. The below fix looks good and I've confirmed that the other uses 
are fine as well.

- Naveen

Re: [PATCH 3/3] powerpc/mm/ Add proper pte access check helper

2018-08-17 Thread Christophe LEROY





Le 04/12/2017 à 03:19, Aneesh Kumar K.V a écrit :

pte_access_premitted get called in get_user_pages_fast path. If we have marked
the pte PROT_NONE, we should not allow a read access on the address. With
the current implementation we are not checking the READ and only check for
WRITE. This is needed on archs like ppc64 that implement PROT_NONE using
_PAGE_USER access instead of _PAGE_PRESENT. Also add pte_user check just to 
make sure
we are not accessing kernel mapping.

Even though there is code duplication, keeping the low level pte accessors
different for different platforms helps in code readability.

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/book3s/32/pgtable.h | 23 +++
  arch/powerpc/include/asm/nohash/pgtable.h| 23 +++
  2 files changed, 46 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 016579ef16d3..30a155c0a6b0 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -311,6 +311,29 @@ static inline int pte_present(pte_t pte)
return pte_val(pte) & _PAGE_PRESENT;
  }
  
+/*

+ * We only find page table entry in the last level
+ * Hence no need for other accessors
+ */
+#define pte_access_permitted pte_access_permitted
+static inline bool pte_access_permitted(pte_t pte, bool write)
+{
+   unsigned long pteval = pte_val(pte);
+   /*
+* A read-only access is controlled by _PAGE_USER bit.
+* We have _PAGE_READ set for WRITE and EXECUTE
+*/
+   unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_USER;
+
+   if (write)
+   need_pte_bits |= _PAGE_WRITE;
+
+   if ((pteval & need_pte_bits) != need_pte_bits)
+   return false;
+
+   return true;
+}
+
  /* Conversion functions: convert a page and protection to a page entry,
   * and a page entry and page directory to the page they refer to.
   *
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 5c68f4a59f75..fc4376c8d444 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -45,6 +45,29 @@ static inline int pte_present(pte_t pte)
return pte_val(pte) & _PAGE_PRESENT;
  }
  
+/*

+ * We only find page table entry in the last level
+ * Hence no need for other accessors
+ */
+#define pte_access_permitted pte_access_permitted
+static inline bool pte_access_permitted(pte_t pte, bool write)
+{
+   unsigned long pteval = pte_val(pte);
+   /*
+* A read-only access is controlled by _PAGE_USER bit.
+* We have _PAGE_READ set for WRITE and EXECUTE
+*/


Not fully right. asm/pte-common.h defines:

#define PAGE_NONE   __pgprot(_PAGE_BASE | _PAGE_NA)
#define PAGE_SHARED __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
#define PAGE_SHARED_X   __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | \
 _PAGE_EXEC)
#define PAGE_COPY   __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO)
#define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO | \
 _PAGE_EXEC)
#define PAGE_READONLY   __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO)
#define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RO | \
 _PAGE_EXEC)

On the 8xx, _PAGE_USER = 0


+   unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_USER;
+
+   if (write)
+   need_pte_bits |= _PAGE_WRITE;
+
+   if ((pteval & need_pte_bits) != need_pte_bits)
+   return false;


This test is not fully correct:
- To check access(read) permission, you also have to check that _PAGE_NA 
is not set.
- To check write permission, you also have to check that neither 
_PAGE_NA nor _PAGE_RO are set.


On the 8xx, you have:
_PAGE_RW = _PAGE_WRITE = 0
_PAGE_NA = 0x0200
_PAGE_RO = 0x0600

Christophe


+
+   return true;
+}
+
  /* Conversion functions: convert a page and protection to a page entry,
   * and a page entry and page directory to the page they refer to.
   *

[PATCH v5] powerpc/topology: Get topology for shared processors at boot

2018-08-17 Thread Srikar Dronamraju

On a shared lpar, Phyp will not update the cpu associativity at boot
time. Just after the boot system does recognize itself as a shared lpar and
trigger a request for correct cpu associativity. But by then the scheduler
would have already created/destroyed its sched domains.

This causes
- Broken load balance across Nodes causing islands of cores.
- Performance degradation esp if the system is lightly loaded
- dmesg to wrongly report all cpus to be in Node 0.
- Messages in dmesg saying borken topology.
- With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity
  node sched domain"), can cause rcu stalls at boot up.

>From a scheduler maintainer's perspective, moving cpus from one node to
another or creating more numa levels after boot is not appropriate
without some notification to the user space.
https://lore.kernel.org/lkml/20150406214558.ga38...@linux.vnet.ibm.com/T/#u

The sched_domains_numa_masks table which is used to generate cpumasks is
only created at boot time just before creating sched domains and never
updated.  Hence, its better to get the topology correct before the sched
domains are created.

For example on 64 core Power 8 shared lpar, dmesg reports

[2.088360] Brought up 512 CPUs
[2.088368] Node 0 CPUs: 0-511
[2.088371] Node 1 CPUs:
[2.088373] Node 2 CPUs:
[2.088375] Node 3 CPUs:
[2.088376] Node 4 CPUs:
[2.088378] Node 5 CPUs:
[2.088380] Node 6 CPUs:
[2.088382] Node 7 CPUs:
[2.088386] Node 8 CPUs:
[2.088388] Node 9 CPUs:
[2.088390] Node 10 CPUs:
[2.088392] Node 11 CPUs:
...
[3.916091] BUG: arch topology borken
[3.916103]  the DIE domain not a subset of the NUMA domain
[3.916105] BUG: arch topology borken
[3.916106]  the DIE domain not a subset of the NUMA domain
...

numactl/lscpu output will still be correct with cores spreading across
all nodes.

Socket(s): 64
NUMA node(s):  12
Model: 2.0 (pvr 004d 0200)
Model name:POWER8 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type:   para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
NUMA node4 CPU(s): 208-215,304-311,400-407,496-503
NUMA node5 CPU(s): 168-175,264-271,360-367,456-463
NUMA node6 CPU(s): 128-135,224-231,320-327,416-423
NUMA node7 CPU(s): 136-143,232-239,328-335,424-431
NUMA node8 CPU(s): 216-223,312-319,408-415,504-511
NUMA node9 CPU(s): 144-151,240-247,336-343,432-439
NUMA node10 CPU(s):152-159,248-255,344-351,440-447
NUMA node11 CPU(s):160-167,256-263,352-359,448-455

Currently on this lpar, the scheduler detects 2 levels of Numa and
created numa sched domains for all cpus, but it finds a single DIE
domain consisting of all cpus. Hence it deletes all numa sched domains.

To address this, detect the shared processor and update topology soon after
cpus are setup so that correct topology is updated just before scheduler
creates sched domain.

With the fix, dmesg reports

[0.491336] numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 
368-375 464-471
[0.491351] numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 
376-383 472-479
[0.491359] numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 
384-391 480-487
[0.491366] numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 
392-399 488-495
[0.491374] numa: Node 4 CPUs: 208-215 304-311 400-407 496-503
[0.491379] numa: Node 5 CPUs: 168-175 264-271 360-367 456-463
[0.491384] numa: Node 6 CPUs: 128-135 224-231 320-327 416-423
[0.491389] numa: Node 7 CPUs: 136-143 232-239 328-335 424-431
[0.491394] numa: Node 8 CPUs: 216-223 312-319 408-415 504-511
[0.491399] numa: Node 9 CPUs: 144-151 240-247 336-343 432-439
[0.491404] numa: Node 10 CPUs: 152-159 248-255 344-351 440-447
[0.491409] numa: Node 11 CPUs: 160-167 256-263 352-359 448-455

and lscpu would also report

Socket(s): 64
NUMA node(s):  12
Model: 2.0 (pvr 004d 0200)
Model name:POWER8 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type:   para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
NUMA node4 CPU(s): 208-215,304-311,400-407,496-503
NUMA node5 CPU(s): 168-175,264-271,360-367,456-463
NUMA node6 CPU(s): 128-135,224-231,320-327,416-423
NUMA node7 CPU(s):

[PATCH v5] powerpc/topology: Get topology for shared processors at boot

2018-08-17 Thread Srikar Dronamraju

On a shared lpar, Phyp will not update the cpu associativity at boot
time. Just after the boot system does recognize itself as a shared lpar and
trigger a request for correct cpu associativity. But by then the scheduler
would have already created/destroyed its sched domains.

This causes
- Broken load balance across Nodes causing islands of cores.
- Performance degradation esp if the system is lightly loaded
- dmesg to wrongly report all cpus to be in Node 0.
- Messages in dmesg saying borken topology.
- With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity
  node sched domain"), can cause rcu stalls at boot up.

>From a scheduler maintainer's perspective, moving cpus from one node to
another or creating more numa levels after boot is not appropriate
without some notification to the user space.
https://lore.kernel.org/lkml/20150406214558.ga38...@linux.vnet.ibm.com/T/#u

The sched_domains_numa_masks table which is used to generate cpumasks is
only created at boot time just before creating sched domains and never
updated.  Hence, its better to get the topology correct before the sched
domains are created.

For example on 64 core Power 8 shared lpar, dmesg reports

[2.088360] Brought up 512 CPUs
[2.088368] Node 0 CPUs: 0-511
[2.088371] Node 1 CPUs:
[2.088373] Node 2 CPUs:
[2.088375] Node 3 CPUs:
[2.088376] Node 4 CPUs:
[2.088378] Node 5 CPUs:
[2.088380] Node 6 CPUs:
[2.088382] Node 7 CPUs:
[2.088386] Node 8 CPUs:
[2.088388] Node 9 CPUs:
[2.088390] Node 10 CPUs:
[2.088392] Node 11 CPUs:
...
[3.916091] BUG: arch topology borken
[3.916103]  the DIE domain not a subset of the NUMA domain
[3.916105] BUG: arch topology borken
[3.916106]  the DIE domain not a subset of the NUMA domain
...

numactl/lscpu output will still be correct with cores spreading across
all nodes.

Socket(s): 64
NUMA node(s):  12
Model: 2.0 (pvr 004d 0200)
Model name:POWER8 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type:   para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
NUMA node4 CPU(s): 208-215,304-311,400-407,496-503
NUMA node5 CPU(s): 168-175,264-271,360-367,456-463
NUMA node6 CPU(s): 128-135,224-231,320-327,416-423
NUMA node7 CPU(s): 136-143,232-239,328-335,424-431
NUMA node8 CPU(s): 216-223,312-319,408-415,504-511
NUMA node9 CPU(s): 144-151,240-247,336-343,432-439
NUMA node10 CPU(s):152-159,248-255,344-351,440-447
NUMA node11 CPU(s):160-167,256-263,352-359,448-455

Currently on this lpar, the scheduler detects 2 levels of Numa and
created numa sched domains for all cpus, but it finds a single DIE
domain consisting of all cpus. Hence it deletes all numa sched domains.

To address this, detect the shared processor and update topology soon after
cpus are setup so that correct topology is updated just before scheduler
creates sched domain.

With the fix, dmesg reports

[0.491336] numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 
368-375 464-471
[0.491351] numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 
376-383 472-479
[0.491359] numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 
384-391 480-487
[0.491366] numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 
392-399 488-495
[0.491374] numa: Node 4 CPUs: 208-215 304-311 400-407 496-503
[0.491379] numa: Node 5 CPUs: 168-175 264-271 360-367 456-463
[0.491384] numa: Node 6 CPUs: 128-135 224-231 320-327 416-423
[0.491389] numa: Node 7 CPUs: 136-143 232-239 328-335 424-431
[0.491394] numa: Node 8 CPUs: 216-223 312-319 408-415 504-511
[0.491399] numa: Node 9 CPUs: 144-151 240-247 336-343 432-439
[0.491404] numa: Node 10 CPUs: 152-159 248-255 344-351 440-447
[0.491409] numa: Node 11 CPUs: 160-167 256-263 352-359 448-455

and lscpu would also report

Socket(s): 64
NUMA node(s):  12
Model: 2.0 (pvr 004d 0200)
Model name:POWER8 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type:   para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
NUMA node4 CPU(s): 208-215,304-311,400-407,496-503
NUMA node5 CPU(s): 168-175,264-271,360-367,456-463
NUMA node6 CPU(s): 128-135,224-231,320-327,416-423
NUMA node7 CPU(s):

[PATCH 1/3] dmaengine: fsldma: move spin_lock_bh to spin_lock in tasklet

2018-08-17 Thread Barry Song

as you are already in a tasklet, it is unnecessary to call spin_lock_bh.

Signed-off-by: Barry Song <21cn...@gmail.com>
---
 drivers/dma/fsldma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 1117b51..9d360a3 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -987,7 +987,7 @@ static void dma_do_tasklet(unsigned long data)
 
chan_dbg(chan, "tasklet entry\n");
 
-   spin_lock_bh(>desc_lock);
+   spin_lock(>desc_lock);
 
/* the hardware is now idle and ready for more */
chan->idle = true;
@@ -995,7 +995,7 @@ static void dma_do_tasklet(unsigned long data)
/* Run all cleanup for descriptors which have been completed */
fsldma_cleanup_descriptors(chan);
 
-   spin_unlock_bh(>desc_lock);
+   spin_unlock(>desc_lock);
 
chan_dbg(chan, "tasklet exit\n");
 }
-- 
2.7.4

Re: [PATCH] powerpc/traps: Avoid rate limit messages from show unhandled signals

2018-08-17 Thread Murilo Opsfelder Araujo

Hi, Michael.

On Fri, Aug 17, 2018 at 04:55:00PM +1000, Michael Ellerman wrote:
> In the recent commit to add an explicit ratelimit state when showing
> unhandled signals, commit 35a52a10c3ac ("powerpc/traps: Use an
> explicit ratelimit state for show_signal_msg()"), I put the check of
> show_unhandled_signals and the ratelimit state before the call to
> unhandled_signal() so as to avoid unnecessarily calling the latter
> when show_unhandled_signals is false.
> 
> However that causes us to check the ratelimit state on every call, so
> if we take a lot of *handled* signals that has the effect of making
> the ratelimit code print warnings that callbacks have been suppressed
> when they haven't.
> 
> So rearrange the code so that we check show_unhandled_signals first,
> then call unhandled_signal() and finally check the ratelimit state.
> 
> Signed-off-by: Michael Ellerman 

Nice catch.  Thanks for patching it.

Reviewed-by: Murilo Opsfelder Araujo 

> ---
>  arch/powerpc/kernel/traps.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 070e96f1773a..c85adb858271 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -315,22 +315,21 @@ void user_single_step_siginfo(struct task_struct *tsk,
>   info->si_addr = (void __user *)regs->nip;
>  }
>  
> -static bool show_unhandled_signals_ratelimited(void)
> +static void show_signal_msg(int signr, struct pt_regs *regs, int code,
> + unsigned long addr)
>  {
>   static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
> DEFAULT_RATELIMIT_BURST);
> - return show_unhandled_signals && __ratelimit();
> -}
>  
> -static void show_signal_msg(int signr, struct pt_regs *regs, int code,
> - unsigned long addr)
> -{
> - if (!show_unhandled_signals_ratelimited())
> + if (!show_unhandled_signals)
>   return;
>  
>   if (!unhandled_signal(current, signr))
>   return;
>  
> + if (!__ratelimit())
> + return;
> +
>   pr_info("%s[%d]: %s (%d) at %lx nip %lx lr %lx code %x",
>   current->comm, current->pid, signame(signr), signr,
>   addr, regs->nip, regs->link, code);
> -- 
> 2.17.1
> 

-- 
Murilo

Re: [PATCH 2/2] powerpc/process: Constify the number of insns printed by show instructions functions.

2018-08-17 Thread Murilo Opsfelder Araujo

Hi, Christophe.

On Tue, Aug 14, 2018 at 08:59:20AM +, Christophe Leroy wrote:
> instructions_to_print var is assigned value 16 and there is no
> way to change it.
> 
> This patch replaces it by a constant.
> 
> Signed-off-by: Christophe Leroy 

Reviewed-by: Murilo Opsfelder Araujo 

> ---
>  arch/powerpc/kernel/process.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index c722ce4ca1c0..6317f2ed04ab 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1259,17 +1259,16 @@ struct task_struct *__switch_to(struct task_struct 
> *prev,
>   return last;
>  }
>  
> -static int instructions_to_print = 16;
> +#define NR_INSN_TO_PRINT 16
>  
>  static void show_instructions(struct pt_regs *regs)
>  {
>   int i;
> - unsigned long pc = regs->nip - (instructions_to_print * 3 / 4 *
> - sizeof(int));
> + unsigned long pc = regs->nip - (NR_INSN_TO_PRINT * 3 / 4 * sizeof(int));
>  
>   printk("Instruction dump:");
>  
> - for (i = 0; i < instructions_to_print; i++) {
> + for (i = 0; i < NR_INSN_TO_PRINT; i++) {
>   int instr;
>  
>   if (!(i % 8))
> @@ -1306,9 +1305,9 @@ void show_user_instructions(struct pt_regs *regs)
>   char buf[96]; /* enough for 8 times 9 + 2 chars */
>   int l = 0;
>  
> - pc = regs->nip - (instructions_to_print * 3 / 4 * sizeof(int));
> + pc = regs->nip - (NR_INSN_TO_PRINT * 3 / 4 * sizeof(int));
>  
> - for (i = 0; i < instructions_to_print; i++) {
> + for (i = 0; i < NR_INSN_TO_PRINT; i++) {
>   int instr;
>  
>   if (!(i % 8) && (i > 0)) {
> -- 
> 2.13.3
> 

-- 
Murilo

Re: [PATCH 1/2] powerpc/process: fix nested output in show_user_instructions()

2018-08-17 Thread Murilo Opsfelder Araujo

Hi, Christophe.

On Tue, Aug 14, 2018 at 08:59:18AM +, Christophe Leroy wrote:
> When two processes crash at the same time, we sometimes encounter
> nesting in the middle of a line:
> 
> [4.365317] init[1]: segfault (11) at 0 nip 0 lr 0 code 1
> [4.370452] init[1]: code:    
> [4.372042] init[74]: segfault (11) at 10a74 nip 1000c198 lr 100078c8 code 
> 1 in sh[1000+14000]
> [4.386829]    
> [4.391542] init[1]: code:      
>   
> [4.400863] init[74]: code: 90010024 bf61000c 91490a7c 3fa01002 3be0 
> 7d3e4b78 3bbd0c20 3b60
> [4.409867] init[74]: code: 3b9d0040 7c7fe02e 2f83 419e0028 <8923> 
> 2f89 41be001c 4b7f6e79

My smoke test passed with the two patches.

Perhaps adding an output sample of how messages would look like after your patch
could be an enhancement to the commit message.

Otherwise:

Reviewed-by: Murilo Opsfelder Araujo 

> This patch fixes it by preparing complete lines in a buffer and
> printing it at once.
> 
> Fixes: 88b0fe1757359 ("powerpc: Add show_user_instructions()")
> Cc: Murilo Opsfelder Araujo 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/process.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 913c5725cdb2..c722ce4ca1c0 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1303,32 +1303,33 @@ void show_user_instructions(struct pt_regs *regs)
>  {
>   unsigned long pc;
>   int i;
> + char buf[96]; /* enough for 8 times 9 + 2 chars */
> + int l = 0;
>  
>   pc = regs->nip - (instructions_to_print * 3 / 4 * sizeof(int));
>  
> - pr_info("%s[%d]: code: ", current->comm, current->pid);
> -
>   for (i = 0; i < instructions_to_print; i++) {
>   int instr;
>  
>   if (!(i % 8) && (i > 0)) {
> - pr_cont("\n");
> - pr_info("%s[%d]: code: ", current->comm, current->pid);
> + pr_info("%s[%d]: code: %s\n", current->comm, 
> current->pid, buf);
> + l = 0;
>   }
>  
>   if (probe_kernel_address((unsigned int __user *)pc, instr)) {
> - pr_cont(" ");
> + l += sprintf(buf + l, " ");
>   } else {
>   if (regs->nip == pc)
> - pr_cont("<%08x> ", instr);
> + l += sprintf(buf + l, "<%08x> ", instr);
>   else
> - pr_cont("%08x ", instr);
> + l += sprintf(buf + l, "%08x ", instr);
>   }
>  
>   pc += sizeof(int);
>   }
>  
> - pr_cont("\n");
> + if (l)
> + pr_info("%s[%d]: code: %s\n", current->comm, current->pid, buf);
>  }
>  
>  struct regbit {
> -- 
> 2.13.3
> 

-- 
Murilo

Re: [PATCH v2 1/4] powerpc/tm: Remove msr_tm_active()

2018-08-17 Thread Breno Leitao

Hi Michael,

On 08/16/2018 09:49 PM, Michael Ellerman wrote:
> Michael Neuling  writes:
> 
>> On Mon, 2018-06-18 at 19:59 -0300, Breno Leitao wrote:
>>> Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if
>>> CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that
>>> returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set.
>>>
>>> This function is not necessary, since MSR_TM_ACTIVE() just do the same,
>>> checking for the TS bits and does not require any TM facility.
>>>
>>> This patchset remove every instance of msr_tm_active() and replaced it
>>> by MSR_TM_ACTIVE().
>>>
>>> Signed-off-by: Breno Leitao 
>>>
>>
>> Patch looks good... one minor nit below...
>>
>>>  
>>> -   if (!msr_tm_active(regs->msr) &&
>>> -   !current->thread.load_fp && !loadvec(current->thread))
>>> +   if (!current->thread.load_fp && !loadvec(current->thread)) {
>>> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>>> +   if (!MSR_TM_ACTIVE(regs->msr))
>>> +   return;
>>
>> Can you make a MSR_TM_ACTIVE() that returns false when
>> !CONFIG_PPC_TRANSACTIONAL_MEM. Then you don't need this inline #ifdef.
> 
> Is that safe?
> 
> I see ~50 callers of MSR_TM_ACTIVE(), are they all inside #ifdef TM ?

I checked all of them, and the only two that are not called inside a #ifdef
are at kvm/book3s_hv_tm.c. They are:

  kvm/book3s_hv_tm.c:   if (!MSR_TM_ACTIVE(msr)) {
  kvm/book3s_hv_tm.c:   if (MSR_TM_ACTIVE(msr) || !(vcpu->arch.texasr & 
TEXASR_FS)) {

All the others are being called inside the #ifdef

Other than that, I do not see why it would be a problem in the way I
implemented it, since it will return false for the two cases above, which
seems correct. Take a look on how the definition became:

  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
  #define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */
  #else
  #define MSR_TM_ACTIVE(x) 0
  #endif

I also tested it with different config files, and I didn't see any complain.
These are the platforms I built for.

* powernv_defconfig
* pseries_le_defconfig
* pseries_defconfig
* ppc64_defconfig
* ppc64e_defconfig
* pmac32_defconfig
* ppc44x_defconfig
* mpc85xx_smp_defconfig
* mpc85xx_defconfig
* ps3_defconfig

Anyway, if you have any other suggestion I can follow in order to guarantee
that I am not causing any regression, I would be happy. Touching these core
kernel macros is scary!

Thanks!

Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread David Hildenbrand

On 17.08.2018 13:28, Heiko Carstens wrote:
> On Fri, Aug 17, 2018 at 01:04:58PM +0200, David Hildenbrand wrote:
 If there are no objections, I'll go into that direction. But I'll wait
 for more comments regarding the general concept first.
>>>
>>> It is the middle of the merge window, and maintainers are really busy
>>> right now.  I doubt you will get many review comments just yet...
>>>
>>
>> This has been broken since 2015, so I guess it can wait a bit :)
> 
> I hope you figured out what needs to be locked why. Your patch description
> seems to be "only" about locking order ;)

Well I hope so, too ... but there is a reason for the RFC mark ;) There
is definitely a lot of magic in the current code. And that's why it is
also not that obvious that locking is wrong.

To avoid/fix the locking order problem was the motivation for the
original patch that dropped mem_hotplug_lock on one path. So I focused
on that in my description.

> 
> I tried to figure out and document that partially with 55adc1d05dca ("mm:
> add private lock to serialize memory hotplug operations"), and that wasn't
> easy to figure out. I was especially concerned about sprinkling

Haven't seen that so far as that was reworked by 3f906ba23689
("mm/memory-hotplug: switch locking to a percpu rwsem"). Thanks for the
pointer. There is a long history to all this.

> lock/unlock_device_hotplug() calls, which has the potential to make it the
> next BKL thing.

Well, the thing with memory hotplug and device_hotplug_lock is that

a) ACPI already holds it while adding/removing memory via add_memory()
b) we hold it during online/offline of memory (via sysfs calls to
   device_online()/device_offline())

So it is already pretty much involved in all memory hotplug/unplug
activities on x86 (except paravirt). And as far as I understand, there
are good reasons to hold the lock in core.c and ACPI. (as mentioned by
Rafael)

The exceptions are add_memory() called on s390x, hyper-v, xen and ppc
(including manual probing). And device_online()/device_offline() called
from the kernel.

Holding device_hotplug_lock when adding/removing memory from the system
doesn't sound too wrong (especially as devices are created/removed). At
least that way (documenting and following the rules in the patch
description) we might at least get locking right.

I am very open to other suggestions (but as noted by Greg, many
maintainers might be busy by know).

E.g. When adding the memory block devices, we know that there won't be a
driver to attach to (as there are no drivers for the "memory" subsystem)
- the bus_probe_device() function that takes the device_lock() could
pretty much be avoided for that case. But burying such special cases
down in core driver code definitely won't make locking related to memory
hotplug easier.

Thanks for having a look!

-- 

Thanks,

David / dhildenb

Re: [PATCH v7 4/9] powerpc/pseries: Define MCE error event section.

2018-08-17 Thread Mahesh Jagannath Salgaonkar

On 08/16/2018 09:44 AM, Michael Ellerman wrote:
> Mahesh Jagannath Salgaonkar  writes:
>> On 08/08/2018 08:12 PM, Michael Ellerman wrote:
> ...
>>>
 +  union {
 +  struct {
 +  uint8_t ue_err_type;
 +  /* 
 +   * X1: Permanent or Transient UE.
 +   *  X   1: Effective address provided.
 +   *   X  1: Logical address provided.
 +   *XX2: Reserved.
 +   *  XXX 3: Type of UE error.
 +   */
>>>
>>> But which bit is bit 0? And is that the LSB or MSB?
>>
>> RTAS errorlog data in BE format, the leftmost bit is MSB 0 (1: Permanent
>> or Transient UE.). I Will update the comment above that properly points
>> out which one is MSB 0.
>>
>>>
>>>
 +  uint8_t reserved_1[6];
 +  __be64  effective_address;
 +  __be64  logical_address;
 +  } ue_error;
 +  struct {
 +  uint8_t soft_err_type;
 +  /* 
 +   * X1: Effective address provided.
 +   *  X   5: Reserved.
 +   *   XX 2: Type of SLB/ERAT/TLB error.
 +   */
 +  uint8_t reserved_1[6];
 +  __be64  effective_address;
 +  uint8_t reserved_2[8];
 +  } soft_error;
 +  } u;
 +};
 +#pragma pack(pop)
>>>
>>> Why not __packed ?
>>
>> Because when used __packed it added 1 byte extra padding between
>> reserved_1[6] and effective_address. That caused wrong effective address
>> to be printed on the console. Hence I switched to #pragma pack to force
>> 1 byte alignment for this structure alone.
> 
> OK, that's weird.
> 
> Do we really need to bother with all the union stuff? The only
> difference is the field names, and whether logical address has a value

Also the bit fields for UE and other sub errors differ. Yeah but we can
do away with union stuff.

> or not. What about:
> 
> struct pseries_mc_errorlog {
>   __be32  fru_id;
>   __be32  proc_id;
>   u8  error_type;
>   u8  sub_error_type;
>   u8  reserved_1[6];
>   __be64  effective_address;
>   __be64  logical_address;
> } __packed;

Sure will do.

Thanks
-Mahesh.

> 
> cheers
>

Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread Heiko Carstens

On Fri, Aug 17, 2018 at 01:04:58PM +0200, David Hildenbrand wrote:
> >> If there are no objections, I'll go into that direction. But I'll wait
> >> for more comments regarding the general concept first.
> > 
> > It is the middle of the merge window, and maintainers are really busy
> > right now.  I doubt you will get many review comments just yet...
> > 
> 
> This has been broken since 2015, so I guess it can wait a bit :)

I hope you figured out what needs to be locked why. Your patch description
seems to be "only" about locking order ;)

I tried to figure out and document that partially with 55adc1d05dca ("mm:
add private lock to serialize memory hotplug operations"), and that wasn't
easy to figure out. I was especially concerned about sprinkling
lock/unlock_device_hotplug() calls, which has the potential to make it the
next BKL thing.

Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread David Hildenbrand

On 17.08.2018 12:06, Greg Kroah-Hartman wrote:
> On Fri, Aug 17, 2018 at 11:41:24AM +0200, David Hildenbrand wrote:
>> On 17.08.2018 11:03, Rafael J. Wysocki wrote:
>>> On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand  wrote:

 On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
>> From: Vitaly Kuznetsov 
>>
>> Well require to call add_memory()/add_memory_resource() with
>> device_hotplug_lock held, to avoid a lock inversion. Allow external 
>> modules
>> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
>> lock device hotplug.
>>
>> Signed-off-by: Vitaly Kuznetsov 
>> [modify patch description]
>> Signed-off-by: David Hildenbrand 
>> ---
>>  drivers/base/core.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index 04bbcd779e11..9010b9e942b5 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
>>  {
>>  mutex_lock(_hotplug_lock);
>>  }
>> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
>>
>>  void unlock_device_hotplug(void)
>>  {
>>  mutex_unlock(_hotplug_lock);
>>  }
>> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>
> If these are going to be "global" symbols, let's properly name them.
> device_hotplug_lock/unlock would be better.  But I am _really_ nervous
> about letting stuff outside of the driver core mess with this, as people
> better know what they are doing.

 The only "problem" is that we have kernel modules (for paravirtualized
 devices) that call add_memory(). This is Hyper-V right now, but we might
 have other ones in the future. Without them we would not have to export
 it. We might also get kernel modules that want to call remove_memory() -
 which will require the device_hotplug_lock as of now.

 What we could do is

 a) add_memory() -> _add_memory() and don't export it
 b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
 We export that one.
 c) Use add_memory() in external modules only

 Similar wrapper would be needed e.g. for remove_memory() later on.
>>>
>>> That would be safer IMO, as it would prevent developers from using
>>> add_memory() without the lock, say.
>>>
>>> If the lock is always going to be required for add_memory(), make it
>>> hard (or event impossible) to use the latter without it.
>>>
>>
>> If there are no objections, I'll go into that direction. But I'll wait
>> for more comments regarding the general concept first.
> 
> It is the middle of the merge window, and maintainers are really busy
> right now.  I doubt you will get many review comments just yet...
> 

This has been broken since 2015, so I guess it can wait a bit :)

-- 

Thanks,

David / dhildenb

Re: [PATCH v2 3/4] powerpc/mm: fix a warning when a cache is common to PGD and hugepages

2018-08-17 Thread Christophe LEROY





Le 17/08/2018 à 05:32, Aneesh Kumar K.V a écrit :

On 08/14/2018 08:24 PM, Christophe Leroy wrote:

While implementing TLB miss HW assistance on the 8xx, the following
warning was encountered:

[  423.732965] WARNING: CPU: 0 PID: 345 at mm/slub.c:2412 
___slab_alloc.constprop.30+0x26c/0x46c
[  423.733033] CPU: 0 PID: 345 Comm: mmap Not tainted 
4.18.0-rc8-00664-g2dfff9121c55 #671

[  423.733075] NIP:  c0108f90 LR: c0109ad0 CTR: 0004
[  423.733121] REGS: c455bba0 TRAP: 0700   Not tainted  
(4.18.0-rc8-00664-g2dfff9121c55)

[  423.733147] MSR:  00021032   CR: 24224848  XER: 2000
[  423.733319]
[  423.733319] GPR00: c0109ad0 c455bc50 c4521910 c60053c0 007080c0 
c0011b34 c7fa41e0 c455be30
[  423.733319] GPR08: 0001 c00103a0 c7fa41e0 c49afcc4 24282842 
10018840 c079b37c 0040
[  423.733319] GPR16: 73f0 00210d00  0001 c455a000 
0100 0200 c455a000
[  423.733319] GPR24: c60053c0 c0011b34 007080c0 c455a000 c455a000 
c7fa41e0  9032

[  423.734190] NIP [c0108f90] ___slab_alloc.constprop.30+0x26c/0x46c
[  423.734257] LR [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734283] Call Trace:
[  423.734326] [c455bc50] [0100] 0x100 (unreliable)
[  423.734430] [c455bcc0] [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734543] [c455bcf0] [c0011b34] huge_pte_alloc+0xc0/0x1dc
[  423.734633] [c455bd20] [c01044dc] hugetlb_fault+0x408/0x48c
[  423.734720] [c455bdb0] [c0104b20] follow_hugetlb_page+0x14c/0x44c
[  423.734826] [c455be10] [c00e8e54] __get_user_pages+0x1c4/0x3dc
[  423.734919] [c455be80] [c00e9924] __mm_populate+0xac/0x140
[  423.735020] [c455bec0] [c00db14c] vm_mmap_pgoff+0xb4/0xb8
[  423.735127] [c455bf00] [c00f27c0] ksys_mmap_pgoff+0xcc/0x1fc
[  423.735222] [c455bf40] [c000e0f8] ret_from_syscall+0x0/0x38
[  423.735271] Instruction dump:
[  423.735321] 7cbf482e 38fd0008 7fa6eb78 7fc4f378 4bfff5dd 7fe3fb78 
4bfffe24 81370010
[  423.735536] 71280004 41a2ff88 4840c571 4b80 <0fe0> 4bfffeb8 
81340010 712a0004

[  423.735757] ---[ end trace e9b222919a470790 ]---

This warning occurs when calling kmem_cache_zalloc() on a
cache having a constructor.

In this case it happens because PGD cache and 512k hugepte cache are
the same size (4k). While a cache with constructor is created for
the PGD, hugepages create cache without constructor and uses
kmem_cache_zalloc(). As both expect a cache with the same size,
the hugepages reuse the cache created for PGD, hence the conflict.

In order to avoid this conflict, this patch:
- modifies pgtable_cache_add() so that a zeroising constructor is
added for any cache size.
- replaces calls to kmem_cache_zalloc() by kmem_cache_alloc()



Can't we just do kmem_cache_alloc with gfp flags __GFP_ZERO? and remove 
the constructor completely?


I don't understand what you mean. That's exactly what I did in v1 (by 
using kmem_cache_zalloc()), and you commented that doing this we would 
zeroise at allocation whereas the constructors are called when adding 
memory to the slab and when freeing the allocated block. Or did I 
misunderstood your comment ?


static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags)
{
return kmem_cache_alloc(k, flags | __GFP_ZERO);
}

Christophe





-aneesh

[RFC 13/15] PCI: turn pcibios_alloc_irq into a callback

2018-08-17 Thread Arnd Bergmann

Weak functions are a bit confusing, and we can better deal with
this using a callback function. pcibios_free_irq() is actually
completely unused, but it seems better to treat it the same way
as the allocation, unless we want to remove it completely.

Signed-off-by: Arnd Bergmann 
---
 arch/arm64/kernel/pci.c  | 16 +++-
 drivers/pci/pci-driver.c | 13 +++--
 include/linux/pci.h  |  2 ++
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index 0e2ea1c78542..3d196c68e362 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -22,19 +22,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_ACPI
-/*
- * Try to assign the IRQ number when probing a new device
- */
-int pcibios_alloc_irq(struct pci_dev *dev)
-{
-   if (!acpi_disabled)
-   acpi_pci_irq_enable(dev);
-
-   return 0;
-}
-#endif
-
 /*
  * raw_pci_read/write - Platform-specific PCI config space access.
  */
@@ -93,6 +80,9 @@ int pcibios_root_bridge_prepare(struct pci_host_bridge 
*bridge)
 
ACPI_COMPANION_SET(>dev, adev);
set_dev_node(bus_dev, acpi_get_node(acpi_device_handle(adev)));
+
+   /* Try to assign the IRQ number when probing a new device */
+   bridge->alloc_irq = acpi_pci_irq_enable;
}
 
return 0;
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index bef17c3fca67..c96bc7bd56da 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -387,13 +387,22 @@ static int __pci_device_probe(struct pci_driver *drv, 
struct pci_dev *pci_dev)
return error;
 }
 
-int __weak pcibios_alloc_irq(struct pci_dev *dev)
+int pcibios_alloc_irq(struct pci_dev *dev)
 {
+   struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+
+   if (bridge->alloc_irq)
+   return bridge->alloc_irq(dev);
+
return 0;
 }
 
-void __weak pcibios_free_irq(struct pci_dev *dev)
+void pcibios_free_irq(struct pci_dev *dev)
 {
+   struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+
+   if (bridge->free_irq)
+   bridge->free_irq(dev);
 }
 
 #ifdef CONFIG_PCI_IOV
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d1072690cb4f..1296d9fcc5da 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -476,6 +476,8 @@ struct pci_host_bridge {
int (*map_irq)(const struct pci_dev *, u8, u8);
void (*release_fn)(struct pci_host_bridge *);
void (*bus_add_device)(struct pci_dev *pdev);
+   int (*alloc_irq)(struct pci_dev *);
+   int (*free_irq)(struct pci_dev *);
void*release_data;
struct msi_controller *msi;
unsigned intignore_reset_delay:1;   /* For entire hierarchy */
-- 
2.18.0

[RFC 06/15] powerpc/pci: fold pci_create_root_bus into pcibios_scan_phb

2018-08-17 Thread Arnd Bergmann

This slightly simplifies the pcibios_scan_phb() implementation, and
gives us an easier point to add further fields in the pci_host_bridge
structure.

I tried removing fields that are duplicated between pci_host_bridge
and pci_controller (which really serve the same purpose), but
ran into the problem that we can't call pci_alloc_host_bridge()
as early as pcibios_alloc_controller(). Some more refactoring
is needed for that, but it could noticably clean the powerpc code
up more.

Signed-off-by: Arnd Bergmann 
---
 arch/powerpc/include/asm/pci-bridge.h |  3 ++
 arch/powerpc/kernel/pci-common.c  | 72 ++-
 2 files changed, 30 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 94d449031b18..42ae567084d9 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -51,8 +51,11 @@ struct pci_controller_ops {
 
 /*
  * Structure of a PCI controller (host bridge)
+ * Some members here are duplicated in struct pci_host_bridge
+ * and should be moved there.
  */
 struct pci_controller {
+   struct pci_host_bridge *bridge;
struct pci_bus *bus;
char is_dynamic;
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 57ca621a32f4..096011ec8670 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1587,81 +1587,63 @@ struct device_node *pcibios_get_phb_of_node(struct 
pci_bus *bus)
return of_node_get(hose->dn);
 }
 
-static struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
-   struct pci_ops *ops, void *sysdata, struct list_head *resources)
-{
-   int error;
-   struct pci_host_bridge *bridge;
-
-   bridge = pci_alloc_host_bridge(0);
-   if (!bridge)
-   return NULL;
-
-   bridge->dev.parent = parent;
-
-   list_splice_init(resources, >windows);
-   bridge->sysdata = sysdata;
-   bridge->busnr = bus;
-   bridge->ops = ops;
-
-   error = pci_register_host_bridge(bridge);
-   if (error < 0)
-   goto err_out;
-
-   return bridge->bus;
-
-err_out:
-   kfree(bridge);
-   return NULL;
-}
-
 /**
  * pci_scan_phb - Given a pci_controller, setup and scan the PCI bus
  * @hose: Pointer to the PCI host controller instance structure
  */
 void pcibios_scan_phb(struct pci_controller *hose)
 {
-   LIST_HEAD(resources);
-   struct pci_bus *bus;
struct device_node *node = hose->dn;
int mode;
+   struct pci_host_bridge *bridge;
+   int error;
 
pr_debug("PCI: Scanning PHB %pOF\n", node);
 
+   /* The allocation should ideally be done in pcibios_alloc_controller(),
+* but pci_alloc_host_bridge() requires slab to work first */
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   return;
+
/* Get some IO space for the new PHB */
pcibios_setup_phb_io_space(hose);
 
/* Wire up PHB bus resources */
-   pcibios_setup_phb_resources(hose, );
+   pcibios_setup_phb_resources(hose, >windows);
 
hose->busn.start = hose->first_busno;
hose->busn.end   = hose->last_busno;
hose->busn.flags = IORESOURCE_BUS;
-   pci_add_resource(, >busn);
+   pci_add_resource(>windows, >busn);
+
+   bridge->dev.parent = hose->parent;
+   bridge->sysdata = hose;
+   bridge->busnr = hose->first_busno;
+   bridge->ops = hose->ops;
 
-   /* Create an empty bus for the toplevel */
-   bus = pci_create_root_bus(hose->parent, hose->first_busno,
- hose->ops, hose, );
-   if (bus == NULL) {
+   error = pci_register_host_bridge(bridge);
+   if (error < 0) {
pr_err("Failed to create bus for PCI domain %04x\n",
hose->global_number);
-   pci_free_resource_list();
+   pci_free_host_bridge(bridge);
return;
}
-   hose->bus = bus;
+   hose->bridge = bridge;
+   hose->bus = bridge->bus;
 
/* Get probe mode and perform scan */
mode = PCI_PROBE_NORMAL;
if (node && hose->controller_ops.probe_mode)
-   mode = hose->controller_ops.probe_mode(bus);
+   mode = hose->controller_ops.probe_mode(bridge->bus);
pr_debug("probe mode: %d\n", mode);
if (mode == PCI_PROBE_DEVTREE)
-   of_scan_bus(node, bus);
+   of_scan_bus(node, bridge->bus);
 
if (mode == PCI_PROBE_NORMAL) {
-   pci_bus_update_busn_res_end(bus, 255);
-   hose->last_busno = pci_scan_child_bus(bus);
-   pci_bus_update_busn_res_end(bus, hose->last_busno);
+   pci_bus_update_busn_res_end(bridge->bus, 255);
+   hose->last_busno = pci_scan_child_bus(bridge->bus);
+   pci_bus_update_busn_res_end(bridge->bus,

[RFC 12/15] PCI: make pcibios_bus_add_device() a callback function

2018-08-17 Thread Arnd Bergmann

Weak functions are confusion, and we can now add callback pointers
to pci host bridges for any controller, so let's make this one
a callback rather than a __weak global function.

Signed-off-by: Arnd Bergmann 
---
 arch/powerpc/kernel/pci-common.c  | 7 +--
 arch/sh/drivers/pci/pci.c | 1 +
 arch/sh/drivers/pci/pcie-sh7786.c | 3 ++-
 arch/sh/include/asm/pci.h | 2 ++
 drivers/pci/bus.c | 8 +++-
 include/linux/pci.h   | 2 +-
 6 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 096011ec8670..afc9598e4349 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -270,12 +270,6 @@ int pcibios_sriov_disable(struct pci_dev *pdev)
 
 #endif /* CONFIG_PCI_IOV */
 
-void pcibios_bus_add_device(struct pci_dev *pdev)
-{
-   if (ppc_md.pcibios_bus_add_device)
-   ppc_md.pcibios_bus_add_device(pdev);
-}
-
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
 {
 #ifdef CONFIG_PPC64
@@ -1617,6 +1611,7 @@ void pcibios_scan_phb(struct pci_controller *hose)
hose->busn.flags = IORESOURCE_BUS;
pci_add_resource(>windows, >busn);
 
+   bridge->bus_add_device = ppc_md->pcibios_bus_add_device;
bridge->dev.parent = hose->parent;
bridge->sysdata = hose;
bridge->busnr = hose->first_busno;
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index 8256626bc53c..d5c01a86cde1 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -65,6 +65,7 @@ static void pcibios_scanbus(struct pci_channel *hose)
bridge->ops = hose->pci_ops;
bridge->swizzle_irq = pci_common_swizzle;
bridge->map_irq = pcibios_map_platform_irq;
+   bridge->bus_add_device = hose->bus_add_device;
 
ret = pci_scan_root_bus_bridge(bridge);
if (ret) {
diff --git a/arch/sh/drivers/pci/pcie-sh7786.c 
b/arch/sh/drivers/pci/pcie-sh7786.c
index 3d81a8b80942..ab78356681a0 100644
--- a/arch/sh/drivers/pci/pcie-sh7786.c
+++ b/arch/sh/drivers/pci/pcie-sh7786.c
@@ -122,6 +122,7 @@ extern struct pci_ops sh7786_pci_ops;
.reg_base   = start,\
.mem_offset = 0,\
.io_offset  = 0,\
+   .bus_add_device = sh7786_pci_bus_add_device,\
 }
 
 static struct pci_channel sh7786_pci_channels[] = {
@@ -488,7 +489,7 @@ int pcibios_map_platform_irq(const struct pci_dev *pdev, u8 
slot, u8 pin)
 return evt2irq(0xae0);
 }
 
-void pcibios_bus_add_device(struct pci_dev *pdev)
+static void sh7786_pci_bus_add_device(struct pci_dev *pdev)
 {
pdev->dev.dma_pfn_offset = dma_pfn_offset;
 }
diff --git a/arch/sh/include/asm/pci.h b/arch/sh/include/asm/pci.h
index 10a36b1cf2ea..e605141cbcbe 100644
--- a/arch/sh/include/asm/pci.h
+++ b/arch/sh/include/asm/pci.h
@@ -35,6 +35,8 @@ struct pci_channel {
/* Optional error handling */
struct timer_list   err_timer, serr_timer;
unsigned interr_irq, serr_irq;
+
+   void (*bus_add_device)(struct pci_dev *pdev);
 };
 
 /* arch/sh/drivers/pci/pci.c */
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 5cb40b2518f9..45873ac1a49c 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -299,7 +299,13 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx)
 
 void __weak pcibios_resource_survey_bus(struct pci_bus *bus) { }
 
-void __weak pcibios_bus_add_device(struct pci_dev *pdev) { }
+static void pcibios_bus_add_device(struct pci_dev *pdev)
+{
+   struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
+
+   if (bridge->bus_add_device)
+   bridge->bus_add_device(pdev);
+}
 
 /**
  * pci_bus_add_device - start driver for a single device
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1dd8a3ecf753..d1072690cb4f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -475,6 +475,7 @@ struct pci_host_bridge {
u8 (*swizzle_irq)(struct pci_dev *, u8 *); /* Platform IRQ swizzler */
int (*map_irq)(const struct pci_dev *, u8, u8);
void (*release_fn)(struct pci_host_bridge *);
+   void (*bus_add_device)(struct pci_dev *pdev);
void*release_data;
struct msi_controller *msi;
unsigned intignore_reset_delay:1;   /* For entire hierarchy */
@@ -880,7 +881,6 @@ extern struct list_head pci_root_buses; /* List of all 
known PCI buses */
 int no_pci_devices(void);
 
 void pcibios_resource_survey_bus(struct pci_bus *bus);
-void pcibios_bus_add_device(struct pci_dev *pdev);
 void pcibios_add_bus(struct pci_bus *bus);
 void pcibios_remove_bus(struct pci_bus *bus);
 void pcibios_fixup_bus(struct pci_bus *);
-- 
2.18.0

[RFC 15/15] PCI: make pcibios_add_bus/remove_bus callbacks

2018-08-17 Thread Arnd Bergmann

These are mostly not architecture specific but are meant for particular
PCI host bridge implementations, in particular for the ACPI version.

Turn them both into callback functions that are implemented by the
APCI PCI implementation as well as the one architecture that overrides
pcibios_remove_bus.

Signed-off-by: Arnd Bergmann 
---
 arch/arm64/kernel/pci.c | 10 --
 arch/ia64/pci/pci.c | 10 --
 arch/s390/pci/pci.c |  3 ++-
 arch/x86/pci/common.c   | 10 --
 drivers/acpi/pci_root.c |  2 ++
 drivers/pci/probe.c | 12 ++--
 include/linux/pci.h |  2 ++
 7 files changed, 16 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index 8958a7c32a9f..99fac25efe88 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -191,14 +191,4 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root 
*root)
return bus;
 }
 
-void pcibios_add_bus(struct pci_bus *bus)
-{
-   acpi_pci_add_bus(bus);
-}
-
-void pcibios_remove_bus(struct pci_bus *bus)
-{
-   acpi_pci_remove_bus(bus);
-}
-
 #endif
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 511b8a058d80..f47e0920d308 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -367,16 +367,6 @@ void pcibios_fixup_bus(struct pci_bus *b)
platform_pci_fixup_bus(b);
 }
 
-void pcibios_add_bus(struct pci_bus *bus)
-{
-   acpi_pci_add_bus(bus);
-}
-
-void pcibios_remove_bus(struct pci_bus *bus)
-{
-   acpi_pci_remove_bus(bus);
-}
-
 void pcibios_set_master (struct pci_dev *dev)
 {
/* No special bus mastering setup handling */
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index b21205f131ce..120beb83b6a5 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -769,7 +769,7 @@ static void zpci_free_domain(struct zpci_dev *zdev)
spin_unlock(_domain_lock);
 }
 
-void pcibios_remove_bus(struct pci_bus *bus)
+static void zpci_remove_bus(struct pci_bus *bus)
 {
struct zpci_dev *zdev = get_zdev_by_bus(bus);
 
@@ -801,6 +801,7 @@ static struct pci_bus *pci_scan_root_bus(struct device 
*parent, int bus,
bridge->sysdata = sysdata;
bridge->busnr = bus;
bridge->ops = ops;
+   bridge->remove_bus = zpci_remove_bus;
 
error = pci_scan_root_bus_bridge(bridge);
if (error < 0)
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 920d0885434c..987e6fefd5d3 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -168,16 +168,6 @@ void pcibios_fixup_bus(struct pci_bus *b)
pcibios_fixup_device_resources(dev);
 }
 
-void pcibios_add_bus(struct pci_bus *bus)
-{
-   acpi_pci_add_bus(bus);
-}
-
-void pcibios_remove_bus(struct pci_bus *bus)
-{
-   acpi_pci_remove_bus(bus);
-}
-
 /*
  * Only use DMI information to set this if nothing was passed
  * on the kernel command line (which was parsed earlier).
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 5da0f70c4e65..cf7a9a7bf1e7 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -911,6 +911,8 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root 
*root,
bridge->busnr = busnum;
bridge->ops = ops->pci_ops;
bridge->prepare = acpi_pci_root_bridge_prepare;
+   bridge->add_bus = acpi_pci_add_bus;
+   bridge->remove_bus = acpi_pci_remove_bus;
pci_set_host_bridge_release(bridge, acpi_pci_root_release_info,
info);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index f493d7e299e6..86a678fa8c13 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2905,12 +2905,20 @@ unsigned int pci_scan_child_bus(struct pci_bus *bus)
 }
 EXPORT_SYMBOL_GPL(pci_scan_child_bus);
 
-void __weak pcibios_add_bus(struct pci_bus *bus)
+void pcibios_add_bus(struct pci_bus *bus)
 {
+   struct pci_host_bridge *bridge = pci_find_host_bridge(bus);
+
+   if (bridge->add_bus)
+   bridge->add_bus(bus);
 }
 
-void __weak pcibios_remove_bus(struct pci_bus *bus)
+void pcibios_remove_bus(struct pci_bus *bus)
 {
+   struct pci_host_bridge *bridge = pci_find_host_bridge(bus);
+
+   if (bridge->remove_bus)
+   bridge->remove_bus(bus);
 }
 
 int pci_host_probe(struct pci_host_bridge *bridge)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 24216daef6f8..bc9635313747 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -479,6 +479,8 @@ struct pci_host_bridge {
void (*bus_add_device)(struct pci_dev *pdev);
int (*alloc_irq)(struct pci_dev *);
int (*free_irq)(struct pci_dev *);
+   void (*add_bus)(struct pci_bus *);
+   void (*remove_bus)(struct pci_bus *);
void*release_data;
struct msi_controller *msi;
unsigned intignore_reset_delay:1;   /* For entire hierarchy */
-- 
2.18.0

[RFC 09/15] PCI: xenfront: clean up pcifront_scan_root()

2018-08-17 Thread Arnd Bergmann

Merging pci_scan_root_bus() into pcifront_scan_root() simplifies
the implementation and makes it more readable. We can allocate
the pcifront_sd structure along with the bridge structure, which
helps manage its lifetime rules so we don't free it before the
device has been released.

There are two small issues that I noticed that could be improved:

- It seems we unregister the 'bus' device that is a child of the
  'pci_host_bridge' device after we unregister its parent in
  pcifront_free_roots(), which seems odd.
- We probably don't need an extra pci_bus_entry list at all,
  but could instead walk the children of the pcifront_device,
  which are all pci_host_bridge devices.

Signed-off-by: Arnd Bergmann 
---
 drivers/pci/xen-pcifront.c | 67 --
 1 file changed, 21 insertions(+), 46 deletions(-)

diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index 24070e1c5f22..a5eb6cb02bec 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -443,40 +443,12 @@ static int pcifront_scan_bus(struct pcifront_device *pdev,
return 0;
 }
 
-static struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
-   struct pci_ops *ops, void *sysdata, struct list_head *resources)
-{
-   struct pci_host_bridge *bridge;
-   int error;
-
-   bridge = pci_alloc_host_bridge(0);
-   if (!bridge)
-   return NULL;
-
-   list_splice_init(resources, >windows);
-   bridge->dev.parent = parent;
-   bridge->sysdata = sysdata;
-   bridge->busnr = bus;
-   bridge->ops = ops;
-
-   error = pci_scan_root_bus_bridge(bridge);
-   if (error < 0)
-   goto err_out;
-
-   return bridge->bus;
-
-err_out:
-   kfree(bridge);
-   return NULL;
-}
-
 static int pcifront_scan_root(struct pcifront_device *pdev,
 unsigned int domain, unsigned int bus)
 {
-   struct pci_bus *b;
-   LIST_HEAD(resources);
struct pcifront_sd *sd = NULL;
struct pci_bus_entry *bus_entry = NULL;
+   struct pci_host_bridge *bridge;
int err = 0;
static struct resource busn_res = {
.start = 0,
@@ -498,50 +470,55 @@ static int pcifront_scan_root(struct pcifront_device 
*pdev,
dev_info(>xdev->dev, "Creating PCI Frontend Bus %04x:%02x\n",
 domain, bus);
 
+   bridge = pci_alloc_host_bridge(sizeof(*sd));
+   if (!bridge)
+   return -ENOMEM;
+
bus_entry = kzalloc(sizeof(*bus_entry), GFP_KERNEL);
-   sd = kzalloc(sizeof(*sd), GFP_KERNEL);
-   if (!bus_entry || !sd) {
+   sd = pci_host_bridge_priv(bridge);
+   if (!bus_entry) {
err = -ENOMEM;
goto err_out;
}
-   pci_add_resource(, _resource);
-   pci_add_resource(, _resource);
-   pci_add_resource(, _res);
+   pci_add_resource(>windows, _resource);
+   pci_add_resource(>windows, _resource);
+   pci_add_resource(>windows, _res);
pcifront_init_sd(sd, domain, bus, pdev);
+   bridge->dev.parent = >xdev->dev;
+   bridge->sysdata = sd;
+   bridge->busnr = bus;
+   bridge->ops = _bus_ops;
 
pci_lock_rescan_remove();
 
-   b = pci_scan_root_bus(>xdev->dev, bus,
- _bus_ops, sd, );
-   if (!b) {
+   err = pci_scan_root_bus_bridge(bridge);
+   if (err < 0) {
dev_err(>xdev->dev,
"Error creating PCI Frontend Bus!\n");
-   err = -ENOMEM;
pci_unlock_rescan_remove();
-   pci_free_resource_list();
goto err_out;
}
 
-   bus_entry->bus = b;
+   bus_entry->bus = bridge->bus;
 
list_add(_entry->list, >root_buses);
 
/* pci_scan_root_bus skips devices which do not have a
* devfn==0. The pcifront_scan_bus enumerates all devfn. */
-   err = pcifront_scan_bus(pdev, domain, bus, b);
+   err = pcifront_scan_bus(pdev, domain, bus, bridge->bus);
 
/* Claim resources before going "live" with our devices */
-   pci_walk_bus(b, pcifront_claim_resource, pdev);
+   pci_walk_bus(bridge->bus, pcifront_claim_resource, pdev);
 
/* Create SysFS and notify udev of the devices. Aka: "going live" */
-   pci_bus_add_devices(b);
+   pci_bus_add_devices(bridge->bus);
 
pci_unlock_rescan_remove();
return err;
 
 err_out:
+   pci_free_host_bridge(bridge);
kfree(bus_entry);
-   kfree(sd);
 
return err;
 }
@@ -605,8 +582,6 @@ static void pcifront_free_roots(struct pcifront_device 
*pdev)
 
free_root_bus_devs(bus_entry->bus);
 
-   kfree(bus_entry->bus->sysdata);
-
device_unregister(bus_entry->bus->bridge);
pci_remove_bus(bus_entry->bus);
 
-- 
2.18.0

[RFC 05/15] PCI: move pci_create_root_bus into callers

2018-08-17 Thread Arnd Bergmann

There are only seven remaining callers of the old pci_scan_root_bus()
interface. Since we want to expose the pci_host_bridge structure
everywhere and discourage users from calling the old interfaces, let's
move the implementation into the respective callsites.

While this duplicates the source code, it makes the object code smaller
for almost all users by avoiding the global implementation, and it allows
further cleanup of the callers.

Signed-off-by: Arnd Bergmann 
---
 arch/powerpc/kernel/pci-common.c| 28 +++
 arch/sparc/kernel/pci.c | 28 +++
 drivers/acpi/pci_root.c | 30 -
 drivers/parisc/dino.c   | 28 +++
 drivers/parisc/lba_pci.c| 28 +++
 drivers/pci/controller/pci-hyperv.c | 28 +++
 drivers/pci/controller/vmd.c| 30 -
 drivers/pci/probe.c | 29 
 include/linux/pci.h |  3 ---
 9 files changed, 198 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 88e4f69a09e5..57ca621a32f4 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1587,6 +1587,34 @@ struct device_node *pcibios_get_phb_of_node(struct 
pci_bus *bus)
return of_node_get(hose->dn);
 }
 
+static struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
+   struct pci_ops *ops, void *sysdata, struct list_head *resources)
+{
+   int error;
+   struct pci_host_bridge *bridge;
+
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   return NULL;
+
+   bridge->dev.parent = parent;
+
+   list_splice_init(resources, >windows);
+   bridge->sysdata = sysdata;
+   bridge->busnr = bus;
+   bridge->ops = ops;
+
+   error = pci_register_host_bridge(bridge);
+   if (error < 0)
+   goto err_out;
+
+   return bridge->bus;
+
+err_out:
+   kfree(bridge);
+   return NULL;
+}
+
 /**
  * pci_scan_phb - Given a pci_controller, setup and scan the PCI bus
  * @hose: Pointer to the PCI host controller instance structure
diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index 17ea16a1337c..afbce59d9231 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -691,6 +691,34 @@ static void pci_claim_bus_resources(struct pci_bus *bus)
pci_claim_bus_resources(child_bus);
 }
 
+static struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
+   struct pci_ops *ops, void *sysdata, struct list_head *resources)
+{
+   int error;
+   struct pci_host_bridge *bridge;
+
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   return NULL;
+
+   bridge->dev.parent = parent;
+
+   list_splice_init(resources, >windows);
+   bridge->sysdata = sysdata;
+   bridge->busnr = bus;
+   bridge->ops = ops;
+
+   error = pci_register_host_bridge(bridge);
+   if (error < 0)
+   goto err_out;
+
+   return bridge->bus;
+
+err_out:
+   kfree(bridge);
+   return NULL;
+}
+
 struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
 struct device *parent)
 {
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 7433035ded95..85dbcf47015b 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -873,6 +873,34 @@ static void acpi_pci_root_release_info(struct 
pci_host_bridge *bridge)
__acpi_pci_root_release_info(bridge->release_data);
 }
 
+static struct pci_bus *acpi_pci_create_root_bus(struct device *parent, int bus,
+   struct pci_ops *ops, void *sysdata, struct list_head *resources)
+{
+   int error;
+   struct pci_host_bridge *bridge;
+
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   return NULL;
+
+   bridge->dev.parent = parent;
+
+   list_splice_init(resources, >windows);
+   bridge->sysdata = sysdata;
+   bridge->busnr = bus;
+   bridge->ops = ops;
+
+   error = pci_register_host_bridge(bridge);
+   if (error < 0)
+   goto err_out;
+
+   return bridge->bus;
+
+err_out:
+   kfree(bridge);
+   return NULL;
+}
+
 struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
 struct acpi_pci_root_ops *ops,
 struct acpi_pci_root_info *info,
@@ -902,7 +930,7 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root 
*root,
 
pci_acpi_root_add_resources(info);
pci_add_resource(>resources, >secondary);
-   bus = pci_create_root_bus(NULL, busnum, ops->pci_ops,
+   bus = acpi_pci_create_root_bus(NULL, busnum, ops->pci_ops,
  sysdata, >resources);

[RFC 01/15] PCI: clean up legacy host bridge scan functions

2018-08-17 Thread Arnd Bergmann

Aside from the modern pci_host_bridge based interfaces, we have a couple
of interfaces from old times that are still used in a couple of platforms:
pci_create_root_bus(), pci_scan_bus() and pci_scan_root_bus().

As a first step towards getting everybody to use the new interfaces,
this simplifies the latter two to call the pci_alloc_host_bridge() and
pci_register_host_bridge()/pci_scan_root_bus_bridge() interfaces directly.

The behavior should be entirely unchanged here, but we can then push
down the functions into the individual host implementations.

Signed-off-by: Arnd Bergmann 
---
 drivers/pci/probe.c | 86 -
 1 file changed, 45 insertions(+), 41 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ec784009a36b..b0f666271245 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -23,13 +23,6 @@
 #define CARDBUS_LATENCY_TIMER  176 /* secondary latency timer */
 #define CARDBUS_RESERVE_BUSNR  3
 
-static struct resource busn_resource = {
-   .name   = "PCI busn",
-   .start  = 0,
-   .end= 255,
-   .flags  = IORESOURCE_BUS,
-};
-
 /* Ugh.  Need to stop exporting this to modules. */
 LIST_HEAD(pci_root_buses);
 EXPORT_SYMBOL(pci_root_buses);
@@ -3060,53 +3053,64 @@ EXPORT_SYMBOL(pci_scan_root_bus_bridge);
 struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
struct pci_ops *ops, void *sysdata, struct list_head *resources)
 {
-   struct resource_entry *window;
-   bool found = false;
-   struct pci_bus *b;
-   int max;
-
-   resource_list_for_each_entry(window, resources)
-   if (window->res->flags & IORESOURCE_BUS) {
-   found = true;
-   break;
-   }
+   struct pci_host_bridge *bridge;
+   int error;
 
-   b = pci_create_root_bus(parent, bus, ops, sysdata, resources);
-   if (!b)
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
return NULL;
 
-   if (!found) {
-   dev_info(>dev,
-"No busn resource found for root bus, will use [bus 
%02x-ff]\n",
-   bus);
-   pci_bus_insert_busn_res(b, bus, 255);
-   }
+   list_splice_init(resources, >windows);
+   bridge->dev.parent = parent;
+   bridge->sysdata = sysdata;
+   bridge->busnr = bus;
+   bridge->ops = ops;
 
-   max = pci_scan_child_bus(b);
+   error = pci_scan_root_bus_bridge(bridge);
+   if (error < 0)
+   goto err_out;
 
-   if (!found)
-   pci_bus_update_busn_res_end(b, max);
+   return bridge->bus;
 
-   return b;
+err_out:
+   kfree(bridge);
+   return NULL;
 }
 EXPORT_SYMBOL(pci_scan_root_bus);
 
+static struct resource busn_resource = {
+   .name   = "PCI busn",
+   .start  = 0,
+   .end= 255,
+   .flags  = IORESOURCE_BUS,
+};
+
 struct pci_bus *pci_scan_bus(int bus, struct pci_ops *ops,
void *sysdata)
 {
-   LIST_HEAD(resources);
-   struct pci_bus *b;
+   struct pci_host_bridge *bridge;
+   int error;
 
-   pci_add_resource(, _resource);
-   pci_add_resource(, _resource);
-   pci_add_resource(, _resource);
-   b = pci_create_root_bus(NULL, bus, ops, sysdata, );
-   if (b) {
-   pci_scan_child_bus(b);
-   } else {
-   pci_free_resource_list();
-   }
-   return b;
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   goto err;
+
+   pci_add_resource(>windows, _resource);
+   pci_add_resource(>windows, _resource);
+   pci_add_resource(>windows, _resource);
+   bridge->sysdata = sysdata;
+   bridge->busnr = bus;
+   bridge->ops = ops;
+
+   error = pci_scan_root_bus_bridge(bridge);
+   if (error < 0)
+   goto err;
+
+   return bridge->bus;
+
+err:
+   pci_free_host_bridge(bridge);
+   return NULL;
 }
 EXPORT_SYMBOL(pci_scan_bus);
 
-- 
2.18.0

[RFC 04/15] PCI: export pci_register_host_bridge

2018-08-17 Thread Arnd Bergmann

There are a couple of users of the old pci_create_root_bus() interface,
which calls pci_register_host_bridge() without actually scanning the bus.

In order to get those callers a little closer to the current method
of separating the allocation and probing of the host bridge, this
exports the internal interface to modules. If all the callers can
get moved over to pci_host_probe() or pci_scan_root_bus_bridge()
later, the export can be removed again.

Signed-off-by: Arnd Bergmann 
---
 drivers/pci/probe.c | 25 -
 include/linux/pci.h |  1 +
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index cf169742c03e..5ca7d5941ad0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -762,7 +762,29 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus)
dev_set_msi_domain(>dev, d);
 }
 
-static int pci_register_host_bridge(struct pci_host_bridge *bridge)
+/*
+ * pci_register_host_bridge() - Register a host bridge without scanning
+ *
+ * @bridge: a newly allocated host bridge structure
+ *
+ * This is the core part of bringing up a new PCI host bridge,
+ * before we scan for attached devices and register them as
+ * pci_dev.
+ *
+ * For the most part, this is an implementation detail of the
+ * pci_host_probe() interface, which brings up the entire bus,
+ * bus some older platforms still call it directly and manually
+ * scan for devices.
+ *
+ * If your driver uses this, try to convert it to using
+ * pci_host_probe() instead.
+ *
+ * Return: zero on suggess, or a negative error code.
+ * Note: after pci_register_host_bridge() successfully returns,
+ * the pci_host_bridge device is alive in driver core, and must
+ * not be freed directly.
+ */
+int pci_register_host_bridge(struct pci_host_bridge *bridge)
 {
struct device *parent = bridge->dev.parent;
struct resource_entry *window, *n;
@@ -877,6 +899,7 @@ static int pci_register_host_bridge(struct pci_host_bridge 
*bridge)
kfree(bus);
return err;
 }
+EXPORT_SYMBOL_GPL(pci_register_host_bridge);
 
 static bool pci_bridge_child_ext_cfg_accessible(struct pci_dev *bridge)
 {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d226e06fb5e5..e1337148cf9f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -909,6 +909,7 @@ struct pci_bus *pci_create_root_bus(struct device *parent, 
int bus,
struct pci_ops *ops, void *sysdata,
struct list_head *resources);
 int pci_host_probe(struct pci_host_bridge *bridge);
+int pci_register_host_bridge(struct pci_host_bridge *);
 int pci_bus_insert_busn_res(struct pci_bus *b, int bus, int busmax);
 int pci_bus_update_busn_res_end(struct pci_bus *b, int busmax);
 void pci_bus_release_busn_res(struct pci_bus *b);
-- 
2.18.0

[RFC 00/15] PCI: turn some __weak functions into callbacks

2018-08-17 Thread Arnd Bergmann

Hi Bjorn and others,

Triggered by Christoph's patches, I had another go at converting
all of the remaining pci host bridge implementations to be based
on pci_alloc_host_bridge and a separate registration function.

This is made possible through work from Lorenzo and others to
convert many of the existing drivers, as well as the removal
of some of the older architectures that nobody used.

I'm adding a bit of duplication into the less maintained code
here, but it makes everything more consistent, and gives an
easy place to hook up callback functions etc.

The three parts of this series are:

a) push up the registration into the callers (this is where
   code gets added)
b) clean up some of the more common host bridge
   implementations again to integrate that code better.
   This could be done for the rest as well, or we could just
   leave them alone.
c) start moving the __weak functions into callbacks in
   pci_host_bridge. This is intentionally incomplete, since
   it is a lot of work to do it for all those functions,
   and I want to get consensus on the approach first, as well
   as maybe get other developers to help out with the rest.

Please have a look.

   Arnd

[1] https://lore.kernel.org/lkml/4288331.jNpl6KXlNO@wuerfel/
[2] https://patchwork.kernel.org/patch/10555657/

Arnd Bergmann (15):
  PCI: clean up legacy host bridge scan functions
  PCI: move pci_scan_bus into callers
  PCI: move pci_scan_root_bus into callers
  PCI: export pci_register_host_bridge
  PCI: move pci_create_root_bus into callers
  powerpc/pci: fold pci_create_root_bus into pcibios_scan_phb
  PCI/ACPI: clean up acpi_pci_root_create()
  x86: PCI: clean up pcibios_scan_root()
  PCI: xenfront: clean up pcifront_scan_root()
  sparc/PCI: simplify pci_scan_one_pbm
  PCI: hyperv: convert to pci_scan_root_bus_bridge
  PCI: make pcibios_bus_add_device() a callback function
  PCI: turn pcibios_alloc_irq into a callback
  PCI: make pcibios_root_bridge_prepare a callback
  PCI: make pcibios_add_bus/remove_bus callbacks

 arch/arm64/kernel/pci.c   |  40 ++-
 arch/ia64/pci/pci.c   |  25 +
 arch/ia64/sn/kernel/io_init.c |  27 +
 arch/microblaze/pci/pci-common.c  |  27 +
 arch/powerpc/include/asm/pci-bridge.h |   3 +
 arch/powerpc/kernel/pci-common.c  |  60 +--
 arch/s390/pci/pci.c   |  30 +-
 arch/sh/drivers/pci/pci.c |   1 +
 arch/sh/drivers/pci/pcie-sh7786.c |   3 +-
 arch/sh/include/asm/pci.h |   2 +
 arch/sparc/kernel/pci.c   |  40 ---
 arch/sparc/kernel/pcic.c  |  35 ++
 arch/x86/pci/acpi.c   |  15 +--
 arch/x86/pci/common.c |  42 
 arch/xtensa/kernel/pci.c  |  27 +
 drivers/acpi/pci_root.c   |  43 +---
 drivers/parisc/dino.c |  28 +
 drivers/parisc/lba_pci.c  |  28 +
 drivers/pci/bus.c |   8 +-
 drivers/pci/controller/pci-hyperv.c   |  47 
 drivers/pci/controller/vmd.c  |  30 +-
 drivers/pci/hotplug/ibmphp_core.c |  35 ++
 drivers/pci/pci-driver.c  |  13 ++-
 drivers/pci/probe.c   | 150 +-
 drivers/pci/xen-pcifront.c|  40 +++
 include/linux/acpi.h  |   2 +
 include/linux/pci.h   |  17 ++-
 27 files changed, 514 insertions(+), 304 deletions(-)

-- 
2.18.0

[RFC 10/15] sparc/PCI: simplify pci_scan_one_pbm

2018-08-17 Thread Arnd Bergmann

We no longer need a separate pci_create_root_bus() function, and
merging it into pci_scan_one_pbm() makes the implementation easier
to understand.

A possible future cleanup would move the allocation of the
pci_host_bridge structure into the callers of pci_scan_one_pbm,
and avoid duplication between pci_host_bridge and pci_pbm_info
fields.

Signed-off-by: Arnd Bergmann 
---
 arch/sparc/kernel/pci.c | 62 ++---
 1 file changed, 21 insertions(+), 41 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index afbce59d9231..0d34fb2ac55b 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -691,70 +691,50 @@ static void pci_claim_bus_resources(struct pci_bus *bus)
pci_claim_bus_resources(child_bus);
 }
 
-static struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
-   struct pci_ops *ops, void *sysdata, struct list_head *resources)
+struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
+struct device *parent)
 {
-   int error;
+   struct device_node *node = pbm->op->dev.of_node;
struct pci_host_bridge *bridge;
+   int ret;
 
bridge = pci_alloc_host_bridge(0);
if (!bridge)
return NULL;
 
bridge->dev.parent = parent;
+   bridge->sysdata = pbm;
+   bridge->busnr = pbm->pci_first_busno;
+   bridge->ops = pbm->pci_ops;
 
-   list_splice_init(resources, >windows);
-   bridge->sysdata = sysdata;
-   bridge->busnr = bus;
-   bridge->ops = ops;
-
-   error = pci_register_host_bridge(bridge);
-   if (error < 0)
-   goto err_out;
-
-   return bridge->bus;
-
-err_out:
-   kfree(bridge);
-   return NULL;
-}
-
-struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
-struct device *parent)
-{
-   LIST_HEAD(resources);
-   struct device_node *node = pbm->op->dev.of_node;
-   struct pci_bus *bus;
-
-   printk("PCI: Scanning PBM %s\n", node->full_name);
-
-   pci_add_resource_offset(, >io_space,
+   pci_add_resource_offset(>windows, >io_space,
pbm->io_offset);
-   pci_add_resource_offset(, >mem_space,
+   pci_add_resource_offset(>windows, >mem_space,
pbm->mem_offset);
if (pbm->mem64_space.flags)
-   pci_add_resource_offset(, >mem64_space,
+   pci_add_resource_offset(>windows, >mem64_space,
pbm->mem64_offset);
pbm->busn.start = pbm->pci_first_busno;
pbm->busn.end   = pbm->pci_last_busno;
pbm->busn.flags = IORESOURCE_BUS;
-   pci_add_resource(, >busn);
-   bus = pci_create_root_bus(parent, pbm->pci_first_busno, pbm->pci_ops,
- pbm, );
-   if (!bus) {
+   pci_add_resource(>windows, >busn);
+
+   printk("PCI: Scanning PBM %s\n", node->full_name);
+   ret = pci_register_host_bridge(bridge);
+   if (!ret) {
printk(KERN_ERR "Failed to create bus for %s\n",
   node->full_name);
-   pci_free_resource_list();
+   pci_free_host_bridge(bridge);
return NULL;
}
 
-   pci_of_scan_bus(pbm, node, bus);
-   pci_bus_register_of_sysfs(bus);
+   pci_of_scan_bus(pbm, node, bridge->bus);
+   pci_bus_register_of_sysfs(bridge->bus);
 
-   pci_claim_bus_resources(bus);
+   pci_claim_bus_resources(bridge->bus);
 
-   pci_bus_add_devices(bus);
-   return bus;
+   pci_bus_add_devices(bridge->bus);
+   return bridge->bus;
 }
 
 int pcibios_enable_device(struct pci_dev *dev, int mask)
-- 
2.18.0

[RFC 03/15] PCI: move pci_scan_root_bus into callers

2018-08-17 Thread Arnd Bergmann

There are only six remaining callers of the old pci_scan_root_bus()
interface. Since we want to expose the pci_host_bridge structure
everywhere and discourage users from calling the old interfaces, let's
move the implementation into the respective callsites.

While this duplicates the source code, it makes the object code smaller
for almost all users by avoiding the global implementation, and it allows
further cleanup of the callers.

Signed-off-by: Arnd Bergmann 
---
 arch/ia64/sn/kernel/io_init.c| 27 +++
 arch/microblaze/pci/pci-common.c | 27 +++
 arch/s390/pci/pci.c  | 27 +++
 arch/x86/pci/common.c| 27 +++
 arch/xtensa/kernel/pci.c | 27 +++
 drivers/pci/probe.c  | 28 
 drivers/pci/xen-pcifront.c   | 27 +++
 include/linux/pci.h  |  3 ---
 8 files changed, 162 insertions(+), 31 deletions(-)

diff --git a/arch/ia64/sn/kernel/io_init.c b/arch/ia64/sn/kernel/io_init.c
index d63809a6adfa..e768702a7b45 100644
--- a/arch/ia64/sn/kernel/io_init.c
+++ b/arch/ia64/sn/kernel/io_init.c
@@ -213,6 +213,33 @@ sn_io_slot_fixup(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(sn_io_slot_fixup);
 
+static struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
+   struct pci_ops *ops, void *sysdata, struct list_head *resources)
+{
+   struct pci_host_bridge *bridge;
+   int error;
+
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   return NULL;
+
+   list_splice_init(resources, >windows);
+   bridge->dev.parent = parent;
+   bridge->sysdata = sysdata;
+   bridge->busnr = bus;
+   bridge->ops = ops;
+
+   error = pci_scan_root_bus_bridge(bridge);
+   if (error < 0)
+   goto err_out;
+
+   return bridge->bus;
+
+err_out:
+   kfree(bridge);
+   return NULL;
+}
+
 /*
  * sn_pci_controller_fixup() - This routine sets up a bus's resources
  *consistent with the Linux PCI abstraction layer.
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index f34346d56095..302071385e1b 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -977,6 +977,33 @@ static void pcibios_setup_phb_resources(struct 
pci_controller *hose,
 (unsigned long)hose->io_base_virt - _IO_BASE);
 }
 
+static struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
+   struct pci_ops *ops, void *sysdata, struct list_head *resources)
+{
+   struct pci_host_bridge *bridge;
+   int error;
+
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   return NULL;
+
+   list_splice_init(resources, >windows);
+   bridge->dev.parent = parent;
+   bridge->sysdata = sysdata;
+   bridge->busnr = bus;
+   bridge->ops = ops;
+
+   error = pci_scan_root_bus_bridge(bridge);
+   if (error < 0)
+   goto err_out;
+
+   return bridge->bus;
+
+err_out:
+   kfree(bridge);
+   return NULL;
+}
+
 static void pcibios_scan_phb(struct pci_controller *hose)
 {
LIST_HEAD(resources);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 9f6f392a4461..b21205f131ce 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -786,6 +786,33 @@ void pcibios_remove_bus(struct pci_bus *bus)
kfree(zdev);
 }
 
+static struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
+   struct pci_ops *ops, void *sysdata, struct list_head *resources)
+{
+   struct pci_host_bridge *bridge;
+   int error;
+
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   return NULL;
+
+   list_splice_init(resources, >windows);
+   bridge->dev.parent = parent;
+   bridge->sysdata = sysdata;
+   bridge->busnr = bus;
+   bridge->ops = ops;
+
+   error = pci_scan_root_bus_bridge(bridge);
+   if (error < 0)
+   goto err_out;
+
+   return bridge->bus;
+
+err_out:
+   kfree(bridge);
+   return NULL;
+}
+
 static int zpci_scan_bus(struct zpci_dev *zdev)
 {
LIST_HEAD(resources);
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index d4ec117c1142..e740d9aa4024 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -453,6 +453,33 @@ void __init dmi_check_pciprobe(void)
dmi_check_system(pciprobe_dmi_table);
 }
 
+static struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
+   struct pci_ops *ops, void *sysdata, struct list_head *resources)
+{
+   struct pci_host_bridge *bridge;
+   int error;
+
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   return NULL;
+
+   list_splice_init(resources, >windows);
+   bridge->dev.parent = parent;
+   bridge->sysdata =

[RFC 14/15] PCI: make pcibios_root_bridge_prepare a callback

2018-08-17 Thread Arnd Bergmann

pcibios_root_bridge_prepare() is always used as a per host bridge
function, not per architecture.

Making it a callback in the pci_host_bridge instead lets the host
bridge implementation easily override it, and avoids the checks
in the architecture for which host bridge implementation is being
used.

Alternatively, we could probably just call the pcibios_root_bridge_prepare
after alloc_pci_host_bridge() here and get rid of it as a generic
interface altogether, but doing that has a slightly higher chance
of breaking something subtle.

Signed-off-by: Arnd Bergmann 
---
 arch/arm64/kernel/pci.c  | 18 --
 arch/ia64/pci/pci.c  | 15 ---
 arch/powerpc/kernel/pci-common.c |  9 +
 arch/x86/pci/acpi.c  | 15 ---
 drivers/acpi/pci_root.c  |  1 +
 drivers/pci/probe.c  | 28 
 include/linux/acpi.h |  2 ++
 include/linux/pci.h  |  3 +--
 8 files changed, 37 insertions(+), 54 deletions(-)

diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index 3d196c68e362..8958a7c32a9f 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -71,19 +71,17 @@ int acpi_pci_bus_find_domain_nr(struct pci_bus *bus)
return root->segment;
 }
 
-int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
+int acpi_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
 {
-   if (!acpi_disabled) {
-   struct pci_config_window *cfg = bridge->bus->sysdata;
-   struct acpi_device *adev = to_acpi_device(cfg->parent);
-   struct device *bus_dev = >bus->dev;
+   struct pci_config_window *cfg = bridge->bus->sysdata;
+   struct acpi_device *adev = to_acpi_device(cfg->parent);
+   struct device *bus_dev = >bus->dev;
 
-   ACPI_COMPANION_SET(>dev, adev);
-   set_dev_node(bus_dev, acpi_get_node(acpi_device_handle(adev)));
+   ACPI_COMPANION_SET(>dev, adev);
+   set_dev_node(bus_dev, acpi_get_node(acpi_device_handle(adev)));
 
-   /* Try to assign the IRQ number when probing a new device */
-   bridge->alloc_irq = acpi_pci_irq_enable;
-   }
+   /* Try to assign the IRQ number when probing a new device */
+   bridge->alloc_irq = acpi_pci_irq_enable;
 
return 0;
 }
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 7ccc64d5fe3e..511b8a058d80 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -308,18 +308,11 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root 
*root)
>common, >controller);
 }
 
-int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
+int acpi_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
 {
-   /*
-* We pass NULL as parent to pci_create_root_bus(), so if it is not NULL
-* here, pci_create_root_bus() has been called by someone else and
-* sysdata is likely to be different from what we expect.  Let it go in
-* that case.
-*/
-   if (!bridge->dev.parent) {
-   struct pci_controller *controller = bridge->bus->sysdata;
-   ACPI_COMPANION_SET(>dev, controller->companion);
-   }
+   struct pci_controller *controller = bridge->bus->sysdata;
+   ACPI_COMPANION_SET(>dev, controller->companion);
+
return 0;
 }
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index afc9598e4349..5e5c6dd7ebe8 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -771,14 +771,6 @@ int pci_proc_domain(struct pci_bus *bus)
return 1;
 }
 
-int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
-{
-   if (ppc_md.pcibios_root_bridge_prepare)
-   return ppc_md.pcibios_root_bridge_prepare(bridge);
-
-   return 0;
-}
-
 /* This header fixup will do the resource fixup for all devices as they are
  * probed, but not for bridge ranges
  */
@@ -1612,6 +1604,7 @@ void pcibios_scan_phb(struct pci_controller *hose)
pci_add_resource(>windows, >busn);
 
bridge->bus_add_device = ppc_md->pcibios_bus_add_device;
+   bridge->prepare = ppc_md->pcibios_root_bridge_prepare;
bridge->dev.parent = hose->parent;
bridge->sysdata = hose;
bridge->busnr = hose->first_busno;
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 5559dcaddd5e..041b2003707c 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -382,18 +382,11 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root 
*root)
return bus;
 }
 
-int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
+int acpi_pci_root_bridge_prepare(struct pci_host_bridge *bridge)
 {
-   /*
-* We pass NULL as parent to pci_create_root_bus(), so if it is not NULL
-* here, pci_create_root_bus() has been called by someone else and
-* sysdata is likely to be different from

[RFC 11/15] PCI: hyperv: convert to pci_scan_root_bus_bridge

2018-08-17 Thread Arnd Bergmann

create_root_hv_pci_bus() uses a rather generic method of probing the host
bridge, which can be simplified by just calling pci_scan_root_bus_bridge()
after setting up the pci_host_bridge structure.

Since we can no longer assign hbus->pci_bus in the middle, I just remove
that member completely and use the pci_host_bridge instead.

Ideally we'd convert it to pci_host_probe() for simplicity, but
that is a bit different and I could not easily test it. Using
pci_scan_root_bus_bridge should not change the behavior at all.

Signed-off-by: Arnd Bergmann 
---
 drivers/pci/controller/pci-hyperv.c | 75 +++--
 1 file changed, 28 insertions(+), 47 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index df7cddea8e30..49586aefa38b 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -443,7 +443,7 @@ struct hv_pcibus_device {
struct resource *high_mmio_res;
struct completion *survey_event;
struct completion remove_event;
-   struct pci_bus *pci_bus;
+   struct pci_host_bridge *bridge;
spinlock_t config_lock; /* Avoid two threads writing index page */
spinlock_t device_list_lock;/* Protect lists below */
void __iomem *cfg_addr;
@@ -1457,34 +1457,6 @@ static void prepopulate_bars(struct hv_pcibus_device 
*hbus)
spin_unlock_irqrestore(>device_list_lock, flags);
 }
 
-static struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
-   struct pci_ops *ops, void *sysdata, struct list_head *resources)
-{
-   int error;
-   struct pci_host_bridge *bridge;
-
-   bridge = pci_alloc_host_bridge(0);
-   if (!bridge)
-   return NULL;
-
-   bridge->dev.parent = parent;
-
-   list_splice_init(resources, >windows);
-   bridge->sysdata = sysdata;
-   bridge->busnr = bus;
-   bridge->ops = ops;
-
-   error = pci_register_host_bridge(bridge);
-   if (error < 0)
-   goto err_out;
-
-   return bridge->bus;
-
-err_out:
-   kfree(bridge);
-   return NULL;
-}
-
 /**
  * create_root_hv_pci_bus() - Expose a new root PCI bus
  * @hbus:  Root PCI bus, as understood by this driver
@@ -1493,25 +1465,34 @@ static struct pci_bus *pci_create_root_bus(struct 
device *parent, int bus,
  */
 static int create_root_hv_pci_bus(struct hv_pcibus_device *hbus)
 {
-   /* Register the device */
-   hbus->pci_bus = pci_create_root_bus(>hdev->device,
-   0, /* bus number is always zero */
-   _pcifront_ops,
-   >sysdata,
-   >resources_for_children);
-   if (!hbus->pci_bus)
-   return -ENODEV;
+   struct pci_host_bridge *bridge;
+   int ret;
+
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   return -ENOMEM;
 
-   hbus->pci_bus->msi = >msi_chip;
-   hbus->pci_bus->msi->dev = >hdev->device;
+   hbus->bridge = bridge;
+   bridge->dev.parent = >hdev->device;
+   list_splice_init(>resources_for_children, >windows);
+   bridge->sysdata = >sysdata;
+   bridge->ops = _pcifront_ops;
+   bridge->msi = >msi_chip;
+   bridge->msi->dev = >hdev->device;
 
pci_lock_rescan_remove();
-   pci_scan_child_bus(hbus->pci_bus);
-   pci_bus_assign_resources(hbus->pci_bus);
-   pci_bus_add_devices(hbus->pci_bus);
-   pci_unlock_rescan_remove();
+   /* ideally we should use pci_host_probe here */
+   ret = pci_scan_root_bus_bridge(bridge);
+   if (ret < 0) {
+   pci_free_host_bridge(bridge);
+   goto error;
+   }
+   pci_bus_assign_resources(bridge->bus);
+   pci_bus_add_devices(bridge->bus);
hbus->state = hv_pcibus_installed;
-   return 0;
+error:
+   pci_unlock_rescan_remove();
+   return ret;
 }
 
 struct q_res_req_compl {
@@ -1769,7 +1750,7 @@ static void pci_devices_present_work(struct work_struct 
*work)
 * because there may have been changes.
 */
pci_lock_rescan_remove();
-   pci_scan_child_bus(hbus->pci_bus);
+   pci_scan_child_bus(hbus->bridge->bus);
pci_unlock_rescan_remove();
break;
 
@@ -2669,8 +2650,8 @@ static int hv_pci_remove(struct hv_device *hdev)
if (hbus->state == hv_pcibus_installed) {
/* Remove the bus from PCI's point of view. */
pci_lock_rescan_remove();
-   pci_stop_root_bus(hbus->pci_bus);
-   pci_remove_root_bus(hbus->pci_bus);
+   pci_stop_root_bus(hbus->bridge->bus);
+   pci_remove_root_bus(hbus->bridge->bus);
pci_unlock_rescan_remove();
hbus->state = hv_pcibus_removed;
}
-- 
2.18.0

[RFC 07/15] PCI/ACPI: clean up acpi_pci_root_create()

2018-08-17 Thread Arnd Bergmann

The acpi_pci_create_root_bus() can be fully integrated into
acpi_pci_root_create(), improving a few things:

* We can call pci_scan_root_bus_bridge(), which registers and
  scans the bridge in one step.
* After a failure in pci_register_host_bridge(), we correctly
  clean up the resources.
* The bridge settings (release function, flags, operations etc)
  can get set up before registering the bridge.
* Further cleanup would be possible, removing duplication between
  pci_host_bridge and some ACPI structures.

Signed-off-by: Arnd Bergmann 
---
 drivers/acpi/pci_root.c | 68 +++--
 1 file changed, 24 insertions(+), 44 deletions(-)

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 85dbcf47015b..5f73de3b67c8 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -873,34 +873,6 @@ static void acpi_pci_root_release_info(struct 
pci_host_bridge *bridge)
__acpi_pci_root_release_info(bridge->release_data);
 }
 
-static struct pci_bus *acpi_pci_create_root_bus(struct device *parent, int bus,
-   struct pci_ops *ops, void *sysdata, struct list_head *resources)
-{
-   int error;
-   struct pci_host_bridge *bridge;
-
-   bridge = pci_alloc_host_bridge(0);
-   if (!bridge)
-   return NULL;
-
-   bridge->dev.parent = parent;
-
-   list_splice_init(resources, >windows);
-   bridge->sysdata = sysdata;
-   bridge->busnr = bus;
-   bridge->ops = ops;
-
-   error = pci_register_host_bridge(bridge);
-   if (error < 0)
-   goto err_out;
-
-   return bridge->bus;
-
-err_out:
-   kfree(bridge);
-   return NULL;
-}
-
 struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
 struct acpi_pci_root_ops *ops,
 struct acpi_pci_root_info *info,
@@ -909,8 +881,7 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root 
*root,
int ret, busnum = root->secondary.start;
struct acpi_device *device = root->device;
int node = acpi_get_node(device->handle);
-   struct pci_bus *bus;
-   struct pci_host_bridge *host_bridge;
+   struct pci_host_bridge *bridge;
 
info->root = root;
info->bridge = device;
@@ -930,30 +901,39 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root 
*root,
 
pci_acpi_root_add_resources(info);
pci_add_resource(>resources, >secondary);
-   bus = acpi_pci_create_root_bus(NULL, busnum, ops->pci_ops,
- sysdata, >resources);
-   if (!bus)
+
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
goto out_release_info;
 
-   host_bridge = to_pci_host_bridge(bus->bridge);
+   list_splice_init(>resources, >windows);
+   bridge->sysdata = sysdata;
+   bridge->busnr = busnum;
+   bridge->ops = ops->pci_ops;
+   pci_set_host_bridge_release(bridge, acpi_pci_root_release_info,
+   info);
+
if (!(root->osc_control_set & OSC_PCI_EXPRESS_NATIVE_HP_CONTROL))
-   host_bridge->native_pcie_hotplug = 0;
+   bridge->native_pcie_hotplug = 0;
if (!(root->osc_control_set & OSC_PCI_SHPC_NATIVE_HP_CONTROL))
-   host_bridge->native_shpc_hotplug = 0;
+   bridge->native_shpc_hotplug = 0;
if (!(root->osc_control_set & OSC_PCI_EXPRESS_AER_CONTROL))
-   host_bridge->native_aer = 0;
+   bridge->native_aer = 0;
if (!(root->osc_control_set & OSC_PCI_EXPRESS_PME_CONTROL))
-   host_bridge->native_pme = 0;
+   bridge->native_pme = 0;
if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
-   host_bridge->native_ltr = 0;
+   bridge->native_ltr = 0;
+
+   ret = pci_scan_root_bus_bridge(bridge);
+   if (ret < 0)
+   goto out_release_bridge;
 
-   pci_scan_child_bus(bus);
-   pci_set_host_bridge_release(host_bridge, acpi_pci_root_release_info,
-   info);
if (node != NUMA_NO_NODE)
-   dev_printk(KERN_DEBUG, >dev, "on NUMA node %d\n", node);
-   return bus;
+   dev_printk(KERN_DEBUG, >bus->dev, "on NUMA node %d\n", 
node);
+   return bridge->bus;
 
+out_release_bridge:
+   pci_free_host_bridge(bridge);
 out_release_info:
__acpi_pci_root_release_info(info);
return NULL;
-- 
2.18.0

[RFC 08/15] x86: PCI: clean up pcibios_scan_root()

2018-08-17 Thread Arnd Bergmann

pcibios_scan_root() is now just a wrapper around pci_scan_root_bus(),
and merging the two into one makes it shorter and more readable.

We can also take advantage of pci_alloc_host_bridge() doing the
allocation of the sysdata for us, which helps if we ever want to
allow hot-unplugging the host bridge itself.

We might be able to simplify it further using pci_host_probe(),
but I wasn't sure about the resource registration there.

Signed-off-by: Arnd Bergmann 
---
 arch/x86/pci/common.c | 53 ++-
 1 file changed, 17 insertions(+), 36 deletions(-)

diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index e740d9aa4024..920d0885434c 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -453,54 +453,35 @@ void __init dmi_check_pciprobe(void)
dmi_check_system(pciprobe_dmi_table);
 }
 
-static struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
-   struct pci_ops *ops, void *sysdata, struct list_head *resources)
+void pcibios_scan_root(int busnum)
 {
+   struct pci_sysdata *sd;
struct pci_host_bridge *bridge;
int error;
 
-   bridge = pci_alloc_host_bridge(0);
-   if (!bridge)
-   return NULL;
+   bridge = pci_alloc_host_bridge(sizeof(sd));
+   if (!bridge) {
+   printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", busnum);
+   return;
+   }
+   sd = pci_host_bridge_priv(bridge);
 
-   list_splice_init(resources, >windows);
-   bridge->dev.parent = parent;
-   bridge->sysdata = sysdata;
-   bridge->busnr = bus;
-   bridge->ops = ops;
+   sd->node = x86_pci_root_bus_node(busnum);
+   x86_pci_root_bus_resources(busnum, >windows);
+   bridge->sysdata = sd;
+   bridge->busnr = busnum;
+   bridge->ops = _root_ops;
 
+   printk(KERN_DEBUG "PCI: Probing PCI hardware (bus %02x)\n", busnum);
error = pci_scan_root_bus_bridge(bridge);
if (error < 0)
goto err_out;
 
-   return bridge->bus;
+   pci_bus_add_devices(bridge->bus);
+   return;
 
 err_out:
-   kfree(bridge);
-   return NULL;
-}
-
-void pcibios_scan_root(int busnum)
-{
-   struct pci_bus *bus;
-   struct pci_sysdata *sd;
-   LIST_HEAD(resources);
-
-   sd = kzalloc(sizeof(*sd), GFP_KERNEL);
-   if (!sd) {
-   printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", busnum);
-   return;
-   }
-   sd->node = x86_pci_root_bus_node(busnum);
-   x86_pci_root_bus_resources(busnum, );
-   printk(KERN_DEBUG "PCI: Probing PCI hardware (bus %02x)\n", busnum);
-   bus = pci_scan_root_bus(NULL, busnum, _root_ops, sd, );
-   if (!bus) {
-   pci_free_resource_list();
-   kfree(sd);
-   return;
-   }
-   pci_bus_add_devices(bus);
+   pci_free_host_bridge(bridge);
 }
 
 void __init pcibios_set_cache_line_size(void)
-- 
2.18.0

[RFC 02/15] PCI: move pci_scan_bus into callers

2018-08-17 Thread Arnd Bergmann

There are only two remaining callers of the old pci_scan_bus()
interface. Since we want to expose the pci_host_bridge structure
everywhere and discourage users from calling the old interfaces,
let's move the implementation into the respective callsites.

While this duplicates the source code, it makes the object code
smaller for all users by avoiding the global implementation,
and it allows further cleanup of the two callers.

Signed-off-by: Arnd Bergmann 
---
 arch/sparc/kernel/pcic.c  | 35 ++
 drivers/pci/hotplug/ibmphp_core.c | 35 ++
 drivers/pci/probe.c   | 36 ---
 include/linux/pci.h   |  1 -
 4 files changed, 70 insertions(+), 37 deletions(-)

diff --git a/arch/sparc/kernel/pcic.c b/arch/sparc/kernel/pcic.c
index ee4c9a9a171c..0197b80fe590 100644
--- a/arch/sparc/kernel/pcic.c
+++ b/arch/sparc/kernel/pcic.c
@@ -387,6 +387,41 @@ int __init pcic_probe(void)
return 0;
 }
 
+static struct resource busn_resource = {
+   .name   = "PCI busn",
+   .start  = 0,
+   .end= 255,
+   .flags  = IORESOURCE_BUS,
+};
+
+static struct pci_bus *pci_scan_bus(int bus, struct pci_ops *ops,
+   void *sysdata)
+{
+   struct pci_host_bridge *bridge;
+   int error;
+
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   return NULL;
+
+   pci_add_resource(>windows, _resource);
+   pci_add_resource(>windows, _resource);
+   pci_add_resource(>windows, _resource);
+   bridge->sysdata = sysdata;
+   bridge->busnr = bus;
+   bridge->ops = ops;
+
+   error = pci_scan_root_bus_bridge(bridge);
+   if (error < 0)
+   goto err;
+
+   return bridge->bus;
+
+err_res:
+   pci_free_host_bridge(bridge);
+   return NULL;
+}
+
 static void __init pcic_pbm_scan_bus(struct linux_pcic *pcic)
 {
struct linux_pbm_info *pbm = >pbm;
diff --git a/drivers/pci/hotplug/ibmphp_core.c 
b/drivers/pci/hotplug/ibmphp_core.c
index 4ea57e9019f1..d35463ee96ba 100644
--- a/drivers/pci/hotplug/ibmphp_core.c
+++ b/drivers/pci/hotplug/ibmphp_core.c
@@ -717,6 +717,41 @@ static void ibm_unconfigure_device(struct pci_func *func)
pci_unlock_rescan_remove();
 }
 
+static struct resource busn_resource = {
+   .name   = "pci busn",
+   .start  = 0,
+   .end= 255,
+   .flags  = IORESOURCE_BUS,
+};
+
+static struct pci_bus *pci_scan_bus(int bus, struct pci_ops *ops,
+   void *sysdata)
+{
+   struct pci_host_bridge *bridge;
+   int error;
+
+   bridge = pci_alloc_host_bridge(0);
+   if (!bridge)
+   return NULL;
+
+   pci_add_resource(>windows, _resource);
+   pci_add_resource(>windows, _resource);
+   pci_add_resource(>windows, _resource);
+   bridge->sysdata = sysdata;
+   bridge->busnr = bus;
+   bridge->ops = ops;
+
+   error = pci_scan_root_bus_bridge(bridge);
+   if (error < 0)
+   goto err;
+
+   return bridge->bus;
+
+err:
+   pci_free_host_bridge(bridge);
+   return NULL;
+}
+
 /*
  * The following function is to fix kernel bug regarding
  * getting bus entries, here we manually add those primary
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index b0f666271245..12c3aa63c34d 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3078,42 +3078,6 @@ struct pci_bus *pci_scan_root_bus(struct device *parent, 
int bus,
 }
 EXPORT_SYMBOL(pci_scan_root_bus);
 
-static struct resource busn_resource = {
-   .name   = "PCI busn",
-   .start  = 0,
-   .end= 255,
-   .flags  = IORESOURCE_BUS,
-};
-
-struct pci_bus *pci_scan_bus(int bus, struct pci_ops *ops,
-   void *sysdata)
-{
-   struct pci_host_bridge *bridge;
-   int error;
-
-   bridge = pci_alloc_host_bridge(0);
-   if (!bridge)
-   goto err;
-
-   pci_add_resource(>windows, _resource);
-   pci_add_resource(>windows, _resource);
-   pci_add_resource(>windows, _resource);
-   bridge->sysdata = sysdata;
-   bridge->busnr = bus;
-   bridge->ops = ops;
-
-   error = pci_scan_root_bus_bridge(bridge);
-   if (error < 0)
-   goto err;
-
-   return bridge->bus;
-
-err:
-   pci_free_host_bridge(bridge);
-   return NULL;
-}
-EXPORT_SYMBOL(pci_scan_bus);
-
 /**
  * pci_rescan_bus_bridge_resize - Scan a PCI bus for devices
  * @bridge: PCI bridge for the bus to scan
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..d77ce35a2b33 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -905,7 +905,6 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct 
resource *res,
 void pcibios_scan_specific_bus(int busn);
 struct pci_bus *pci_find_bus(int domain, int busnr);
 void pci_bus_add_devices(const struct pci_bus *bus);
-struct

Re: [PATCH] powerpc64/ftrace: Include ftrace.h needed for enable/disable calls

2018-08-17 Thread Naveen N. Rao


Luke Dashjr wrote:

this_cpu_disable_ftrace and this_cpu_enable_ftrace are inlines in ftrace.h
Without it included, the build fails.


I'm unable to reproduce this. Can you share your .config and the build 
environment?




Fixes: a4bc64d305af ("powerpc64/ftrace: Disable ftrace during kvm entry/exit")
Signed-off-by: Luke Dashjr 
Cc: sta...@vger.kernel.org
---
 arch/powerpc/kvm/book3s_hv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ee4a8854985e..15c2c64291f4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -46,6 +46,7 @@
 #include 
 #include 

+#include 
 #include 
 #include 
 #include 


In any case, this change itself looks alright to me. So:
Acked-by: Naveen N. Rao 


Thanks,
Naveen

Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread Greg Kroah-Hartman

On Fri, Aug 17, 2018 at 11:41:24AM +0200, David Hildenbrand wrote:
> On 17.08.2018 11:03, Rafael J. Wysocki wrote:
> > On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand  wrote:
> >>
> >> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> >>> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
>  From: Vitaly Kuznetsov 
> 
>  Well require to call add_memory()/add_memory_resource() with
>  device_hotplug_lock held, to avoid a lock inversion. Allow external 
>  modules
>  (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
>  lock device hotplug.
> 
>  Signed-off-by: Vitaly Kuznetsov 
>  [modify patch description]
>  Signed-off-by: David Hildenbrand 
>  ---
>   drivers/base/core.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
>  diff --git a/drivers/base/core.c b/drivers/base/core.c
>  index 04bbcd779e11..9010b9e942b5 100644
>  --- a/drivers/base/core.c
>  +++ b/drivers/base/core.c
>  @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
>   {
>   mutex_lock(_hotplug_lock);
>   }
>  +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> 
>   void unlock_device_hotplug(void)
>   {
>   mutex_unlock(_hotplug_lock);
>   }
>  +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
> >>>
> >>> If these are going to be "global" symbols, let's properly name them.
> >>> device_hotplug_lock/unlock would be better.  But I am _really_ nervous
> >>> about letting stuff outside of the driver core mess with this, as people
> >>> better know what they are doing.
> >>
> >> The only "problem" is that we have kernel modules (for paravirtualized
> >> devices) that call add_memory(). This is Hyper-V right now, but we might
> >> have other ones in the future. Without them we would not have to export
> >> it. We might also get kernel modules that want to call remove_memory() -
> >> which will require the device_hotplug_lock as of now.
> >>
> >> What we could do is
> >>
> >> a) add_memory() -> _add_memory() and don't export it
> >> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
> >> We export that one.
> >> c) Use add_memory() in external modules only
> >>
> >> Similar wrapper would be needed e.g. for remove_memory() later on.
> > 
> > That would be safer IMO, as it would prevent developers from using
> > add_memory() without the lock, say.
> > 
> > If the lock is always going to be required for add_memory(), make it
> > hard (or event impossible) to use the latter without it.
> > 
> 
> If there are no objections, I'll go into that direction. But I'll wait
> for more comments regarding the general concept first.

It is the middle of the merge window, and maintainers are really busy
right now.  I doubt you will get many review comments just yet...

Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread David Hildenbrand

On 17.08.2018 11:03, Rafael J. Wysocki wrote:
> On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand  wrote:
>>
>> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
>>> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
 From: Vitaly Kuznetsov 

 Well require to call add_memory()/add_memory_resource() with
 device_hotplug_lock held, to avoid a lock inversion. Allow external modules
 (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
 lock device hotplug.

 Signed-off-by: Vitaly Kuznetsov 
 [modify patch description]
 Signed-off-by: David Hildenbrand 
 ---
  drivers/base/core.c | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/drivers/base/core.c b/drivers/base/core.c
 index 04bbcd779e11..9010b9e942b5 100644
 --- a/drivers/base/core.c
 +++ b/drivers/base/core.c
 @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
  {
  mutex_lock(_hotplug_lock);
  }
 +EXPORT_SYMBOL_GPL(lock_device_hotplug);

  void unlock_device_hotplug(void)
  {
  mutex_unlock(_hotplug_lock);
  }
 +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>>>
>>> If these are going to be "global" symbols, let's properly name them.
>>> device_hotplug_lock/unlock would be better.  But I am _really_ nervous
>>> about letting stuff outside of the driver core mess with this, as people
>>> better know what they are doing.
>>
>> The only "problem" is that we have kernel modules (for paravirtualized
>> devices) that call add_memory(). This is Hyper-V right now, but we might
>> have other ones in the future. Without them we would not have to export
>> it. We might also get kernel modules that want to call remove_memory() -
>> which will require the device_hotplug_lock as of now.
>>
>> What we could do is
>>
>> a) add_memory() -> _add_memory() and don't export it
>> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
>> We export that one.
>> c) Use add_memory() in external modules only
>>
>> Similar wrapper would be needed e.g. for remove_memory() later on.
> 
> That would be safer IMO, as it would prevent developers from using
> add_memory() without the lock, say.
> 
> If the lock is always going to be required for add_memory(), make it
> hard (or event impossible) to use the latter without it.
> 

If there are no objections, I'll go into that direction. But I'll wait
for more comments regarding the general concept first.

Thanks!

-- 

Thanks,

David / dhildenb

[PATCH] poewrpc/mce: Fix SLB rebolting during MCE recovery path.

2018-08-17 Thread Mahesh J Salgaonkar

From: Mahesh Salgaonkar 

With the powrpc next commit e7e81847478 (poewrpc/mce: Fix SLB rebolting
during MCE recovery path.), the SLB error recovery is broken. The
commit missed a crucial change of OR-ing index value to RB[52-63] which
selects the SLB entry while rebolting. This patch fixes that.

Signed-off-by: Mahesh Salgaonkar 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/mm/slb.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 0b095fa54049..6dd9913425bc 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -101,9 +101,12 @@ void __slb_restore_bolted_realmode(void)
 
 /* No isync needed because realmode. */
for (index = 0; index < SLB_NUM_BOLTED; index++) {
+   unsigned long rb = be64_to_cpu(p->save_area[index].esid);
+
+   rb = (rb & ~0xFFFul) | index;
asm volatile("slbmte  %0,%1" :
 : "r" (be64_to_cpu(p->save_area[index].vsid)),
-  "r" (be64_to_cpu(p->save_area[index].esid)));
+  "r" (rb));
}
 }

Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread Rafael J. Wysocki

On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand  wrote:
>
> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> > On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> >> From: Vitaly Kuznetsov 
> >>
> >> Well require to call add_memory()/add_memory_resource() with
> >> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> >> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> >> lock device hotplug.
> >>
> >> Signed-off-by: Vitaly Kuznetsov 
> >> [modify patch description]
> >> Signed-off-by: David Hildenbrand 
> >> ---
> >>  drivers/base/core.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >> index 04bbcd779e11..9010b9e942b5 100644
> >> --- a/drivers/base/core.c
> >> +++ b/drivers/base/core.c
> >> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> >>  {
> >>  mutex_lock(_hotplug_lock);
> >>  }
> >> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >>
> >>  void unlock_device_hotplug(void)
> >>  {
> >>  mutex_unlock(_hotplug_lock);
> >>  }
> >> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
> >
> > If these are going to be "global" symbols, let's properly name them.
> > device_hotplug_lock/unlock would be better.  But I am _really_ nervous
> > about letting stuff outside of the driver core mess with this, as people
> > better know what they are doing.
>
> The only "problem" is that we have kernel modules (for paravirtualized
> devices) that call add_memory(). This is Hyper-V right now, but we might
> have other ones in the future. Without them we would not have to export
> it. We might also get kernel modules that want to call remove_memory() -
> which will require the device_hotplug_lock as of now.
>
> What we could do is
>
> a) add_memory() -> _add_memory() and don't export it
> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
> We export that one.
> c) Use add_memory() in external modules only
>
> Similar wrapper would be needed e.g. for remove_memory() later on.

That would be safer IMO, as it would prevent developers from using
add_memory() without the lock, say.

If the lock is always going to be required for add_memory(), make it
hard (or event impossible) to use the latter without it.

Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread Rafael J. Wysocki

On Fri, Aug 17, 2018 at 10:41 AM Greg Kroah-Hartman
 wrote:
>
> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> > From: Vitaly Kuznetsov 
> >
> > Well require to call add_memory()/add_memory_resource() with
> > device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> > (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> > lock device hotplug.
> >
> > Signed-off-by: Vitaly Kuznetsov 
> > [modify patch description]
> > Signed-off-by: David Hildenbrand 
> > ---
> >  drivers/base/core.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 04bbcd779e11..9010b9e942b5 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> >  {
> >   mutex_lock(_hotplug_lock);
> >  }
> > +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >
> >  void unlock_device_hotplug(void)
> >  {
> >   mutex_unlock(_hotplug_lock);
> >  }
> > +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>
> If these are going to be "global" symbols, let's properly name them.
> device_hotplug_lock/unlock would be better.

Well, device_hotplug_lock is the name of the lock itself. :-)

> But I am _really_ nervous about letting stuff outside of the driver core mess
> with this, as people better know what they are doing.
>
> Can't we just "lock" the memory stuff instead?  Why does the entirety of
> the driver core need to be messed with here?

Because, in general, memory hotplug and hotplug of devices are not
independent.  CPUs and memory may only be possible to take away
together and that may be the case for other devices too (say, it
wouldn't be a good idea to access a memory block that has just gone
away from a device, for DMA and the like).  That's why
device_hotplug_lock was introduced in the first place.

That said I agree that exporting this to drivers doesn't feel particularly safe.

Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread David Hildenbrand

On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
>> From: Vitaly Kuznetsov 
>>
>> Well require to call add_memory()/add_memory_resource() with
>> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
>> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
>> lock device hotplug.
>>
>> Signed-off-by: Vitaly Kuznetsov 
>> [modify patch description]
>> Signed-off-by: David Hildenbrand 
>> ---
>>  drivers/base/core.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index 04bbcd779e11..9010b9e942b5 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
>>  {
>>  mutex_lock(_hotplug_lock);
>>  }
>> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
>>  
>>  void unlock_device_hotplug(void)
>>  {
>>  mutex_unlock(_hotplug_lock);
>>  }
>> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
> 
> If these are going to be "global" symbols, let's properly name them.
> device_hotplug_lock/unlock would be better.  But I am _really_ nervous
> about letting stuff outside of the driver core mess with this, as people
> better know what they are doing.

The only "problem" is that we have kernel modules (for paravirtualized
devices) that call add_memory(). This is Hyper-V right now, but we might
have other ones in the future. Without them we would not have to export
it. We might also get kernel modules that want to call remove_memory() -
which will require the device_hotplug_lock as of now.

What we could do is

a) add_memory() -> _add_memory() and don't export it
b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
We export that one.
c) Use add_memory() in external modules only

Similar wrapper would be needed e.g. for remove_memory() later on.

> 
> Can't we just "lock" the memory stuff instead?  Why does the entirety of
> the driver core need to be messed with here?

The main problem is that add_memory() will first take the
mem_hotplug_lock, to later on create and attach the memory block devices
(to a bus without any drivers) via bus_probe_device(). Here, we will
take the device_lock() of these devices.

Setting a device online from user space (.online = true) will first take
the device_hotplug_lock, then the device_lock(), and _right now_ not the
mem_hotplug_lock (see cover letter/patch 2).

We have to take mem_hotplug_lock here, but doing it down in e.g.
online_pages() will right now create the possibility for a lock
inversion. So one alternative is to take the mem_hotplug_lock in core.c
before calling device_online()/device_offline(). But this feels very wrong.

Thanks!

> 
> thanks,
> 
> greg k-h
> 

-- 

Thanks,

David / dhildenb

Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread Greg Kroah-Hartman

On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> From: Vitaly Kuznetsov 
> 
> Well require to call add_memory()/add_memory_resource() with
> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> lock device hotplug.
> 
> Signed-off-by: Vitaly Kuznetsov 
> [modify patch description]
> Signed-off-by: David Hildenbrand 
> ---
>  drivers/base/core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 04bbcd779e11..9010b9e942b5 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
>  {
>   mutex_lock(_hotplug_lock);
>  }
> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
>  
>  void unlock_device_hotplug(void)
>  {
>   mutex_unlock(_hotplug_lock);
>  }
> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);

If these are going to be "global" symbols, let's properly name them.
device_hotplug_lock/unlock would be better.  But I am _really_ nervous
about letting stuff outside of the driver core mess with this, as people
better know what they are doing.

Can't we just "lock" the memory stuff instead?  Why does the entirety of
the driver core need to be messed with here?

thanks,

greg k-h

Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

2018-08-17 Thread David Hildenbrand

On 17.08.2018 10:20, Rafael J. Wysocki wrote:
> On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand  wrote:
>>
>> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
>> fix concurrent memory hot-add deadlock"), which tried to fix a possible
>> lock inversion reported and discussed in [1] due to the two locks
>> a) device_lock()
>> b) mem_hotplug_lock
>>
>> While add_memory() first takes b), followed by a) during
>> bus_probe_device(), onlining of memory from user space first took b),
>> followed by a), exposing a possible deadlock.
>>
>> In [1], and it was decided to not make use of device_hotplug_lock, but
>> rather to enforce a locking order. Looking at 1., this order is not always
>> satisfied when calling device_online() - essentially we simply don't take
>> one of both locks anymore - and fixing this would require us to
>> take the mem_hotplug_lock in core driver code (online_store()), which
>> sounds wrong.
>>
>> The problems I spotted related to this:
>>
>> 1. Memory block device attributes: While .state first calls
>>mem_hotplug_begin() and the calls device_online() - which takes
>>device_lock() - .online does no longer call mem_hotplug_begin(), so
>>effectively calls online_pages() without mem_hotplug_lock. onlining/
>>offlining of pages is no longer serialised across different devices.
>>
>> 2. device_online() should be called under device_hotplug_lock, however
>>onlining memory during add_memory() does not take care of that. (I
>>didn't follow how strictly this is needed, but there seems to be a
>>reason because it is documented at device_online() and
>>device_offline()).
>>
>> In addition, I think there is also something wrong about the locking in
>>
>> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>>(and device_online()) without locks. This was introduced after
>>30467e0b3be. And skimming over the code, I assume it could need some
>>more care in regards to locking.
>>
>> ACPI code already holds the device_hotplug_lock, and as we are
>> effectively hotplugging memory block devices, requiring to hold that
>> lock does not sound too wrong, although not chosen in [1], as
>> "I don't think resolving a locking dependency is appropriate by
>>  just serializing them with another lock."
>> I think this is the cleanest solution.
>>
>> Requiring add_memory()/add_memory_resource() to be called under
>> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
>> online_pages/offline_pages() fixes 1. and 3.
>>
>> Fixup all callers of add_memory/add_memory_resource to hold the lock if
>> not already done.
>>
>> So this is essentially a revert of 30467e0b3be, implementation of what
>> was suggested in [1] by Vitaly, applied to the current tree.
>>
>> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
>> 2015-February/065324.html
>>
>> This patch is partly based on a patch by Vitaly Kuznetsov.
>>
>> Signed-off-by: David Hildenbrand 
>> ---
>>  arch/powerpc/platforms/powernv/memtrace.c |  3 ++
>>  drivers/acpi/acpi_memhotplug.c|  1 +
>>  drivers/base/memory.c | 18 +-
>>  drivers/hv/hv_balloon.c   |  4 +++
>>  drivers/s390/char/sclp_cmd.c  |  3 ++
>>  drivers/xen/balloon.c |  3 ++
>>  mm/memory_hotplug.c   | 42 ++-
>>  7 files changed, 55 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
>> b/arch/powerpc/platforms/powernv/memtrace.c
>> index 51dc398ae3f7..4c2737a33020 100644
>> --- a/arch/powerpc/platforms/powernv/memtrace.c
>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>> @@ -206,6 +206,8 @@ static int memtrace_online(void)
>> int i, ret = 0;
>> struct memtrace_entry *ent;
>>
>> +   /* add_memory() requires device_hotplug_lock */
>> +   lock_device_hotplug();
>> for (i = memtrace_array_nr - 1; i >= 0; i--) {
>> ent = _array[i];
>>
>> @@ -244,6 +246,7 @@ static int memtrace_online(void)
>> pr_info("Added trace memory back to node %d\n", ent->nid);
>> ent->size = ent->start = ent->nid = -1;
>> }
>> +   unlock_device_hotplug();
>> if (ret)
>> return ret;
>>
>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>> index 6b0d3ef7309c..e7a4c7900967 100644
>> --- a/drivers/acpi/acpi_memhotplug.c
>> +++ b/drivers/acpi/acpi_memhotplug.c
>> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct 
>> acpi_memory_device *mem_device)
>> if (node < 0)
>> node = memory_add_physaddr_to_nid(info->start_addr);
>>
>> +   /* we already hold the device_hotplug lock at this point */
>> result = add_memory(node, info->start_addr, info->length);
>>
>> /*
> 
> A very minor nit here: I would

Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

2018-08-17 Thread Rafael J. Wysocki

On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand  wrote:
>
> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
> fix concurrent memory hot-add deadlock"), which tried to fix a possible
> lock inversion reported and discussed in [1] due to the two locks
> a) device_lock()
> b) mem_hotplug_lock
>
> While add_memory() first takes b), followed by a) during
> bus_probe_device(), onlining of memory from user space first took b),
> followed by a), exposing a possible deadlock.
>
> In [1], and it was decided to not make use of device_hotplug_lock, but
> rather to enforce a locking order. Looking at 1., this order is not always
> satisfied when calling device_online() - essentially we simply don't take
> one of both locks anymore - and fixing this would require us to
> take the mem_hotplug_lock in core driver code (online_store()), which
> sounds wrong.
>
> The problems I spotted related to this:
>
> 1. Memory block device attributes: While .state first calls
>mem_hotplug_begin() and the calls device_online() - which takes
>device_lock() - .online does no longer call mem_hotplug_begin(), so
>effectively calls online_pages() without mem_hotplug_lock. onlining/
>offlining of pages is no longer serialised across different devices.
>
> 2. device_online() should be called under device_hotplug_lock, however
>onlining memory during add_memory() does not take care of that. (I
>didn't follow how strictly this is needed, but there seems to be a
>reason because it is documented at device_online() and
>device_offline()).
>
> In addition, I think there is also something wrong about the locking in
>
> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>(and device_online()) without locks. This was introduced after
>30467e0b3be. And skimming over the code, I assume it could need some
>more care in regards to locking.
>
> ACPI code already holds the device_hotplug_lock, and as we are
> effectively hotplugging memory block devices, requiring to hold that
> lock does not sound too wrong, although not chosen in [1], as
> "I don't think resolving a locking dependency is appropriate by
>  just serializing them with another lock."
> I think this is the cleanest solution.
>
> Requiring add_memory()/add_memory_resource() to be called under
> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
> online_pages/offline_pages() fixes 1. and 3.
>
> Fixup all callers of add_memory/add_memory_resource to hold the lock if
> not already done.
>
> So this is essentially a revert of 30467e0b3be, implementation of what
> was suggested in [1] by Vitaly, applied to the current tree.
>
> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
> 2015-February/065324.html
>
> This patch is partly based on a patch by Vitaly Kuznetsov.
>
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/platforms/powernv/memtrace.c |  3 ++
>  drivers/acpi/acpi_memhotplug.c|  1 +
>  drivers/base/memory.c | 18 +-
>  drivers/hv/hv_balloon.c   |  4 +++
>  drivers/s390/char/sclp_cmd.c  |  3 ++
>  drivers/xen/balloon.c |  3 ++
>  mm/memory_hotplug.c   | 42 ++-
>  7 files changed, 55 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
> b/arch/powerpc/platforms/powernv/memtrace.c
> index 51dc398ae3f7..4c2737a33020 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -206,6 +206,8 @@ static int memtrace_online(void)
> int i, ret = 0;
> struct memtrace_entry *ent;
>
> +   /* add_memory() requires device_hotplug_lock */
> +   lock_device_hotplug();
> for (i = memtrace_array_nr - 1; i >= 0; i--) {
> ent = _array[i];
>
> @@ -244,6 +246,7 @@ static int memtrace_online(void)
> pr_info("Added trace memory back to node %d\n", ent->nid);
> ent->size = ent->start = ent->nid = -1;
> }
> +   unlock_device_hotplug();
> if (ret)
> return ret;
>
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 6b0d3ef7309c..e7a4c7900967 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct 
> acpi_memory_device *mem_device)
> if (node < 0)
> node = memory_add_physaddr_to_nid(info->start_addr);
>
> +   /* we already hold the device_hotplug lock at this point */
> result = add_memory(node, info->start_addr, info->length);
>
> /*

A very minor nit here: I would say "device_hotplug_lock is already
held at this point" in the comment (I sort of don't like to say "we"
in code comments as it is not particularly clear what

[PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

2018-08-17 Thread David Hildenbrand

There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
fix concurrent memory hot-add deadlock"), which tried to fix a possible
lock inversion reported and discussed in [1] due to the two locks
a) device_lock()
b) mem_hotplug_lock

While add_memory() first takes b), followed by a) during
bus_probe_device(), onlining of memory from user space first took b),
followed by a), exposing a possible deadlock.

In [1], and it was decided to not make use of device_hotplug_lock, but
rather to enforce a locking order. Looking at 1., this order is not always
satisfied when calling device_online() - essentially we simply don't take
one of both locks anymore - and fixing this would require us to
take the mem_hotplug_lock in core driver code (online_store()), which
sounds wrong.

The problems I spotted related to this:

1. Memory block device attributes: While .state first calls
   mem_hotplug_begin() and the calls device_online() - which takes
   device_lock() - .online does no longer call mem_hotplug_begin(), so
   effectively calls online_pages() without mem_hotplug_lock. onlining/
   offlining of pages is no longer serialised across different devices.

2. device_online() should be called under device_hotplug_lock, however
   onlining memory during add_memory() does not take care of that. (I
   didn't follow how strictly this is needed, but there seems to be a
   reason because it is documented at device_online() and
   device_offline()).

In addition, I think there is also something wrong about the locking in

3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
   (and device_online()) without locks. This was introduced after
   30467e0b3be. And skimming over the code, I assume it could need some
   more care in regards to locking.

ACPI code already holds the device_hotplug_lock, and as we are
effectively hotplugging memory block devices, requiring to hold that
lock does not sound too wrong, although not chosen in [1], as
"I don't think resolving a locking dependency is appropriate by
 just serializing them with another lock."
I think this is the cleanest solution.

Requiring add_memory()/add_memory_resource() to be called under
device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
online_pages/offline_pages() fixes 1. and 3.

Fixup all callers of add_memory/add_memory_resource to hold the lock if
not already done.

So this is essentially a revert of 30467e0b3be, implementation of what
was suggested in [1] by Vitaly, applied to the current tree.

[1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
2015-February/065324.html

This patch is partly based on a patch by Vitaly Kuznetsov.

Signed-off-by: David Hildenbrand 
---
 arch/powerpc/platforms/powernv/memtrace.c |  3 ++
 drivers/acpi/acpi_memhotplug.c|  1 +
 drivers/base/memory.c | 18 +-
 drivers/hv/hv_balloon.c   |  4 +++
 drivers/s390/char/sclp_cmd.c  |  3 ++
 drivers/xen/balloon.c |  3 ++
 mm/memory_hotplug.c   | 42 ++-
 7 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
b/arch/powerpc/platforms/powernv/memtrace.c
index 51dc398ae3f7..4c2737a33020 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -206,6 +206,8 @@ static int memtrace_online(void)
int i, ret = 0;
struct memtrace_entry *ent;
 
+   /* add_memory() requires device_hotplug_lock */
+   lock_device_hotplug();
for (i = memtrace_array_nr - 1; i >= 0; i--) {
ent = _array[i];
 
@@ -244,6 +246,7 @@ static int memtrace_online(void)
pr_info("Added trace memory back to node %d\n", ent->nid);
ent->size = ent->start = ent->nid = -1;
}
+   unlock_device_hotplug();
if (ret)
return ret;
 
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..e7a4c7900967 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
 
+   /* we already hold the device_hotplug lock at this point */
result = add_memory(node, info->start_addr, info->length);
 
/*
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c8a1cb0b6136..f60507f994df 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -341,19 +341,11 @@ store_mem_state(struct device *dev,
goto err;
}
 
-   /*
-* Memory hotplug needs to hold mem_hotplug_begin() for probe to find
-* the correct memory block to online before doing device_online(dev),
-* which

[PATCH RFC 0/2] mm: online/offline_pages called w.o. mem_hotplug_lock

2018-08-17 Thread David Hildenbrand

Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock. And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g. via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock. So we can
have concurrent callers in online_pages(). We e.g. touch in online_pages()
basically unprotected zone->present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible. We
would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock(),
and the approach to fix it fixed one inversion, but dropped the
mem_hotplug_lock on another instance (.online).

As far as I understand from the code and from b93e0f329e24 ("mm,
memory_hotplug: get rid of zonelists_mutex"), mem_hotplug_lock is required
because we assume that
"both memory online and offline are fully serialized."
and this is not the case if we only hold the device_lock().

I propose the general rules:

1. add_memory/add_memory_resource() must only be called with
   device_hotplug_lock. For now only done in ACPI code.
2. remove_memory() must only be called with device_hotplug_lock. This is
   already documented and true in ACPI code.
3. device_online()/device_offline() must only be called with
   device_hotplug_lock. This is already documented and true for now in core
   code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
   online_pages/offline_pages. For now this is only true for the first two
   instances.

To me, this looks way cleaner than what we have right now (and easier to
verify). And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural. Second patch could
maybe split up.

But let's first hear if this is actually a problem and if there migh be
alternatives (or cleanups). Only tested with DIMM-based hotplug.

David Hildenbrand (1):
  mm/memory_hotplug: fix online/offline_pages called w.o.
mem_hotplug_lock

Vitaly Kuznetsov (1):
  drivers/base: export lock_device_hotplug/unlock_device_hotplug

 arch/powerpc/platforms/powernv/memtrace.c |  3 ++
 drivers/acpi/acpi_memhotplug.c|  1 +
 drivers/base/core.c   |  2 ++
 drivers/base/memory.c | 18 +-
 drivers/hv/hv_balloon.c   |  4 +++
 drivers/s390/char/sclp_cmd.c  |  3 ++
 drivers/xen/balloon.c |  3 ++
 mm/memory_hotplug.c   | 42 ++-
 8 files changed, 57 insertions(+), 19 deletions(-)

-- 
2.17.1

[PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread David Hildenbrand

From: Vitaly Kuznetsov 

Well require to call add_memory()/add_memory_resource() with
device_hotplug_lock held, to avoid a lock inversion. Allow external modules
(e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
lock device hotplug.

Signed-off-by: Vitaly Kuznetsov 
[modify patch description]
Signed-off-by: David Hildenbrand 
---
 drivers/base/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 04bbcd779e11..9010b9e942b5 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -700,11 +700,13 @@ void lock_device_hotplug(void)
 {
mutex_lock(_hotplug_lock);
 }
+EXPORT_SYMBOL_GPL(lock_device_hotplug);
 
 void unlock_device_hotplug(void)
 {
mutex_unlock(_hotplug_lock);
 }
+EXPORT_SYMBOL_GPL(unlock_device_hotplug);
 
 int lock_device_hotplug_sysfs(void)
 {
-- 
2.17.1

[PATCH v2] powerpc/powernv/pci: Work around races in PCI bridge enabling

2018-08-17 Thread Benjamin Herrenschmidt

The generic code is race when multiple children of a PCI bridge try to
enable it simultaneously.

This leads to drivers trying to access a device through a not-yet-enabled
bridge, and this EEH errors under various circumstances when using parallel
driver probing.

There is work going on to fix that properly in the PCI core but it will
take some time.

x86 gets away with it because (outside of hotplug), the BIOS enables all
the bridges at boot time.

This patch does the same thing on powernv by enabling all bridges that
have child devices at boot time, thus avoiding subsequent races. It's suitable
for backporting to stable and distros, while the proper PCI fix will probably
be significantly more invasive.

Signed-off-by: Benjamin Herrenschmidt 
CC: sta...@vger.kernel.org
---

v2. Fix unused variables warning.

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5bd0eb6681bc..ca720d530c6d 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3367,12 +3367,49 @@ static void pnv_pci_ioda_create_dbgfs(void)
 #endif /* CONFIG_DEBUG_FS */
 }
 
+static void pnv_pci_enable_bridge(struct pci_bus *bus)
+{
+   struct pci_dev *dev = bus->self;
+   struct pci_bus *child;
+
+   /* Empty bus ? bail */
+   if (list_empty(>devices))
+   return;
+
+   /*
+* If there's a bridge associated with that bus enable it. This works
+* around races in the generic code if the enabling is done during
+* parallel probing. This can be removed once those races have been
+* fixed.
+*/
+   if (dev) {
+   int rc = pci_enable_device(dev);
+   if (rc)
+   pci_err(dev, "Error enabling bridge (%d)\n", rc);
+   pci_set_master(dev);
+   }
+
+   /* Perform the same to child busses */
+   list_for_each_entry(child, >children, node)
+   pnv_pci_enable_bridge(child);
+}
+
+static void pnv_pci_enable_bridges(void)
+{
+   struct pci_controller *hose;
+
+   list_for_each_entry(hose, _list, list_node)
+   pnv_pci_enable_bridge(hose->bus);
+}
+
 static void pnv_pci_ioda_fixup(void)
 {
pnv_pci_ioda_setup_PEs();
pnv_pci_ioda_setup_iommu_api();
pnv_pci_ioda_create_dbgfs();
 
+   pnv_pci_enable_bridges();
+
 #ifdef CONFIG_EEH
pnv_eeh_post_init();
 #endif

Re: [PATCH] powerpc/powernv/pci: Work around races in PCI bridge enabling

2018-08-17 Thread kbuild test robot

Hi Benjamin,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.18 next-20180817]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Benjamin-Herrenschmidt/powerpc-powernv-pci-Work-around-races-in-PCI-bridge-enabling/20180817-113453
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/platforms/powernv/pci-ioda.c: In function 
'pnv_pci_enable_bridges':
>> arch/powerpc/platforms/powernv/pci-ioda.c:3262:18: error: unused variable 
>> 'pdev' [-Werror=unused-variable]
 struct pci_dev *pdev;
 ^~~~
>> arch/powerpc/platforms/powernv/pci-ioda.c:3261:18: error: unused variable 
>> 'bus' [-Werror=unused-variable]
 struct pci_bus *bus;
 ^~~
>> arch/powerpc/platforms/powernv/pci-ioda.c:3260:18: error: unused variable 
>> 'phb' [-Werror=unused-variable]
 struct pnv_phb *phb;
 ^~~
   cc1: all warnings being treated as errors

vim +/pdev +3262 arch/powerpc/platforms/powernv/pci-ioda.c

  3256  
  3257  static void pnv_pci_enable_bridges(void)
  3258  {
  3259  struct pci_controller *hose;
> 3260  struct pnv_phb *phb;
> 3261  struct pci_bus *bus;
> 3262  struct pci_dev *pdev;
  3263  
  3264  list_for_each_entry(hose, _list, list_node)
  3265  pnv_pci_enable_bridge(hose->bus);
  3266  }
  3267  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH] powerpc/traps: Avoid rate limit messages from show unhandled signals

2018-08-17 Thread Michael Ellerman

In the recent commit to add an explicit ratelimit state when showing
unhandled signals, commit 35a52a10c3ac ("powerpc/traps: Use an
explicit ratelimit state for show_signal_msg()"), I put the check of
show_unhandled_signals and the ratelimit state before the call to
unhandled_signal() so as to avoid unnecessarily calling the latter
when show_unhandled_signals is false.

However that causes us to check the ratelimit state on every call, so
if we take a lot of *handled* signals that has the effect of making
the ratelimit code print warnings that callbacks have been suppressed
when they haven't.

So rearrange the code so that we check show_unhandled_signals first,
then call unhandled_signal() and finally check the ratelimit state.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/traps.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 070e96f1773a..c85adb858271 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -315,22 +315,21 @@ void user_single_step_siginfo(struct task_struct *tsk,
info->si_addr = (void __user *)regs->nip;
 }
 
-static bool show_unhandled_signals_ratelimited(void)
+static void show_signal_msg(int signr, struct pt_regs *regs, int code,
+   unsigned long addr)
 {
static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
  DEFAULT_RATELIMIT_BURST);
-   return show_unhandled_signals && __ratelimit();
-}
 
-static void show_signal_msg(int signr, struct pt_regs *regs, int code,
-   unsigned long addr)
-{
-   if (!show_unhandled_signals_ratelimited())
+   if (!show_unhandled_signals)
return;
 
if (!unhandled_signal(current, signr))
return;
 
+   if (!__ratelimit())
+   return;
+
pr_info("%s[%d]: %s (%d) at %lx nip %lx lr %lx code %x",
current->comm, current->pid, signame(signr), signr,
addr, regs->nip, regs->link, code);
-- 
2.17.1

49 matches

Mail list logo