[PATCH] SB600 for the Nemo board has non-zero devices on non-root bus

2017-12-15 Thread Christian Zigotzky

On 04 December 2017 at 12:40PM, Darren Stevens wrote:
> Hello Bjorn
>
> Firstly sorry for not being able to join in this discussion, I have been
> moving house and only got my X1000 set up again yesterday..
>
> On 30/11/2017, Bjorn Helgaas wrote:
>> I *think* something like the patch below should make this work if you
>> use the "pci=pcie_scan_all" parameter.  We have some x86 DMI quirks
>> that set PCI_SCAN_ALL_PCIE_DEVS automatically.  I don't know how to do
>> something similar on powerpc, but maybe you do?
>
> Actually the root ports on the Nemo's PA6T processor don't respond to the
> SB600 unless we turn on a special 'relax pci-e' bit in one of its control
> registers. We use a small out of tree init routine to do this, and there
> would be the ideal place to put a call to
> pci_set_flag(PCI_SCAN_ALL_PCIE_DEVS).
>
> This patch fixes the last major hurdle to getting the X1000 fully 
supported in

> the linux kernel, so thanks very much for that.
>
> Regards
> Darren
>
>


On 15 December 2017 at 09:25PM, Bjorn Helgaas wrote:
> On Fri, Dec 15, 2017 at 09:04:51AM +0100, Christian Zigotzky wrote:
>> On 09 December 2017 at 7:03PM, Christian Zigotzky wrote:
>>> On 08 December 2017 at 12:59PM, Michael Ellerman wrote:

> Darren's idea of doing it at the same time you tweak the SB600 "relax
> pci-e" bit is ideal because then the two pieces are obviously
> connected and it wouldn't affect any other systems at all.

 Yes that would be ideal. That patch is currently out-of-tree I gather,
 but I guess everyone who's using these machines must have that patch
 anyway.

 Darren what does that code look like? Can we get it upstream and close
 the loop on this?

 cheers

>>>
>>> Hi Michael,
>>>
>>> Please find attached the code.
>>>
>>> Thanks,
>>> Christian
>>
>> Hi All,
>>
>> I haven't received any response yet. Is this the correct patch you
>> are looking for?
>
> This is a powerpc patch that doesn't affect the PCI core, so I would
> say this is Michael's bailiwick.
>
> I guess you're only looking for a hint about whether this is the right
> approach, because it's obviously fully baked yet (no changelog,
> signed-off-by, etc, not a "safe for all powerpc" run-time solution,
> not in Linux indentation style, etc).
>
> It looks like the "pasemi,1682m-iob" DT property is required and
> possibly sufficient to identify this system at run-time.
>
> My advice is to finish that stuff up, post it to the powerpc
> maintainers and the linuxppc-dev@lists.ozlabs.org list, and go from
> there.
>

Darren,

Where is this small out of tree init routine in our patch? I haven't 
found it yet. Please post this routine here. Please find attached our 
latest Nemo patch.


Thanks,
Christian
diff -rupN a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c
--- a/arch/powerpc/platforms/pasemi/pci.c	2017-11-16 08:18:35.078874462 +0100
+++ b/arch/powerpc/platforms/pasemi/pci.c	2017-11-16 08:17:22.034367975 +0100
@@ -27,6 +27,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 #include 
@@ -108,6 +109,69 @@ static int workaround_5945(struct pci_bu
 	return 1;
 }
 
+#ifdef CONFIG_PPC_PASEMI_NEMO
+static int sb600_bus = 5;
+static void __iomem *iob_mapbase = NULL;
+
+static int pa_pxp_read_config(struct pci_bus *bus, unsigned int devfn,
+ int offset, int len, u32 *val);
+
+static void sb600_set_flag(int bus)
+{
+struct resource res;
+struct device_node *dn;
+   struct pci_bus *busp;
+   u32 val;
+   int err;
+
+   if (sb600_bus == -1)
+   {
+   busp = pci_find_bus(0, 0);
+   pa_pxp_read_config(busp, PCI_DEVFN(17,0), PCI_SECONDARY_BUS, 1, &val);
+
+   sb600_bus = val;
+
+   printk(KERN_CRIT "NEMO SB600 on bus %d.\n",sb600_bus);
+   }
+
+   if (iob_mapbase == NULL)
+   {
+dn = of_find_compatible_node(NULL, "isa", "pasemi,1682m-iob");
+if (!dn)
+{
+   printk(KERN_CRIT "NEMO SB600 missing iob node\n");
+   return;
+   }
+
+   err = of_address_to_resource(dn, 0, &res);
+of_node_put(dn);
+
+   if (err)
+   {
+   printk(KERN_CRIT "NEMO SB600 missing resource\n");
+   return;
+   }
+
+   printk(KERN_CRIT "NEMO SB600 IOB base %08lx\n",res.start);
+
+   iob_mapbase = ioremap(res.start + 0x100, 0x94);
+   }
+
+   if (iob_mapbase != NULL)
+   {
+   if (bus == sb600_bus)
+   {
+   out_le32(iob_mapbase + 4, in_le32(iob_mapbase + 4) | 0x800);
+   }
+   else
+   {
+   out_le32(iob_mapbase + 4, in_le32(iob_mapbase + 4) & ~0x800);
+   }
+   }
+}
+#endif
+
+
 static int pa_pxp_read_config(struct pci_bus *bus, unsigned int devfn,
 			  int offset, int len, u32 *val)
 {

Re: [PATCH v3 03/11] ASoC: fsl_ssi: Refine all comments

2017-12-15 Thread Nicolin Chen
Hi,

I am outside so can't use mutt. Sorry for that.

This comment is going to be replaced in the 2nd set anyway because the
whole function will be replaced.

And please point out all comments that you think I need to rework. I am
totally fine to do that. I don't think every single one is bad. And this
patch has to go in as it also adds a lot of new comments.

Thank you for your effort
Nicolin

On Dec 15, 2017 20:43, "Timur Tabi"  wrote:

On 12/13/17 5:18 PM, Nicolin Chen wrote:

> -* We are running on a SoC which does not support online SSI
> -* reconfiguration, so we have to enable all necessary flags at
> once
> -* even if we do not use them later (capture and playback
> configuration)
> +* Online configuration is not supported
> +* Enable or Disable all necessary bits at once
>

This is an example of a bad change, IMHO.  The original was written in
elegant prose.  The new version is just two short sentences.


Re: [PATCH v3 00/11] ASoC: fsl_ssi: Clean up - coding style level

2017-12-15 Thread Timur Tabi

On 12/13/17 5:18 PM, Nicolin Chen wrote:

Additionally, in order to fix/work-around hardware bugs and design
flaws, the driver made a lot of compromise so now its program flow
looks very complicated and it's getting hard to maintain or update.

So I am going to clean up the driver on both coding style level and
program flow level.


I'm okay with everything except patch #3.


Re: [PATCH v3 03/11] ASoC: fsl_ssi: Refine all comments

2017-12-15 Thread Timur Tabi

On 12/13/17 5:18 PM, Nicolin Chen wrote:

-* We are running on a SoC which does not support online SSI
-* reconfiguration, so we have to enable all necessary flags at once
-* even if we do not use them later (capture and playback configuration)
+* Online configuration is not supported
+* Enable or Disable all necessary bits at once


This is an example of a bad change, IMHO.  The original was written in 
elegant prose.  The new version is just two short sentences.


Re: [PATCH 10/17] mm: merge vmem_altmap_alloc into altmap_alloc_block_buf

2017-12-15 Thread Dan Williams
On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig  wrote:
> There is no clear separation between the two, so merge them.
>
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Logan Gunthorpe 

Looks good,

Reviewed-by: Dan Williams 


Re: [PATCH 09/17] mm: split altmap memory map allocation from normal case

2017-12-15 Thread Dan Williams
On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig  wrote:
> No functional changes, just untangling the call chain.

I'd also mention that creating more helper functions in the altmap_
namespace helps document why altmap is passed all around the hotplug
code.

>
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Logan Gunthorpe 

Reviewed-by: Dan Williams 


Re: [PATCH 08/17] mm: pass the vmem_altmap to memmap_init_zone

2017-12-15 Thread Dan Williams
On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig  wrote:
> Pass the vmem_altmap two levels down instead of needing a lookup.
>
> Signed-off-by: Christoph Hellwig 

Given the fact that HMM and now P2P are attracted to
devm_memremap_pages() I think this churn is worth it. vmem_altmap is
worth being considered a first class citizen of memory hotplug and not
a hidden hack.

Reviewed-by: Dan Williams 


Re: [PATCH 07/17] mm: pass the vmem_altmap to vmemmap_free

2017-12-15 Thread Dan Williams
On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig  wrote:
> We can just pass this on instead of having to do a radix tree lookup
> without proper locking a few levels into the callchain.
>
> Signed-off-by: Christoph Hellwig 

Now I remember why I went with the radix lookup, laziness!

This looks good to me, I appreciate you digging in.

Reviewed-by: Dan Williams 


Re: [PATCH 06/17] mm: pass the vmem_altmap to arch_remove_memory and __remove_pages

2017-12-15 Thread Dan Williams
On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig  wrote:
> We can just pass this on instead of having to do a radix tree lookup
> without proper locking 2 levels into the callchain.
>
> Signed-off-by: Christoph Hellwig wip

I assume that "wip" is a typo?

Otherwise,

Reviewed-by: Dan Williams 


Re: [PATCH 05/17] mm: pass the vmem_altmap to vmemmap_populate

2017-12-15 Thread Dan Williams
[ cc Michal ]

On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig  wrote:
> We can just pass this on instead of having to do a radix tree lookup
> without proper locking a few levels into the callchain.
>
> Signed-off-by: Christoph Hellwig 

I know Michal has concerns about the complexity of the memory hotplug
implementation, but I think this just means I need to go write up
better kerneldoc for the vmem_altmap definition so that memory hotplug
developers know what's happening.

Other than that:

Reviewed-by: Dan Williams 

Including the patch for Michal just in case he doesn't have it in his archives.

> ---
>  arch/arm64/mm/mmu.c|  6 --
>  arch/ia64/mm/discontig.c   |  3 ++-
>  arch/powerpc/mm/init_64.c  |  7 ++-
>  arch/s390/mm/vmem.c|  3 ++-
>  arch/sparc/mm/init_64.c|  2 +-
>  arch/x86/mm/init_64.c  |  4 ++--
>  include/linux/memory_hotplug.h |  3 ++-
>  include/linux/mm.h |  6 --
>  mm/memory_hotplug.c|  7 ---
>  mm/sparse-vmemmap.c|  7 ---
>  mm/sparse.c| 20 
>  11 files changed, 39 insertions(+), 29 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 267d2b79d52d..ec8952ff13be 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -654,12 +654,14 @@ int kern_addr_valid(unsigned long addr)
>  }
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  #if !ARM64_SWAPPER_USES_SECTION_MAPS
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
> node)
> +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
> node,
> +   struct vmem_altmap *altmap)
>  {
> return vmemmap_populate_basepages(start, end, node);
>  }
>  #else  /* !ARM64_SWAPPER_USES_SECTION_MAPS */
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
> node)
> +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
> node,
> +   struct vmem_altmap *altmap)
>  {
> unsigned long addr = start;
> unsigned long next;
> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
> index 9b2d994cddf6..1555aecaaf85 100644
> --- a/arch/ia64/mm/discontig.c
> +++ b/arch/ia64/mm/discontig.c
> @@ -754,7 +754,8 @@ void arch_refresh_nodedata(int update_node, pg_data_t 
> *update_pgdat)
>  #endif
>
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
> node)
> +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
> node,
> +   struct vmem_altmap *altmap)
>  {
> return vmemmap_populate_basepages(start, end, node);
>  }
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index a07722531b32..779b74a96b8f 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -183,7 +183,8 @@ static __meminit void vmemmap_list_populate(unsigned long 
> phys,
> vmemmap_list = vmem_back;
>  }
>
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
> node)
> +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
> node,
> +   struct vmem_altmap *altmap)
>  {
> unsigned long page_size = 1 << 
> mmu_psize_defs[mmu_vmemmap_psize].shift;
>
> @@ -193,16 +194,12 @@ int __meminit vmemmap_populate(unsigned long start, 
> unsigned long end, int node)
> pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node);
>
> for (; start < end; start += page_size) {
> -   struct vmem_altmap *altmap;
> void *p;
> int rc;
>
> if (vmemmap_populated(start, page_size))
> continue;
>
> -   /* altmap lookups only work at section boundaries */
> -   altmap = to_vmem_altmap(SECTION_ALIGN_DOWN(start));
> -
> p =  __vmemmap_alloc_block_buf(page_size, node, altmap);
> if (!p)
> return -ENOMEM;
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index 3316d463fc29..c44ef0e7c466 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -211,7 +211,8 @@ static void vmem_remove_range(unsigned long start, 
> unsigned long size)
>  /*
>   * Add a backed mem_map array to the virtual mem_map array.
>   */
> -int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
> node)
> +int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
> node,
> +   struct vmem_altmap *altmap)
>  {
> unsigned long pgt_prot, sgt_prot;
> unsigned long address = start;
> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
> index 55ba62957e64..42d27a1a042a 100644
> --- a/arch/sparc/mm/init_64.c
> +++ b/arch/sparc/mm/init_64.c
> @@ -2628,7 +2628,7 @@ EXPORT_SYMBOL(_PAGE_CACHE);
>
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  int __meminit vmemmap_pop

Re: [PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages

2017-12-15 Thread Dan Williams
On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig  wrote:
> We can just pass this on instead of having to do a radix tree lookup
> without proper locking 2 levels into the callchain.
>
> Signed-off-by: Christoph Hellwig 

Yeah, the lookup of vmem_altmap is too magical and surprising this is better.

Reviewed-by: Dan Williams 


Re: [PATCH 03/17] mm: don't export __add_pages

2017-12-15 Thread Dan Williams
On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig  wrote:
> This function isn't used by any modules, and is only to be called
> from core MM code.  This includes the calls for the add_pages wrapper
> that might be inlined.
>
> Signed-off-by: Christoph Hellwig 

Looks good,

Reviewed-by: Dan Williams 


Re: [PATCH 02/17] mm: don't export arch_add_memory

2017-12-15 Thread Dan Williams
On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig  wrote:
> Only x86_64 and sh export this symbol, and it is not used by any
> modular code.
>
> Signed-off-by: Christoph Hellwig 

Looks good,

Reviewed-by: Dan Williams 


Re: [PATCH 01/17] memremap: provide stubs for vmem_altmap_offset and vmem_altmap_free

2017-12-15 Thread Dan Williams
On Fri, Dec 15, 2017 at 6:09 AM, Christoph Hellwig  wrote:
> Currently all calls to those functions are eliminated by the compiler when
> CONFIG_ZONE_DEVICE is not set, but this soon won't be the case.
>
> Signed-off-by: Christoph Hellwig 

Looks good,

Reviewed-by: Dan Williams 


Re: [PATCH] Fix parse_args cycle limit check.

2017-12-15 Thread Randy Dunlap
On 12/15/2017 01:41 PM, Michal Suchanek wrote:
> Actually args are supposed to be renamed to next so both and args hold the
> previous argument so both can be passed to the callback. This additionla patch

additional

> should fix up the rename.

Would you try rewriting the first sentence, please? I don't get it.

> ---
>  kernel/params.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)


-- 
~Randy


[PATCH] PCI: Add #defines for Completion Timeout Disable feature

2017-12-15 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Add #defines for the Completion Timeout Disable feature and use them.  No
functional change intended.

Signed-off-by: Bjorn Helgaas 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c |6 +++---
 include/uapi/linux/pci_regs.h|2 ++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 4650fb294e7a..2f7cd0ef3cdc 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1654,14 +1654,14 @@ static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)
eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
  2, devctl);
 
-   /* Disable Completion Timeout */
+   /* Disable Completion Timeout if possible */
eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
 4, &cap2);
-   if (cap2 & 0x10) {
+   if (cap2 & PCI_EXP_DEVCAP2_COMP_TMOUT_DIS) {
eeh_ops->read_config(pdn,
 edev->pcie_cap + PCI_EXP_DEVCTL2,
 4, &cap2);
-   cap2 |= 0x10;
+   cap2 |= PCI_EXP_DEVCTL2_COMP_TMOUT_DIS;
eeh_ops->write_config(pdn,
  edev->pcie_cap + PCI_EXP_DEVCTL2,
  4, cap2);
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 70c2b2ade048..9dc67643fc18 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -622,6 +622,7 @@
  * safely.
  */
 #define PCI_EXP_DEVCAP236  /* Device Capabilities 2 */
+#define  PCI_EXP_DEVCAP2_COMP_TMOUT_DIS0x0010 /* Completion 
Timeout Disable supported */
 #define  PCI_EXP_DEVCAP2_ARI   0x0020 /* Alternative Routing-ID */
 #define  PCI_EXP_DEVCAP2_ATOMIC_ROUTE  0x0040 /* Atomic Op routing */
 #define PCI_EXP_DEVCAP2_ATOMIC_COMP64  0x0100 /* Atomic 64-bit compare */
@@ -631,6 +632,7 @@
 #define  PCI_EXP_DEVCAP2_OBFF_WAKE 0x0008 /* Re-use WAKE# for OBFF */
 #define PCI_EXP_DEVCTL240  /* Device Control 2 */
 #define  PCI_EXP_DEVCTL2_COMP_TIMEOUT  0x000f  /* Completion Timeout Value */
+#define  PCI_EXP_DEVCTL2_COMP_TMOUT_DIS0x0010  /* Completion Timeout 
Disable */
 #define  PCI_EXP_DEVCTL2_ARI   0x0020  /* Alternative Routing-ID */
 #define PCI_EXP_DEVCTL2_ATOMIC_REQ 0x0040  /* Set Atomic requests */
 #define PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK 0x0080 /* Block atomic egress */



[PATCH] Optimize final quote removal.

2017-12-15 Thread Michal Suchanek
This is additional patch that avoids the memmove when processing the quote on
the end of the parameter.

---
 lib/cmdline.c   | 9 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/cmdline.c b/lib/cmdline.c
index c5335a79a177..b1d8a0dc60fc 100644
--- a/lib/cmdline.c
+++ b/lib/cmdline.c
@@ -191,7 +191,13 @@ bool parse_option_str(const char *str, const char *option)
return false;
 }
 
+#define break_arg_end(i) { \
+   if (isspace(args[i]) && !in_quote && !backslash && !in_single) \
+   break; \
+   }
+
 #define squash_char { \
+   break_arg_end(i + 1); \
memmove(args + 1, args, i); \
args++; \
i--; \
@@ -209,8 +215,7 @@ char *next_arg(char *args, char **param, char **val)
char *next;
 
for (i = 0; args[i]; i++) {
-   if (isspace(args[i]) && !in_quote && !backslash && !in_single)
-   break;
+   break_arg_end(i);
 
if ((equals == 0) && (args[i] == '='))
equals = i;
-- 
2.13.6



[PATCH] Fix parse_args cycle limit check.

2017-12-15 Thread Michal Suchanek
Actually args are supposed to be renamed to next so both and args hold the
previous argument so both can be passed to the callback. This additionla patch
should fix up the rename.

---
 kernel/params.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/params.c b/kernel/params.c
index 69ff58e69887..efb4dfaa6bc5 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -182,17 +182,18 @@ char *parse_args(const char *doing,
 
if (*args)
pr_debug("doing %s, parsing ARGS: '%s'\n", doing, args);
+   else
+   return err;
 
-   next = next_arg(args, ¶m, &val);
-   while (*next) {
+   do {
int ret;
int irq_was_disabled;
 
-   args = next;
next = next_arg(args, ¶m, &val);
+
/* Stop at -- */
if (!val && strcmp(param, "--") == 0)
-   return err ?: args;
+   return err ?: next;
irq_was_disabled = irqs_disabled();
ret = parse_one(param, val, args, next, doing, params, num,
min_level, max_level, arg, unknown);
@@ -215,9 +216,10 @@ char *parse_args(const char *doing,
   doing, val ?: "", param);
break;
}
-
err = ERR_PTR(ret);
-   }
+
+   args = next;
+   } while (*args);
 
return err;
 }
-- 
2.13.6



Re: Mac Mini G4 defconfig ?

2017-12-15 Thread Mathieu Malaterre
On Fri, Dec 15, 2017 at 9:52 PM, Mathieu Malaterre  wrote:
> On Fri, Dec 15, 2017 at 8:50 PM, Mathieu Malaterre  wrote:
>> Hi there,
>>
>> Does anyone has working defconfig for a Mac Mini G4 ?
>>
>> Here is what I tried:
>>
>> $ cat ./arch/powerpc/configs/g4_defconfig
>> CONFIG_PPC_FPU=y
>> CONFIG_ALTIVEC=y
>> $ make ARCH=powerpc g4_defconfig
>> $ make -j8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- V=1
>> set -e; : '  CHK include/config/kernel.release'; mkdir -p
>
> That is odd. Doing a quick git bisect:
>
> $ git checkout 3298b690b21cdbe6b2ae8076d9147027f396f2b1
> $ make -n ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -f ./Makefile
> silentoldconfig V=1
> + make -n ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -f ./Makefile
> silentoldconfig V=1
> /bin/sh: line 0: [: -ge: unary operator expected
> make -f ./scripts/Makefile.build obj=scripts/basic
>
> I cannot make sense of this shell error, maybe this is unrelated but
> things start breaking around this commit.

Even if I discard this shell error, the next error comes with:

433dc2ebe7d17dd21cba7ad5c362d37323592236 is the first bad commit

With:

$ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- V=1
[...]
  powerpc-linux-gnu-gcc -Wp,-MD,kernel/.bounds.s.d  -nostdinc -isystem
 -I./arch/powerpc/include -I./arch/powerpc/include/generated
-I./include -I./arch/powerpc/include/uapi
-I./arch/powerpc/include/generated/uapi -I./include/uapi
-I./include/generated/uapi -include ./include/linux/kconfig.h
-D__KERNEL__ -Iarch/powerpc -Wall -Wundef -Wstrict-prototypes
-Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar
-Werror-implicit-function-declaration -Wno-format-security -std=gnu89
-pipe -Iarch/powerpc -ffixed-r2 -mmultiple -mcpu=powerpc -Wa,-maltivec
-mbig-endian -fno-delete-null-pointer-checks -Wno-frame-address -O2
-Wno-maybe-uninitialized --param=allow-store-data-races=0
-DCC_HAVE_ASM_GOTO -Wframe-larger-than=1024 -fno-stack-protector
-Wno-unused-but-set-variable -Wno-unused-const-variable
-fomit-frame-pointer -fno-var-tracking-assignments
-Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow
-fconserve-stack -Werror=implicit-int -Werror=strict-prototypes
-Werror=date-time -Werror=incompatible-pointer-types
-Werror=designated-init-DKBUILD_BASENAME='"bounds"'
-DKBUILD_MODNAME='"bounds"'  -fverbose-asm -S -o kernel/bounds.s
kernel/bounds.c
In file included from ./include/linux/page-flags.h:9:0,
 from kernel/bounds.c:9:
./include/linux/bug.h:4:21: fatal error: asm/bug.h: No such file or directory
 #include 
 ^
compilation terminated.
Kbuild:20: recipe for target 'kernel/bounds.s' failed
make[1]: *** [kernel/bounds.s] Error 1
Makefile:1051: recipe for target 'prepare0' failed
make: *** [prepare0] Error 2


Comments ?


Re: Mac Mini G4 defconfig ?

2017-12-15 Thread Mathieu Malaterre
On Fri, Dec 15, 2017 at 8:50 PM, Mathieu Malaterre  wrote:
> Hi there,
>
> Does anyone has working defconfig for a Mac Mini G4 ?
>
> Here is what I tried:
>
> $ cat ./arch/powerpc/configs/g4_defconfig
> CONFIG_PPC_FPU=y
> CONFIG_ALTIVEC=y
> $ make ARCH=powerpc g4_defconfig
> $ make -j8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- V=1
> set -e; : '  CHK include/config/kernel.release'; mkdir -p

That is odd. Doing a quick git bisect:

$ git checkout 3298b690b21cdbe6b2ae8076d9147027f396f2b1
$ make -n ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -f ./Makefile
silentoldconfig V=1
+ make -n ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -f ./Makefile
silentoldconfig V=1
/bin/sh: line 0: [: -ge: unary operator expected
make -f ./scripts/Makefile.build obj=scripts/basic

I cannot make sense of this shell error, maybe this is unrelated but
things start breaking around this commit.


Re: [PATCH v9 1/8] lib/cmdline.c: remove quotes symmetrically

2017-12-15 Thread Michal Suchánek
On Wed, 15 Nov 2017 20:46:56 +0530
Hari Bathini  wrote:

> From: Michal Suchanek 
> 
> Remove quotes from argument value only if there is qoute on both
> sides.
> 
> Signed-off-by: Michal Suchanek 
> ---
>  lib/cmdline.c |   10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/cmdline.c b/lib/cmdline.c
> index 171c19b..6d398a8 100644
> --- a/lib/cmdline.c
> +++ b/lib/cmdline.c
> @@ -227,14 +227,12 @@ char *next_arg(char *args, char **param, char
> **val) *val = args + equals + 1;
>  
>   /* Don't include quotes in value. */
> - if (**val == '"') {
> - (*val)++;
> - if (args[i-1] == '"')
> - args[i-1] = '\0';
> + if ((args[i-1] == '"') && ((quoted) || (**val ==
> '"'))) {
> + args[i-1] = '\0';
> + if (!quoted)
> + (*val)++;
>   }
>   }
> - if (quoted && args[i-1] == '"')
> - args[i-1] = '\0';
>  
>   if (args[i]) {
>   args[i] = '\0';
> 

This was only useful as separate patch with the incremental fadump
update. Since the fadump update is squashed in this refresh series this
can be squashed with the following lib/cmdline patch as well.

Thanks

Michal


Re: [PATCH v9 2/8] boot/param: add pointer to current and next argument to unknown parameter callback

2017-12-15 Thread Michal Suchánek
Hello,

On Wed, 15 Nov 2017 20:47:14 +0530
Hari Bathini  wrote:

> From: Michal Suchanek 
> 
> Add pointer to current and next argument to make parameter processing
> more robust. This can make parameter processing easier and less error
> prone in cases where the parameters need to be enforced/ignored based
> on firmware/system state.
> 
> Signed-off-by: Michal Suchanek 
> Signed-off-by: Hari Bathini 

> @@ -179,16 +183,18 @@ char *parse_args(const char *doing,
>   if (*args)
>   pr_debug("doing %s, parsing ARGS: '%s'\n", doing,
> args); 
> - while (*args) {
> + next = next_arg(args, ¶m, &val);
> + while (*next) {
>   int ret;
>   int irq_was_disabled;
>  
> - args = next_arg(args, ¶m, &val);
> + args = next;
> + next = next_arg(args, ¶m, &val);
>   /* Stop at -- */

The [PATCH v8 5/6] you refreshed here moves the while(*next) to the end
of the cycle for a reason. Checking *args at the start is mostly
equivalent checking *next at the end. Checking *next at the start on
the other hand skips the last argument.

The "mostly" part is that there is a bug here because *args is not
checked at the start of the cycle making it possible to crash if it is
0. To fix that the if(*args) above should be extended to wrap the cycle.

Thanks

Michal


Re: [PATCH] SB600 for the Nemo board has non-zero devices on non-root bus

2017-12-15 Thread Bjorn Helgaas
On Fri, Dec 15, 2017 at 09:04:51AM +0100, Christian Zigotzky wrote:
> On 09 December 2017 at 7:03PM, Christian Zigotzky wrote:
> > On 08 December 2017 at 12:59PM, Michael Ellerman wrote:
> > >
> > >> Darren's idea of doing it at the same time you tweak the SB600 "relax
> > >> pci-e" bit is ideal because then the two pieces are obviously
> > >> connected and it wouldn't affect any other systems at all.
> > >
> > > Yes that would be ideal. That patch is currently out-of-tree I gather,
> > > but I guess everyone who's using these machines must have that patch
> > > anyway.
> > >
> > > Darren what does that code look like? Can we get it upstream and close
> > > the loop on this?
> > >
> > > cheers
> > >
> >
> > Hi Michael,
> >
> > Please find attached the code.
> >
> > Thanks,
> > Christian
> 
> Hi All,
> 
> I haven't received any response yet. Is this the correct patch you
> are looking for?

This is a powerpc patch that doesn't affect the PCI core, so I would
say this is Michael's bailiwick.

I guess you're only looking for a hint about whether this is the right
approach, because it's obviously fully baked yet (no changelog,
signed-off-by, etc, not a "safe for all powerpc" run-time solution,
not in Linux indentation style, etc).

It looks like the "pasemi,1682m-iob" DT property is required and
possibly sufficient to identify this system at run-time.

My advice is to finish that stuff up, post it to the powerpc
maintainers and the linuxppc-dev@lists.ozlabs.org list, and go from
there.

> diff -rupN a/arch/powerpc/platforms/pasemi/pci.c 
> b/arch/powerpc/platforms/pasemi/pci.c
> --- a/arch/powerpc/platforms/pasemi/pci.c 2017-11-16 08:18:35.078874462 
> +0100
> +++ b/arch/powerpc/platforms/pasemi/pci.c 2017-11-16 08:17:22.034367975 
> +0100
> @@ -27,6 +27,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -108,6 +109,69 @@ static int workaround_5945(struct pci_bu
>   return 1;
>  }
>  
> +#ifdef CONFIG_PPC_PASEMI_NEMO
> +static int sb600_bus = 5;
> +static void __iomem *iob_mapbase = NULL;
> +
> +static int pa_pxp_read_config(struct pci_bus *bus, unsigned int devfn,
> + int offset, int len, u32 *val);
> +
> +static void sb600_set_flag(int bus)
> +{
> +struct resource res;
> +struct device_node *dn;
> +   struct pci_bus *busp;
> +   u32 val;
> +   int err;
> +
> +   if (sb600_bus == -1)
> +   {
> +   busp = pci_find_bus(0, 0);
> +   pa_pxp_read_config(busp, PCI_DEVFN(17,0), PCI_SECONDARY_BUS, 
> 1, &val);
> +
> +   sb600_bus = val;
> +
> +   printk(KERN_CRIT "NEMO SB600 on bus %d.\n",sb600_bus);
> +   }
> +
> +   if (iob_mapbase == NULL)
> +   {
> +dn = of_find_compatible_node(NULL, "isa", "pasemi,1682m-iob");
> +if (!dn)
> +{
> +   printk(KERN_CRIT "NEMO SB600 missing iob node\n");
> +   return;
> +   }
> +
> +   err = of_address_to_resource(dn, 0, &res);
> +of_node_put(dn);
> +
> +   if (err)
> +   {
> +   printk(KERN_CRIT "NEMO SB600 missing resource\n");
> +   return;
> +   }
> +
> +   printk(KERN_CRIT "NEMO SB600 IOB base %08lx\n",res.start);
> +
> +   iob_mapbase = ioremap(res.start + 0x100, 0x94);
> +   }
> +
> +   if (iob_mapbase != NULL)
> +   {
> +   if (bus == sb600_bus)
> +   {
> +   out_le32(iob_mapbase + 4, in_le32(iob_mapbase + 4) | 
> 0x800);
> +   }
> +   else
> +   {
> +   out_le32(iob_mapbase + 4, in_le32(iob_mapbase + 4) & 
> ~0x800);
> +   }
> +   }
> +}
> +#endif
> +
> +
>  static int pa_pxp_read_config(struct pci_bus *bus, unsigned int devfn,
> int offset, int len, u32 *val)
>  {
> @@ -126,6 +190,10 @@ static int pa_pxp_read_config(struct pci
>  
>   addr = pa_pxp_cfg_addr(hose, bus->number, devfn, offset);
>  
> +#ifdef CONFIG_PPC_PASEMI_NEMO
> +   sb600_set_flag(bus->number);
> +#endif
> +
>   /*
>* Note: the caller has already checked that offset is
>* suitably aligned and that len is 1, 2 or 4.
> @@ -210,6 +278,9 @@ static int __init pas_add_bridge(struct
>   /* Interpret the "ranges" property */
>   pci_process_bridge_OF_ranges(hose, dev, 1);
>  
> + /* Scan for an isa bridge. */
> + isa_bridge_find_early(hose);
> +
>   return 0;
>  }



Mac Mini G4 defconfig ?

2017-12-15 Thread Mathieu Malaterre
Hi there,

Does anyone has working defconfig for a Mac Mini G4 ?

Here is what I tried:

$ cat ./arch/powerpc/configs/g4_defconfig
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
$ make ARCH=powerpc g4_defconfig
$ make -j8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- V=1
set -e; : '  CHK include/config/kernel.release'; mkdir -p
include/config/; echo "4.15.0-rc3$(/bin/sh ./scripts/setlocalversion
.)" < include/config/auto.conf > include/config/kernel.release.tmp; if
[ -r include/config/kernel.release ] && cmp -s
include/config/kernel.release include/config/kernel.release.tmp; then
rm -f include/config/kernel.release.tmp; else : '  UPD
include/config/kernel.release'; mv -f
include/config/kernel.release.tmp include/config/kernel.release; fi
make -f ./scripts/Makefile.asm-generic \
src=uapi/asm obj=arch/powerpc/include/generated/uapi/asm
set -e; : '  CHK include/generated/uapi/linux/version.h'; mkdir -p
include/generated/uapi/linux/; (echo \#define LINUX_VERSION_CODE
265984; echo '#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8)
+ (c))';) < Makefile > include/generated/uapi/linux/version.h.tmp; if
[ -r include/generated/uapi/linux/version.h ] && cmp -s
include/generated/uapi/linux/version.h
include/generated/uapi/linux/version.h.tmp; then rm -f
include/generated/uapi/linux/version.h.tmp; else : '  UPD
include/generated/uapi/linux/version.h'; mv -f
include/generated/uapi/linux/version.h.tmp
include/generated/uapi/linux/version.h; fi
make -f ./scripts/Makefile.build obj=scripts/basic
rm -f include/linux/version.h
make -f ./scripts/Makefile.asm-generic \
src=asm obj=arch/powerpc/include/generated/asm
(cat /dev/null; ) > scripts/basic/modules.order
rm -f .tmp_quiet_recordmcount
make -f ./scripts/Makefile.build obj=scripts
make -f ./scripts/Makefile.build obj=scripts/dtc need-builtin=
make -f ./scripts/Makefile.build obj=scripts/mod need-builtin=
(cat /dev/null; ) > scripts/mod/modules.order
(cat /dev/null; ) > scripts/dtc/modules.order
  powerpc-linux-gnu-gcc -Wp,-MD,scripts/mod/.devicetable-offsets.s.d
-nostdinc -isystem  -I./arch/powerpc/include
-I./arch/powerpc/include/generated  -I./include
-I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include
./include/linux/kconfig.h -D__KERNEL__ -Iarch/powerpc -Wall -Wundef
-Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
-fshort-wchar -Werror-implicit-function-declaration
-Wno-format-security -std=gnu89 -pipe -Iarch/powerpc -ffixed-r2
-mmultiple -mcpu=powerpc -O2 -fomit-frame-pointer
-DKBUILD_BASENAME='"devicetable_offsets"'
-DKBUILD_MODNAME='"devicetable_offsets"'  -fverbose-asm -S -o
scripts/mod/devicetable-offsets.s scripts/mod/devicetable-offsets.c
In file included from ./include/linux/string.h:6:0,
 from ./include/uapi/linux/uuid.h:22,
 from ./include/linux/uuid.h:19,
 from ./include/linux/mod_devicetable.h:13,
 from scripts/mod/devicetable-offsets.c:3:
./include/linux/compiler.h:242:25: fatal error: asm/barrier.h: No such
file or directory
 #include 
 ^
compilation terminated.
scripts/Makefile.build:150: recipe for target
'scripts/mod/devicetable-offsets.s' failed
make[2]: *** [scripts/mod/devicetable-offsets.s] Error 1
scripts/Makefile.build:569: recipe for target 'scripts/mod' failed
make[1]: *** [scripts/mod] Error 2
Makefile:576: recipe for target 'scripts' failed
make: *** [scripts] Error 2
make: *** Waiting for unfinished jobs


[PATCH 17/17] memremap: merge find_dev_pagemap into get_dev_pagemap

2017-12-15 Thread Christoph Hellwig
There is only one caller of the trivial function find_dev_pagemap left,
so just merge it into the caller.

Signed-off-by: Christoph Hellwig 
---
 kernel/memremap.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index fd0e7c44e6bd..c04000361664 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -306,14 +306,6 @@ static void devm_memremap_pages_release(void *data)
  "%s: failed to free all reserved pages\n", __func__);
 }
 
-/* assumes rcu_read_lock() held at entry */
-static struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
-{
-   WARN_ON_ONCE(!rcu_read_lock_held());
-
-   return radix_tree_lookup(&pgmap_radix, PHYS_PFN(phys));
-}
-
 /**
  * devm_memremap_pages - remap and provide memmap backing for the given 
resource
  * @dev: hosting device for @res
@@ -466,7 +458,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 
/* fall back to slow path lookup */
rcu_read_lock();
-   pgmap = find_dev_pagemap(phys);
+   pgmap = radix_tree_lookup(&pgmap_radix, PHYS_PFN(phys));
if (pgmap && !percpu_ref_tryget_live(pgmap->ref))
pgmap = NULL;
rcu_read_unlock();
-- 
2.14.2



[PATCH 16/17] memremap: change devm_memremap_pages interface to use struct dev_pagemap

2017-12-15 Thread Christoph Hellwig
From: Logan Gunthorpe 

This new interface is similar to how struct device (and many others)
work. The caller initializes a 'struct dev_pagemap' as required
and calls 'devm_memremap_pages'. This allows the pagemap structure to
be embedded in another structure and thus container_of can be used. In
this way application specific members can be stored in a containing
struct.

This will be used by the P2P infrastructure and HMM could probably
be cleaned up to use it as well (instead of having it's own, similar
'hmm_devmem_pages_create' function).

Signed-off-by: Logan Gunthorpe 
Signed-off-by: Christoph Hellwig 
---
 drivers/dax/pmem.c| 20 +---
 drivers/nvdimm/nd.h   |  9 ---
 drivers/nvdimm/pfn_devs.c | 25 ++--
 drivers/nvdimm/pmem.c | 37 -
 drivers/nvdimm/pmem.h |  1 +
 include/linux/memremap.h  |  6 ++---
 kernel/memremap.c | 50 ---
 tools/testing/nvdimm/test/iomap.c |  7 +++---
 8 files changed, 75 insertions(+), 80 deletions(-)

diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index 8d8c852ba8f2..31b6ecce4c64 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -21,6 +21,7 @@
 struct dax_pmem {
struct device *dev;
struct percpu_ref ref;
+   struct dev_pagemap pgmap;
struct completion cmp;
 };
 
@@ -69,20 +70,23 @@ static int dax_pmem_probe(struct device *dev)
struct nd_namespace_common *ndns;
struct nd_dax *nd_dax = to_nd_dax(dev);
struct nd_pfn *nd_pfn = &nd_dax->nd_pfn;
-   struct vmem_altmap __altmap, *altmap = NULL;
 
ndns = nvdimm_namespace_common_probe(dev);
if (IS_ERR(ndns))
return PTR_ERR(ndns);
nsio = to_nd_namespace_io(&ndns->dev);
 
+   dax_pmem = devm_kzalloc(dev, sizeof(*dax_pmem), GFP_KERNEL);
+   if (!dax_pmem)
+   return -ENOMEM;
+
/* parse the 'pfn' info block via ->rw_bytes */
rc = devm_nsio_enable(dev, nsio);
if (rc)
return rc;
-   altmap = nvdimm_setup_pfn(nd_pfn, &res, &__altmap);
-   if (IS_ERR(altmap))
-   return PTR_ERR(altmap);
+   rc = nvdimm_setup_pfn(nd_pfn, &dax_pmem->pgmap);
+   if (rc)
+   return rc;
devm_nsio_disable(dev, nsio);
 
pfn_sb = nd_pfn->pfn_sb;
@@ -94,10 +98,6 @@ static int dax_pmem_probe(struct device *dev)
return -EBUSY;
}
 
-   dax_pmem = devm_kzalloc(dev, sizeof(*dax_pmem), GFP_KERNEL);
-   if (!dax_pmem)
-   return -ENOMEM;
-
dax_pmem->dev = dev;
init_completion(&dax_pmem->cmp);
rc = percpu_ref_init(&dax_pmem->ref, dax_pmem_percpu_release, 0,
@@ -110,7 +110,8 @@ static int dax_pmem_probe(struct device *dev)
if (rc)
return rc;
 
-   addr = devm_memremap_pages(dev, &res, &dax_pmem->ref, altmap);
+   dax_pmem->pgmap.ref = &dax_pmem->ref;
+   addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
if (IS_ERR(addr))
return PTR_ERR(addr);
 
@@ -120,6 +121,7 @@ static int dax_pmem_probe(struct device *dev)
return rc;
 
/* adjust the dax_region resource to the start of data */
+   memcpy(&res, &dax_pmem->pgmap.res, sizeof(res));
res.start += le64_to_cpu(pfn_sb->dataoff);
 
rc = sscanf(dev_name(&ndns->dev), "namespace%d.%d", ®ion_id, &id);
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index e958f3724c41..8d6375ee0fda 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -368,15 +368,14 @@ unsigned int pmem_sector_size(struct nd_namespace_common 
*ndns);
 void nvdimm_badblocks_populate(struct nd_region *nd_region,
struct badblocks *bb, const struct resource *res);
 #if IS_ENABLED(CONFIG_ND_CLAIM)
-struct vmem_altmap *nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
-   struct resource *res, struct vmem_altmap *altmap);
+int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
 int devm_nsio_enable(struct device *dev, struct nd_namespace_io *nsio);
 void devm_nsio_disable(struct device *dev, struct nd_namespace_io *nsio);
 #else
-static inline struct vmem_altmap *nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
-   struct resource *res, struct vmem_altmap *altmap)
+static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
+  struct dev_pagemap *pgmap)
 {
-   return ERR_PTR(-ENXIO);
+   return -ENXIO;
 }
 static inline int devm_nsio_enable(struct device *dev,
struct nd_namespace_io *nsio)
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 65cc171c721d..6f58615ddb85 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -541,9 +541,10 @@ static unsigned long init_altmap_reserve(resource_size_t 
base)
return reserve;
 }
 
-static struct vmem_altmap *__nvdimm

[PATCH 15/17] memremap: drop private struct page_map

2017-12-15 Thread Christoph Hellwig
From: Logan Gunthorpe 

'struct page_map' is a private structure of 'struct dev_pagemap' but the
latter replicates all the same fields as the former so there isn't much
value in it. Thus drop it in favour of a completely public struct.

This is a clean up in preperation for a more generally useful
'devm_memeremap_pages' interface.

Signed-off-by: Logan Gunthorpe 
Signed-off-by: Christoph Hellwig 
---
 include/linux/memremap.h |  5 ++--
 kernel/memremap.c| 66 ++--
 mm/hmm.c |  2 +-
 3 files changed, 29 insertions(+), 44 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 3fddcfe57bb0..1cb5f39d25c1 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -113,8 +113,9 @@ typedef void (*dev_page_free_t)(struct page *page, void 
*data);
 struct dev_pagemap {
dev_page_fault_t page_fault;
dev_page_free_t page_free;
-   struct vmem_altmap *altmap;
-   const struct resource *res;
+   struct vmem_altmap altmap;
+   bool altmap_valid;
+   struct resource res;
struct percpu_ref *ref;
struct device *dev;
void *data;
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 901404094df1..97782215bbd4 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -188,13 +188,6 @@ static RADIX_TREE(pgmap_radix, GFP_KERNEL);
 #define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
 #define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
 
-struct page_map {
-   struct resource res;
-   struct percpu_ref *ref;
-   struct dev_pagemap pgmap;
-   struct vmem_altmap altmap;
-};
-
 static unsigned long order_at(struct resource *res, unsigned long pgoff)
 {
unsigned long phys_pgoff = PHYS_PFN(res->start) + pgoff;
@@ -260,22 +253,21 @@ static void pgmap_radix_release(struct resource *res)
synchronize_rcu();
 }
 
-static unsigned long pfn_first(struct page_map *page_map)
+static unsigned long pfn_first(struct dev_pagemap *pgmap)
 {
-   struct dev_pagemap *pgmap = &page_map->pgmap;
-   const struct resource *res = &page_map->res;
-   struct vmem_altmap *altmap = pgmap->altmap;
+   const struct resource *res = &pgmap->res;
+   struct vmem_altmap *altmap = &pgmap->altmap;
unsigned long pfn;
 
pfn = res->start >> PAGE_SHIFT;
-   if (altmap)
+   if (pgmap->altmap_valid)
pfn += vmem_altmap_offset(altmap);
return pfn;
 }
 
-static unsigned long pfn_end(struct page_map *page_map)
+static unsigned long pfn_end(struct dev_pagemap *pgmap)
 {
-   const struct resource *res = &page_map->res;
+   const struct resource *res = &pgmap->res;
 
return (res->start + resource_size(res)) >> PAGE_SHIFT;
 }
@@ -285,13 +277,12 @@ static unsigned long pfn_end(struct page_map *page_map)
 
 static void devm_memremap_pages_release(struct device *dev, void *data)
 {
-   struct page_map *page_map = data;
-   struct resource *res = &page_map->res;
+   struct dev_pagemap *pgmap = data;
+   struct resource *res = &pgmap->res;
resource_size_t align_start, align_size;
-   struct dev_pagemap *pgmap = &page_map->pgmap;
unsigned long pfn;
 
-   for_each_device_pfn(pfn, page_map)
+   for_each_device_pfn(pfn, pgmap)
put_page(pfn_to_page(pfn));
 
if (percpu_ref_tryget_live(pgmap->ref)) {
@@ -304,24 +295,22 @@ static void devm_memremap_pages_release(struct device 
*dev, void *data)
align_size = ALIGN(resource_size(res), SECTION_SIZE);
 
mem_hotplug_begin();
-   arch_remove_memory(align_start, align_size, pgmap->altmap);
+   arch_remove_memory(align_start, align_size, pgmap->altmap_valid ?
+   &pgmap->altmap : NULL);
mem_hotplug_done();
 
untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
pgmap_radix_release(res);
-   dev_WARN_ONCE(dev, pgmap->altmap && pgmap->altmap->alloc,
-   "%s: failed to free all reserved pages\n", __func__);
+   dev_WARN_ONCE(dev, pgmap->altmap.alloc,
+ "%s: failed to free all reserved pages\n", __func__);
 }
 
 /* assumes rcu_read_lock() held at entry */
 static struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
 {
-   struct page_map *page_map;
-
WARN_ON_ONCE(!rcu_read_lock_held());
 
-   page_map = radix_tree_lookup(&pgmap_radix, PHYS_PFN(phys));
-   return page_map ? &page_map->pgmap : NULL;
+   return radix_tree_lookup(&pgmap_radix, PHYS_PFN(phys));
 }
 
 /**
@@ -349,7 +338,6 @@ void *devm_memremap_pages(struct device *dev, struct 
resource *res,
unsigned long pfn, pgoff, order;
pgprot_t pgprot = PAGE_KERNEL;
struct dev_pagemap *pgmap;
-   struct page_map *page_map;
int error, nid, is_ram, i = 0;
 
align_start = res->start & ~(SECTION_SIZE - 1);
@@ -370,21 +358,19 @@ void *devm_memremap_pages(struct device *dev, stru

[PATCH 14/17] memremap: simplify duplicate region handling in devm_memremap_pages

2017-12-15 Thread Christoph Hellwig
__radix_tree_insert already checks for duplicates and returns -EEXIST in
that case, so remove the duplicate (and racy) duplicates check.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Logan Gunthorpe 
---
 kernel/memremap.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 891491ddccdb..901404094df1 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -395,17 +395,6 @@ void *devm_memremap_pages(struct device *dev, struct 
resource *res,
align_end = align_start + align_size - 1;
 
foreach_order_pgoff(res, order, pgoff) {
-   struct dev_pagemap *dup;
-
-   rcu_read_lock();
-   dup = find_dev_pagemap(res->start + PFN_PHYS(pgoff));
-   rcu_read_unlock();
-   if (dup) {
-   dev_err(dev, "%s: %pr collides with mapping for %s\n",
-   __func__, res, dev_name(dup->dev));
-   error = -EBUSY;
-   break;
-   }
error = __radix_tree_insert(&pgmap_radix,
PHYS_PFN(res->start) + pgoff, order, page_map);
if (error) {
-- 
2.14.2



[PATCH 13/17] memremap: remove to_vmem_altmap

2017-12-15 Thread Christoph Hellwig
All callers are gone now.

Signed-off-by: Christoph Hellwig 
---
 include/linux/memremap.h |  9 -
 kernel/memremap.c| 26 --
 2 files changed, 35 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 26e8aaba27d5..3fddcfe57bb0 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -26,15 +26,6 @@ struct vmem_altmap {
unsigned long alloc;
 };
 
-#ifdef CONFIG_ZONE_DEVICE
-struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start);
-#else
-static inline struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
-{
-   return NULL;
-}
-#endif
-
 /*
  * Specialize ZONE_DEVICE memory into multiple types each having differents
  * usage.
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 26764085785d..891491ddccdb 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -475,32 +475,6 @@ void vmem_altmap_free(struct vmem_altmap *altmap, unsigned 
long nr_pfns)
altmap->alloc -= nr_pfns;
 }
 
-struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start)
-{
-   /*
-* 'memmap_start' is the virtual address for the first "struct
-* page" in this range of the vmemmap array.  In the case of
-* CONFIG_SPARSEMEM_VMEMMAP a page_to_pfn conversion is simple
-* pointer arithmetic, so we can perform this to_vmem_altmap()
-* conversion without concern for the initialization state of
-* the struct page fields.
-*/
-   struct page *page = (struct page *) memmap_start;
-   struct dev_pagemap *pgmap;
-
-   /*
-* Unconditionally retrieve a dev_pagemap associated with the
-* given physical address, this is only for use in the
-* arch_{add|remove}_memory() for setting up and tearing down
-* the memmap.
-*/
-   rcu_read_lock();
-   pgmap = find_dev_pagemap(__pfn_to_phys(page_to_pfn(page)));
-   rcu_read_unlock();
-
-   return pgmap ? pgmap->altmap : NULL;
-}
-
 /**
  * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn
  * @pfn: page frame number to lookup page_map
-- 
2.14.2



[PATCH 12/17] mm: optimize dev_pagemap reference counting around get_dev_pagemap

2017-12-15 Thread Christoph Hellwig
Change the calling convention so that get_dev_pagemap always consumes the
previous reference instead of doing this using an explicit earlier call to
put_dev_pagemap in the callers.

The callers will still need to put the final reference after finishing the
loop over the pages.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Logan Gunthorpe 
---
 kernel/memremap.c | 17 +
 mm/gup.c  |  7 +--
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 43d94db97ff4..26764085785d 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -506,22 +506,23 @@ struct vmem_altmap *to_vmem_altmap(unsigned long 
memmap_start)
  * @pfn: page frame number to lookup page_map
  * @pgmap: optional known pgmap that already has a reference
  *
- * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
- * same mapping.
+ * If @pgmap is non-NULL and covers @pfn it will be returned as-is.  If @pgmap
+ * is non-NULL but does not cover @pfn the reference to it while be released.
  */
 struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
struct dev_pagemap *pgmap)
 {
-   const struct resource *res = pgmap ? pgmap->res : NULL;
resource_size_t phys = PFN_PHYS(pfn);
 
/*
-* In the cached case we're already holding a live reference so
-* we can simply do a blind increment
+* In the cached case we're already holding a live reference.
 */
-   if (res && phys >= res->start && phys <= res->end) {
-   percpu_ref_get(pgmap->ref);
-   return pgmap;
+   if (pgmap) {
+   const struct resource *res = pgmap ? pgmap->res : NULL;
+
+   if (res && phys >= res->start && phys <= res->end)
+   return pgmap;
+   put_dev_pagemap(pgmap);
}
 
/* fall back to slow path lookup */
diff --git a/mm/gup.c b/mm/gup.c
index d3fb60e5bfac..9d142eb9e2e9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1410,7 +1410,6 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
unsigned long end,
 
VM_BUG_ON_PAGE(compound_head(page) != head, page);
 
-   put_dev_pagemap(pgmap);
SetPageReferenced(page);
pages[*nr] = page;
(*nr)++;
@@ -1420,6 +1419,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
unsigned long end,
ret = 1;
 
 pte_unmap:
+   if (pgmap)
+   put_dev_pagemap(pgmap);
pte_unmap(ptem);
return ret;
 }
@@ -1459,10 +1460,12 @@ static int __gup_device_huge(unsigned long pfn, 
unsigned long addr,
SetPageReferenced(page);
pages[*nr] = page;
get_page(page);
-   put_dev_pagemap(pgmap);
(*nr)++;
pfn++;
} while (addr += PAGE_SIZE, addr != end);
+
+   if (pgmap)
+   put_dev_pagemap(pgmap);
return 1;
 }
 
-- 
2.14.2



[PATCH 11/17] mm: move get_dev_pagemap out of line

2017-12-15 Thread Christoph Hellwig
This is a pretty big function, which should be out of line in general,
and a no-op stub if CONFIG_ZONE_DEVICЕ is not set.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Logan Gunthorpe 
---
 include/linux/memremap.h | 39 ---
 kernel/memremap.c| 36 ++--
 2 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index d5a6736d9737..26e8aaba27d5 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -133,7 +133,8 @@ struct dev_pagemap {
 #ifdef CONFIG_ZONE_DEVICE
 void *devm_memremap_pages(struct device *dev, struct resource *res,
struct percpu_ref *ref, struct vmem_altmap *altmap);
-struct dev_pagemap *find_dev_pagemap(resource_size_t phys);
+struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
+   struct dev_pagemap *pgmap);
 
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
@@ -153,7 +154,8 @@ static inline void *devm_memremap_pages(struct device *dev,
return ERR_PTR(-ENXIO);
 }
 
-static inline struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
+static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
+   struct dev_pagemap *pgmap)
 {
return NULL;
 }
@@ -183,39 +185,6 @@ static inline bool is_device_public_page(const struct page 
*page)
 }
 #endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
 
-/**
- * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn
- * @pfn: page frame number to lookup page_map
- * @pgmap: optional known pgmap that already has a reference
- *
- * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
- * same mapping.
- */
-static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
-   struct dev_pagemap *pgmap)
-{
-   const struct resource *res = pgmap ? pgmap->res : NULL;
-   resource_size_t phys = PFN_PHYS(pfn);
-
-   /*
-* In the cached case we're already holding a live reference so
-* we can simply do a blind increment
-*/
-   if (res && phys >= res->start && phys <= res->end) {
-   percpu_ref_get(pgmap->ref);
-   return pgmap;
-   }
-
-   /* fall back to slow path lookup */
-   rcu_read_lock();
-   pgmap = find_dev_pagemap(phys);
-   if (pgmap && !percpu_ref_tryget_live(pgmap->ref))
-   pgmap = NULL;
-   rcu_read_unlock();
-
-   return pgmap;
-}
-
 static inline void put_dev_pagemap(struct dev_pagemap *pgmap)
 {
if (pgmap)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 8e85803b6b0e..43d94db97ff4 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -314,7 +314,7 @@ static void devm_memremap_pages_release(struct device *dev, 
void *data)
 }
 
 /* assumes rcu_read_lock() held at entry */
-struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
+static struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
 {
struct page_map *page_map;
 
@@ -500,8 +500,40 @@ struct vmem_altmap *to_vmem_altmap(unsigned long 
memmap_start)
 
return pgmap ? pgmap->altmap : NULL;
 }
-#endif /* CONFIG_ZONE_DEVICE */
 
+/**
+ * get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn
+ * @pfn: page frame number to lookup page_map
+ * @pgmap: optional known pgmap that already has a reference
+ *
+ * @pgmap allows the overhead of a lookup to be bypassed when @pfn lands in the
+ * same mapping.
+ */
+struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
+   struct dev_pagemap *pgmap)
+{
+   const struct resource *res = pgmap ? pgmap->res : NULL;
+   resource_size_t phys = PFN_PHYS(pfn);
+
+   /*
+* In the cached case we're already holding a live reference so
+* we can simply do a blind increment
+*/
+   if (res && phys >= res->start && phys <= res->end) {
+   percpu_ref_get(pgmap->ref);
+   return pgmap;
+   }
+
+   /* fall back to slow path lookup */
+   rcu_read_lock();
+   pgmap = find_dev_pagemap(phys);
+   if (pgmap && !percpu_ref_tryget_live(pgmap->ref))
+   pgmap = NULL;
+   rcu_read_unlock();
+
+   return pgmap;
+}
+#endif /* CONFIG_ZONE_DEVICE */
 
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE) ||  IS_ENABLED(CONFIG_DEVICE_PUBLIC)
 void put_zone_device_private_or_public_page(struct page *page)
-- 
2.14.2



[PATCH 10/17] mm: merge vmem_altmap_alloc into altmap_alloc_block_buf

2017-12-15 Thread Christoph Hellwig
There is no clear separation between the two, so merge them.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Logan Gunthorpe 
---
 mm/sparse-vmemmap.c | 45 -
 1 file changed, 16 insertions(+), 29 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d012c9e2811b..bd0276d5f66b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -107,33 +107,16 @@ static unsigned long __meminit vmem_altmap_nr_free(struct 
vmem_altmap *altmap)
 }
 
 /**
- * vmem_altmap_alloc - allocate pages from the vmem_altmap reservation
- * @altmap - reserved page pool for the allocation
- * @nr_pfns - size (in pages) of the allocation
+ * altmap_alloc_block_buf - allocate pages from the device page map
+ * @altmap:device page map
+ * @size:  size (in bytes) of the allocation
  *
- * Allocations are aligned to the size of the request
+ * Allocations are aligned to the size of the request.
  */
-static unsigned long __meminit vmem_altmap_alloc(struct vmem_altmap *altmap,
-   unsigned long nr_pfns)
-{
-   unsigned long pfn = vmem_altmap_next_pfn(altmap);
-   unsigned long nr_align;
-
-   nr_align = 1UL << find_first_bit(&nr_pfns, BITS_PER_LONG);
-   nr_align = ALIGN(pfn, nr_align) - pfn;
-
-   if (nr_pfns + nr_align > vmem_altmap_nr_free(altmap))
-   return ULONG_MAX;
-   altmap->alloc += nr_pfns;
-   altmap->align += nr_align;
-   return pfn + nr_align;
-}
-
 void * __meminit altmap_alloc_block_buf(unsigned long size,
struct vmem_altmap *altmap)
 {
-   unsigned long pfn, nr_pfns;
-   void *ptr;
+   unsigned long pfn, nr_pfns, nr_align;
 
if (size & ~PAGE_MASK) {
pr_warn_once("%s: allocations must be multiple of PAGE_SIZE 
(%ld)\n",
@@ -141,16 +124,20 @@ void * __meminit altmap_alloc_block_buf(unsigned long 
size,
return NULL;
}
 
+   pfn = vmem_altmap_next_pfn(altmap);
nr_pfns = size >> PAGE_SHIFT;
-   pfn = vmem_altmap_alloc(altmap, nr_pfns);
-   if (pfn < ULONG_MAX)
-   ptr = __va(__pfn_to_phys(pfn));
-   else
-   ptr = NULL;
+   nr_align = 1UL << find_first_bit(&nr_pfns, BITS_PER_LONG);
+   nr_align = ALIGN(pfn, nr_align) - pfn;
+   if (nr_pfns + nr_align > vmem_altmap_nr_free(altmap))
+   return NULL;
+
+   altmap->alloc += nr_pfns;
+   altmap->align += nr_align;
+   pfn += nr_align;
+
pr_debug("%s: pfn: %#lx alloc: %ld align: %ld nr: %#lx\n",
__func__, pfn, altmap->alloc, altmap->align, nr_pfns);
-
-   return ptr;
+   return __va(__pfn_to_phys(pfn));
 }
 
 void __meminit vmemmap_verify(pte_t *pte, int node,
-- 
2.14.2



[PATCH 09/17] mm: split altmap memory map allocation from normal case

2017-12-15 Thread Christoph Hellwig
No functional changes, just untangling the call chain.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Logan Gunthorpe 
---
 arch/powerpc/mm/init_64.c |  5 -
 arch/x86/mm/init_64.c |  5 -
 include/linux/mm.h|  9 ++---
 mm/sparse-vmemmap.c   | 15 +++
 4 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index db7d4e092157..7a2251d99ed3 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -200,7 +200,10 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
if (vmemmap_populated(start, page_size))
continue;
 
-   p =  __vmemmap_alloc_block_buf(page_size, node, altmap);
+   if (altmap)
+   p = altmap_alloc_block_buf(page_size, altmap);
+   else
+   p = vmemmap_alloc_block_buf(page_size, node);
if (!p)
return -ENOMEM;
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 37dd79646a8b..39c5051cf7c2 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1385,7 +1385,10 @@ static int __meminit vmemmap_populate_hugepages(unsigned 
long start,
if (pmd_none(*pmd)) {
void *p;
 
-   p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
+   if (altmap)
+   p = altmap_alloc_block_buf(PMD_SIZE, altmap);
+   else
+   p = vmemmap_alloc_block_buf(PMD_SIZE, node);
if (p) {
pte_t entry;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fd01135324b6..09637c353de0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2547,13 +2547,8 @@ pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long 
addr, int node);
 pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node);
 void *vmemmap_alloc_block(unsigned long size, int node);
 struct vmem_altmap;
-void *__vmemmap_alloc_block_buf(unsigned long size, int node,
-   struct vmem_altmap *altmap);
-static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
-{
-   return __vmemmap_alloc_block_buf(size, node, NULL);
-}
-
+void *vmemmap_alloc_block_buf(unsigned long size, int node);
+void *altmap_alloc_block_buf(unsigned long size, struct vmem_altmap *altmap);
 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
 int vmemmap_populate_basepages(unsigned long start, unsigned long end,
   int node);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 376dcf05a39c..d012c9e2811b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -74,7 +74,7 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int 
node)
 }
 
 /* need to make sure size is all the same during early stage */
-static void * __meminit alloc_block_buf(unsigned long size, int node)
+void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node)
 {
void *ptr;
 
@@ -129,7 +129,7 @@ static unsigned long __meminit vmem_altmap_alloc(struct 
vmem_altmap *altmap,
return pfn + nr_align;
 }
 
-static void * __meminit altmap_alloc_block_buf(unsigned long size,
+void * __meminit altmap_alloc_block_buf(unsigned long size,
struct vmem_altmap *altmap)
 {
unsigned long pfn, nr_pfns;
@@ -153,15 +153,6 @@ static void * __meminit altmap_alloc_block_buf(unsigned 
long size,
return ptr;
 }
 
-/* need to make sure size is all the same during early stage */
-void * __meminit __vmemmap_alloc_block_buf(unsigned long size, int node,
-   struct vmem_altmap *altmap)
-{
-   if (altmap)
-   return altmap_alloc_block_buf(size, altmap);
-   return alloc_block_buf(size, node);
-}
-
 void __meminit vmemmap_verify(pte_t *pte, int node,
unsigned long start, unsigned long end)
 {
@@ -178,7 +169,7 @@ pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned 
long addr, int node)
pte_t *pte = pte_offset_kernel(pmd, addr);
if (pte_none(*pte)) {
pte_t entry;
-   void *p = alloc_block_buf(PAGE_SIZE, node);
+   void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
if (!p)
return NULL;
entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
-- 
2.14.2



[PATCH 08/17] mm: pass the vmem_altmap to memmap_init_zone

2017-12-15 Thread Christoph Hellwig
Pass the vmem_altmap two levels down instead of needing a lookup.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/mm/init.c| 9 +
 include/linux/memory_hotplug.h | 2 +-
 include/linux/mm.h | 4 ++--
 kernel/memremap.c  | 2 +-
 mm/hmm.c   | 2 +-
 mm/memory_hotplug.c| 9 +
 mm/page_alloc.c| 6 +++---
 7 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 6a8ce9e1536e..18278b448530 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -501,7 +501,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
if (map_start < map_end)
memmap_init_zone((unsigned long)(map_end - map_start),
 args->nid, args->zone, page_to_pfn(map_start),
-MEMMAP_EARLY);
+MEMMAP_EARLY, NULL);
return 0;
 }
 
@@ -509,9 +509,10 @@ void __meminit
 memmap_init (unsigned long size, int nid, unsigned long zone,
 unsigned long start_pfn)
 {
-   if (!vmem_map)
-   memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY);
-   else {
+   if (!vmem_map) {
+   memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY,
+   NULL);
+   } else {
struct page *start;
struct memmap_init_callback_data args;
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 20dd98ad44a0..aba5f86eb038 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -324,7 +324,7 @@ extern int add_memory_resource(int nid, struct resource 
*resource, bool online);
 extern int arch_add_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap, bool want_memblock);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
-   unsigned long nr_pages);
+   unsigned long nr_pages, struct vmem_altmap *altmap);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern void remove_memory(int nid, u64 start, u64 size);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9d4cd4c1dc6d..fd01135324b6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2069,8 +2069,8 @@ static inline void zero_resv_unavail(void) {}
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long,
-   unsigned long, enum memmap_context);
+extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
+   enum memmap_context, struct vmem_altmap *);
 extern void setup_per_zone_wmarks(void);
 extern int __meminit init_per_zone_wmark_min(void);
 extern void mem_init(void);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index b707ac60d13c..8e85803b6b0e 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -431,7 +431,7 @@ void *devm_memremap_pages(struct device *dev, struct 
resource *res,
if (!error)
move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
align_start >> PAGE_SHIFT,
-   align_size >> PAGE_SHIFT);
+   align_size >> PAGE_SHIFT, altmap);
mem_hotplug_done();
if (error)
goto err_add_memory;
diff --git a/mm/hmm.c b/mm/hmm.c
index b08105e2cd3b..5d0f488e66bc 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -942,7 +942,7 @@ static int hmm_devmem_pages_create(struct hmm_devmem 
*devmem)
}
move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
align_start >> PAGE_SHIFT,
-   align_size >> PAGE_SHIFT);
+   align_size >> PAGE_SHIFT, NULL);
mem_hotplug_done();
 
for (pfn = devmem->pfn_first; pfn < devmem->pfn_last; pfn++) {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a8dde9734120..12df8a5fadcc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -798,8 +798,8 @@ static void __meminit resize_pgdat_range(struct pglist_data 
*pgdat, unsigned lon
pgdat->node_spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - 
pgdat->node_start_pfn;
 }
 
-void __ref move_pfn_range_to_zone(struct zone *zone,
-   unsigned long start_pfn, unsigned long nr_pages)
+void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
+   unsigned long nr_pages, struct vmem_altmap *altmap)
 {
struct pglist_data *pgdat = zone->zone_pgdat;
int nid = pgdat->node_id;
@@ -824,7 +824,8 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
 * expects the zone spans the pfn range. All the pages in the range
 

[PATCH 07/17] mm: pass the vmem_altmap to vmemmap_free

2017-12-15 Thread Christoph Hellwig
We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/mmu.c|  3 +-
 arch/ia64/mm/discontig.c   |  3 +-
 arch/powerpc/mm/init_64.c  |  5 ++--
 arch/s390/mm/vmem.c|  3 +-
 arch/sparc/mm/init_64.c|  3 +-
 arch/x86/mm/init_64.c  | 67 --
 include/linux/memory_hotplug.h |  2 +-
 include/linux/mm.h |  3 +-
 mm/memory_hotplug.c|  7 +++--
 mm/sparse.c| 23 ---
 10 files changed, 68 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ec8952ff13be..0b1f13e0b4b3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -696,7 +696,8 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
return 0;
 }
 #endif /* CONFIG_ARM64_64K_PAGES */
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+   struct vmem_altmap *altmap)
 {
 }
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 1555aecaaf85..5ea0d8d0968b 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -760,7 +760,8 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
return vmemmap_populate_basepages(start, end, node);
 }
 
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+   struct vmem_altmap *altmap)
 {
 }
 #endif
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 779b74a96b8f..db7d4e092157 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -254,7 +254,8 @@ static unsigned long vmemmap_list_free(unsigned long start)
return vmem_back->phys;
 }
 
-void __ref vmemmap_free(unsigned long start, unsigned long end)
+void __ref vmemmap_free(unsigned long start, unsigned long end,
+   struct vmem_altmap *altmap)
 {
unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
unsigned long page_order = get_order(page_size);
@@ -265,7 +266,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long 
end)
 
for (; start < end; start += page_size) {
unsigned long nr_pages, addr;
-   struct vmem_altmap *altmap;
struct page *section_base;
struct page *page;
 
@@ -285,7 +285,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long 
end)
section_base = pfn_to_page(vmemmap_section_start(start));
nr_pages = 1 << page_order;
 
-   altmap = to_vmem_altmap((unsigned long) section_base);
if (altmap) {
vmem_altmap_free(altmap, nr_pages);
} else if (PageReserved(page)) {
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index c44ef0e7c466..db55561c5981 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -297,7 +297,8 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
return ret;
 }
 
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+   struct vmem_altmap *altmap)
 {
 }
 
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 42d27a1a042a..995f9490334d 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2671,7 +2671,8 @@ int __meminit vmemmap_populate(unsigned long vstart, 
unsigned long vend,
return 0;
 }
 
-void vmemmap_free(unsigned long start, unsigned long end)
+void vmemmap_free(unsigned long start, unsigned long end,
+   struct vmem_altmap *altmap)
 {
 }
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index c5bba00fe71f..37dd79646a8b 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -800,11 +800,11 @@ int arch_add_memory(int nid, u64 start, u64 size, struct 
vmem_altmap *altmap,
 
 #define PAGE_INUSE 0xFD
 
-static void __meminit free_pagetable(struct page *page, int order)
+static void __meminit free_pagetable(struct page *page, int order,
+   struct vmem_altmap *altmap)
 {
unsigned long magic;
unsigned int nr_pages = 1 << order;
-   struct vmem_altmap *altmap = to_vmem_altmap((unsigned long) page);
 
if (altmap) {
vmem_altmap_free(altmap, nr_pages);
@@ -826,7 +826,8 @@ static void __meminit free_pagetable(struct page *page, int 
order)
free_pages((unsigned long)page_address(page), order);
 }
 
-static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
+static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd,
+   struct vmem_altmap *altmap)
 {
   

[PATCH 06/17] mm: pass the vmem_altmap to arch_remove_memory and __remove_pages

2017-12-15 Thread Christoph Hellwig
We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.

Signed-off-by: Christoph Hellwig wip
---
 arch/ia64/mm/init.c| 4 ++--
 arch/powerpc/mm/mem.c  | 6 ++
 arch/s390/mm/init.c| 2 +-
 arch/sh/mm/init.c  | 4 ++--
 arch/x86/mm/init_32.c  | 4 ++--
 arch/x86/mm/init_64.c  | 6 ++
 include/linux/memory_hotplug.h | 5 +++--
 kernel/memremap.c  | 2 +-
 mm/hmm.c   | 4 ++--
 mm/memory_hotplug.c| 8 ++--
 10 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 2e2e4f532204..6a8ce9e1536e 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -663,7 +663,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct 
vmem_altmap *altmap,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size)
+int arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -671,7 +671,7 @@ int arch_remove_memory(u64 start, u64 size)
int ret;
 
zone = page_zone(pfn_to_page(start_pfn));
-   ret = __remove_pages(zone, start_pfn, nr_pages);
+   ret = __remove_pages(zone, start_pfn, nr_pages, altmap);
if (ret)
pr_warn("%s: Problem encountered in __remove_pages() as"
" ret=%d\n", __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index e670cfc2766e..22aa528b78a2 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -149,11 +149,10 @@ int arch_add_memory(int nid, u64 start, u64 size, struct 
vmem_altmap *altmap,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size)
+int arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
-   struct vmem_altmap *altmap;
struct page *page;
int ret;
 
@@ -162,11 +161,10 @@ int arch_remove_memory(u64 start, u64 size)
 * when querying the zone.
 */
page = pfn_to_page(start_pfn);
-   altmap = to_vmem_altmap((unsigned long) page);
if (altmap)
page += vmem_altmap_offset(altmap);
 
-   ret = __remove_pages(page_zone(page), start_pfn, nr_pages);
+   ret = __remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
if (ret)
return ret;
 
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index e12c5af50cd7..3fa3e5323612 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -240,7 +240,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct 
vmem_altmap *altmap,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size)
+int arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
/*
 * There is no hardware or firmware interface which could trigger a
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 552afbf55bad..ce0bbaa7e404 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -510,7 +510,7 @@ EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size)
+int arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
unsigned long start_pfn = PFN_DOWN(start);
unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -518,7 +518,7 @@ int arch_remove_memory(u64 start, u64 size)
int ret;
 
zone = page_zone(pfn_to_page(start_pfn));
-   ret = __remove_pages(zone, start_pfn, nr_pages);
+   ret = __remove_pages(zone, start_pfn, nr_pages, altmap);
if (unlikely(ret))
pr_warn("%s: Failed, __remove_pages() == %d\n", __func__,
ret);
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index cdf19ec6460c..c3bf36fc78d5 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -833,14 +833,14 @@ int arch_add_memory(int nid, u64 start, u64 size, struct 
vmem_altmap *altmap,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size)
+int arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;
 
zone = page_zone(pfn_to_page(start_pfn));
-   return __remove_pages(zone, start_pfn, nr_pages);
+   return __remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
 #endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 0c898098feaf..c5bba00fe71f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1132,21 +1132,19 @@ kernel_physical_mapping_remove(unsigned long start, 
unsigned long end)
remove_pagetable(start, end

[PATCH 05/17] mm: pass the vmem_altmap to vmemmap_populate

2017-12-15 Thread Christoph Hellwig
We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/mmu.c|  6 --
 arch/ia64/mm/discontig.c   |  3 ++-
 arch/powerpc/mm/init_64.c  |  7 ++-
 arch/s390/mm/vmem.c|  3 ++-
 arch/sparc/mm/init_64.c|  2 +-
 arch/x86/mm/init_64.c  |  4 ++--
 include/linux/memory_hotplug.h |  3 ++-
 include/linux/mm.h |  6 --
 mm/memory_hotplug.c|  7 ---
 mm/sparse-vmemmap.c|  7 ---
 mm/sparse.c| 20 
 11 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 267d2b79d52d..ec8952ff13be 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -654,12 +654,14 @@ int kern_addr_valid(unsigned long addr)
 }
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 #if !ARM64_SWAPPER_USES_SECTION_MAPS
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node,
+   struct vmem_altmap *altmap)
 {
return vmemmap_populate_basepages(start, end, node);
 }
 #else  /* !ARM64_SWAPPER_USES_SECTION_MAPS */
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node,
+   struct vmem_altmap *altmap)
 {
unsigned long addr = start;
unsigned long next;
diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 9b2d994cddf6..1555aecaaf85 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -754,7 +754,8 @@ void arch_refresh_nodedata(int update_node, pg_data_t 
*update_pgdat)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node,
+   struct vmem_altmap *altmap)
 {
return vmemmap_populate_basepages(start, end, node);
 }
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index a07722531b32..779b74a96b8f 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -183,7 +183,8 @@ static __meminit void vmemmap_list_populate(unsigned long 
phys,
vmemmap_list = vmem_back;
 }
 
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node,
+   struct vmem_altmap *altmap)
 {
unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
 
@@ -193,16 +194,12 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node)
pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node);
 
for (; start < end; start += page_size) {
-   struct vmem_altmap *altmap;
void *p;
int rc;
 
if (vmemmap_populated(start, page_size))
continue;
 
-   /* altmap lookups only work at section boundaries */
-   altmap = to_vmem_altmap(SECTION_ALIGN_DOWN(start));
-
p =  __vmemmap_alloc_block_buf(page_size, node, altmap);
if (!p)
return -ENOMEM;
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 3316d463fc29..c44ef0e7c466 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -211,7 +211,8 @@ static void vmem_remove_range(unsigned long start, unsigned 
long size)
 /*
  * Add a backed mem_map array to the virtual mem_map array.
  */
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node,
+   struct vmem_altmap *altmap)
 {
unsigned long pgt_prot, sgt_prot;
unsigned long address = start;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 55ba62957e64..42d27a1a042a 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2628,7 +2628,7 @@ EXPORT_SYMBOL(_PAGE_CACHE);
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
-  int node)
+  int node, struct vmem_altmap *altmap)
 {
unsigned long pte_base;
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e26ade50ae18..0c898098feaf 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1411,9 +1411,9 @@ static int __meminit vmemmap_populate_hugepages(unsigned 
long start,
return 0;
 }
 
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node)
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node,
+   struct vmem_altmap *altm

[PATCH 04/17] mm: pass the vmem_altmap to arch_add_memory and __add_pages

2017-12-15 Thread Christoph Hellwig
We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/mm/init.c|  5 +++--
 arch/powerpc/mm/mem.c  |  5 +++--
 arch/s390/mm/init.c|  5 +++--
 arch/sh/mm/init.c  |  5 +++--
 arch/x86/mm/init_32.c  |  5 +++--
 arch/x86/mm/init_64.c  | 11 ++-
 include/linux/memory_hotplug.h | 17 ++---
 kernel/memremap.c  |  2 +-
 mm/hmm.c   |  5 +++--
 mm/memory_hotplug.c|  7 +++
 10 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 7af4e05bb61e..2e2e4f532204 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -647,13 +647,14 @@ mem_init (void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+   bool want_memblock)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
int ret;
 
-   ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
+   ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
if (ret)
printk("%s: Problem encountered in __add_pages() as ret=%d\n",
   __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 4362b86ef84c..e670cfc2766e 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -127,7 +127,8 @@ int __weak remove_section_mapping(unsigned long start, 
unsigned long end)
return -ENODEV;
 }
 
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+   bool want_memblock)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -144,7 +145,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
want_memblock)
return -EFAULT;
}
 
-   return __add_pages(nid, start_pfn, nr_pages, want_memblock);
+   return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 671535e64aba..e12c5af50cd7 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -222,7 +222,8 @@ device_initcall(s390_cma_mem_init);
 
 #endif /* CONFIG_CMA */
 
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+   bool want_memblock)
 {
unsigned long start_pfn = PFN_DOWN(start);
unsigned long size_pages = PFN_DOWN(size);
@@ -232,7 +233,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
want_memblock)
if (rc)
return rc;
 
-   rc = __add_pages(nid, start_pfn, size_pages, want_memblock);
+   rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock);
if (rc)
vmem_remove_mapping(start, size);
return rc;
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index afc54d593a26..552afbf55bad 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -485,14 +485,15 @@ void free_initrd_mem(unsigned long start, unsigned long 
end)
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+   bool want_memblock)
 {
unsigned long start_pfn = PFN_DOWN(start);
unsigned long nr_pages = size >> PAGE_SHIFT;
int ret;
 
/* We only have ZONE_NORMAL, so this is easy.. */
-   ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
+   ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
if (unlikely(ret))
printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 8a64a6f2848d..cdf19ec6460c 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -823,12 +823,13 @@ void __init mem_init(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+   bool want_memblock)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
 
-   return __add_pages(nid, start_pfn, nr_pages, want_memblock);
+   return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 8acdc35c2dfa..e26ade50ae18 100644
--- a/arch/x86/mm/init_64.c
+++ b/ar

revamp vmem_altmap / dev_pagemap handling V2

2017-12-15 Thread Christoph Hellwig
Hi all,

this series started with two patches from Logan that now are in the
middle of the series to kill the memremap-internal pgmap structure
and to redo the dev_memreamp_pages interface to be better suitable
for future PCI P2P uses.  I reviewed them and noticed that there
isn't really any good reason to keep struct vmem_altmap either,
and that a lot of these alternative device page map access should
be better abstracted out instead of being sprinkled all over the
mm code.  But when we got the RCU warnings in V1 I went for yet
another approach, and now struct vmem_altmap is kept for now,
but passed explicitly through the memory hotplug code instead of
having to do unprotected lookups through the radix tree.  The
end result is that only the get_user_pages path ever looks up
struct dev_pagemap, and struct vmem_altmap is now always embedded
into struct dev_pagemap, and explicitly passed where needed.

Please review carefully, this has only been tested with my legacy
e820 NVDIMM system.


[PATCH 01/17] memremap: provide stubs for vmem_altmap_offset and vmem_altmap_free

2017-12-15 Thread Christoph Hellwig
Currently all calls to those functions are eliminated by the compiler when
CONFIG_ZONE_DEVICE is not set, but this soon won't be the case.

Signed-off-by: Christoph Hellwig 
---
 include/linux/memremap.h | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 10d23c367048..d5a6736d9737 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -26,9 +26,6 @@ struct vmem_altmap {
unsigned long alloc;
 };
 
-unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
-void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
-
 #ifdef CONFIG_ZONE_DEVICE
 struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start);
 #else
@@ -138,6 +135,9 @@ void *devm_memremap_pages(struct device *dev, struct 
resource *res,
struct percpu_ref *ref, struct vmem_altmap *altmap);
 struct dev_pagemap *find_dev_pagemap(resource_size_t phys);
 
+unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
+void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
+
 static inline bool is_zone_device_page(const struct page *page);
 #else
 static inline void *devm_memremap_pages(struct device *dev,
@@ -157,7 +157,17 @@ static inline struct dev_pagemap 
*find_dev_pagemap(resource_size_t phys)
 {
return NULL;
 }
-#endif
+
+static inline unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
+{
+   return 0;
+}
+
+static inline void vmem_altmap_free(struct vmem_altmap *altmap,
+   unsigned long nr_pfns)
+{
+}
+#endif /* CONFIG_ZONE_DEVICE */
 
 #if defined(CONFIG_DEVICE_PRIVATE) || defined(CONFIG_DEVICE_PUBLIC)
 static inline bool is_device_private_page(const struct page *page)
-- 
2.14.2



[PATCH 02/17] mm: don't export arch_add_memory

2017-12-15 Thread Christoph Hellwig
Only x86_64 and sh export this symbol, and it is not used by any
modular code.

Signed-off-by: Christoph Hellwig 
---
 arch/sh/mm/init.c | 1 -
 arch/x86/mm/init_64.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index bf726af5f1a5..afc54d593a26 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -498,7 +498,6 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
want_memblock)
 
return ret;
 }
-EXPORT_SYMBOL_GPL(arch_add_memory);
 
 #ifdef CONFIG_NUMA
 int memory_add_physaddr_to_nid(u64 addr)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 4a837289f2ad..8acdc35c2dfa 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -796,7 +796,6 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
want_memblock)
 
return add_pages(nid, start_pfn, nr_pages, want_memblock);
 }
-EXPORT_SYMBOL_GPL(arch_add_memory);
 
 #define PAGE_INUSE 0xFD
 
-- 
2.14.2



[PATCH 03/17] mm: don't export __add_pages

2017-12-15 Thread Christoph Hellwig
This function isn't used by any modules, and is only to be called
from core MM code.  This includes the calls for the add_pages wrapper
that might be inlined.

Signed-off-by: Christoph Hellwig 
---
 mm/memory_hotplug.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c52aa05b106c..5c6f96e6b334 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -334,7 +334,6 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 out:
return err;
 }
-EXPORT_SYMBOL_GPL(__add_pages);
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
-- 
2.14.2



[PATCH v4 2/2] cxl: read PHB indications from the device tree

2017-12-15 Thread Philippe Bergheaud
Configure the P9 XSL_DSNCTL register with PHB indications found
in the device tree, or else use legacy hard-coded values.

Signed-off-by: Philippe Bergheaud 
---
Changelog:

v2: New patch. Use the new device tree property "ibm,phb-indications".

v3: No change.

v4: No functional change.
Drop cosmetic fix in comment.

This patch depends on the following skiboot prerequisite:

https://patchwork.ozlabs.org/patch/849162/
---
 drivers/misc/cxl/cxl.h|  2 +-
 drivers/misc/cxl/cxllib.c |  2 +-
 drivers/misc/cxl/pci.c| 40 +++-
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index e46a4062904a..5a6e9a921c2b 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -1062,7 +1062,7 @@ int cxl_psl_purge(struct cxl_afu *afu);
 int cxl_calc_capp_routing(struct pci_dev *dev, u64 *chipid,
  u32 *phb_index, u64 *capp_unit_id);
 int cxl_slot_is_switched(struct pci_dev *dev);
-int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg);
+int cxl_get_xsl9_dsnctl(struct pci_dev *dev, u64 capp_unit_id, u64 *reg);
 u64 cxl_calculate_sr(bool master, bool kernel, bool real_mode, bool p9);
 
 void cxl_native_irq_dump_regs_psl9(struct cxl_context *ctx);
diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c
index dc9bc1807fdf..61f80d586279 100644
--- a/drivers/misc/cxl/cxllib.c
+++ b/drivers/misc/cxl/cxllib.c
@@ -99,7 +99,7 @@ int cxllib_get_xsl_config(struct pci_dev *dev, struct 
cxllib_xsl_config *cfg)
if (rc)
return rc;
 
-   rc = cxl_get_xsl9_dsnctl(capp_unit_id, &cfg->dsnctl);
+   rc = cxl_get_xsl9_dsnctl(dev, capp_unit_id, &cfg->dsnctl);
if (rc)
return rc;
if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 19969ee86d6f..c58fb28685af 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -409,7 +409,36 @@ int cxl_calc_capp_routing(struct pci_dev *dev, u64 *chipid,
return 0;
 }
 
-int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg)
+static u64 nbwind = 0;
+static u64 asnind = 0;
+static u64 capiind = 0;
+
+static int get_phb_indications(struct pci_dev *dev)
+{
+   struct device_node *np;
+   const __be32 *prop;
+
+   if (capiind)
+   return 0;
+
+   if (!(np = pnv_pci_get_phb_node(dev)))
+   return -1;
+
+   prop = of_get_property(np, "ibm,phb-indications", NULL);
+   if (!prop) {
+   nbwind = 0x0300UL; /* legacy values */
+   asnind = 0x0400UL;
+   capiind = 0x0200UL;
+   } else {
+   nbwind = (u64)be32_to_cpu(prop[2]);
+   asnind = (u64)be32_to_cpu(prop[1]);
+   capiind = (u64)be32_to_cpu(prop[0]);
+   }
+   of_node_put(np);
+   return 0;
+}
+
+int cxl_get_xsl9_dsnctl(struct pci_dev *dev, u64 capp_unit_id, u64 *reg)
 {
u64 xsl_dsnctl;
 
@@ -423,7 +452,8 @@ int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg)
 * Tell XSL where to route data to.
 * The field chipid should match the PHB CAPI_CMPM register
 */
-   xsl_dsnctl = ((u64)0x2 << (63-7)); /* Bit 57 */
+   get_phb_indications(dev);
+   xsl_dsnctl = (capiind << (63-15)); /* Bit 57 */
xsl_dsnctl |= (capp_unit_id << (63-15));
 
/* nMMU_ID Defaults to: b’01001’*/
@@ -437,14 +467,14 @@ int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg)
 * nbwind=0x03, bits [57:58], must include capi indicator.
 * Not supported on P9 DD1.
 */
-   xsl_dsnctl |= ((u64)0x03 << (63-47));
+   xsl_dsnctl |= (nbwind << (63-55));
 
/*
 * Upper 16b address bits of ASB_Notify messages sent to the
 * system. Need to match the PHB’s ASN Compare/Mask Register.
 * Not supported on P9 DD1.
 */
-   xsl_dsnctl |= ((u64)0x04 << (63-55));
+   xsl_dsnctl |= asnind;
}
 
*reg = xsl_dsnctl;
@@ -464,7 +494,7 @@ static int init_implementation_adapter_regs_psl9(struct cxl 
*adapter,
if (rc)
return rc;
 
-   rc = cxl_get_xsl9_dsnctl(capp_unit_id, &xsl_dsnctl);
+   rc = cxl_get_xsl9_dsnctl(dev, capp_unit_id, &xsl_dsnctl);
if (rc)
return rc;
 
-- 
2.15.0



[PATCH v4 1/2] powerpc/powernv: Enable tunneled operations

2017-12-15 Thread Philippe Bergheaud
P9 supports PCI tunneled operations (atomics and as_notify). This
patch adds support for tunneled operations on powernv, with a new
API, to be called by device drivers:

pnv_pci_get_tunnel_ind()
   Tell driver the 16-bit ASN indication used by kernel.

pnv_pci_set_tunnel_bar()
   Tell kernel the Tunnel BAR Response address used by driver.
   This function uses two new OPAL calls, as the PBCQ Tunnel BAR
   register is configured by skiboot.

void pnv_pci_get_as_notify_info()
   Return the ASN info of the thread to be woken up.

Signed-off-by: Philippe Bergheaud 
---
Changelog:

v2: Do not set the ASN indication. Get it from the device tree.

v3: Make pnv_pci_get_phb_node() available when compiling without cxl.

v4: Add pnv_pci_get_as_notify_info().
Rebase opal call numbers on skiboot 5.9.6.

This patch depends on the following skiboot prerequisites:

https://patchwork.ozlabs.org/patch/849162/
https://patchwork.ozlabs.org/patch/849163/
---
 arch/powerpc/include/asm/opal-api.h|  4 +-
 arch/powerpc/include/asm/opal.h|  2 +
 arch/powerpc/include/asm/pnv-pci.h |  5 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  2 +
 arch/powerpc/platforms/powernv/pci-cxl.c   |  8 ---
 arch/powerpc/platforms/powernv/pci.c   | 93 ++
 6 files changed, 105 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 233c7504b1f2..b901f4d9f009 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -201,7 +201,9 @@
 #define OPAL_SET_POWER_SHIFT_RATIO 155
 #define OPAL_SENSOR_GROUP_CLEAR156
 #define OPAL_PCI_SET_P2P   157
-#define OPAL_LAST  157
+#define OPAL_PCI_GET_PBCQ_TUNNEL_BAR   159
+#define OPAL_PCI_SET_PBCQ_TUNNEL_BAR   160
+#define OPAL_LAST  160
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 0c545f7fc77b..8705e422b893 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -198,6 +198,8 @@ int64_t opal_unregister_dump_region(uint32_t id);
 int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
 int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag);
 int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t 
pe_number);
+int64_t opal_pci_get_pbcq_tunnel_bar(uint64_t phb_id, uint64_t *addr);
+int64_t opal_pci_set_pbcq_tunnel_bar(uint64_t phb_id, uint64_t addr);
 int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg,
uint64_t msg_len);
 int64_t opal_ipmi_recv(uint64_t interface, struct opal_ipmi_msg *msg,
diff --git a/arch/powerpc/include/asm/pnv-pci.h 
b/arch/powerpc/include/asm/pnv-pci.h
index 3e5cf251ad9a..4839e09663f2 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -29,6 +29,11 @@ extern int pnv_pci_set_power_state(uint64_t id, uint8_t 
state,
 extern int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target,
   u64 desc);
 
+extern int pnv_pci_get_tunnel_ind(struct pci_dev *dev, uint64_t *ind);
+extern int pnv_pci_set_tunnel_bar(struct pci_dev *dev, uint64_t addr,
+ int enable);
+extern void pnv_pci_get_as_notify_info(struct task_struct *task, u32 *lpid,
+  u32 *pid, u32 *tid);
 int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
 int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 6f4b00a2ac46..5da790fb7fef 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -320,3 +320,5 @@ OPAL_CALL(opal_set_powercap,
OPAL_SET_POWERCAP);
 OPAL_CALL(opal_get_power_shift_ratio,  OPAL_GET_POWER_SHIFT_RATIO);
 OPAL_CALL(opal_set_power_shift_ratio,  OPAL_SET_POWER_SHIFT_RATIO);
 OPAL_CALL(opal_sensor_group_clear, OPAL_SENSOR_GROUP_CLEAR);
+OPAL_CALL(opal_pci_get_pbcq_tunnel_bar,
OPAL_PCI_GET_PBCQ_TUNNEL_BAR);
+OPAL_CALL(opal_pci_set_pbcq_tunnel_bar,
OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
diff --git a/arch/powerpc/platforms/powernv/pci-cxl.c 
b/arch/powerpc/platforms/powernv/pci-cxl.c
index 94498a04558b..cee003de63af 100644
--- a/arch/powerpc/platforms/powernv/pci-cxl.c
+++ b/arch/powerpc/platforms/powernv/pci-cxl.c
@@ -16,14 +16,6 @@
 
 #include "pci.h"
 
-struct device_node *pnv_pci_get_phb_node(struct pci_dev *dev)
-{
-   struct pci_controller *hose = pci_bus_to_host(dev->bus);
-
-   return of_node_get(hose->dn);
-}
-EXPORT_SYMBOL(pnv_pci_get_phb_node);
-
 int pnv_phb_to_cxl_mode(st

Re: [mainline] rcu stalls on CPU when unbinding mpt3sas driver

2017-12-15 Thread Hannes Reinecke
On 12/12/2017 11:38 AM, Abdul Haleem wrote:
> Hi,
> 
> Off late we are seeing cpu stalls messages while mpt3sas driver unbind
> on powerpc machine for both mainline and linux-next kernels
> 
> Machine Type: Power 8 Bare-metal
> Kernel version: 4.15.0-rc2
> config: attached.
> test: driver unbind
> 
> $ echo -n 0001:03:00.0 > /sys/bus/pci/drivers/mpt3sas/unbind
> mpt3sas_cm0: removing handle(0x000a), sas_addr(0x500304801f080d00)
> mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(0)
> mpt3sas_cm0: removing enclosure level(0x), connector name( )
> mpt3sas_cm0: removing handle(0x000b), sas_addr(0x500304801f080d01)
> mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(1)
> mpt3sas_cm0: removing enclosure level(0x), connector name( )
> mpt3sas_cm0: removing handle(0x000c), sas_addr(0x500304801f080d02)
> mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(2)
> mpt3sas_cm0: removing enclosure level(0x), connector name( )
> mpt3sas_cm0: removing handle(0x000d), sas_addr(0x500304801f080d03)
> mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(3)
> mpt3sas_cm0: removing enclosure level(0x), connector name( )
> mpt3sas_cm0: removing handle(0x000e), sas_addr(0x500304801f080d04)
> mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(4)
> mpt3sas_cm0: removing enclosure level(0x), connector name( )
> mpt3sas_cm0: removing handle(0x000f), sas_addr(0x500304801f080d3d)
> mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(12)
> mpt3sas_cm0: removing enclosure level(0x), connector name( )
> sd 16:0:0:0: [sdb] Synchronizing SCSI cache
> sd 16:0:0:0: [sdb] Synchronize Cache(10) failed: Result: 
> hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> sd 16:0:1:0: [sdc] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT 
> driverbyte=DRIVER_OK
> sd 16:0:1:0: [sdc] tag#0 CDB: ATA command pass through(16) 85 06 2c 00 00 00 
> 00 00 00 00 00 00 00 00 e5 00
> sd 16:0:2:0: [sdd] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT 
> driverbyte=DRIVER_OK
> sd 16:0:2:0: [sdd] tag#0 CDB: ATA command pass through(16) 85 06 2c 00 00 00 
> 00 00 00 00 00 00 00 00 e5 00
> sd 16:0:3:0: [sde] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT 
> driverbyte=DRIVER_OK
> sd 16:0:3:0: [sde] tag#0 CDB: ATA command pass through(16) 85 06 2c 00 00 00 
> 00 00 00 00 00 00 00 00 e5 00
> sd 16:0:4:0: [sdf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT 
> driverbyte=DRIVER_OK
> sd 16:0:4:0: [sdf] tag#0 CDB: ATA command pass through(16) 85 06 2c 00 00 00 
> 00 00 00 00 00 00 00 00 e5 00
> 
> few minutes after above command was executed, machine is flooded with rcu 
> stalls messages.
> 
> INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 86-... } 44191221 
> jiffies s: 3445 root: 0x20/.
> blocking rcu_node structures: l=1:80-95:0x40/.
> Task dump for CPU 86:
> sh  R  running task10384 18136  1 0x00042086
> Call Trace:
> [c07792d47370] [c07933667200] 0xc07933667200 (unreliable)
> INFO: rcu_sched self-detected stall on CPU
>   86-: (50420459 ticks this GP) idle=0ae/141/0 
> softirq=11962/11962 fqs=24724293 
>(t=50420460 jiffies g=80217 c=80216 q=36817447)
> NMI backtrace for cpu 86
> CPU: 86 PID: 18136 Comm: sh Not tainted 4.15.0-rc2-autotest #1
> Call Trace:
> [c07792d46f20] [c099b83c] dump_stack+0xb0/0xf4 (unreliable)
> [c07792d46f60] [c09a43e4] nmi_cpu_backtrace+0x1a4/0x210
> [c07792d46ff0] [c09a462c] 
> nmi_trigger_cpumask_backtrace+0x1dc/0x220
> [c07792d47090] [c002c7d0] arch_trigger_cpumask_backtrace+0x20/0x40
> [c07792d470b0] [c017496c] rcu_dump_cpu_stacks+0xf4/0x158
> [c07792d47100] [c0173cb0] rcu_check_callbacks+0x8f0/0xb00
> [c07792d47230] [c017c25c] update_process_times+0x3c/0x90
> [c07792d47260] [c01921e4] tick_sched_handle.isra.13+0x44/0x80
> [c07792d47280] [c0192278] tick_sched_timer+0x58/0xb0
> [c07792d472c0] [c017cd58] __hrtimer_run_queues+0xf8/0x330
> [c07792d47340] [c017da74] hrtimer_interrupt+0xe4/0x280
> [c07792d47400] [c0022660] __timer_interrupt+0x90/0x270
> [c07792d47450] [c0022d30] timer_interrupt+0xa0/0xe0
> [c07792d47480] [c0009238] decrementer_common+0x158/0x160
> --- interrupt: 901 at replay_interrupt_return+0x0/0x4
> LR = arch_local_irq_restore+0x74/0x90
> [c07792d47770] [c03fb3185000] 0xc03fb3185000 (unreliable)
> [c07792d47790] [c09bb658] _raw_spin_unlock_irqrestore+0x38/0x60
> [c07792d477b0] [c066f274] scsi_remove_target+0x204/0x270
> [c07792d47820] [dfc72604] sas_rphy_remove+0x94/0xa0 
> [scsi_transport_sas]
> [c07792d47850] [dfc745bc] sas_port_delete+0x4c/0x238 
> [scsi_transport_sas]
> [c07792d478b0] [d00010e82990] 
> mpt3sas_transport_port_remove+0x2d0/0x310 [mpt3sas]
> [c07792d

Re: [PATCH] On ppc64le we HAVE_RELIABLE_STACKTRACE

2017-12-15 Thread Nicholas Piggin
On Tue, 12 Dec 2017 08:05:01 -0600
Josh Poimboeuf  wrote:

> On Tue, Dec 12, 2017 at 12:39:12PM +0100, Torsten Duwe wrote:
> > Hi all,
> > 
> > The "Power Architecture 64-Bit ELF V2 ABI" says in section 2.3.2.3:
> > 
> > [...] There are several rules that must be adhered to in order to ensure
> > reliable and consistent call chain backtracing:
> > 
> > * Before a function calls any other function, it shall establish its
> >   own stack frame, whose size shall be a multiple of 16 bytes.  
> 
> What about leaf functions?  If a leaf function doesn't establish a stack
> frame, and it has inline asm which contains a blr to another function,
> this ABI is broken.
> 
> Also, even for non-leaf functions, is it possible for GCC to insert the
> inline asm before it sets up the stack frame?  (This is an occasional
> problem on x86.)

Inline asm must not have control transfer out of the statement unless
it is asm goto.

> 
> Also, what about hand-coded asm?

Should follow the same rules if it uses the stack.

> 
> > To me this sounds like the equivalent of HAVE_RELIABLE_STACKTRACE.
> > This patch may be unneccessarily limited to ppc64le, but OTOH the only
> > user of this flag so far is livepatching, which is only implemented on
> > PPCs with 64-LE, a.k.a. ELF ABI v2.  
> 
> In addition to fixing the above issues, the unwinder also needs to
> detect interrupts (i.e., preemption) and page faults on the stack of a
> blocked task.  If a function were preempted before it created a stack
> frame, or if a leaf function blocked on a page fault, the stack trace
> will skip the function's caller, so such a trace will need to be
> reported to livepatch as unreliable.

I don't think there is much problem there for powerpc. Stack frame
creation and function call with return pointer are each atomic.

> 
> Furthermore, the "reliable" unwinder needs to have a way to report an
> error if it doesn't reach the end.  This probably just means ensuring
> that it reaches the user mode registers on the stack.
> 
> And as Miroslav mentioned, once that's all done, implement
> save_stack_trace_tsk_reliable().
> 
> I don't think the above is documented anywhere, it would be good to put
> it in the livepatch doc.
> 

Thanks,
Nick


Re: [PATCH v4 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's

2017-12-15 Thread Nicholas Piggin
On Fri, 15 Dec 2017 19:14:55 +1100
Balbir Singh  wrote:

> Certain HMI's such as malfunction error propagate through
> all threads/core on the system. If a thread was offline
> prior to us crashing the system and jumping to the kdump
> kernel, bad things happen when it wakes up due to an HMI
> in the kdump kernel.
> 
> There are several possible ways to solve this problem
> 
> 1. Put the offline cores in a state such that they are
> not woken up for machine check and HMI errors. This
> does not work, since we might need to wake up offline
> threads to handle TB errors
> 2. Ignore HMI errors, setup HMEER to mask HMI errors,
> but this still leads the window open for any MCEs
> and masking them for the duration of the dump might
> be a concern
> 3. Wake up offline CPUs, as in send them to
> crash_ipi_callback (not wake them up as in mark them
> online as seen by the hotplug). kexec does a
> wake_online_cpus() call, this patch does something
> similar, but instead sends an IPI and forces them to
> crash_ipi_callback()
> 
> This patch takes approach #3.
> 
> Care is taken to enable this only for powenv platforms
> via crash_wake_offline (a global value set at setup
> time). The crash code sends out IPI's to all CPU's
> which then move to crash_ipi_callback and kexec_smp_wait().
> 

Reviewed-by: Nicholas Piggin 

> Signed-off-by: Balbir Singh 
> ---
> 
> Changelog v4
>  - Handle the case for crash IPI's sent via non NMI
>  - Drop system reset via SCOM in non-power saving
>mode
> 
>  arch/powerpc/include/asm/kexec.h |  2 ++
>  arch/powerpc/kernel/crash.c  | 13 -
>  arch/powerpc/kernel/smp.c| 18 ++
>  arch/powerpc/platforms/powernv/smp.c | 28 
>  4 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index 4419d435639a..9dcbfa6bbb91 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -73,6 +73,8 @@ extern void kexec_smp_wait(void);   /* get and clear naca 
> physid, wait for
> master to copy new code to 0 */
>  extern int crashing_cpu;
>  extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *));
> +extern void crash_ipi_callback(struct pt_regs *);
> +extern int crash_wake_offline;
>  
>  struct kimage;
>  struct pt_regs;
> diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
> index 29c56ca2ddfd..00b215125d3e 100644
> --- a/arch/powerpc/kernel/crash.c
> +++ b/arch/powerpc/kernel/crash.c
> @@ -44,6 +44,14 @@
>  #define REAL_MODE_TIMEOUT1
>  
>  static int time_to_dump;
> +/*
> + * crash_wake_offline should be set to 1 by platforms that intend to wake
> + * up offline cpus prior to jumping to a kdump kernel. Currently powernv
> + * sets it to 1, since we want to avoid things from happening when an
> + * offline CPU wakes up due to something like an HMI (malfunction error),
> + * which propagates to all threads.
> + */
> +int crash_wake_offline;
>  
>  #define CRASH_HANDLER_MAX 3
>  /* List of shutdown handles */
> @@ -63,7 +71,7 @@ static int handle_fault(struct pt_regs *regs)
>  #ifdef CONFIG_SMP
>  
>  static atomic_t cpus_in_crash;
> -static void crash_ipi_callback(struct pt_regs *regs)
> +void crash_ipi_callback(struct pt_regs *regs)
>  {
>   static cpumask_t cpus_state_saved = CPU_MASK_NONE;
>  
> @@ -106,6 +114,9 @@ static void crash_kexec_prepare_cpus(int cpu)
>  
>   printk(KERN_EMERG "Sending IPI to other CPUs\n");
>  
> + if (crash_wake_offline)
> + ncpus = num_present_cpus() - 1;
> +
>   crash_send_ipi(crash_ipi_callback);
>   smp_wmb();
>  
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index e0a4c1f82e25..bbe7634b3a43 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -543,7 +543,25 @@ void smp_send_debugger_break(void)
>  #ifdef CONFIG_KEXEC_CORE
>  void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
>  {
> + int cpu;
> +
>   smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 100);
> + if (kdump_in_progress() && crash_wake_offline) {
> + for_each_present_cpu(cpu) {
> + if (cpu_online(cpu))
> + continue;
> + /*
> +  * crash_ipi_callback will wait for
> +  * all cpus, including offline CPUs.
> +  * We don't care about nmi_ipi_function.
> +  * Offline cpus will jump straight into
> +  * crash_ipi_callback, we can skip the
> +  * entire NMI dance and waiting for
> +  * cpus to clear pending mask, etc.
> +  */
> + do_smp_send_nmi_ipi(cpu);
> + }
> + }
>  }
>  #endif
>  
> diff --git a/arch/powerpc/platforms/powernv/smp.c 
> b/arc

Re: [PATCH v4 1/2] powerpc/crash: Remove the test for cpu_online in the IPI callback

2017-12-15 Thread Nicholas Piggin
On Fri, 15 Dec 2017 19:14:54 +1100
Balbir Singh  wrote:

> Our check was extra cautious, we've audited crash_send_ipi
> and it sends an IPI only to online CPU's. Removal of this
> check should have not functional impact on crash kdump.
> 

Reviewed-by: Nicholas Piggin 

> Signed-off-by: Balbir Singh 
> ---
>  arch/powerpc/kernel/crash.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
> index cbabb5adccd9..29c56ca2ddfd 100644
> --- a/arch/powerpc/kernel/crash.c
> +++ b/arch/powerpc/kernel/crash.c
> @@ -69,9 +69,6 @@ static void crash_ipi_callback(struct pt_regs *regs)
>  
>   int cpu = smp_processor_id();
>  
> - if (!cpu_online(cpu))
> - return;
> -
>   hard_irq_disable();
>   if (!cpumask_test_cpu(cpu, &cpus_state_saved)) {
>   crash_save_cpu(regs, cpu);



[PATCH v4 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's

2017-12-15 Thread Balbir Singh
Certain HMI's such as malfunction error propagate through
all threads/core on the system. If a thread was offline
prior to us crashing the system and jumping to the kdump
kernel, bad things happen when it wakes up due to an HMI
in the kdump kernel.

There are several possible ways to solve this problem

1. Put the offline cores in a state such that they are
not woken up for machine check and HMI errors. This
does not work, since we might need to wake up offline
threads to handle TB errors
2. Ignore HMI errors, setup HMEER to mask HMI errors,
but this still leads the window open for any MCEs
and masking them for the duration of the dump might
be a concern
3. Wake up offline CPUs, as in send them to
crash_ipi_callback (not wake them up as in mark them
online as seen by the hotplug). kexec does a
wake_online_cpus() call, this patch does something
similar, but instead sends an IPI and forces them to
crash_ipi_callback()

This patch takes approach #3.

Care is taken to enable this only for powenv platforms
via crash_wake_offline (a global value set at setup
time). The crash code sends out IPI's to all CPU's
which then move to crash_ipi_callback and kexec_smp_wait().

Signed-off-by: Balbir Singh 
---

Changelog v4
 - Handle the case for crash IPI's sent via non NMI
 - Drop system reset via SCOM in non-power saving
   mode

 arch/powerpc/include/asm/kexec.h |  2 ++
 arch/powerpc/kernel/crash.c  | 13 -
 arch/powerpc/kernel/smp.c| 18 ++
 arch/powerpc/platforms/powernv/smp.c | 28 
 4 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 4419d435639a..9dcbfa6bbb91 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -73,6 +73,8 @@ extern void kexec_smp_wait(void); /* get and clear naca 
physid, wait for
  master to copy new code to 0 */
 extern int crashing_cpu;
 extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *));
+extern void crash_ipi_callback(struct pt_regs *);
+extern int crash_wake_offline;
 
 struct kimage;
 struct pt_regs;
diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
index 29c56ca2ddfd..00b215125d3e 100644
--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -44,6 +44,14 @@
 #define REAL_MODE_TIMEOUT  1
 
 static int time_to_dump;
+/*
+ * crash_wake_offline should be set to 1 by platforms that intend to wake
+ * up offline cpus prior to jumping to a kdump kernel. Currently powernv
+ * sets it to 1, since we want to avoid things from happening when an
+ * offline CPU wakes up due to something like an HMI (malfunction error),
+ * which propagates to all threads.
+ */
+int crash_wake_offline;
 
 #define CRASH_HANDLER_MAX 3
 /* List of shutdown handles */
@@ -63,7 +71,7 @@ static int handle_fault(struct pt_regs *regs)
 #ifdef CONFIG_SMP
 
 static atomic_t cpus_in_crash;
-static void crash_ipi_callback(struct pt_regs *regs)
+void crash_ipi_callback(struct pt_regs *regs)
 {
static cpumask_t cpus_state_saved = CPU_MASK_NONE;
 
@@ -106,6 +114,9 @@ static void crash_kexec_prepare_cpus(int cpu)
 
printk(KERN_EMERG "Sending IPI to other CPUs\n");
 
+   if (crash_wake_offline)
+   ncpus = num_present_cpus() - 1;
+
crash_send_ipi(crash_ipi_callback);
smp_wmb();
 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e0a4c1f82e25..bbe7634b3a43 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -543,7 +543,25 @@ void smp_send_debugger_break(void)
 #ifdef CONFIG_KEXEC_CORE
 void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
 {
+   int cpu;
+
smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 100);
+   if (kdump_in_progress() && crash_wake_offline) {
+   for_each_present_cpu(cpu) {
+   if (cpu_online(cpu))
+   continue;
+   /*
+* crash_ipi_callback will wait for
+* all cpus, including offline CPUs.
+* We don't care about nmi_ipi_function.
+* Offline cpus will jump straight into
+* crash_ipi_callback, we can skip the
+* entire NMI dance and waiting for
+* cpus to clear pending mask, etc.
+*/
+   do_smp_send_nmi_ipi(cpu);
+   }
+   }
 }
 #endif
 
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index ba030669eca1..9664c8461f03 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -37,6 +37,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "powernv.h"
 
@@ -209,9 +211,32 @@ static void pn

[PATCH v4 1/2] powerpc/crash: Remove the test for cpu_online in the IPI callback

2017-12-15 Thread Balbir Singh
Our check was extra cautious, we've audited crash_send_ipi
and it sends an IPI only to online CPU's. Removal of this
check should have not functional impact on crash kdump.

Signed-off-by: Balbir Singh 
---
 arch/powerpc/kernel/crash.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
index cbabb5adccd9..29c56ca2ddfd 100644
--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -69,9 +69,6 @@ static void crash_ipi_callback(struct pt_regs *regs)
 
int cpu = smp_processor_id();
 
-   if (!cpu_online(cpu))
-   return;
-
hard_irq_disable();
if (!cpumask_test_cpu(cpu, &cpus_state_saved)) {
crash_save_cpu(regs, cpu);
-- 
2.13.6



[PATCH] SB600 for the Nemo board has non-zero devices on non-root bus

2017-12-15 Thread Christian Zigotzky

On 09 December 2017 at 7:03PM, Christian Zigotzky wrote:
> On 08 December 2017 at 12:59PM, Michael Ellerman wrote:
> >
> >> Darren's idea of doing it at the same time you tweak the SB600 "relax
> >> pci-e" bit is ideal because then the two pieces are obviously
> >> connected and it wouldn't affect any other systems at all.
> >
> > Yes that would be ideal. That patch is currently out-of-tree I gather,
> > but I guess everyone who's using these machines must have that patch
> > anyway.
> >
> > Darren what does that code look like? Can we get it upstream and close
> > the loop on this?
> >
> > cheers
> >
>
> Hi Michael,
>
> Please find attached the code.
>
> Thanks,
> Christian

Hi All,

I haven't received any response yet. Is this the correct patch you are 
looking for?


Thanks,
Christian
diff -rupN a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c
--- a/arch/powerpc/platforms/pasemi/pci.c	2017-11-16 08:18:35.078874462 +0100
+++ b/arch/powerpc/platforms/pasemi/pci.c	2017-11-16 08:17:22.034367975 +0100
@@ -27,6 +27,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 #include 
@@ -108,6 +109,69 @@ static int workaround_5945(struct pci_bu
 	return 1;
 }
 
+#ifdef CONFIG_PPC_PASEMI_NEMO
+static int sb600_bus = 5;
+static void __iomem *iob_mapbase = NULL;
+
+static int pa_pxp_read_config(struct pci_bus *bus, unsigned int devfn,
+ int offset, int len, u32 *val);
+
+static void sb600_set_flag(int bus)
+{
+struct resource res;
+struct device_node *dn;
+   struct pci_bus *busp;
+   u32 val;
+   int err;
+
+   if (sb600_bus == -1)
+   {
+   busp = pci_find_bus(0, 0);
+   pa_pxp_read_config(busp, PCI_DEVFN(17,0), PCI_SECONDARY_BUS, 1, &val);
+
+   sb600_bus = val;
+
+   printk(KERN_CRIT "NEMO SB600 on bus %d.\n",sb600_bus);
+   }
+
+   if (iob_mapbase == NULL)
+   {
+dn = of_find_compatible_node(NULL, "isa", "pasemi,1682m-iob");
+if (!dn)
+{
+   printk(KERN_CRIT "NEMO SB600 missing iob node\n");
+   return;
+   }
+
+   err = of_address_to_resource(dn, 0, &res);
+of_node_put(dn);
+
+   if (err)
+   {
+   printk(KERN_CRIT "NEMO SB600 missing resource\n");
+   return;
+   }
+
+   printk(KERN_CRIT "NEMO SB600 IOB base %08lx\n",res.start);
+
+   iob_mapbase = ioremap(res.start + 0x100, 0x94);
+   }
+
+   if (iob_mapbase != NULL)
+   {
+   if (bus == sb600_bus)
+   {
+   out_le32(iob_mapbase + 4, in_le32(iob_mapbase + 4) | 0x800);
+   }
+   else
+   {
+   out_le32(iob_mapbase + 4, in_le32(iob_mapbase + 4) & ~0x800);
+   }
+   }
+}
+#endif
+
+
 static int pa_pxp_read_config(struct pci_bus *bus, unsigned int devfn,
 			  int offset, int len, u32 *val)
 {
@@ -126,6 +190,10 @@ static int pa_pxp_read_config(struct pci
 
 	addr = pa_pxp_cfg_addr(hose, bus->number, devfn, offset);
 
+#ifdef CONFIG_PPC_PASEMI_NEMO
+   sb600_set_flag(bus->number);
+#endif
+
 	/*
 	 * Note: the caller has already checked that offset is
 	 * suitably aligned and that len is 1, 2 or 4.
@@ -210,6 +278,9 @@ static int __init pas_add_bridge(struct
 	/* Interpret the "ranges" property */
 	pci_process_bridge_OF_ranges(hose, dev, 1);
 
+	/* Scan for an isa bridge. */
+	isa_bridge_find_early(hose);
+
 	return 0;
 }