[PATCH] ASoC:fsl_spdif:Remove superfluous error message around platform_get_irq()

2021-06-09 Thread  Zhongjun Tan
From: Tan Zhongjun 

The platform_get_irq() prints error message telling that interrupt is
missing, hence there is no need to duplicated that message.

Signed-off-by: Tan Zhongjun 
---
 sound/soc/fsl/fsl_spdif.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/sound/soc/fsl/fsl_spdif.c b/sound/soc/fsl/fsl_spdif.c
index 2a76714eb8e6..29cefd459241 100644
--- a/sound/soc/fsl/fsl_spdif.c
+++ b/sound/soc/fsl/fsl_spdif.c
@@ -1368,10 +1368,8 @@ static int fsl_spdif_probe(struct platform_device *pdev)
 
for (i = 0; i < spdif_priv->soc->interrupts; i++) {
irq = platform_get_irq(pdev, i);
-   if (irq < 0) {
-   dev_err(>dev, "no irq for node %s\n", pdev->name);
+   if (irq < 0)
return irq;
-   }
 
ret = devm_request_irq(>dev, irq, spdif_isr, 0,
   dev_name(>dev), spdif_priv);
-- 
2.17.1



[PATCH] fs: btrfs: Disable BTRFS on platforms having 256K pages

2021-06-09 Thread Christophe Leroy
With a config having PAGE_SIZE set to 256K, BTRFS build fails
with the following message

 include/linux/compiler_types.h:326:38: error: call to 
'__compiletime_assert_791' declared with attribute error: BUILD_BUG_ON failed: 
(BTRFS_MAX_COMPRESSED % PAGE_SIZE) != 0

BTRFS_MAX_COMPRESSED being 128K, BTRFS cannot support platforms with
256K pages at the time being.

There are two platforms that can select 256K pages:
 - hexagon
 - powerpc

Disable BTRFS when 256K page size is selected.

Reported-by: kernel test robot 
Signed-off-by: Christophe Leroy 
---
 fs/btrfs/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index 68b95ad82126..520a0f6a7d9e 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -18,6 +18,8 @@ config BTRFS_FS
select RAID6_PQ
select XOR_BLOCKS
select SRCU
+   depends on !PPC_256K_PAGES  # powerpc
+   depends on !PAGE_SIZE_256KB # hexagon
 
help
  Btrfs is a general purpose copy-on-write filesystem with extents,
-- 
2.25.0



Re: [RFC] powerpc/pseries: Interface to represent PAPR firmware attributes

2021-06-09 Thread Fabiano Rosas
Pratik Sampat  writes:

>>> 3. version info  - 1 byte
>>> 4. A data array of size num attributes, which contains the following:
>>>a. attribute ID  - 8 bytes
>>>b. attribute value in number - 8 bytes
>>>c. attribute name in string  - 64 bytes
>>>d. attribute value in string - 64 bytes
>> Is this new hypercall already present in the spec? These seem a bit
>> underspecified to me.
>
> Yes, it is present in the spec. I probably summarized a little more than 
> needed
> here and I could expand upon below.
>
> The input buffer recives the following data:
>
> 1. “flags”:
>   a. Bit 0: singleAttribute
>   If set to 1, only return the single attribute matching 
> firstAttributeId.
>   b. Bits 1-63: Reserved
> 2. “firstAttributeId”: The first attribute to retrieve
> 3. “bufferAddress”: The logical real address of the start of the output buffer
> 4. “bufferSize”: The size in bytes of the output buffer
>   
>
>  From the document, the format of the output buffer is as follows:
>
> Table 1 --> output buffer
> 
> | Field Name   | Byte   | Length   |  Description
> |  | Offset | in Bytes |
> 
> | NumberOf ||  | Number of Attributes in Buffer
> | AttributesInBuffer   | 0x000  | 0x08 |
> 
> | AttributeArrayOffset | 0x008  | 0x08 | Byte offset to start of Array
> |  ||  | of Attributes
> |  ||  |
> 
> | OutputBufferData ||  | Version of the Header.
> | HeaderVersion| 0x010  | 0x01 | The header will be always
> | AttributesInBuffer   ||  | backward compatible, and changes
> |  ||  | will not impact the Array of
> |  ||  | attributes.
> |  ||  | Current version = 0x01

This is not clear to me. In the event of a header version change, is the
total set of attributes guaranteed to remain the same? Or only the array
layout? We might not need to expose the version information after all.

> 
> | ArrayOfAttributes||  | The array will contain
> |  ||  | "NumberOfAttributesInBuffer"
> |  ||  | array elements not to exceed
> |  ||  | the size of the buffer.
> |  ||  | Layout of the array is
> |  ||  | detailed in Table 2.
> 
>
>
> Table 2 --> Array of attributes
> 
> | Field Name   | Byte   | Length   |  Description
> |  | Offset | in Bytes |
> 
> | 1st AttributeId  | 0x000  | 0x08 | The ID of the Attribute
> 
> | 1st AttributeValue   | 0x008  | 0x08 | The numerical value of
> |  ||  | the attribute
> 
> | 1st AttributeString  | 0x010  | 0x40 | The ASCII string
> | Description  ||  | description of the
> |  ||  | attribute, up to 63
> |  ||  | characters plus a NULL
> |  ||  | terminator.

There is a slight disconnect in that this is called "description" by the
spec, which makes me think they could eventually have something more
verbose than what you'd expect from "name".

So they could give us either: "Frequency" or "The Frequency in GigaHertz".

> 
> | 1st AttributeValue   | 0x050  | 0x40 | The ASCII string
> | StringDescription||  | description of the
> |  ||  | attribute value, up to 63
> |  ||  | characters plus a NULL
> |  ||  | terminator. If this
> |  ||  | contains only a NULL
> |  ||  | terminator, then there is
> |  ||  | no ASCII string
> |  | 

Re: [PATCH 1/1] of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses

2021-06-09 Thread Bjorn Helgaas
On Thu, Apr 15, 2021 at 01:59:52PM -0500, Rob Herring wrote:
> On Thu, Apr 15, 2021 at 1:01 PM Leonardo Bras  wrote:
> >
> > Many other resource flag parsers already add this flag when the input
> > has bits 24 & 25 set, so update this one to do the same.

[Adding this to the thread for archaeological purposes since it didn't
make it to the commit log]

The other resource flag parsers appear to be:

  pci_parse_of_flags(u32 addr0, ...)# powerpc/kernel/pci_of_scan.c
unsigned int as = addr0 & OF_PCI_ADDR0_SPACE_MASK;
if (as == OF_PCI_ADDR0_SPACE_MMIO32 || as == OF_PCI_ADDR0_SPACE_MMIO64)
  flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
  if (as == OF_PCI_ADDR0_SPACE_MMIO64)
flags |= PCI_BASE_ADDRESS_MEM_TYPE_64 | IORESOURCE_MEM_64;

  pci_parse_of_flags(u32 addr0) # sparc/kernel/pci.c
if (addr0 & 0x0200) {
  flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
  if (addr0 & 0x0100)
flags |= IORESOURCE_MEM_64 | PCI_BASE_ADDRESS_MEM_TYPE_64;

  of_bus_pci_get_flags(... addr)# drivers/of/address.c (this one)
u32 w = be32_to_cpup(addr);
switch((w >> 24) & 0x03) {
case 0x02: /* 32 bits */
  flags |= IORESOURCE_MEM;
  break;
case 0x03: /* 64 bits */
  flags |= IORESOURCE_MEM | IORESOURCE_MEM_64;
  break;

It's super annoying to have three copies of essentially the same
thing.  Even more annoying that they test the same things in three
completely different ways.  But I remember looking at this several
years ago, and it wasn't as simple to unify these as I had hoped.

> Many others? Looks like sparc and powerpc to me. Those would be the
> ones I worry about breaking. Sparc doesn't use of/address.c so it's
> fine. Powerpc version of the flags code was only fixed in 2019, so I
> don't think powerpc will care either.

I'm guessing you're referring to df5be5be8735 ("powerpc/pci/of: Fix OF
flags parsing for 64bit BARs").

> I noticed both sparc and powerpc set PCI_BASE_ADDRESS_MEM_TYPE_64 in
> the flags. AFAICT, that's not set anywhere outside of arch code. So
> never for riscv, arm and arm64 at least. That leads me to
> pci_std_update_resource() which is where the PCI code sets BARs and
> just copies the flags in PCI_BASE_ADDRESS_MEM_MASK ignoring
> IORESOURCE_* flags. So it seems like 64-bit is still not handled and
> neither is prefetch.
> 
> > Some devices (like virtio-net) have more than one memory resource
> > (like MMIO32 and MMIO64) and without this flag it would be needed to
> > verify the address range to know which one is which.
> >
> > Signed-off-by: Leonardo Bras 
> > ---
> >  drivers/of/address.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > index 73ddf2540f3f..dc7147843783 100644
> > --- a/drivers/of/address.c
> > +++ b/drivers/of/address.c
> > @@ -116,9 +116,12 @@ static unsigned int of_bus_pci_get_flags(const __be32 
> > *addr)
> > flags |= IORESOURCE_IO;
> > break;
> > case 0x02: /* 32 bits */
> > -   case 0x03: /* 64 bits */
> > flags |= IORESOURCE_MEM;
> > break;
> > +
> > +   case 0x03: /* 64 bits */
> > +   flags |= IORESOURCE_MEM | IORESOURCE_MEM_64;
> > +   break;
> > }
> > if (w & 0x4000)
> > flags |= IORESOURCE_PREFETCH;
> > --
> > 2.30.2
> >


Re: [PATCH v2 0/9] Remove DISCINTIGMEM memory model

2021-06-09 Thread Mike Rapoport
Hi Arnd,

On Wed, Jun 09, 2021 at 01:30:39PM +0200, Arnd Bergmann wrote:
> On Fri, Jun 4, 2021 at 8:49 AM Mike Rapoport  wrote:
> >
> > From: Mike Rapoport 
> >
> > Hi,
> >
> > SPARSEMEM memory model was supposed to entirely replace DISCONTIGMEM a
> > (long) while ago. The last architectures that used DISCONTIGMEM were
> > updated to use other memory models in v5.11 and it is about the time to
> > entirely remove DISCONTIGMEM from the kernel.
> >
> > This set removes DISCONTIGMEM from alpha, arc and m68k, simplifies memory
> > model selection in mm/Kconfig and replaces usage of redundant
> > CONFIG_NEED_MULTIPLE_NODES and CONFIG_FLAT_NODE_MEM_MAP with CONFIG_NUMA
> > and CONFIG_FLATMEM respectively.
> >
> > I've also removed NUMA support on alpha that was BROKEN for more than 15
> > years.
> >
> > There were also minor updates all over arch/ to remove mentions of
> > DISCONTIGMEM in comments and #ifdefs.
> 
> Hi Mike and Andrew,
> 
> It looks like everyone is happy with this version so far. How should we merge 
> it
> for linux-next? I'm happy to take it through the asm-generic tree, but 
> linux-mm
> would fit at least as well. In case we go for linux-mm, feel free to add

Andrew already took to mmotm.
 
> Acked-by: Arnd Bergmann 

Thanks!

> for the whole series.

-- 
Sincerely yours,
Mike.


Re: [PATCH] powerpc/bpf: Use bctrl for making function calls

2021-06-09 Thread Naveen N. Rao

Christophe Leroy wrote:



Le 09/06/2021 à 11:00, Naveen N. Rao a écrit :

blrl corrupts the link stack. Instead use bctrl when making function
calls from BPF programs.


What's the link stack ? Is it the PPC64 branch predictor stack ?


c974809a26a13e ("powerpc/vdso: Avoid link stack corruption in 
__get_datapage()") has a good write up on the link stack.






Reported-by: Anton Blanchard 
Signed-off-by: Naveen N. Rao 
---
  arch/powerpc/include/asm/ppc-opcode.h |  1 +
  arch/powerpc/net/bpf_jit_comp32.c |  4 ++--
  arch/powerpc/net/bpf_jit_comp64.c | 12 ++--
  3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index ac41776661e963..1abacb8417d562 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -451,6 +451,7 @@
  #define PPC_RAW_MTLR(r)   (0x7c0803a6 | ___PPC_RT(r))
  #define PPC_RAW_MFLR(t)   (PPC_INST_MFLR | ___PPC_RT(t))
  #define PPC_RAW_BCTR()(PPC_INST_BCTR)
+#define PPC_RAW_BCTRL()(PPC_INST_BCTRL)


Can you use the numeric value instead of the PPC_INST_BCTRL, to avoid conflict with 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/4ca2bfdca2f47a293d05f61eb3c4e487ee170f1f.1621506159.git.christophe.le...@csgroup.eu/


Sure. I'll post a v2.

- Naveen



Re: [PATCH 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats

2021-06-09 Thread kajoljain



On 6/8/21 11:06 PM, Peter Zijlstra wrote:
> On Tue, Jun 08, 2021 at 05:26:58PM +0530, Kajol Jain wrote:
>> +static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
>> +{
>> +struct nvdimm_pmu *nd_pmu;
>> +u32 target;
>> +int nodeid;
>> +const struct cpumask *cpumask;
>> +
>> +nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
>> +
>> +/* Clear it, incase given cpu is set in nd_pmu->arch_cpumask */
>> +cpumask_test_and_clear_cpu(cpu, _pmu->arch_cpumask);
>> +
>> +/*
>> + * If given cpu is not same as current designated cpu for
>> + * counter access, just return.
>> + */
>> +if (cpu != nd_pmu->cpu)
>> +return 0;
>> +
>> +/* Check for any active cpu in nd_pmu->arch_cpumask */
>> +target = cpumask_any(_pmu->arch_cpumask);
>> +nd_pmu->cpu = target;
>> +
>> +/*
>> + * Incase we don't have any active cpu in nd_pmu->arch_cpumask,
>> + * check in given cpu's numa node list.
>> + */
>> +if (target >= nr_cpu_ids) {
>> +nodeid = cpu_to_node(cpu);
>> +cpumask = cpumask_of_node(nodeid);
>> +target = cpumask_any_but(cpumask, cpu);
>> +nd_pmu->cpu = target;
>> +
>> +if (target >= nr_cpu_ids)
>> +return -1;
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +static int nvdimm_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
>> +{
>> +struct nvdimm_pmu *nd_pmu;
>> +
>> +nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
>> +
>> +if (nd_pmu->cpu >= nr_cpu_ids)
>> +nd_pmu->cpu = cpu;
>> +
>> +return 0;
>> +}
> 
>> +static int nvdimm_pmu_cpu_hotplug_init(struct nvdimm_pmu *nd_pmu)
>> +{
>> +int nodeid, rc;
>> +const struct cpumask *cpumask;
>> +
>> +/*
>> + * Incase cpu hotplug is not handled by arch specific code
>> + * they can still provide required cpumask which can be used
>> + * to get designatd cpu for counter access.
>> + * Check for any active cpu in nd_pmu->arch_cpumask.
>> + */
>> +if (!cpumask_empty(_pmu->arch_cpumask)) {
>> +nd_pmu->cpu = cpumask_any(_pmu->arch_cpumask);
>> +} else {
>> +/* pick active cpu from the cpumask of device numa node. */
>> +nodeid = dev_to_node(nd_pmu->dev);
>> +cpumask = cpumask_of_node(nodeid);
>> +nd_pmu->cpu = cpumask_any(cpumask);
>> +}
>> +
>> +rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "perf/nvdimm:online",
>> + nvdimm_pmu_cpu_online, 
>> nvdimm_pmu_cpu_offline);
>> +
> 
> Did you actually test this hotplug stuff?
> 
> That is, create a counter, unplug the CPU the counter was on, and
> continue counting? "perf stat -I" is a good option for this, concurrent
> with a hotplug.
>
> Because I don't think it's actually correct. The thing is perf core is
> strictly per-cpu, and it will place the event on a specific CPU context.
> If you then unplug that CPU, nothing will touch the events on that CPU
> anymore.
> 
> What drivers that span CPUs need to do is call
> perf_pmu_migrate_context() whenever the CPU they were assigned to goes
> away. Please have a look at arch/x86/events/rapl.c or
> arch/x86/events/amd/power.c for relatively simple drivers that have this
> property.
> 


Hi Peter,
Primarily I tested off-lining multiple cpus and checking if cpumask file is 
updating as expected,
followed with perf stat commands.
But I missed the scenario where we are off-lining CPU while running perf stat. 
My bad, thanks
for pointing it out.
I will fix this issue and send new version of the patchset.

Thanks,
Kajol Jain
> 


Re: [PATCH v2 0/9] Remove DISCINTIGMEM memory model

2021-06-09 Thread Arnd Bergmann
On Fri, Jun 4, 2021 at 8:49 AM Mike Rapoport  wrote:
>
> From: Mike Rapoport 
>
> Hi,
>
> SPARSEMEM memory model was supposed to entirely replace DISCONTIGMEM a
> (long) while ago. The last architectures that used DISCONTIGMEM were
> updated to use other memory models in v5.11 and it is about the time to
> entirely remove DISCONTIGMEM from the kernel.
>
> This set removes DISCONTIGMEM from alpha, arc and m68k, simplifies memory
> model selection in mm/Kconfig and replaces usage of redundant
> CONFIG_NEED_MULTIPLE_NODES and CONFIG_FLAT_NODE_MEM_MAP with CONFIG_NUMA
> and CONFIG_FLATMEM respectively.
>
> I've also removed NUMA support on alpha that was BROKEN for more than 15
> years.
>
> There were also minor updates all over arch/ to remove mentions of
> DISCONTIGMEM in comments and #ifdefs.

Hi Mike and Andrew,

It looks like everyone is happy with this version so far. How should we merge it
for linux-next? I'm happy to take it through the asm-generic tree, but linux-mm
would fit at least as well. In case we go for linux-mm, feel free to add

Acked-by: Arnd Bergmann 

for the whole series.


Re: [PATCH 9/9] mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM

2021-06-09 Thread David Hildenbrand

On 02.06.21 12:53, Mike Rapoport wrote:

From: Mike Rapoport 

After removal of the DISCONTIGMEM memory model the FLAT_NODE_MEM_MAP
configuration option is equivalent to FLATMEM.

Drop CONFIG_FLAT_NODE_MEM_MAP and use CONFIG_FLATMEM instead.

Signed-off-by: Mike Rapoport 
---
  include/linux/mmzone.h | 4 ++--
  kernel/crash_core.c| 2 +-
  mm/Kconfig | 4 
  mm/page_alloc.c| 6 +++---
  mm/page_ext.c  | 2 +-
  5 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ad42f440c704..2698cdbfbf75 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -775,7 +775,7 @@ typedef struct pglist_data {
struct zonelist node_zonelists[MAX_ZONELISTS];
  
  	int nr_zones; /* number of populated zones in this node */

-#ifdef CONFIG_FLAT_NODE_MEM_MAP/* means !SPARSEMEM */
+#ifdef CONFIG_FLATMEM  /* means !SPARSEMEM */
struct page *node_mem_map;
  #ifdef CONFIG_PAGE_EXTENSION
struct page_ext *node_page_ext;
@@ -865,7 +865,7 @@ typedef struct pglist_data {
  
  #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)

  #define node_spanned_pages(nid)   (NODE_DATA(nid)->node_spanned_pages)
-#ifdef CONFIG_FLAT_NODE_MEM_MAP
+#ifdef CONFIG_FLATMEM
  #define pgdat_page_nr(pgdat, pagenr)  ((pgdat)->node_mem_map + (pagenr))
  #else
  #define pgdat_page_nr(pgdat, pagenr)  pfn_to_page((pgdat)->node_start_pfn + 
(pagenr))
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 53eb8bc6026d..2b8446ea7105 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -483,7 +483,7 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_OFFSET(page, compound_head);
VMCOREINFO_OFFSET(pglist_data, node_zones);
VMCOREINFO_OFFSET(pglist_data, nr_zones);
-#ifdef CONFIG_FLAT_NODE_MEM_MAP
+#ifdef CONFIG_FLATMEM
VMCOREINFO_OFFSET(pglist_data, node_mem_map);
  #endif
VMCOREINFO_OFFSET(pglist_data, node_start_pfn);
diff --git a/mm/Kconfig b/mm/Kconfig
index bffe4bd859f3..ded98fb859ab 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -55,10 +55,6 @@ config FLATMEM
def_bool y
depends on !SPARSEMEM || FLATMEM_MANUAL
  
-config FLAT_NODE_MEM_MAP

-   def_bool y
-   depends on !SPARSEMEM
-
  #
  # SPARSEMEM_EXTREME (which is the default) does some bootmem
  # allocations when sparse_init() is called.  If this cannot
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8f08135d3eb4..f039736541eb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6444,7 +6444,7 @@ static void __meminit zone_init_free_lists(struct zone 
*zone)
}
  }
  
-#if !defined(CONFIG_FLAT_NODE_MEM_MAP)

+#if !defined(CONFIG_FLATMEM)
  /*
   * Only struct pages that correspond to ranges defined by memblock.memory
   * are zeroed and initialized by going through __init_single_page() during
@@ -7241,7 +7241,7 @@ static void __init free_area_init_core(struct pglist_data 
*pgdat)
}
  }
  
-#ifdef CONFIG_FLAT_NODE_MEM_MAP

+#ifdef CONFIG_FLATMEM
  static void __ref alloc_node_mem_map(struct pglist_data *pgdat)
  {
unsigned long __maybe_unused start = 0;
@@ -7289,7 +7289,7 @@ static void __ref alloc_node_mem_map(struct pglist_data 
*pgdat)
  }
  #else
  static void __ref alloc_node_mem_map(struct pglist_data *pgdat) { }
-#endif /* CONFIG_FLAT_NODE_MEM_MAP */
+#endif /* CONFIG_FLATMEM */
  
  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT

  static inline void pgdat_set_deferred_range(pg_data_t *pgdat)
diff --git a/mm/page_ext.c b/mm/page_ext.c
index df6f74aac8e1..293b2685fc48 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -191,7 +191,7 @@ void __init page_ext_init_flatmem(void)
panic("Out of memory");
  }
  
-#else /* CONFIG_FLAT_NODE_MEM_MAP */

+#else /* CONFIG_FLATMEM */
  
  struct page_ext *lookup_page_ext(const struct page *page)

  {



Acked-by: David Hildenbrand 

--
Thanks,

David / dhildenb



Re: [PATCH 8/9] mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA

2021-06-09 Thread David Hildenbrand

On 02.06.21 12:53, Mike Rapoport wrote:

From: Mike Rapoport 

After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA
configuration options are equivalent.

Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead.

Done with

$ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \
$(git grep -wl CONFIG_NEED_MULTIPLE_NODES)
$ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \
$(git grep -wl NEED_MULTIPLE_NODES)

with manual tweaks afterwards.

Signed-off-by: Mike Rapoport 
---
  arch/arm64/Kconfig|  2 +-
  arch/ia64/Kconfig |  2 +-
  arch/mips/Kconfig |  2 +-
  arch/mips/include/asm/mmzone.h|  2 +-
  arch/mips/include/asm/page.h  |  2 +-
  arch/mips/mm/init.c   |  4 ++--
  arch/powerpc/Kconfig  |  2 +-
  arch/powerpc/include/asm/mmzone.h |  4 ++--
  arch/powerpc/kernel/setup_64.c|  2 +-
  arch/powerpc/kernel/smp.c |  2 +-
  arch/powerpc/kexec/core.c |  4 ++--
  arch/powerpc/mm/Makefile  |  2 +-
  arch/powerpc/mm/mem.c |  4 ++--
  arch/riscv/Kconfig|  2 +-
  arch/s390/Kconfig |  2 +-
  arch/sh/include/asm/mmzone.h  |  4 ++--
  arch/sh/kernel/topology.c |  2 +-
  arch/sh/mm/Kconfig|  2 +-
  arch/sh/mm/init.c |  2 +-
  arch/sparc/Kconfig|  2 +-
  arch/sparc/include/asm/mmzone.h   |  4 ++--
  arch/sparc/kernel/smp_64.c|  2 +-
  arch/sparc/mm/init_64.c   | 12 ++--
  arch/x86/Kconfig  |  2 +-
  arch/x86/kernel/setup_percpu.c|  6 +++---
  arch/x86/mm/init_32.c |  4 ++--
  include/asm-generic/topology.h|  2 +-
  include/linux/memblock.h  |  6 +++---
  include/linux/mm.h|  4 ++--
  include/linux/mmzone.h|  8 
  kernel/crash_core.c   |  2 +-
  mm/Kconfig|  9 -
  mm/memblock.c |  8 
  mm/page_alloc.c   |  6 +++---
  34 files changed, 58 insertions(+), 67 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9f1d8566bbf9..d01a1545ab8f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1035,7 +1035,7 @@ config NODES_SHIFT
int "Maximum NUMA Nodes (as a power of 2)"
range 1 10
default "4"
-   depends on NEED_MULTIPLE_NODES
+   depends on NUMA
help
  Specify the maximum number of NUMA Nodes available on the target
  system.  Increases memory reserved to accommodate various tables.
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 279252e3e0f7..da22a35e6f03 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -302,7 +302,7 @@ config NODES_SHIFT
int "Max num nodes shift(3-10)"
range 3 10
default "10"
-   depends on NEED_MULTIPLE_NODES
+   depends on NUMA
help
  This option specifies the maximum number of nodes in your SSI system.
  MAX_NUMNODES will be 2^(This value).
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index ed51970c08e7..4704a16c2e44 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2867,7 +2867,7 @@ config RANDOMIZE_BASE_MAX_OFFSET
  config NODES_SHIFT
int
default "6"
-   depends on NEED_MULTIPLE_NODES
+   depends on NUMA
  
  config HW_PERF_EVENTS

bool "Enable hardware performance counter support for perf events"
diff --git a/arch/mips/include/asm/mmzone.h b/arch/mips/include/asm/mmzone.h
index 7649ab45e80c..602a21aee9d4 100644
--- a/arch/mips/include/asm/mmzone.h
+++ b/arch/mips/include/asm/mmzone.h
@@ -8,7 +8,7 @@
  
  #include 
  
-#ifdef CONFIG_NEED_MULTIPLE_NODES

+#ifdef CONFIG_NUMA
  # include 
  #endif
  
diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h

index 195ff4e9771f..96bc798c1ec1 100644
--- a/arch/mips/include/asm/page.h
+++ b/arch/mips/include/asm/page.h
@@ -239,7 +239,7 @@ static inline int pfn_valid(unsigned long pfn)
  
  /* pfn_valid is defined in linux/mmzone.h */
  
-#elif defined(CONFIG_NEED_MULTIPLE_NODES)

+#elif defined(CONFIG_NUMA)
  
  #define pfn_valid(pfn)			\

  ({\
diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 97f6ca341448..19347dc6bbf8 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -394,7 +394,7 @@ void maar_init(void)
}
  }
  
-#ifndef CONFIG_NEED_MULTIPLE_NODES

+#ifndef CONFIG_NUMA
  void __init paging_init(void)
  {
unsigned long max_zone_pfns[MAX_NR_ZONES];
@@ -473,7 +473,7 @@ void __init mem_init(void)
0x8000 - 4, KCORE_TEXT);
  #endif
  }
-#endif /* !CONFIG_NEED_MULTIPLE_NODES */
+#endif /* !CONFIG_NUMA */
  
  void free_init_pages(const char *what, unsigned long begin, unsigned long end)

  {
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 

Re: [PATCH 7/9] docs: remove description of DISCONTIGMEM

2021-06-09 Thread David Hildenbrand

On 02.06.21 12:53, Mike Rapoport wrote:

From: Mike Rapoport 

Remove description of DISCONTIGMEM from the "Memory Models" document and
update VM sysctl description so that it won't mention DISCONIGMEM.

Signed-off-by: Mike Rapoport 
---
  Documentation/admin-guide/sysctl/vm.rst | 12 +++
  Documentation/vm/memory-model.rst   | 45 ++---
  2 files changed, 8 insertions(+), 49 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst 
b/Documentation/admin-guide/sysctl/vm.rst
index 586cd4b86428..ddbd71d592e0 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -936,12 +936,12 @@ allocations, THP and hugetlbfs pages.
  
  To make it sensible with respect to the watermark_scale_factor

  parameter, the unit is in fractions of 10,000. The default value of
-15,000 on !DISCONTIGMEM configurations means that up to 150% of the high
-watermark will be reclaimed in the event of a pageblock being mixed due
-to fragmentation. The level of reclaim is determined by the number of
-fragmentation events that occurred in the recent past. If this value is
-smaller than a pageblock then a pageblocks worth of pages will be reclaimed
-(e.g.  2MB on 64-bit x86). A boost factor of 0 will disable the feature.
+15,000 means that up to 150% of the high watermark will be reclaimed in the
+event of a pageblock being mixed due to fragmentation. The level of reclaim
+is determined by the number of fragmentation events that occurred in the
+recent past. If this value is smaller than a pageblock then a pageblocks
+worth of pages will be reclaimed (e.g.  2MB on 64-bit x86). A boost factor
+of 0 will disable the feature.
  
  
  watermark_scale_factor

diff --git a/Documentation/vm/memory-model.rst 
b/Documentation/vm/memory-model.rst
index ce398a7dc6cd..30e8fbed6914 100644
--- a/Documentation/vm/memory-model.rst
+++ b/Documentation/vm/memory-model.rst
@@ -14,15 +14,11 @@ for the CPU. Then there could be several contiguous ranges 
at
  completely distinct addresses. And, don't forget about NUMA, where
  different memory banks are attached to different CPUs.
  
-Linux abstracts this diversity using one of the three memory models:

-FLATMEM, DISCONTIGMEM and SPARSEMEM. Each architecture defines what
+Linux abstracts this diversity using one of the two memory models:
+FLATMEM and SPARSEMEM. Each architecture defines what
  memory models it supports, what the default memory model is and
  whether it is possible to manually override that default.
  
-.. note::

-   At time of this writing, DISCONTIGMEM is considered deprecated,
-   although it is still in use by several architectures.
-
  All the memory models track the status of physical page frames using
  struct page arranged in one or more arrays.
  
@@ -63,43 +59,6 @@ straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the

  The `ARCH_PFN_OFFSET` defines the first page frame number for
  systems with physical memory starting at address different from 0.
  
-DISCONTIGMEM

-
-
-The DISCONTIGMEM model treats the physical memory as a collection of
-`nodes` similarly to how Linux NUMA support does. For each node Linux
-constructs an independent memory management subsystem represented by
-`struct pglist_data` (or `pg_data_t` for short). Among other
-things, `pg_data_t` holds the `node_mem_map` array that maps
-physical pages belonging to that node. The `node_start_pfn` field of
-`pg_data_t` is the number of the first page frame belonging to that
-node.
-
-The architecture setup code should call :c:func:`free_area_init_node` for
-each node in the system to initialize the `pg_data_t` object and its
-`node_mem_map`.
-
-Every `node_mem_map` behaves exactly as FLATMEM's `mem_map` -
-every physical page frame in a node has a `struct page` entry in the
-`node_mem_map` array. When DISCONTIGMEM is enabled, a portion of the
-`flags` field of the `struct page` encodes the node number of the
-node hosting that page.
-
-The conversion between a PFN and the `struct page` in the
-DISCONTIGMEM model became slightly more complex as it has to determine
-which node hosts the physical page and which `pg_data_t` object
-holds the `struct page`.
-
-Architectures that support DISCONTIGMEM provide :c:func:`pfn_to_nid`
-to convert PFN to the node number. The opposite conversion helper
-:c:func:`page_to_nid` is generic as it uses the node number encoded in
-page->flags.
-
-Once the node number is known, the PFN can be used to index
-appropriate `node_mem_map` array to access the `struct page` and
-the offset of the `struct page` from the `node_mem_map` plus
-`node_start_pfn` is the PFN of that page.
-
  SPARSEMEM
  =
  



Reviewed-by: David Hildenbrand 

--
Thanks,

David / dhildenb



Re: [PATCH 6/9] arch, mm: remove stale mentions of DISCONIGMEM

2021-06-09 Thread David Hildenbrand

On 02.06.21 12:53, Mike Rapoport wrote:

From: Mike Rapoport 

There are several places that mention DISCONIGMEM in comments or have stale
code guarded by CONFIG_DISCONTIGMEM.

Remove the dead code and update the comments.

Signed-off-by: Mike Rapoport 
---
  arch/ia64/kernel/topology.c | 5 ++---
  arch/ia64/mm/numa.c | 5 ++---
  arch/mips/include/asm/mmzone.h  | 6 --
  arch/mips/mm/init.c | 3 ---
  arch/nds32/include/asm/memory.h | 6 --
  arch/xtensa/include/asm/page.h  | 4 
  include/linux/gfp.h | 4 ++--
  7 files changed, 6 insertions(+), 27 deletions(-)

diff --git a/arch/ia64/kernel/topology.c b/arch/ia64/kernel/topology.c
index 09fc385c2acd..3639e0a7cb3b 100644
--- a/arch/ia64/kernel/topology.c
+++ b/arch/ia64/kernel/topology.c
@@ -3,9 +3,8 @@
   * License.  See the file "COPYING" in the main directory of this archive
   * for more details.
   *
- * This file contains NUMA specific variables and functions which can
- * be split away from DISCONTIGMEM and are used on NUMA machines with
- * contiguous memory.
+ * This file contains NUMA specific variables and functions which are used on
+ * NUMA machines with contiguous memory.
   *2002/08/07 Erich Focht 
   * Populate cpu entries in sysfs for non-numa systems as well
   *Intel Corporation - Ashok Raj
diff --git a/arch/ia64/mm/numa.c b/arch/ia64/mm/numa.c
index 46b6e5f3a40f..d6579ec3ea32 100644
--- a/arch/ia64/mm/numa.c
+++ b/arch/ia64/mm/numa.c
@@ -3,9 +3,8 @@
   * License.  See the file "COPYING" in the main directory of this archive
   * for more details.
   *
- * This file contains NUMA specific variables and functions which can
- * be split away from DISCONTIGMEM and are used on NUMA machines with
- * contiguous memory.
+ * This file contains NUMA specific variables and functions which are used on
+ * NUMA machines with contiguous memory.
   *
   * 2002/08/07 Erich Focht 
   */
diff --git a/arch/mips/include/asm/mmzone.h b/arch/mips/include/asm/mmzone.h
index b826b8473e95..7649ab45e80c 100644
--- a/arch/mips/include/asm/mmzone.h
+++ b/arch/mips/include/asm/mmzone.h
@@ -20,10 +20,4 @@
  #define nid_to_addrbase(nid) 0
  #endif
  
-#ifdef CONFIG_DISCONTIGMEM

-
-#define pfn_to_nid(pfn)pa_to_nid((pfn) << PAGE_SHIFT)
-
-#endif /* CONFIG_DISCONTIGMEM */
-
  #endif /* _ASM_MMZONE_H_ */
diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index c36358758969..97f6ca341448 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -454,9 +454,6 @@ void __init mem_init(void)
BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT) && (_PFN_SHIFT > PAGE_SHIFT));
  
  #ifdef CONFIG_HIGHMEM

-#ifdef CONFIG_DISCONTIGMEM
-#error "CONFIG_HIGHMEM and CONFIG_DISCONTIGMEM dont work together yet"
-#endif
max_mapnr = highend_pfn ? highend_pfn : max_low_pfn;
  #else
max_mapnr = max_low_pfn;
diff --git a/arch/nds32/include/asm/memory.h b/arch/nds32/include/asm/memory.h
index 940d32842793..62faafbc28e4 100644
--- a/arch/nds32/include/asm/memory.h
+++ b/arch/nds32/include/asm/memory.h
@@ -76,18 +76,12 @@
   *  virt_to_page(k)   convert a _valid_ virtual address to struct page *
   *  virt_addr_valid(k)indicates whether a virtual address is valid
   */
-#ifndef CONFIG_DISCONTIGMEM
-
  #define ARCH_PFN_OFFSET   PHYS_PFN_OFFSET
  #define pfn_valid(pfn)((pfn) >= PHYS_PFN_OFFSET && (pfn) < 
(PHYS_PFN_OFFSET + max_mapnr))
  
  #define virt_to_page(kaddr)	(pfn_to_page(__pa(kaddr) >> PAGE_SHIFT))

  #define virt_addr_valid(kaddr)((unsigned long)(kaddr) >= PAGE_OFFSET && 
(unsigned long)(kaddr) < (unsigned long)high_memory)
  
-#else /* CONFIG_DISCONTIGMEM */

-#error CONFIG_DISCONTIGMEM is not supported yet.
-#endif /* !CONFIG_DISCONTIGMEM */
-
  #define page_to_phys(page)(page_to_pfn(page) << PAGE_SHIFT)
  
  #endif

diff --git a/arch/xtensa/include/asm/page.h b/arch/xtensa/include/asm/page.h
index 37ce25ef92d6..493eb7083b1a 100644
--- a/arch/xtensa/include/asm/page.h
+++ b/arch/xtensa/include/asm/page.h
@@ -192,10 +192,6 @@ static inline unsigned long ___pa(unsigned long va)
  #define pfn_valid(pfn) \
((pfn) >= ARCH_PFN_OFFSET && ((pfn) - ARCH_PFN_OFFSET) < max_mapnr)
  
-#ifdef CONFIG_DISCONTIGMEM

-# error CONFIG_DISCONTIGMEM not supported
-#endif
-
  #define virt_to_page(kaddr)   pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
  #define page_to_virt(page)__va(page_to_pfn(page) << PAGE_SHIFT)
  #define virt_addr_valid(kaddr)pfn_valid(__pa(kaddr) >> PAGE_SHIFT)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 11da8af06704..dbe1f5fc901d 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -494,8 +494,8 @@ static inline int gfp_zonelist(gfp_t flags)
   * There are two zonelists per node, one for all zones with memory and
   * one containing just zones from the node the zonelist belongs to.
   *
- * For the normal case of non-DISCONTIGMEM systems the NODE_DATA() gets
- * optimized to 

Re: [PATCH 5/9] mm: remove CONFIG_DISCONTIGMEM

2021-06-09 Thread David Hildenbrand

On 02.06.21 12:53, Mike Rapoport wrote:

From: Mike Rapoport 

There are no architectures that support DISCONTIGMEM left.

Remove the configuration option and the dead code it was guarding in the
generic memory management code.

Signed-off-by: Mike Rapoport 
---
  include/asm-generic/memory_model.h | 37 --
  include/linux/mmzone.h |  4 ++--
  mm/Kconfig | 25 +++-
  mm/memory.c|  3 +--
  mm/page_alloc.c| 13 ---
  5 files changed, 10 insertions(+), 72 deletions(-)

diff --git a/include/asm-generic/memory_model.h 
b/include/asm-generic/memory_model.h
index 7637fb46ba4f..a2c8ed60233a 100644
--- a/include/asm-generic/memory_model.h
+++ b/include/asm-generic/memory_model.h
@@ -6,47 +6,18 @@
  
  #ifndef __ASSEMBLY__
  
+/*

+ * supports 3 memory models.
+ */
  #if defined(CONFIG_FLATMEM)
  
  #ifndef ARCH_PFN_OFFSET

  #define ARCH_PFN_OFFSET   (0UL)
  #endif
  
-#elif defined(CONFIG_DISCONTIGMEM)

-
-#ifndef arch_pfn_to_nid
-#define arch_pfn_to_nid(pfn)   pfn_to_nid(pfn)
-#endif
-
-#ifndef arch_local_page_offset
-#define arch_local_page_offset(pfn, nid)   \
-   ((pfn) - NODE_DATA(nid)->node_start_pfn)
-#endif
-
-#endif /* CONFIG_DISCONTIGMEM */
-
-/*
- * supports 3 memory models.
- */
-#if defined(CONFIG_FLATMEM)
-
  #define __pfn_to_page(pfn)(mem_map + ((pfn) - ARCH_PFN_OFFSET))
  #define __page_to_pfn(page)   ((unsigned long)((page) - mem_map) + \
 ARCH_PFN_OFFSET)
-#elif defined(CONFIG_DISCONTIGMEM)
-
-#define __pfn_to_page(pfn) \
-({ unsigned long __pfn = (pfn);\
-   unsigned long __nid = arch_pfn_to_nid(__pfn);  \
-   NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
-})
-
-#define __page_to_pfn(pg)  \
-({ const struct page *__pg = (pg); \
-   struct pglist_data *__pgdat = NODE_DATA(page_to_nid(__pg)); \
-   (unsigned long)(__pg - __pgdat->node_mem_map) +  \
-__pgdat->node_start_pfn;\
-})
  
  #elif defined(CONFIG_SPARSEMEM_VMEMMAP)
  
@@ -70,7 +41,7 @@

struct mem_section *__sec = __pfn_to_section(__pfn);\
__section_mem_map_addr(__sec) + __pfn;  \
  })
-#endif /* CONFIG_FLATMEM/DISCONTIGMEM/SPARSEMEM */
+#endif /* CONFIG_FLATMEM/SPARSEMEM */
  
  /*

   * Convert a physical address to a Page Frame Number and back
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0d53eba1c383..2b41e252a995 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -738,8 +738,8 @@ struct zonelist {
struct zoneref _zonerefs[MAX_ZONES_PER_ZONELIST + 1];
  };
  
-#ifndef CONFIG_DISCONTIGMEM

-/* The array of struct pages - for discontigmem use pgdat->lmem_map */
+#ifdef CONFIG_FLATMEM
+/* The array of struct pages for flatmem */
  extern struct page *mem_map;
  #endif
  
diff --git a/mm/Kconfig b/mm/Kconfig

index 02d44e3420f5..218b96ccc84a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -19,7 +19,7 @@ choice
  
  config FLATMEM_MANUAL

bool "Flat Memory"
-   depends on !(ARCH_DISCONTIGMEM_ENABLE || ARCH_SPARSEMEM_ENABLE) || 
ARCH_FLATMEM_ENABLE
+   depends on !ARCH_SPARSEMEM_ENABLE || ARCH_FLATMEM_ENABLE
help
  This option is best suited for non-NUMA systems with
  flat address space. The FLATMEM is the most efficient
@@ -32,21 +32,6 @@ config FLATMEM_MANUAL
  
  	  If unsure, choose this option (Flat Memory) over any other.
  
-config DISCONTIGMEM_MANUAL

-   bool "Discontiguous Memory"
-   depends on ARCH_DISCONTIGMEM_ENABLE
-   help
- This option provides enhanced support for discontiguous
- memory systems, over FLATMEM.  These systems have holes
- in their physical address spaces, and this option provides
- more efficient handling of these holes.
-
- Although "Discontiguous Memory" is still used by several
- architectures, it is considered deprecated in favor of
- "Sparse Memory".
-
- If unsure, choose "Sparse Memory" over this option.
-
  config SPARSEMEM_MANUAL
bool "Sparse Memory"
depends on ARCH_SPARSEMEM_ENABLE
@@ -62,17 +47,13 @@ config SPARSEMEM_MANUAL
  
  endchoice
  
-config DISCONTIGMEM

-   def_bool y
-   depends on (!SELECT_MEMORY_MODEL && ARCH_DISCONTIGMEM_ENABLE) || 
DISCONTIGMEM_MANUAL
-
  config SPARSEMEM
def_bool y
depends on (!SELECT_MEMORY_MODEL && ARCH_SPARSEMEM_ENABLE) || 
SPARSEMEM_MANUAL
  
  config FLATMEM

def_bool y
-   depends on (!DISCONTIGMEM && !SPARSEMEM) || FLATMEM_MANUAL
+   depends on !SPARSEMEM || FLATMEM_MANUAL
  
  config FLAT_NODE_MEM_MAP

def_bool y
@@ -85,7 +66,7 @@ config FLAT_NODE_MEM_MAP
  #
  config NEED_MULTIPLE_NODES
def_bool y
- 

Re: [PATCH 3/9] arc: remove support for DISCONTIGMEM

2021-06-09 Thread David Hildenbrand

On 02.06.21 12:53, Mike Rapoport wrote:

From: Mike Rapoport 

DISCONTIGMEM was replaced by FLATMEM with freeing of the unused memory map
in v5.11.

Remove the support for DISCONTIGMEM entirely.

Signed-off-by: Mike Rapoport 


Acked-by: David Hildenbrand 

--
Thanks,

David / dhildenb



Re: [PATCH 2/9] arc: update comment about HIGHMEM implementation

2021-06-09 Thread David Hildenbrand

On 02.06.21 12:53, Mike Rapoport wrote:

From: Mike Rapoport 

Arc does not use DISCONTIGMEM to implement high memory, update the comment
describing how high memory works to reflect this.

Signed-off-by: Mike Rapoport 
---
  arch/arc/mm/init.c | 13 +
  1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c
index e2ed355438c9..397a201adfe3 100644
--- a/arch/arc/mm/init.c
+++ b/arch/arc/mm/init.c
@@ -139,16 +139,13 @@ void __init setup_arch_memory(void)
  
  #ifdef CONFIG_HIGHMEM

/*
-* Populate a new node with highmem
-*
 * On ARC (w/o PAE) HIGHMEM addresses are actually smaller (0 based)
-* than addresses in normal ala low memory (0x8000_ based).
+* than addresses in normal aka low memory (0x8000_ based).
 * Even with PAE, the huge peripheral space hole would waste a lot of
-* mem with single mem_map[]. This warrants a mem_map per region design.
-* Thus HIGHMEM on ARC is imlemented with DISCONTIGMEM.
-*
-* DISCONTIGMEM in turns requires multiple nodes. node 0 above is
-* populated with normal memory zone while node 1 only has highmem
+* mem with single contiguous mem_map[].
+* Thus when HIGHMEM on ARC is enabled the memory map corresponding
+* to the hole is freed and ARC specific version of pfn_valid()
+* handles the hole in the memory map.
 */
  #ifdef CONFIG_DISCONTIGMEM
node_set_online(1);



Reviewed-by: David Hildenbrand 

--
Thanks,

David / dhildenb



Re: [PATCH 1/9] alpha: remove DISCONTIGMEM and NUMA

2021-06-09 Thread David Hildenbrand

On 02.06.21 12:53, Mike Rapoport wrote:

From: Mike Rapoport 

NUMA is marked broken on alpha for more than 15 years and DISCONTIGMEM was
replaced with SPARSEMEM in v5.11.

Remove both NUMA and DISCONTIGMEM support from alpha.

Signed-off-by: Mike Rapoport 
---
  arch/alpha/Kconfig|  22 ---
  arch/alpha/include/asm/machvec.h  |   6 -
  arch/alpha/include/asm/mmzone.h   | 100 --
  arch/alpha/include/asm/pgtable.h  |   4 -
  arch/alpha/include/asm/topology.h |  39 --
  arch/alpha/kernel/core_marvel.c   |  53 +--
  arch/alpha/kernel/core_wildfire.c |  29 +---
  arch/alpha/kernel/pci_iommu.c |  29 
  arch/alpha/kernel/proto.h |   8 --
  arch/alpha/kernel/setup.c |  16 ---
  arch/alpha/kernel/sys_marvel.c|   5 -
  arch/alpha/kernel/sys_wildfire.c  |   5 -
  arch/alpha/mm/Makefile|   2 -
  arch/alpha/mm/init.c  |   3 -
  arch/alpha/mm/numa.c  | 223 --
  15 files changed, 4 insertions(+), 540 deletions(-)
  delete mode 100644 arch/alpha/include/asm/mmzone.h
  delete mode 100644 arch/alpha/mm/numa.c

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 5998106faa60..8954216b9956 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -549,29 +549,12 @@ config NR_CPUS
  MARVEL support can handle a maximum of 32 CPUs, all the others
  with working support have a maximum of 4 CPUs.
  
-config ARCH_DISCONTIGMEM_ENABLE

-   bool "Discontiguous Memory Support"
-   depends on BROKEN
-   help
- Say Y to support efficient handling of discontiguous physical memory,
- for architectures which are either NUMA (Non-Uniform Memory Access)
- or have huge holes in the physical address space for other reasons.
- See  for more.
-
  config ARCH_SPARSEMEM_ENABLE
bool "Sparse Memory Support"
help
  Say Y to support efficient handling of discontiguous physical memory,
  for systems that have huge holes in the physical address space.
  
-config NUMA

-   bool "NUMA Support (EXPERIMENTAL)"
-   depends on DISCONTIGMEM && BROKEN
-   help
- Say Y to compile the kernel to support NUMA (Non-Uniform Memory
- Access).  This option is for configuring high-end multiprocessor
- server machines.  If in doubt, say N.
-
  config ALPHA_WTINT
bool "Use WTINT" if ALPHA_SRM || ALPHA_GENERIC
default y if ALPHA_QEMU
@@ -596,11 +579,6 @@ config ALPHA_WTINT
  
  	  If unsure, say N.
  
-config NODES_SHIFT

-   int
-   default "7"
-   depends on NEED_MULTIPLE_NODES
-
  # LARGE_VMALLOC is racy, if you *really* need it then fix it first
  config ALPHA_LARGE_VMALLOC
bool
diff --git a/arch/alpha/include/asm/machvec.h b/arch/alpha/include/asm/machvec.h
index a4e96e2bec74..e49fabce7b33 100644
--- a/arch/alpha/include/asm/machvec.h
+++ b/arch/alpha/include/asm/machvec.h
@@ -99,12 +99,6 @@ struct alpha_machine_vector
  
  	const char *vector_name;
  
-	/* NUMA information */

-   int (*pa_to_nid)(unsigned long);
-   int (*cpuid_to_nid)(int);
-   unsigned long (*node_mem_start)(int);
-   unsigned long (*node_mem_size)(int);
-
/* System specific parameters.  */
union {
struct {
diff --git a/arch/alpha/include/asm/mmzone.h b/arch/alpha/include/asm/mmzone.h
deleted file mode 100644
index 86644604d977..
--- a/arch/alpha/include/asm/mmzone.h
+++ /dev/null
@@ -1,100 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Written by Kanoj Sarcar (ka...@sgi.com) Aug 99
- * Adapted for the alpha wildfire architecture Jan 2001.
- */
-#ifndef _ASM_MMZONE_H_
-#define _ASM_MMZONE_H_
-
-#ifdef CONFIG_DISCONTIGMEM
-
-#include 
-
-/*
- * Following are macros that are specific to this numa platform.
- */
-
-extern pg_data_t node_data[];
-
-#define alpha_pa_to_nid(pa)\
-(alpha_mv.pa_to_nid\
-? alpha_mv.pa_to_nid(pa)   \
-: (0))
-#define node_mem_start(nid)\
-(alpha_mv.node_mem_start   \
-? alpha_mv.node_mem_start(nid) \
-: (0UL))
-#define node_mem_size(nid) \
-(alpha_mv.node_mem_size\
-? alpha_mv.node_mem_size(nid)  \
-: ((nid) ? (0UL) : (~0UL)))
-
-#define pa_to_nid(pa)  alpha_pa_to_nid(pa)
-#define NODE_DATA(nid) (_data[(nid)])
-
-#define node_localnr(pfn, nid) ((pfn) - NODE_DATA(nid)->node_start_pfn)
-
-#if 1
-#define PLAT_NODE_DATA_LOCALNR(p, n)   \
-   (((p) >> PAGE_SHIFT) - PLAT_NODE_DATA(n)->gendata.node_start_pfn)
-#else
-static inline unsigned long
-PLAT_NODE_DATA_LOCALNR(unsigned long p, int n)
-{
-   unsigned long temp;
-   temp = p >> PAGE_SHIFT;
-   return temp - PLAT_NODE_DATA(n)->gendata.node_start_pfn;
-}
-#endif
-
-/*
- * Following are macros that each numa implementation must define.
- */
-
-/*
- * Given a kernel address, find the home node of the 

Re: [RFC] powerpc/pseries: Interface to represent PAPR firmware attributes

2021-06-09 Thread Pratik Sampat

Hello,
Thank you for your comments on the design.

On 09/06/21 3:43 am, Fabiano Rosas wrote:

"Pratik R. Sampat"  writes:

Hi, I have some general comments and questions, mostly trying to
understand design of the hcall and use cases of the sysfs data:


Adds a generic interface to represent the energy and frequency related
PAPR attributes on the system using the new H_CALL
"H_GET_ENERGY_SCALE_INFO".

H_GET_EM_PARMS H_CALL was previously responsible for exporting this
information in the lparcfg, however the H_GET_EM_PARMS H_CALL
will be deprecated P10 onwards.

The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
hcall(
   uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
   uint64 flags,   // Per the flag request
   uint64 firstAttributeId,// The attribute id
   uint64 bufferAddress,   // The logical address of the output buffer

Instead of logical address, guest address or guest physical address
would be more precise.


Yes, the name guest physical address makes more sense for this attribute.
The term logical address had me confused too when I first read it in the ACR,
however that isn't the case.

I'll change it to guest physical address here. Thanks for pointing out.




   uint64 bufferSize   // The size in bytes of the output buffer
);

This H_CALL can query either all the attributes at once with
firstAttributeId = 0, flags = 0 as well as query only one attribute
at a time with firstAttributeId = id

The output buffer consists of the following
1. number of attributes  - 8 bytes
2. array offset to the data location - 8 bytes

The offset is from the start of the buffer, isn't it? So not the array
offset.


Yes,the offset carries information that is to the start of the data buffer.


3. version info  - 1 byte
4. A data array of size num attributes, which contains the following:
   a. attribute ID  - 8 bytes
   b. attribute value in number - 8 bytes
   c. attribute name in string  - 64 bytes
   d. attribute value in string - 64 bytes

Is this new hypercall already present in the spec? These seem a bit
underspecified to me.


Yes, it is present in the spec. I probably summarized a little more than needed
here and I could expand upon below.

The input buffer recives the following data:

1. “flags”:
a. Bit 0: singleAttribute
If set to 1, only return the single attribute matching 
firstAttributeId.
b. Bits 1-63: Reserved
2. “firstAttributeId”: The first attribute to retrieve
3. “bufferAddress”: The logical real address of the start of the output buffer
4. “bufferSize”: The size in bytes of the output buffer


From the document, the format of the output buffer is as follows:

Table 1 --> output buffer

| Field Name   | Byte   | Length   |  Description
|  | Offset | in Bytes |

| NumberOf ||  | Number of Attributes in Buffer
| AttributesInBuffer   | 0x000  | 0x08 |

| AttributeArrayOffset | 0x008  | 0x08 | Byte offset to start of Array
|  ||  | of Attributes
|  ||  |

| OutputBufferData ||  | Version of the Header.
| HeaderVersion| 0x010  | 0x01 | The header will be always
| AttributesInBuffer   ||  | backward compatible, and changes
|  ||  | will not impact the Array of
|  ||  | attributes.
|  ||  | Current version = 0x01

| ArrayOfAttributes||  | The array will contain
|  ||  | "NumberOfAttributesInBuffer"
|  ||  | array elements not to exceed
|  ||  | the size of the buffer.
|  ||  | Layout of the array is
|  ||  | detailed in Table 2.



Table 2 --> Array of attributes

| Field Name   | Byte   | Length   |  Description
|  | Offset | in Bytes |

| 1st AttributeId  | 0x000  | 0x08 | The ID of the Attribute

| 1st AttributeValue   | 0x008  | 0x08 | The numerical 

Re: [PATCH v1 05/12] mm/memory_hotplug: remove nid parameter from remove_memory() and friends

2021-06-09 Thread David Hildenbrand

On 08.06.21 13:18, David Hildenbrand wrote:

On 08.06.21 13:11, Michael Ellerman wrote:

David Hildenbrand  writes:

There is only a single user remaining. We can simply try to offline all
online nodes - which is fast, because we usually span pages and can skip
such nodes right away.


That makes me slightly nervous, because our big powerpc boxes tend to
trip on these scaling issues before others.

But the spanned pages check is just:

void try_offline_node(int nid)
{
pg_data_t *pgdat = NODE_DATA(nid);
  ...
if (pgdat->node_spanned_pages)
return;

So I guess that's pretty cheap, and it's only O(nodes), which should
never get that big.


Exactly. And if it does turn out to be a problem, we can walk all memory
blocks before removing them, collecting the nid(s).



I might just do the following on top:

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 61bff8f3bfb1..bbc26fdac364 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -2176,7 +2176,9 @@ int __ref offline_pages(unsigned long start_pfn, unsigned 
long nr_pages,
 static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 {
int ret = !is_memblock_offlined(mem);
+   int *nid = arg;
 
+   *nid = mem->nid;

if (unlikely(ret)) {
phys_addr_t beginpa, endpa;
 
@@ -2271,10 +2273,10 @@ EXPORT_SYMBOL(try_offline_node);
 
 static int __ref try_remove_memory(u64 start, u64 size)

 {
-   int rc = 0, nid;
struct vmem_altmap mhp_altmap = {};
struct vmem_altmap *altmap = NULL;
unsigned long nr_vmemmap_pages;
+   int rc = 0, nid = NUMA_NO_NODE;
 
BUG_ON(check_hotplug_memory_range(start, size));
 
@@ -2282,8 +2284,12 @@ static int __ref try_remove_memory(u64 start, u64 size)

 * All memory blocks must be offlined before removing memory.  Check
 * whether all memory blocks in question are offline and return error
 * if this is not the case.
+*
+* While at it, determine the nid. Note that if we'd have mixed nodes,
+* we'd only try to offline the last determined one -- which is good
+* enough for the cases we care about.
 */
-   rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb);
+   rc = walk_memory_blocks(start, size, , check_memblock_offlined_cb);
if (rc)
return rc;
 
@@ -2332,7 +2338,7 @@ static int __ref try_remove_memory(u64 start, u64 size)
 
release_mem_region_adjustable(start, size);
 
-   for_each_online_node(nid)

+   if (nid != NUMA_NO_NODE)
try_offline_node(nid);
 
mem_hotplug_done();




--
Thanks,

David / dhildenb



Re: [PATCH] powerpc/bpf: Use bctrl for making function calls

2021-06-09 Thread Christophe Leroy




Le 09/06/2021 à 11:00, Naveen N. Rao a écrit :

blrl corrupts the link stack. Instead use bctrl when making function
calls from BPF programs.


What's the link stack ? Is it the PPC64 branch predictor stack ?



Reported-by: Anton Blanchard 
Signed-off-by: Naveen N. Rao 
---
  arch/powerpc/include/asm/ppc-opcode.h |  1 +
  arch/powerpc/net/bpf_jit_comp32.c |  4 ++--
  arch/powerpc/net/bpf_jit_comp64.c | 12 ++--
  3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index ac41776661e963..1abacb8417d562 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -451,6 +451,7 @@
  #define PPC_RAW_MTLR(r)   (0x7c0803a6 | ___PPC_RT(r))
  #define PPC_RAW_MFLR(t)   (PPC_INST_MFLR | ___PPC_RT(t))
  #define PPC_RAW_BCTR()(PPC_INST_BCTR)
+#define PPC_RAW_BCTRL()(PPC_INST_BCTRL)


Can you use the numeric value instead of the PPC_INST_BCTRL, to avoid conflict with 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/4ca2bfdca2f47a293d05f61eb3c4e487ee170f1f.1621506159.git.christophe.le...@csgroup.eu/



  #define PPC_RAW_MTCTR(r)  (PPC_INST_MTCTR | ___PPC_RT(r))
  #define PPC_RAW_ADDI(d, a, i) (PPC_INST_ADDI | ___PPC_RT(d) | 
___PPC_RA(a) | IMM_L(i))
  #define PPC_RAW_LI(r, i)  PPC_RAW_ADDI(r, 0, i)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index bbb16099e8c7fa..40ab50bea61c02 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -195,8 +195,8 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 fun
/* Load function address into r0 */
EMIT(PPC_RAW_LIS(__REG_R0, IMM_H(func)));
EMIT(PPC_RAW_ORI(__REG_R0, __REG_R0, IMM_L(func)));
-   EMIT(PPC_RAW_MTLR(__REG_R0));
-   EMIT(PPC_RAW_BLRL());
+   EMIT(PPC_RAW_MTCTR(__REG_R0));
+   EMIT(PPC_RAW_BCTRL());
}
  }
  
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c

index 57a8c1153851a0..ae9a6532be6ad4 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -153,8 +153,8 @@ static void bpf_jit_emit_func_call_hlp(u32 *image, struct 
codegen_context *ctx,
PPC_LI64(b2p[TMP_REG_2], func);
/* Load actual entry point from function descriptor */
PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0);
-   /* ... and move it to LR */
-   EMIT(PPC_RAW_MTLR(b2p[TMP_REG_1]));
+   /* ... and move it to CTR */
+   EMIT(PPC_RAW_MTCTR(b2p[TMP_REG_1]));
/*
 * Load TOC from function descriptor at offset 8.
 * We can clobber r2 since we get called through a
@@ -165,9 +165,9 @@ static void bpf_jit_emit_func_call_hlp(u32 *image, struct 
codegen_context *ctx,
  #else
/* We can clobber r12 */
PPC_FUNC_ADDR(12, func);
-   EMIT(PPC_RAW_MTLR(12));
+   EMIT(PPC_RAW_MTCTR(12));
  #endif
-   EMIT(PPC_RAW_BLRL());
+   EMIT(PPC_RAW_BCTRL());
  }
  
  void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func)

@@ -202,8 +202,8 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 fun
PPC_BPF_LL(12, 12, 0);
  #endif
  
-	EMIT(PPC_RAW_MTLR(12));

-   EMIT(PPC_RAW_BLRL());
+   EMIT(PPC_RAW_MTCTR(12));
+   EMIT(PPC_RAW_BCTRL());
  }
  
  static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)


base-commit: 112f47a1484ddca610b70cbe4a99f0d0f1701daf



Re: [PATCH 07/16] rbd: use memzero_bvec

2021-06-09 Thread Ilya Dryomov
On Tue, Jun 8, 2021 at 6:06 PM Christoph Hellwig  wrote:
>
> Use memzero_bvec instead of reimplementing it.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/rbd.c | 15 ++-
>  1 file changed, 2 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index bbb88eb009e0..eb243fc4d108 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1219,24 +1219,13 @@ static void rbd_dev_mapping_clear(struct rbd_device 
> *rbd_dev)
> rbd_dev->mapping.size = 0;
>  }
>
> -static void zero_bvec(struct bio_vec *bv)
> -{
> -   void *buf;
> -   unsigned long flags;
> -
> -   buf = bvec_kmap_irq(bv, );
> -   memset(buf, 0, bv->bv_len);
> -   flush_dcache_page(bv->bv_page);
> -   bvec_kunmap_irq(buf, );
> -}
> -
>  static void zero_bios(struct ceph_bio_iter *bio_pos, u32 off, u32 bytes)
>  {
> struct ceph_bio_iter it = *bio_pos;
>
> ceph_bio_iter_advance(, off);
> ceph_bio_iter_advance_step(, bytes, ({
> -   zero_bvec();
> +   memzero_bvec();
> }));
>  }
>
> @@ -1246,7 +1235,7 @@ static void zero_bvecs(struct ceph_bvec_iter *bvec_pos, 
> u32 off, u32 bytes)
>
> ceph_bvec_iter_advance(, off);
> ceph_bvec_iter_advance_step(, bytes, ({
> -   zero_bvec();
> +   memzero_bvec();
> }));
>  }
>

Ira already brought up the fact that this conversion drops
flush_dcache_page() calls throughout.  Other than that:

Acked-by: Ilya Dryomov 

Thanks,

Ilya


Re: [PATCH 04/16] bvec: add a bvec_kmap_local helper

2021-06-09 Thread Ilya Dryomov
On Tue, Jun 8, 2021 at 6:06 PM Christoph Hellwig  wrote:
>
> Add a helper to call kmap_local_page on a bvec.  There is no need for
> an unmap helper given that kunmap_local accept any address in the mapped
> page.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/bvec.h | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> index 883faf5f1523..d64d6c0ceb77 100644
> --- a/include/linux/bvec.h
> +++ b/include/linux/bvec.h
> @@ -7,6 +7,7 @@
>  #ifndef __LINUX_BVEC_H
>  #define __LINUX_BVEC_H
>
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -183,4 +184,9 @@ static inline void bvec_advance(const struct bio_vec 
> *bvec,
> }
>  }
>
> +static inline void *bvec_kmap_local(struct bio_vec *bvec)
> +{
> +   return kmap_local_page(bvec->bv_page) + bvec->bv_offset;
> +}
> +
>  #endif /* __LINUX_BVEC_H */

Might be useful to add the second sentence of the commit message as
a comment for bvec_kmap_local().  It could be expanded to mention the
single-page bvec caveat too.

Thanks,

Ilya


[PATCH] powerpc/bpf: Use bctrl for making function calls

2021-06-09 Thread Naveen N. Rao
blrl corrupts the link stack. Instead use bctrl when making function
calls from BPF programs.

Reported-by: Anton Blanchard 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/ppc-opcode.h |  1 +
 arch/powerpc/net/bpf_jit_comp32.c |  4 ++--
 arch/powerpc/net/bpf_jit_comp64.c | 12 ++--
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index ac41776661e963..1abacb8417d562 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -451,6 +451,7 @@
 #define PPC_RAW_MTLR(r)(0x7c0803a6 | ___PPC_RT(r))
 #define PPC_RAW_MFLR(t)(PPC_INST_MFLR | ___PPC_RT(t))
 #define PPC_RAW_BCTR() (PPC_INST_BCTR)
+#define PPC_RAW_BCTRL()(PPC_INST_BCTRL)
 #define PPC_RAW_MTCTR(r)   (PPC_INST_MTCTR | ___PPC_RT(r))
 #define PPC_RAW_ADDI(d, a, i)  (PPC_INST_ADDI | ___PPC_RT(d) | 
___PPC_RA(a) | IMM_L(i))
 #define PPC_RAW_LI(r, i)   PPC_RAW_ADDI(r, 0, i)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index bbb16099e8c7fa..40ab50bea61c02 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -195,8 +195,8 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 fun
/* Load function address into r0 */
EMIT(PPC_RAW_LIS(__REG_R0, IMM_H(func)));
EMIT(PPC_RAW_ORI(__REG_R0, __REG_R0, IMM_L(func)));
-   EMIT(PPC_RAW_MTLR(__REG_R0));
-   EMIT(PPC_RAW_BLRL());
+   EMIT(PPC_RAW_MTCTR(__REG_R0));
+   EMIT(PPC_RAW_BCTRL());
}
 }
 
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 57a8c1153851a0..ae9a6532be6ad4 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -153,8 +153,8 @@ static void bpf_jit_emit_func_call_hlp(u32 *image, struct 
codegen_context *ctx,
PPC_LI64(b2p[TMP_REG_2], func);
/* Load actual entry point from function descriptor */
PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0);
-   /* ... and move it to LR */
-   EMIT(PPC_RAW_MTLR(b2p[TMP_REG_1]));
+   /* ... and move it to CTR */
+   EMIT(PPC_RAW_MTCTR(b2p[TMP_REG_1]));
/*
 * Load TOC from function descriptor at offset 8.
 * We can clobber r2 since we get called through a
@@ -165,9 +165,9 @@ static void bpf_jit_emit_func_call_hlp(u32 *image, struct 
codegen_context *ctx,
 #else
/* We can clobber r12 */
PPC_FUNC_ADDR(12, func);
-   EMIT(PPC_RAW_MTLR(12));
+   EMIT(PPC_RAW_MTCTR(12));
 #endif
-   EMIT(PPC_RAW_BLRL());
+   EMIT(PPC_RAW_BCTRL());
 }
 
 void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func)
@@ -202,8 +202,8 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 fun
PPC_BPF_LL(12, 12, 0);
 #endif
 
-   EMIT(PPC_RAW_MTLR(12));
-   EMIT(PPC_RAW_BLRL());
+   EMIT(PPC_RAW_MTCTR(12));
+   EMIT(PPC_RAW_BCTRL());
 }
 
 static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, 
u32 out)

base-commit: 112f47a1484ddca610b70cbe4a99f0d0f1701daf
-- 
2.31.1



Re: [PATCH v2 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug

2021-06-09 Thread David Gibson
On Wed, Jun 09, 2021 at 02:51:49AM -0300, Leonardo Brás wrote:
> On Wed, 2021-06-09 at 14:40 +1000, David Gibson wrote:
> > On Tue, Jun 08, 2021 at 09:52:10PM -0300, Leonardo Brás wrote:
> > > On Mon, 2021-06-07 at 15:02 +1000, David Gibson wrote:
> > > > On Fri, Apr 30, 2021 at 11:36:06AM -0300, Leonardo Bras wrote:
> > > > > Because hypervisors may need to create HPTs without knowing the
> > > > > guest
> > > > > page size, the smallest used page-size (4k) may be chosen,
> > > > > resulting in
> > > > > a HPT that is possibly bigger than needed.
> > > > > 
> > > > > On a guest with bigger page-sizes, the amount of entries for
> > > > > HTP
> > > > > may be
> > > > > too high, causing the guest to ask for a HPT resize-down on the
> > > > > first
> > > > > hotplug.
> > > > > 
> > > > > This becomes a problem when HPT resize-down fails, and causes
> > > > > the
> > > > > HPT resize to be performed on every LMB added, until HPT size
> > > > > is
> > > > > compatible to guest memory size, causing a major slowdown.
> > > > > 
> > > > > So, avoiding HPT resizing-down on hot-add significantly
> > > > > improves
> > > > > memory
> > > > > hotplug times.
> > > > > 
> > > > > As an example, hotplugging 256GB on a 129GB guest took 710s
> > > > > without
> > > > > this
> > > > > patch, and 21s after applied.
> > > > > 
> > > > > Signed-off-by: Leonardo Bras 
> > > > 
> > > > Sorry it's taken me so long to look at these
> > > > 
> > > > I don't love the extra statefulness that the 'shrinking'
> > > > parameter
> > > > adds, but I can't see an elegant way to avoid it, so:
> > > > 
> > > > Reviewed-by: David Gibson 
> > > 
> > > np, thanks for reviewing!
> > 
> > Actually... I take that back.  With the subsequent patches my
> > discomfort with the complexity of implementing the batching grew.
> > 
> > I think I can see a simpler way - although it wasn't as clear as I
> > thought it might be, without some deep history on this feature.
> > 
> > What's going on here is pretty hard to follow, because it starts in
> > arch-specific code (arch/powerpc/platforms/pseries/hotplug-memory.c)
> > where it processes the add/remove requests, then goes into generic
> > code __add_memory() which eventually emerges back in arch specific
> > code (hash__create_section_mapping()).
> > 
> > The HPT resizing calls are in the "inner" arch specific section,
> > whereas it's only the outer arch section that has the information to
> > batch properly.  The mutex and 'shrinking' parameter in Leonardo's
> > code are all about conveying information from the outer to inner
> > section.
> > 
> > Now, I think the reason I had the resize calls in the inner section
> > was to accomodate the notion that a) pHyp might support resizing in
> > future, and it could come in through a different path with its drmgr
> > thingy and/or b) bare metal hash architectures might want to
> > implement
> > hash resizing, and this would make at least part of the path common.
> > 
> > Given the decreasing relevance of hash MMUs, I think we can now
> > safely
> > say neither of these is ever going to happen.
> > 
> > Therefore, we can simplify things by moving the HPT resize calls into
> > the pseries LMB code, instead of create/remove_section_mapping.  Then
> > to do batching without extra complications we just need this logic
> > for
> > all resizes (both add and remove):
> > 
> > let new_hpt_order = expected HPT size for new mem size;
> > 
> > if (new_hpt_order > current_hpt_order)
> > resize to new_hpt_order
> > 
> > add/remove memory
> > 
> > if (new_hpt_order < current_hpt_order - 1)
> > resize to new_hpt_order
> > 
> > 
> 
> 
> Ok, that really does seem to simplify a lot the batching.
> 
> Question:
> by LMB code, you mean dlpar_memory_{add,remove}_by_* ?
> (dealing only with dlpar_{add,remove}_lmb() would not be enough to deal
> with batching)

I was thinking of a two stage process.  First moving the resizes to
dlpar_{add,remote}_lmb() (not changing behaviour for the pseries dlpar
path), then implementing the batching by moving to the {add,remove}_by
functions.

But..

> In my 3/3 repĺy I sent you some other examples of functions that
> currently end up calling resize_hpt_for_hotplug() without comming from 
> hotplug-memory.c. Is that ok that they do not call it anymore?

..as I replied there, I was wrong about it being safe to move the
resizes all to the pseries dlpar code.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v1 1/1] powerpc/prom_init: Move custom isspace() to its own namespace

2021-06-09 Thread Michael Ellerman
Andy Shevchenko  writes:
> On Mon, May 10, 2021 at 05:49:25PM +0300, Andy Shevchenko wrote:
>> If by some reason any of the headers will include ctype.h
>> we will have a name collision. Avoid this by moving isspace()
>> to the dedicate namespace.
>> 
>> First appearance of the code is in the commit cf68787b68a2
>> ("powerpc/prom_init: Evaluate mem kernel parameter for early allocation").
>
> Any comments on this?

Looks fine. Thanks.

I just missed it because it came in a bit early, I tend not to pick
things up until rc2.

I tweaked the formatting of prom_isxdigit() slightly now that we allow
100 column lines.

Have put it in my next-test now.

cheers

>> diff --git a/arch/powerpc/kernel/prom_init.c 
>> b/arch/powerpc/kernel/prom_init.c
>> index 41ed7e33d897..6845cbbc0cd4 100644
>> --- a/arch/powerpc/kernel/prom_init.c
>> +++ b/arch/powerpc/kernel/prom_init.c
>> @@ -701,13 +701,13 @@ static int __init prom_setprop(phandle node, const 
>> char *nodename,
>>  }
>>  
>>  /* We can't use the standard versions because of relocation headaches. */
>> -#define isxdigit(c) (('0' <= (c) && (c) <= '9') \
>> - || ('a' <= (c) && (c) <= 'f') \
>> - || ('A' <= (c) && (c) <= 'F'))
>> +#define prom_isxdigit(c)(('0' <= (c) && (c) <= '9') \
>> + || ('a' <= (c) && (c) <= 'f') \
>> + || ('A' <= (c) && (c) <= 'F'))
>>  
>> -#define isdigit(c)  ('0' <= (c) && (c) <= '9')
>> -#define islower(c)  ('a' <= (c) && (c) <= 'z')
>> -#define toupper(c)  (islower(c) ? ((c) - 'a' + 'A') : (c))
>> +#define prom_isdigit(c) ('0' <= (c) && (c) <= '9')
>> +#define prom_islower(c) ('a' <= (c) && (c) <= 'z')
>> +#define prom_toupper(c) (prom_islower(c) ? ((c) - 'a' + 'A') : 
>> (c))
>>  
>>  static unsigned long prom_strtoul(const char *cp, const char **endp)
>>  {
>> @@ -716,14 +716,14 @@ static unsigned long prom_strtoul(const char *cp, 
>> const char **endp)
>>  if (*cp == '0') {
>>  base = 8;
>>  cp++;
>> -if (toupper(*cp) == 'X') {
>> +if (prom_toupper(*cp) == 'X') {
>>  cp++;
>>  base = 16;
>>  }
>>  }
>>  
>> -while (isxdigit(*cp) &&
>> -   (value = isdigit(*cp) ? *cp - '0' : toupper(*cp) - 'A' + 10) < 
>> base) {
>> +while (prom_isxdigit(*cp) &&
>> +   (value = prom_isdigit(*cp) ? *cp - '0' : prom_toupper(*cp) - 'A' 
>> + 10) < base) {
>>  result = result * base + value;
>>  cp++;
>>  }
>> -- 
>> 2.30.2
>> 
>
> -- 
> With Best Regards,
> Andy Shevchenko


Re: [PATCH v3 8/9] mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA

2021-06-09 Thread Mike Rapoport
On Tue, Jun 08, 2021 at 05:25:44PM -0700, Andrew Morton wrote:
> On Tue,  8 Jun 2021 12:13:15 +0300 Mike Rapoport  wrote:
> 
> > From: Mike Rapoport 
> > 
> > After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA
> > configuration options are equivalent.
> > 
> > Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead.
> > 
> > Done with
> > 
> > $ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \
> > $(git grep -wl CONFIG_NEED_MULTIPLE_NODES)
> > $ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \
> > $(git grep -wl NEED_MULTIPLE_NODES)
> > 
> > with manual tweaks afterwards.
> > 
> > ...
> >
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -987,7 +987,7 @@ extern int movable_zone;
> >  #ifdef CONFIG_HIGHMEM
> >  static inline int zone_movable_is_highmem(void)
> >  {
> > -#ifdef CONFIG_NEED_MULTIPLE_NODES
> > +#ifdef CONFIG_NUMA
> > return movable_zone == ZONE_HIGHMEM;
> >  #else
> > return (ZONE_MOVABLE - 1) == ZONE_HIGHMEM;
> 
> I dropped this hunk - your "mm/mmzone.h: simplify is_highmem_idx()"
> removed zone_movable_is_highmem().  

Ah, right.
Thanks!

-- 
Sincerely yours,
Mike.


[PATCH] powerpc: Move update_power8_hid0() into its only user

2021-06-09 Thread Christophe Leroy
update_power8_hid0() is used only by powernv platform subcore.c

Move it there.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/reg.h   | 10 --
 arch/powerpc/platforms/powernv/subcore.c | 10 ++
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 3bb01a8779c9..c5a527489ba5 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1445,16 +1445,6 @@ extern void scom970_write(unsigned int address, unsigned 
long value);
 struct pt_regs;
 
 extern void ppc_save_regs(struct pt_regs *regs);
-
-static inline void update_power8_hid0(unsigned long hid0)
-{
-   /*
-*  The HID0 update on Power8 should at the very least be
-*  preceded by a SYNC instruction followed by an ISYNC
-*  instruction
-*/
-   asm volatile("sync; mtspr %0,%1; isync":: "i"(SPRN_HID0), "r"(hid0));
-}
 #endif /* __ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_REG_H */
diff --git a/arch/powerpc/platforms/powernv/subcore.c 
b/arch/powerpc/platforms/powernv/subcore.c
index 73207b53dc2b..4fe0594c3f4d 100644
--- a/arch/powerpc/platforms/powernv/subcore.c
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -169,6 +169,16 @@ static void update_hid_in_slw(u64 hid0)
}
 }
 
+static void update_power8_hid0(unsigned long hid0)
+{
+   /*
+*  The HID0 update on Power8 should at the very least be
+*  preceded by a SYNC instruction followed by an ISYNC
+*  instruction
+*/
+   asm volatile("sync; mtspr %0,%1; isync":: "i"(SPRN_HID0), "r"(hid0));
+}
+
 static void unsplit_core(void)
 {
u64 hid0, mask;
-- 
2.25.0



Re: [PATCH v2 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug

2021-06-09 Thread David Gibson
On Wed, Jun 09, 2021 at 02:30:36AM -0300, Leonardo Brás wrote:
> On Mon, 2021-06-07 at 15:20 +1000, David Gibson wrote:
> > On Fri, Apr 30, 2021 at 11:36:10AM -0300, Leonardo Bras wrote:
> > > During memory hotunplug, after each LMB is removed, the HPT may be
> > > resized-down if it would map a max of 4 times the current amount of
> > > memory.
> > > (2 shifts, due to introduced histeresis)
> > > 
> > > It usually is not an issue, but it can take a lot of time if HPT
> > > resizing-down fails. This happens  because resize-down failures
> > > usually repeat at each LMB removal, until there are no more bolted
> > > entries
> > > conflict, which can take a while to happen.
> > > 
> > > This can be solved by doing a single HPT resize at the end of
> > > memory
> > > hotunplug, after all requested entries are removed.
> > > 
> > > To make this happen, it's necessary to temporarily disable all HPT
> > > resize-downs before hotunplug, re-enable them after hotunplug ends,
> > > and then resize-down HPT to the current memory size.
> > > 
> > > As an example, hotunplugging 256GB from a 385GB guest took 621s
> > > without
> > > this patch, and 100s after applied.
> > > 
> > > Signed-off-by: Leonardo Bras 
> > 
> > Hrm.  This looks correct, but it seems overly complicated.
> > 
> > AFAICT, the resize calls that this adds should in practice be the
> > *only* times we call resize, all the calls from the lower level code
> > should be suppressed. 
> 
> That's correct.
> 
> >  In which case can't we just remove those calls
> > entirely, and not deal with the clunky locking and exclusion here.
> > That should also remove the need for the 'shrinking' parameter in
> > 1/3.
> 
> 
> If I get your suggestion correctly, you suggest something like:
> 1 - Never calling resize_hpt_for_hotplug() in
> hash__remove_section_mapping(), thus not needing the srinking
> parameter.
> 2 - Functions in hotplug-memory.c that call dlpar_remove_lmb() would in
> fact call another function to do the batch resize_hpt_for_hotplug() for
> them

Basically, yes.

> If so, that assumes that no other function that currently calls
> resize_hpt_for_hotplug() under another path, or if they do, it does not
> need to actually resize the HPT.
> 
> Is the above correct?
> 
> There are some examples of functions that currently call
> resize_hpt_for_hotplug() by another path:
> 
> add_memory_driver_managed
>   virtio_mem_add_memory
>   dev_dax_kmem_probe

Oh... virtio-mem.  I didn't think of that.


> reserve_additional_memory
>   balloon_process
>   add_ballooned_pages

AFAICT this comes from drivers/xen, and Xen has never been a thing on
POWER.

> __add_memory
>   probe_store

So this is a sysfs triggered memory add.  If the user is doing this
manually, then I think it's reasonable for them to manually manage the
HPT size as well, which they can do through debugfs.  I think it might
also be used my drmgr under pHyp, but pHyp doesn't support HPT
resizing.

> __remove_memory
>   pseries_remove_memblock

Huh, this one comes through OF_RECONFIG_DETACH_NODE.  I don't really
know when those happen, but I strongly suspect it's only under pHyp
again.

> remove_memory
>   dev_dax_kmem_remove
>   virtio_mem_remove_memory

virtio-mem again.

> memunmap_pages
>   pci_p2pdma_add_resource
>   virtio_fs_setup_dax

And virtio-fs in dax mode.  Didn't think of that either.


Ugh, yeah, I'm used to the world where the platform provides the only
way of hotplugging memory, but virtio-mem does indeed provide another
one, and we could indeed need to manage the HPT size based on that.
Drat, so moving all the HPT resizing handling up into
pseries/hotplug-memory.c won't work.

I still think we can simplify the communication between the stuff in
the pseries hotplug code and the actual hash resizing.  In your draft
there are kind of 3 ways the information is conveyed: the mutex
suppresses HPT shrinks, pre-growing past what we need prevents HPT
grows, and the 'shrinking' flag handles some edge cases.

I suggest instead a single flag that will suppress all the current
resizes.  Not sure it technically has to be an atomic mutex, but
that's probably the obvious safe choice.  Then have a "resize up to
target" and "resize down to target" that ignore that suppression and
are no-ops if the target is in the other direction.
Then you should be able to make the path for pseries hotplugs be:

suppress other resizes

resize up to target

do the actual adds or removes

resize down to target

unsuppress other resizes


-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature