[PATCH v3] powerpc/kexec_file: use current CPU info while setting up FDT

2021-04-16 Thread Sourabh Jain
kexec_file_load uses initial_boot_params in setting up the device-tree
for the kernel to be loaded. Though initial_boot_params holds info
about CPUs at the time of boot, it doesn't account for hot added CPUs.

So, kexec'ing with kexec_file_load syscall would leave the kexec'ed
kernel with inaccurate CPU info. Also, if kdump kernel is loaded with
kexec_file_load syscall and the system crashes on a hot added CPU,
capture kernel hangs failing to identify the boot CPU.

 Kernel panic - not syncing: sysrq triggered crash
 CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54
 Call Trace:
 [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable)
 [c000e590fb00] [c0145290] panic+0x16c/0x41c
 [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40
 [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0
 [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178
 [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0
 [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330
 [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140
 [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290
 [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278
 --- interrupt: c00 at 0x7fff905b9664
 NIP:  7fff905b9664 LR: 7fff905320c4 CTR: 
 REGS: c000e590fe80 TRAP: 0c00   Not tainted  (5.12.0-rc5upstream)
 MSR:  8280f033   CR: 28000242
   XER: 
 IRQMASK: 0
 GPR00: 0004 75fedf30 7fff906a7300 0001
 GPR04: 01002a7355b0 0002 0001 75fef616
 GPR08: 0001   
 GPR12:  7fff9073a160  
 GPR16:    
 GPR20:  7fff906a4ee0 0002 0001
 GPR24: 7fff906a0898  0002 01002a7355b0
 GPR28: 0002 7fff906a1790 01002a7355b0 0002
 NIP [7fff905b9664] 0x7fff905b9664
 LR [7fff905320c4] 0x7fff905320c4
 --- interrupt: c00

To avoid this from happening, extract current CPU info from of_root
device node and use it for setting up the fdt in kexec_file_load case.

Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory 
reserve map")

Signed-off-by: Sourabh Jain 
Cc: 
---
 arch/powerpc/kexec/file_load_64.c | 98 +++
 1 file changed, 98 insertions(+)

 ---
Changelog:

v1 -> v2
  - fdt should be updated regardless of kexec type
  - updated commit message and title

v2 -> v3
  - Fixed warnings reported by patchwork

(https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210416124658.718860-1-sourabhj...@linux.ibm.com/)
 - argument aligned to open parenthesis
 - declared add_node_prop and update_cpus_node function static
 ---

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 02b9e4d0dc40..878f8297fbed 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -960,6 +960,99 @@ unsigned int kexec_fdt_totalsize_ppc64(struct kimage 
*image)
return fdt_size;
 }
 
+/**
+ * add_node_prop - Read property from device node structure and add
+ * them to fdt.
+ * @fdt:   Flattened device tree of the kernel
+ * @node_offset:   offset of the node to add a property at
+ * np: device node pointer
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int add_node_prop(void *fdt, int node_offset, const struct device_node 
*np)
+{
+   int ret = 0;
+   struct property *pp;
+   unsigned long flags;
+
+   if (!np)
+   return -EINVAL;
+
+   raw_spin_lock_irqsave(&devtree_lock, flags);
+   for (pp = np->properties; pp; pp = pp->next) {
+   ret = fdt_setprop(fdt, node_offset, pp->name,
+ pp->value, pp->length);
+   if (ret < 0) {
+   pr_err("Unable to add %s property: %s\n",
+  pp->name, fdt_strerror(ret));
+   goto out;
+   }
+   }
+out:
+   raw_spin_unlock_irqrestore(&devtree_lock, flags);
+   return ret;
+}
+
+/**
+ * update_cpus_node - Update cpus node of flattened device-tree using of_root
+ * device node.
+ * @fdt:   Flattened device tree of the kernel.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int update_cpus_node(void *fdt)
+{
+   struct device_node *cpus_node, *dn;
+   int cpus_offset, cpus_subnode_off, ret = 0;
+
+   cpus_offset = fdt_path_offset(fdt, "/cpus");
+   if (cpus_offset == -FDT_ERR_NOTFOUND || cpus_offset > 0) {
+   if (cpus_offset > 0) {
+   ret = fdt_del_n

Re: [PATCH v2] tools: do not include scripts/Kbuild.include

2021-04-16 Thread Yonghong Song




On 4/16/21 6:00 AM, Masahiro Yamada wrote:

Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to
scripts/Makefile.compiler"), some kselftests fail to build.

The tools/ directory opted out Kbuild, and went in a different
direction. They copy any kind of files to the tools/ directory
in order to do whatever they want in their world.

tools/build/Build.include mimics scripts/Kbuild.include, but some
tool Makefiles included the Kbuild one to import a feature that is
missing in tools/build/Build.include:

  - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector"
only if supported") included scripts/Kbuild.include from
tools/thermal/tmon/Makefile to import the cc-option macro.

  - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do
not support -no-pie") included scripts/Kbuild.include from
tools/testing/selftests/kvm/Makefile to import the try-run macro.

  - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang
failures") included scripts/Kbuild.include from
tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR
target.

  - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for
unrecognized option") included scripts/Kbuild.include from
tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the
try-run macro.

Copy what they need into tools/build/Build.include, and make them
include it instead of scripts/Kbuild.include.

Link: 
https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/
Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to 
scripts/Makefile.compiler")
Reported-by: Janosch Frank 
Reported-by: Christian Borntraeger 
Signed-off-by: Masahiro Yamada 


LGTM although I see some tools Makefile directly added 
".DELETE_ON_ERROR:" in their Makefile.


Acked-by: Yonghong Song 


Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-16 Thread Matthew Wilcox
On Fri, Apr 16, 2021 at 07:08:23PM +0200, Jesper Dangaard Brouer wrote:
> On Fri, 16 Apr 2021 16:27:55 +0100
> Matthew Wilcox  wrote:
> 
> > On Thu, Apr 15, 2021 at 08:08:32PM +0200, Jesper Dangaard Brouer wrote:
> > > See below patch.  Where I swap32 the dma address to satisfy
> > > page->compound having bit zero cleared. (It is the simplest fix I could
> > > come up with).  
> > 
> > I think this is slightly simpler, and as a bonus code that assumes the
> > old layout won't compile.
> 
> This is clever, I like it!  When reading the code one just have to
> remember 'unsigned long' size difference between 64-bit vs 32-bit.
> And I assume compiler can optimize the sizeof check out then doable.

I checked before/after with the replacement patch that doesn't
have compiler warnings.  On x86, there is zero codegen difference
(objdump -dr before/after matches exactly) for both x86-32 with 32-bit
dma_addr_t and x86-64.  For x86-32 with 64-bit dma_addr_t, the compiler
makes some different inlining decisions in page_pool_empty_ring(),
page_pool_put_page() and page_pool_put_page_bulk(), but it's not clear
to me that they're wrong.

Function old new   delta
page_pool_empty_ring 387 307 -80
page_pool_put_page   604 516 -88
page_pool_put_page_bulk  690 517-173



Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-16 Thread Matthew Wilcox


Replacement patch to fix compiler warning.

From: "Matthew Wilcox (Oracle)" 
Date: Fri, 16 Apr 2021 16:34:55 -0400
Subject: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
To: bro...@redhat.com
Cc: linux-ker...@vger.kernel.org,
linux...@kvack.org,
net...@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org,
linux-arm-ker...@lists.infradead.org,
linux-m...@vger.kernel.org,
ilias.apalodi...@linaro.org,
mcr...@linux.microsoft.com,
grygorii.stras...@ti.com,
a...@kernel.org,
h...@lst.de,
linux-snps-...@lists.infradead.org,
mho...@kernel.org,
mgor...@suse.de

32-bit architectures which expect 8-byte alignment for 8-byte integers
and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct
page inadvertently expanded in 2019.  When the dma_addr_t was added,
it forced the alignment of the union to 8 bytes, which inserted a 4 byte
gap between 'flags' and the union.

Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
This restores the alignment to that of an unsigned long, and also fixes a
potential problem where (on a big endian platform), the bit used to denote
PageTail could inadvertently get set, and a racing get_user_pages_fast()
could dereference a bogus compound_head().

Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/mm_types.h |  4 ++--
 include/net/page_pool.h  | 12 +++-
 net/core/page_pool.c | 12 +++-
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6613b26a8894..5aacc1c10a45 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -97,10 +97,10 @@ struct page {
};
struct {/* page_pool used by netstack */
/**
-* @dma_addr: might require a 64-bit value even on
+* @dma_addr: might require a 64-bit value on
 * 32-bit architectures.
 */
-   dma_addr_t dma_addr;
+   unsigned long dma_addr[2];
};
struct {/* slab, slob and slub */
union {
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index b5b195305346..ad6154dc206c 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct 
page_pool *pool,
 
 static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
 {
-   return page->dma_addr;
+   dma_addr_t ret = page->dma_addr[0];
+   if (sizeof(dma_addr_t) > sizeof(unsigned long))
+   ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16;
+   return ret;
+}
+
+static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
+{
+   page->dma_addr[0] = addr;
+   if (sizeof(dma_addr_t) > sizeof(unsigned long))
+   page->dma_addr[1] = addr >> 16 >> 16;
 }
 
 static inline bool is_page_pool_compiled_in(void)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index ad8b0707af04..f014fd8c19a6 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct page_pool 
*pool,
  struct page *page,
  unsigned int dma_sync_size)
 {
+   dma_addr_t dma_addr = page_pool_get_dma_addr(page);
+
dma_sync_size = min(dma_sync_size, pool->p.max_len);
-   dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
+   dma_sync_single_range_for_device(pool->p.dev, dma_addr,
 pool->p.offset, dma_sync_size,
 pool->p.dma_dir);
 }
@@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct 
page_pool *pool,
put_page(page);
return NULL;
}
-   page->dma_addr = dma;
+   page_pool_set_dma_addr(page, dma);
 
if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
@@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, 
struct page *page)
 */
goto skip_dma_unmap;
 
-   dma = page->dma_addr;
+   dma = page_pool_get_dma_addr(page);
 
-   /* When page is unmapped, it cannot be returned our pool */
+   /* When page is unmapped, it cannot be returned to our pool */
dma_unmap_page_attrs(pool->p.dev, dma,
 PAGE_SIZE << pool->p.order, pool->p.dma_dir,
 DMA_ATTR_SKIP_CPU_SYNC);
-   page->dma_addr = 0;
+   page_pool_set_dma_addr(page, 0);
 skip_dma_unmap:
/* This may be the last page returned, releasing the pool, so
 * it is not safe to reference pool aft

Re: [PATCH v13 14/14] powerpc/64s/radix: Enable huge vmalloc mappings

2021-04-16 Thread Nicholas Piggin
Excerpts from Andrew Morton's message of April 16, 2021 4:55 am:
> On Thu, 15 Apr 2021 12:23:55 +0200 Christophe Leroy 
>  wrote:
>> > +   * is done. STRICT_MODULE_RWX may require extra work to support this
>> > +   * too.
>> > +   */
>> >   
>> > -  return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, 
>> > GFP_KERNEL,
>> > -  PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS, 
>> > NUMA_NO_NODE,
>> 
>> 
>> I think you should add the following in 
>> 
>> #ifndef MODULES_VADDR
>> #define MODULES_VADDR VMALLOC_START
>> #define MODULES_END VMALLOC_END
>> #endif
>> 
>> And leave module_alloc() as is (just removing the enclosing #ifdef 
>> MODULES_VADDR and adding the 
>> VM_NO_HUGE_VMAP  flag)
>> 
>> This would minimise the conflits with the changes I did in powerpc/next 
>> reported by Stephen R.
>> 
> 
> I'll drop powerpc-64s-radix-enable-huge-vmalloc-mappings.patch for now,
> make life simpler.

Yeah that's fine.

> Nick, a redo on top of Christophe's changes in linux-next would be best
> please.

Will do.

Thanks,
Nick


Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-16 Thread kernel test robot
Hi "Matthew,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v5.12-rc7]
[cannot apply to hnaz-linux-mm/master next-20210416]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Matthew-Wilcox-Oracle/Change-struct-page-layout-for-page_pool/20210417-070951
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
5e46d1b78a03d52306f21f77a4e4a144b6d31486
config: parisc-randconfig-s031-20210416 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-280-g2cd6d34e-dirty
# 
https://github.com/0day-ci/linux/commit/898e155048088be20b2606575a24108eacc4c91b
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Matthew-Wilcox-Oracle/Change-struct-page-layout-for-page_pool/20210417-070951
git checkout 898e155048088be20b2606575a24108eacc4c91b
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 
CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from net/core/xdp.c:15:
   include/net/page_pool.h: In function 'page_pool_get_dma_addr':
>> include/net/page_pool.h:203:40: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
 203 |   ret |= (dma_addr_t)page->dma_addr[1] << 32;
 |^~
   include/net/page_pool.h: In function 'page_pool_set_dma_addr':
>> include/net/page_pool.h:211:28: warning: right shift count >= width of type 
>> [-Wshift-count-overflow]
 211 |   page->dma_addr[1] = addr >> 32;
 |^~


vim +203 include/net/page_pool.h

   198  
   199  static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
   200  {
   201  dma_addr_t ret = page->dma_addr[0];
   202  if (sizeof(dma_addr_t) > sizeof(unsigned long))
 > 203  ret |= (dma_addr_t)page->dma_addr[1] << 32;
   204  return ret;
   205  }
   206  
   207  static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t 
addr)
   208  {
   209  page->dma_addr[0] = addr;
   210  if (sizeof(dma_addr_t) > sizeof(unsigned long))
 > 211  page->dma_addr[1] = addr >> 32;
   212  }
   213  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH v3] powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h

2021-04-16 Thread Tony Ambardar
On Fri, 16 Apr 2021 at 03:41, Michael Ellerman  wrote:
>
> Tony Ambardar  writes:
> > Hello Michael,
> >
> > The latest version of this patch addressed all feedback I'm aware of
> > when submitted last September, and I've seen no further comments from
> > reviewers since then.
> >
> > Could you please let me know where this stands and if anything further
> > is needed?
>
> Sorry, it's still sitting in my inbox :/
>
> I was going to reply to suggest we split the tools change out. The
> headers under tools are usually updated by another maintainer, I think
> it might even be scripted.
>
> Anyway I've applied your patch and done that (dropped the change to
> tools/.../errno.h), which should also mean the stable backport is more
> likely to work automatically.
>
> It will hit mainline in v5.13-rc1 and then be backported to the stable
> trees.
>
> I don't think you actually need the tools version of the header updated
> to fix your bug? In which case we can probably just wait for it to be
> updated automatically when the tools headers are sync'ed with the kernel
> versions.
>
> cheers

I appreciate the follow up. My original bug was indeed with the tools
header but is being patched locally, so waiting for those headers to
sync with the kernel versions is fine if it simplifies things overall.

Thanks,
Tony


Re: [PATCH net-next v4 2/2] of: net: fix of_get_mac_addr_nvmem() for non-platform devices

2021-04-16 Thread Rob Herring
On Fri, Apr 16, 2021 at 2:30 AM Michael Walle  wrote:
>
> Am 2021-04-16 05:24, schrieb Benjamin Herrenschmidt:
> > On Mon, 2021-04-12 at 19:47 +0200, Michael Walle wrote:
> >>
> >>  /**
> >>   * of_get_phy_mode - Get phy mode for given device_node
> >> @@ -59,15 +60,39 @@ static int of_get_mac_addr(struct device_node *np,
> >> const char *name, u8 *addr)
> >>  static int of_get_mac_addr_nvmem(struct device_node *np, u8 *addr)
> >>  {
> >> struct platform_device *pdev = of_find_device_by_node(np);
> >> +   struct nvmem_cell *cell;
> >> +   const void *mac;
> >> +   size_t len;
> >> int ret;
> >>
> >> -   if (!pdev)
> >> -   return -ENODEV;
> >> +   /* Try lookup by device first, there might be a
> >> nvmem_cell_lookup
> >> +* associated with a given device.
> >> +*/
> >> +   if (pdev) {
> >> +   ret = nvmem_get_mac_address(&pdev->dev, addr);
> >> +   put_device(&pdev->dev);
> >> +   return ret;
> >> +   }
> >> +
> >
> > This smells like the wrong band aid :)
> >
> > Any struct device can contain an OF node pointer these days.
>
> But not all nodes might have an associated device, see DSA for example.

I believe what Ben is saying and what I said earlier is going from dev
-> OF node is right and OF node -> dev is wrong. If you only have an
OF node, then use an of_* function.

> And as the name suggests of_get_mac_address() operates on a node. So
> if a driver calls of_get_mac_address() it should work on the node. What
> is wrong IMHO, is that the ethernet drivers where the corresponding
> board
> has a nvmem_cell_lookup registered is calling of_get_mac_address(node).
> It should rather call eth_get_mac_address(dev) in the first place.
>
> One would need to figure out if there is an actual device (with an
> assiciated of_node), then call eth_get_mac_address(dev) and if there
> isn't a device call of_get_mac_address(node).

Yes, I think we're all in agreement.

> But I don't know if that is easy to figure out. Well, one could start
> with just the device where nvmem_cell_lookup is used. Then we could
> drop the workaround above.

Start with the ones just passing dev.of_node directly:

$ git grep 'of_get_mac_address(.*of_node)'
drivers/net/ethernet/aeroflex/greth.c:  addr =
of_get_mac_address(ofdev->dev.of_node);
drivers/net/ethernet/altera/altera_tse_main.c:  macaddr =
of_get_mac_address(pdev->dev.of_node);
drivers/net/ethernet/arc/emac_main.c:   mac_addr =
of_get_mac_address(dev->of_node);
drivers/net/ethernet/broadcom/bgmac-bcma.c: mac =
of_get_mac_address(bgmac->dev->of_node);
drivers/net/ethernet/cavium/octeon/octeon_mgmt.c:   mac =
of_get_mac_address(pdev->dev.of_node);
drivers/net/ethernet/ethoc.c:   mac =
of_get_mac_address(pdev->dev.of_node);
drivers/net/ethernet/ezchip/nps_enet.c: mac_addr =
of_get_mac_address(dev->of_node);
drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c:  mac_addr =
of_get_mac_address(ofdev->dev.of_node);
drivers/net/ethernet/marvell/pxa168_eth.c:  mac_addr =
of_get_mac_address(pdev->dev.of_node);
drivers/net/ethernet/marvell/sky2.c:iap =
of_get_mac_address(hw->pdev->dev.of_node);
drivers/net/ethernet/mediatek/mtk_eth_soc.c:mac_addr =
of_get_mac_address(mac->of_node);
drivers/net/ethernet/microchip/lan743x_main.c:  mac_addr =
of_get_mac_address(pdev->dev.of_node);
drivers/net/ethernet/qualcomm/qca_spi.c:mac =
of_get_mac_address(spi->dev.of_node);
drivers/net/ethernet/qualcomm/qca_uart.c:   mac =
of_get_mac_address(serdev->dev.of_node);
drivers/net/ethernet/wiznet/w5100-spi.c:const void *mac =
of_get_mac_address(spi->dev.of_node);
drivers/net/ethernet/xilinx/xilinx_axienet_main.c:  mac_addr =
of_get_mac_address(pdev->dev.of_node);
drivers/net/ethernet/xilinx/xilinx_emaclite.c:  mac_address =
of_get_mac_address(ofdev->dev.of_node);
drivers/net/wireless/ralink/rt2x00/rt2x00dev.c: mac_addr =
of_get_mac_address(rt2x00dev->dev->of_node);
drivers/staging/octeon/ethernet.c:  mac =
of_get_mac_address(priv->of_node);
drivers/staging/wfx/main.c: macaddr =
of_get_mac_address(wdev->dev->of_node);
net/ethernet/eth.c: addr = of_get_mac_address(dev->of_node);

Then this will find most of the rest:
git grep -W 'of_get_mac_address([a-z]*)'| grep -E '(node|np)'

Rob


Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode

2021-04-16 Thread Jianlin Lv
On Thu, Apr 15, 2021 at 10:38 PM Daniel Borkmann  wrote:
>
> On 4/15/21 11:32 AM, Jianlin Lv wrote:
> > For debugging JITs, dumping the JITed image to kernel log is discouraged,
> > "bpftool prog dump jited" is much better way to examine JITed dumps.
> > This patch get rid of the code related to bpf_jit_enable=2 mode and
> > update the proc handler of bpf_jit_enable, also added auxiliary
> > information to explain how to use bpf_jit_disasm tool after this change.
> >
> > Signed-off-by: Jianlin Lv 
> [...]
> > diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
> > index 0a7a2870f111..8d36b4658076 100644
> > --- a/arch/x86/net/bpf_jit_comp32.c
> > +++ b/arch/x86/net/bpf_jit_comp32.c
> > @@ -2566,9 +2566,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
> > *prog)
> >   cond_resched();
> >   }
> >
> > - if (bpf_jit_enable > 1)
> > - bpf_jit_dump(prog->len, proglen, pass + 1, image);
> > -
> >   if (image) {
> >   bpf_jit_binary_lock_ro(header);
> >   prog->bpf_func = (void *)image;
> > diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> > index c8496c1142c9..990b1720c7a4 100644
> > --- a/net/core/sysctl_net_core.c
> > +++ b/net/core/sysctl_net_core.c
> > @@ -273,16 +273,8 @@ static int proc_dointvec_minmax_bpf_enable(struct 
> > ctl_table *table, int write,
> >
> >   tmp.data = &jit_enable;
> >   ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
> > - if (write && !ret) {
> > - if (jit_enable < 2 ||
> > - (jit_enable == 2 && bpf_dump_raw_ok(current_cred( {
> > - *(int *)table->data = jit_enable;
> > - if (jit_enable == 2)
> > - pr_warn("bpf_jit_enable = 2 was set! NEVER 
> > use this in production, only for JIT debugging!\n");
> > - } else {
> > - ret = -EPERM;
> > - }
> > - }
> > + if (write && !ret)
> > + *(int *)table->data = jit_enable;
> >   return ret;
> >   }
> >
> > @@ -389,7 +381,7 @@ static struct ctl_table net_core_table[] = {
> >   .extra2 = SYSCTL_ONE,
> >   # else
> >   .extra1 = SYSCTL_ZERO,
> > - .extra2 = &two,
> > + .extra2 = SYSCTL_ONE,
> >   # endif
> >   },
> >   # ifdef CONFIG_HAVE_EBPF_JIT
> > diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
> > index c8ae95804728..efa4b17ae016 100644
> > --- a/tools/bpf/bpf_jit_disasm.c
> > +++ b/tools/bpf/bpf_jit_disasm.c
> > @@ -7,7 +7,7 @@
> >*
> >* To get the disassembly of the JIT code, do the following:
> >*
> > - *  1) `echo 2 > /proc/sys/net/core/bpf_jit_enable`
> > + *  1) Insert bpf_jit_dump() and recompile the kernel to output JITed 
> > image into log
>
> Hmm, if we remove bpf_jit_dump(), the next drive-by cleanup patch will be 
> thrown
> at bpf@vger stating that bpf_jit_dump() has no in-tree users and should be 
> removed.
> Maybe we should be removing bpf_jit_disasm.c along with it as well as 
> bpf_jit_dump()
> itself ... I guess if it's ever needed in those rare occasions for JIT 
> debugging we
> can resurrect it from old kernels just locally. But yeah, bpftool's jit dump 
> should
> suffice for vast majority of use cases.
>
> There was a recent set for ppc32 jit which was merged into ppc tree which 
> will create
> a merge conflict with this one [0]. So we would need a rebase and take it 
> maybe during
> merge win once the ppc32 landed..
>
>[0] 
> https://lore.kernel.org/bpf/cover.1616430991.git.christophe.le...@csgroup.eu/
>
> >*  2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host 
> > 192.168.20.0/24`)
> >*  3) Run e.g. `bpf_jit_disasm -o` to read out the last JIT code
> >*
> > diff --git a/tools/bpf/bpftool/feature.c b/tools/bpf/bpftool/feature.c
> > index 40a88df275f9..98c7eec2923f 100644
> > --- a/tools/bpf/bpftool/feature.c
> > +++ b/tools/bpf/bpftool/feature.c
> > @@ -203,9 +203,6 @@ static void probe_jit_enable(void)
> >   case 1:
> >   printf("JIT compiler is enabled\n");
> >   break;
> > - case 2:
> > - printf("JIT compiler is enabled with debugging traces 
> > in kernel logs\n");
> > - break;
>
> This would still need to be there for older kernels ...

I will submit another version after ppc32 landed to remove
bpf_jit_disasm.c and restore bpftool/feature.c

Jianlin


>
> >   case -1:
> >   printf("Unable to retrieve JIT-compiler status\n");
> >   break;
> >
>


Re: [PATCH net-next 0/2] net: gianfar: Drop GFAR_MQ_POLLING support

2021-04-16 Thread patchwork-bot+netdevbpf
Hello:

This series was applied to netdev/net-next.git (refs/heads/master):

On Fri, 16 Apr 2021 20:11:21 +0300 you wrote:
> Drop long time obsolete "per NAPI multi-queue" support in gianfar,
> and related (and undocumented) device tree properties.
> 
> Claudiu Manoil (2):
>   gianfar: Drop GFAR_MQ_POLLING support
>   powerpc: dts: fsl: Drop obsolete fsl,rx-bit-map and fsl,tx-bit-map
> properties
> 
> [...]

Here is the summary with links:
  - [net-next,1/2] gianfar: Drop GFAR_MQ_POLLING support
https://git.kernel.org/netdev/net-next/c/8eda54c5e6c4
  - [net-next,2/2] powerpc: dts: fsl: Drop obsolete fsl,rx-bit-map and 
fsl,tx-bit-map properties
https://git.kernel.org/netdev/net-next/c/221e8c126b78

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




[PATCH 2/2] mm: Indicate pfmemalloc pages in compound_head

2021-04-16 Thread Matthew Wilcox (Oracle)
The net page_pool wants to use a magic value to identify page pool pages.
The best place to put it is in the first word where it can be clearly a
non-pointer value.  That means shifting dma_addr up to alias with ->index,
which means we need to find another way to indicate page_is_pfmemalloc().
Since page_pool doesn't want to set its magic value on pages which are
pfmemalloc, we can use bit 1 of compound_head to indicate that the page
came from the memory reserves.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/mm.h   | 12 +++-
 include/linux/mm_types.h |  7 +++
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8ba434287387..44eab3f6d5ae 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1629,10 +1629,12 @@ struct address_space *page_mapping_file(struct page 
*page);
 static inline bool page_is_pfmemalloc(const struct page *page)
 {
/*
-* Page index cannot be this large so this must be
-* a pfmemalloc page.
+* This is not a tail page; compound_head of a head page is unused
+* at return from the page allocator, and will be overwritten
+* by callers who do not care whether the page came from the
+* reserves.
 */
-   return page->index == -1UL;
+   return page->compound_head & 2;
 }
 
 /*
@@ -1641,12 +1643,12 @@ static inline bool page_is_pfmemalloc(const struct page 
*page)
  */
 static inline void set_page_pfmemalloc(struct page *page)
 {
-   page->index = -1UL;
+   page->compound_head = 2;
 }
 
 static inline void clear_page_pfmemalloc(struct page *page)
 {
-   page->index = 0;
+   page->compound_head = 0;
 }
 
 /*
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5aacc1c10a45..39f7163dcace 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -96,10 +96,9 @@ struct page {
unsigned long private;
};
struct {/* page_pool used by netstack */
-   /**
-* @dma_addr: might require a 64-bit value on
-* 32-bit architectures.
-*/
+   unsigned long pp_magic;
+   unsigned long xmi;
+   unsigned long _pp_mapping_pad;
unsigned long dma_addr[2];
};
struct {/* slab, slob and slub */
-- 
2.30.2



[PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-16 Thread Matthew Wilcox (Oracle)
32-bit architectures which expect 8-byte alignment for 8-byte integers
and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct
page inadvertently expanded in 2019.  When the dma_addr_t was added,
it forced the alignment of the union to 8 bytes, which inserted a 4 byte
gap between 'flags' and the union.

Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
This restores the alignment to that of an unsigned long, and also fixes a
potential problem where (on a big endian platform), the bit used to denote
PageTail could inadvertently get set, and a racing get_user_pages_fast()
could dereference a bogus compound_head().

Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/mm_types.h |  4 ++--
 include/net/page_pool.h  | 12 +++-
 net/core/page_pool.c | 12 +++-
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6613b26a8894..5aacc1c10a45 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -97,10 +97,10 @@ struct page {
};
struct {/* page_pool used by netstack */
/**
-* @dma_addr: might require a 64-bit value even on
+* @dma_addr: might require a 64-bit value on
 * 32-bit architectures.
 */
-   dma_addr_t dma_addr;
+   unsigned long dma_addr[2];
};
struct {/* slab, slob and slub */
union {
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index b5b195305346..db7c7020746a 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct 
page_pool *pool,
 
 static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
 {
-   return page->dma_addr;
+   dma_addr_t ret = page->dma_addr[0];
+   if (sizeof(dma_addr_t) > sizeof(unsigned long))
+   ret |= (dma_addr_t)page->dma_addr[1] << 32;
+   return ret;
+}
+
+static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
+{
+   page->dma_addr[0] = addr;
+   if (sizeof(dma_addr_t) > sizeof(unsigned long))
+   page->dma_addr[1] = addr >> 32;
 }
 
 static inline bool is_page_pool_compiled_in(void)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index ad8b0707af04..f014fd8c19a6 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct page_pool 
*pool,
  struct page *page,
  unsigned int dma_sync_size)
 {
+   dma_addr_t dma_addr = page_pool_get_dma_addr(page);
+
dma_sync_size = min(dma_sync_size, pool->p.max_len);
-   dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
+   dma_sync_single_range_for_device(pool->p.dev, dma_addr,
 pool->p.offset, dma_sync_size,
 pool->p.dma_dir);
 }
@@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct 
page_pool *pool,
put_page(page);
return NULL;
}
-   page->dma_addr = dma;
+   page_pool_set_dma_addr(page, dma);
 
if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
@@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, 
struct page *page)
 */
goto skip_dma_unmap;
 
-   dma = page->dma_addr;
+   dma = page_pool_get_dma_addr(page);
 
-   /* When page is unmapped, it cannot be returned our pool */
+   /* When page is unmapped, it cannot be returned to our pool */
dma_unmap_page_attrs(pool->p.dev, dma,
 PAGE_SIZE << pool->p.order, pool->p.dma_dir,
 DMA_ATTR_SKIP_CPU_SYNC);
-   page->dma_addr = 0;
+   page_pool_set_dma_addr(page, 0);
 skip_dma_unmap:
/* This may be the last page returned, releasing the pool, so
 * it is not safe to reference pool afterwards.
-- 
2.30.2



[PATCH 0/2] Change struct page layout for page_pool

2021-04-16 Thread Matthew Wilcox (Oracle)
The first patch here fixes two bugs on ppc32, and mips32.  It fixes one
bug on arc and arm32 (in certain configurations).  It probably makes
sense to get it in ASAP through the networking tree.  I'd like to see
testing on those four architectures if possible?

The second patch enables new functionality.  It is much less urgent.
I'd really like to see Mel & Michal's thoughts on it.

I have only compile-tested these patches.

Matthew Wilcox (Oracle) (2):
  mm: Fix struct page layout on 32-bit systems
  mm: Indicate pfmemalloc pages in compound_head

 include/linux/mm.h   | 12 +++-
 include/linux/mm_types.h |  9 -
 include/net/page_pool.h  | 12 +++-
 net/core/page_pool.c | 12 +++-
 4 files changed, 29 insertions(+), 16 deletions(-)

-- 
2.30.2



Re: [PATCH 1/1] of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses

2021-04-16 Thread Leonardo Bras
Hello Rob, thanks for this feedback!

On Thu, 2021-04-15 at 13:59 -0500, Rob Herring wrote:
> +PPC and PCI lists
> 
> On Thu, Apr 15, 2021 at 1:01 PM Leonardo Bras  wrote:
> > 
> > Many other resource flag parsers already add this flag when the input
> > has bits 24 & 25 set, so update this one to do the same.
> 
> Many others? Looks like sparc and powerpc to me. 
> 

s390 also does that, but it look like it comes from a device-tree.

> Those would be the
> ones I worry about breaking. Sparc doesn't use of/address.c so it's
> fine. Powerpc version of the flags code was only fixed in 2019, so I
> don't think powerpc will care either.

In powerpc I reach this function with this stack, while configuring a
virtio-net device for a qemu/KVM pseries guest:

pci_process_bridge_OF_ranges+0xac/0x2d4
pSeries_discover_phbs+0xc4/0x158
discover_phbs+0x40/0x60
do_one_initcall+0x60/0x2d0
kernel_init_freeable+0x308/0x3a8
kernel_init+0x2c/0x168
ret_from_kernel_thread+0x5c/0x70

For this, both MMIO32 and MMIO64 resources will have flags 0x200.

> 
> I noticed both sparc and powerpc set PCI_BASE_ADDRESS_MEM_TYPE_64 in
> the flags. AFAICT, that's not set anywhere outside of arch code. So
> never for riscv, arm and arm64 at least. That leads me to
> pci_std_update_resource() which is where the PCI code sets BARs and
> just copies the flags in PCI_BASE_ADDRESS_MEM_MASK ignoring
> IORESOURCE_* flags. So it seems like 64-bit is still not handled and
> neither is prefetch.
> 

I am not sure if you mean here:
a) it's ok to add IORESOURCE_MEM_64 here, because it does not affect
anything else, or
b) it should be using PCI_BASE_ADDRESS_MEM_TYPE_64 
(or IORESOURCE_MEM_64 | PCI_BASE_ADDRESS_MEM_TYPE_64) instead, since
it's how it's added in powerpc/sparc, and else there is no point.

Again, thanks for helping!

Best regards,
Leonardo Bras



[PATCH 2/2] hotplug-cpu.c: set UNISOLATE on dlpar_cpu_remove() failure

2021-04-16 Thread Daniel Henrique Barboza
The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else
for both QEMU and phyp. This gives us an opportunity to use this
behavior to signal the hypervisor layer when an error during device
removal happens, allowing it to do a proper error handling, while not
breaking QEMU/phyp implementations that don't have this support.

This patch introduces this idea by unisolating all CPU DRCs that failed
to be removed by dlpar_cpu_remove_by_index(), when handling the
PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event
only because its the only CPU removal event QEMU uses, and there's no
need at this moment to add this mechanism for phyp only code.

Signed-off-by: Daniel Henrique Barboza 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 12cbffd3c2e3..ed66895c2f51 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -802,8 +802,15 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
case PSERIES_HP_ELOG_ACTION_REMOVE:
if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
rc = dlpar_cpu_remove_by_count(count);
-   else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
+   else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) {
rc = dlpar_cpu_remove_by_index(drc_index);
+   /* Setting the isolation state of an 
UNISOLATED/CONFIGURED
+* device to UNISOLATE is a no-op, but the hypervison 
can
+* use it as a hint that the cpu removal failed.
+*/
+   if (rc)
+   dlpar_unisolate_drc(drc_index);
+   }
else
rc = -EINVAL;
break;
-- 
2.30.2



[PATCH 1/2] dlpar.c: introduce dlpar_unisolate_drc()

2021-04-16 Thread Daniel Henrique Barboza
Next patch will execute a set-indicator call in hotplug-cpu.c.

Create a dlpar_unisolate_drc() helper to avoid spreading more
rtas_set_indicator() calls outside of dlpar.c.

Signed-off-by: Daniel Henrique Barboza 
---
 arch/powerpc/platforms/pseries/dlpar.c   | 14 ++
 arch/powerpc/platforms/pseries/pseries.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index 233503fcf8f0..3ac70790ec7a 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -329,6 +329,20 @@ int dlpar_release_drc(u32 drc_index)
return 0;
 }
 
+int dlpar_unisolate_drc(u32 drc_index)
+{
+   int dr_status, rc;
+
+   rc = rtas_call(rtas_token("get-sensor-state"), 2, 2, &dr_status,
+   DR_ENTITY_SENSE, drc_index);
+   if (rc || dr_status != DR_ENTITY_PRESENT)
+   return -1;
+
+   rtas_set_indicator(ISOLATION_STATE, drc_index, UNISOLATE);
+
+   return 0;
+}
+
 int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_elog)
 {
int rc;
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index 4fe48c04c6c2..4ea12037c920 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -55,6 +55,7 @@ extern int dlpar_attach_node(struct device_node *, struct 
device_node *);
 extern int dlpar_detach_node(struct device_node *);
 extern int dlpar_acquire_drc(u32 drc_index);
 extern int dlpar_release_drc(u32 drc_index);
+extern int dlpar_unisolate_drc(u32 drc_index);
 
 void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog);
 int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_errlog);
-- 
2.30.2



[PATCH 0/2] pseries: UNISOLATE DRCs to signal device removal error

2021-04-16 Thread Daniel Henrique Barboza
At this moment, PAPR [1] does not have a way to report errors during a device
removal operation. This puts a strain in the hypervisor, which needs extra
mechanisms to try to fallback and recover from an error that might have
happened during the removal. The QEMU community has dealt with it during these
years by either trying to preempt the error before sending the HP event or, in
case of a guest side failure, reboot the guest to complete the removal process.

This started to change with QEMU commit fe1831eff8a4 ("spapr_drc.c: use DRC
reconfiguration to cleanup DIMM unplug state"), where a way to fallback from a
memory removal error was introduced. In this case, when QEMU detects that the
kernel is reconfiguring LMBs DRCs that were marked as pending removal, the
entire process is reverted from the QEMU side as well. Around the same time,
other discussions in the QEMU mailing discussed an alternative for other device
as well.

In [2] the idea of using RTAS set-indicator for this role was first introduced.
The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is already
UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else for both QEMU
and phyp. This gives us an opportunity to use this behavior to signal the
hypervisor layer when a device removal happens, allowing it to do a proper
error handling knowing for sure that the removal failed in the kernel. Using
set-indicator to report HP errors isn't strange to PAPR, as per R1-13.5.3.4-4.
of table 13.7 of [1]:

"For all DR options: If this is a DR operation that involves the user insert-
ing a DR entity, then if the firmware can determine that the inserted entity
would cause a system disturbance, then the set-indicator RTAS call must not
unisolate the entity and must return an error status which is unique to the
particular error."

PAPR does not make any restrictions or considerations about setting an already
Unisolated/Configured DRC to 'unisolate', meaning we have a chance to use it
for this purpose - signal an OS side error when attempting to remove a DR
entity.  To validate the design, this is being implemented only for CPUs.

QEMU will use this mechanism to rollback the device removal (hotunplug) state,
allowing for a better error handling mechanism. A implementation of how QEMU
can do it is in [3]. When using a kernel with this series applied, together
with this QEMU build, this is what happens in a common CPU removal/hotunplug
error scenario (trying to remove the last online CPU):

( QEMU command line: qemu-system-ppc64 -machine pseries,accel=kvm,usb=off
-smp 1,maxcpus=2,threads=1,cores=2,sockets=1 ... )

[root@localhost ~]# QEMU 5.2.92 monitor - type 'help' for more information
(qemu) device_add host-spapr-cpu-core,core-id=1,id=core1
(qemu) 

[root@localhost ~]# echo 0 > /sys/devices/system/cpu/cpu0/online
[   77.548442][   T13] IRQ 19: no longer affine to CPU0
[   77.548452][   T13] IRQ 20: no longer affine to CPU0
[   77.548458][   T13] IRQ 256: no longer affine to CPU0
[   77.548465][   T13] IRQ 258: no longer affine to CPU0
[   77.548472][   T13] IRQ 259: no longer affine to CPU0
[   77.548479][   T13] IRQ 260: no longer affine to CPU0
[   77.548485][   T13] IRQ 261: no longer affine to CPU0
[   77.548590][T0] cpu 0 (hwid 0) Ready to die...
[root@localhost ~]# (qemu) 
(qemu) device_del core1
(qemu) [   83.214073][  T100] pseries-hotplug-cpu: Failed to offline CPU 
PowerPC,POWER9, rc: -16
qemu-system-ppc64: Device hotunplug rejected by the guest for device core1

(qemu) 

As soon as the CPU removal fails in dlpar_cpu(), QEMU becames aware of
it and is able to do error recovery.

If this solution is well received, I'll push for an architecture change
request internally at IBM to make this mechanism PAPR official.


[1] 
https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200611.pdf
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06395.html
[3] https://github.com/danielhb/qemu/tree/unisolate_drc_callback_v1

Daniel Henrique Barboza (2):
  dlpar.c: introduce dlpar_unisolate_drc()
  hotplug-cpu.c: set UNISOLATE on dlpar_cpu_remove() failure

 arch/powerpc/platforms/pseries/dlpar.c   | 14 ++
 arch/powerpc/platforms/pseries/hotplug-cpu.c |  9 -
 arch/powerpc/platforms/pseries/pseries.h |  1 +
 3 files changed, 23 insertions(+), 1 deletion(-)

-- 
2.30.2



Re: [PATCH 1/1] of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses

2021-04-16 Thread Leonardo Bras
Hello Rob, thanks for this feedback!

On Thu, 2021-04-15 at 13:59 -0500, Rob Herring wrote:
> +PPC and PCI lists
> 
> On Thu, Apr 15, 2021 at 1:01 PM Leonardo Bras  wrote:
> > 
> > Many other resource flag parsers already add this flag when the input
> > has bits 24 & 25 set, so update this one to do the same.
> 
> Many others? Looks like sparc and powerpc to me. 
> 

s390 also does that, but it look like it comes from a device-tree.

> Those would be the
> ones I worry about breaking. Sparc doesn't use of/address.c so it's
> fine. Powerpc version of the flags code was only fixed in 2019, so I
> don't think powerpc will care either.

In powerpc I reach this function with this stack, while configuring a
virtio-net device for a qemu/KVM pseries guest:

pci_process_bridge_OF_ranges+0xac/0x2d4
pSeries_discover_phbs+0xc4/0x158
discover_phbs+0x40/0x60
do_one_initcall+0x60/0x2d0
kernel_init_freeable+0x308/0x3a8
kernel_init+0x2c/0x168
ret_from_kernel_thread+0x5c/0x70

For this, both MMIO32 and MMIO64 resources will have flags 0x200.

> 
> I noticed both sparc and powerpc set PCI_BASE_ADDRESS_MEM_TYPE_64 in
> the flags. AFAICT, that's not set anywhere outside of arch code. So
> never for riscv, arm and arm64 at least. That leads me to
> pci_std_update_resource() which is where the PCI code sets BARs and
> just copies the flags in PCI_BASE_ADDRESS_MEM_MASK ignoring
> IORESOURCE_* flags. So it seems like 64-bit is still not handled and
> neither is prefetch.
> 

I am not sure if you mean here:
a) it's ok to add IORESOURCE_MEM_64 here, because it does not affect
anything else, or
b) it should be using PCI_BASE_ADDRESS_MEM_TYPE_64 
(or IORESOURCE_MEM_64 | PCI_BASE_ADDRESS_MEM_TYPE_64) instead, since
it's how it's added in powerpc/sparc, and else there is no point.

Again, thanks for helping!

Best regards,
Leonardo Bras



Re: [PATCH] powerpc/pseries: extract host bridge from pci_bus prior to bus removal

2021-04-16 Thread Tyrel Datwyler
On 4/16/21 12:15 AM, Daniel Axtens wrote:
> Hi Tyrel,
> 
>> The pci_bus->bridge reference may no longer be valid after
>> pci_bus_remove() resulting in passing a bad value to device_unregister()
>> for the associated bridge device.
>>
>> Store the host_bridge reference in a separate variable prior to
>> pci_bus_remove().
>>
> The patch certainly seems to do what you say. I'm not really up on the
> innards of PCI, so I'm struggling to figure out by what code path
> pci_bus_remove() might invalidate pci_bus->bridge? A quick look at
> pci_remove_bus was not very illuminating but I didn't chase down every
> call it made.

remove_phb_dynamic()
|--> pci_remove_bus(bus)
 |--> device_unregister(&bus->dev)
  |--> put_device(dev)
   |--> device_release(kobj)
|--> dev->class->dev_release(dev) == release_pci_bus(dev)
 |--> kfree(bus)

We have the above call chain that takes place in the when put_device() triggers
the kobject ref count to go zero. The kobject_release function in this case is
device_release() which in turn calls dev->class->dev_release(dev). For a pci_bus
the class is appropriately pcibus_class whose dev_release() callback points to
release_pci_bus(). This in turn calls kfree() on the bus. Which means we can no
longer safely dereference any fields of the pci_bus struct.

-Tyrel

> 
> Kind regards,
> Daniel
> 
>> Fixes: 7340056567e3 ("powerpc/pci: Reorder pci bus/bridge unregistration 
>> during PHB removal")
>> Signed-off-by: Tyrel Datwyler 
>> ---
>>  arch/powerpc/platforms/pseries/pci_dlpar.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c 
>> b/arch/powerpc/platforms/pseries/pci_dlpar.c
>> index f9ae17e8a0f4..a8f9140a24fa 100644
>> --- a/arch/powerpc/platforms/pseries/pci_dlpar.c
>> +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
>> @@ -50,6 +50,7 @@ EXPORT_SYMBOL_GPL(init_phb_dynamic);
>>  int remove_phb_dynamic(struct pci_controller *phb)
>>  {
>>  struct pci_bus *b = phb->bus;
>> +struct pci_host_bridge *host_bridge = to_pci_host_bridge(b->bridge);
>>  struct resource *res;
>>  int rc, i;
>>  
>> @@ -76,7 +77,8 @@ int remove_phb_dynamic(struct pci_controller *phb)
>>  /* Remove the PCI bus and unregister the bridge device from sysfs */
>>  phb->bus = NULL;
>>  pci_remove_bus(b);
>> -device_unregister(b->bridge);
>> +host_bridge->bus = NULL;
>> +device_unregister(&host_bridge->dev);
>>  
>>  /* Now release the IO resource */
>>  if (res->flags & IORESOURCE_IO)
>> -- 
>> 2.27.0



Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus

2021-04-16 Thread Tyrel Datwyler
On 4/1/21 5:13 PM, Tyrel Datwyler wrote:
> Currently, neither the vio_bus or vio_driver structures provide support
> for a shutdown() routine.
> 
> Add support for shutdown() by allowing drivers to provide a
> implementation via function pointer in their vio_driver struct and
> provide a proper implementation in the driver template for the vio_bus
> that calls a vio drivers shutdown() if defined.
> 
> In the case that no shutdown() is defined by a vio driver and a kexec is
> in progress we implement a big hammer that calls remove() to ensure no
> further DMA for the devices is possible.
> 
> Signed-off-by: Tyrel Datwyler 
> ---

Ping... any comments, problems with this approach?

-Tyrel


Re: [PATCH v1 12/12] KVM: PPC: Book3S HV: Ensure MSR[HV] is always clear in guest MSR

2021-04-16 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Rather than clear the HV bit from the MSR at guest entry, make it clear
> that the hypervisor does not allow the guest to set the bit.
>
> The HV clear is kept in guest entry for now, but a future patch will
> warn if it is set.
>
> Acked-by: Paul Mackerras 
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/kvm/book3s_hv_builtin.c | 4 ++--
>  arch/powerpc/kvm/book3s_hv_nested.c  | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
> b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 41cb03d0bde4..7a0e33a9c980 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -662,8 +662,8 @@ static void kvmppc_end_cede(struct kvm_vcpu *vcpu)
>
>  void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
>  {
> - /* Guest must always run with ME enabled. */
> - msr = msr | MSR_ME;
> + /* Guest must always run with ME enabled, HV disabled. */
> + msr = (msr | MSR_ME) & ~MSR_HV;
>
>   /*
>* Check for illegal transactional state bit combination
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
> b/arch/powerpc/kvm/book3s_hv_nested.c
> index fb03085c902b..60724f674421 100644
> --- a/arch/powerpc/kvm/book3s_hv_nested.c
> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
> @@ -344,8 +344,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>   vcpu->arch.nested_vcpu_id = l2_hv.vcpu_token;
>   vcpu->arch.regs = l2_regs;
>
> - /* Guest must always run with ME enabled. */
> - vcpu->arch.shregs.msr = vcpu->arch.regs.msr | MSR_ME;
> + /* Guest must always run with ME enabled, HV disabled. */
> + vcpu->arch.shregs.msr = (vcpu->arch.regs.msr | MSR_ME) & ~MSR_HV;
>
>   sanitise_hv_regs(vcpu, &l2_hv);
>   restore_hv_regs(vcpu, &l2_hv);


Re: [PATCH v1 4/7] KVM: PPC: Book3S 64: Move hcall early register setup to KVM

2021-04-16 Thread Fabiano Rosas
Nicholas Piggin  writes:

> System calls / hcalls have a different calling convention than
> other interrupts, so there is code in the KVMTEST to massage these
> into the same form as other interrupt handlers.
>
> Move this work into the KVM hcall handler. This means teaching KVM
> a little more about the low level interrupt handler setup, PACA save
> areas, etc., although that's not obviously worse than the current
> approach of coming up with an entirely different interrupt register
> / save convention.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/include/asm/exception-64s.h | 13 
>  arch/powerpc/kernel/exceptions-64s.S | 42 +---
>  arch/powerpc/kvm/book3s_64_entry.S   | 30 +
>  3 files changed, 44 insertions(+), 41 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/exception-64s.h 
> b/arch/powerpc/include/asm/exception-64s.h
> index c1a8aac01cf9..bb6f78fcf981 100644
> --- a/arch/powerpc/include/asm/exception-64s.h
> +++ b/arch/powerpc/include/asm/exception-64s.h
> @@ -35,6 +35,19 @@
>  /* PACA save area size in u64 units (exgen, exmc, etc) */
>  #define EX_SIZE  10
>
> +/* PACA save area offsets */
> +#define EX_R90
> +#define EX_R10   8
> +#define EX_R11   16
> +#define EX_R12   24
> +#define EX_R13   32
> +#define EX_DAR   40
> +#define EX_DSISR 48
> +#define EX_CCR   52
> +#define EX_CFAR  56
> +#define EX_PPR   64
> +#define EX_CTR   72
> +
>  /*
>   * maximum recursive depth of MCE exceptions
>   */
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 9467fd1038f9..1bfd0d7af09e 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -21,22 +21,6 @@
>  #include 
>  #include 
>
> -/* PACA save area offsets (exgen, exmc, etc) */
> -#define EX_R90
> -#define EX_R10   8
> -#define EX_R11   16
> -#define EX_R12   24
> -#define EX_R13   32
> -#define EX_DAR   40
> -#define EX_DSISR 48
> -#define EX_CCR   52
> -#define EX_CFAR  56
> -#define EX_PPR   64
> -#define EX_CTR   72
> -.if EX_SIZE != 10
> - .error "EX_SIZE is wrong"
> -.endif
> -
>  /*
>   * Following are fixed section helper macros.
>   *
> @@ -1964,29 +1948,8 @@ EXC_VIRT_END(system_call, 0x4c00, 0x100)
>
>  #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
>  TRAMP_REAL_BEGIN(system_call_kvm)
> - /*
> -  * This is a hcall, so register convention is as above, with these
> -  * differences:
> -  * r13 = PACA
> -  * ctr = orig r13
> -  * orig r10 saved in PACA
> -  */
> -  /*
> -   * Save the PPR (on systems that support it) before changing to
> -   * HMT_MEDIUM. That allows the KVM code to save that value into the
> -   * guest state (it is the guest's PPR value).
> -   */
> -BEGIN_FTR_SECTION
> - mfspr   r10,SPRN_PPR
> - std r10,HSTATE_PPR(r13)
> -END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
> - HMT_MEDIUM
>   mfctr   r10
> - SET_SCRATCH0(r10)
> - mfcrr10
> - std r12,HSTATE_SCRATCH0(r13)
> - sldir12,r10,32
> - ori r12,r12,0xc00
> + SET_SCRATCH0(r10) /* Save r13 in SCRATCH0 */
>  #ifdef CONFIG_RELOCATABLE
>   /*
>* Requires __LOAD_FAR_HANDLER beause kvmppc_hcall lives
> @@ -1994,15 +1957,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>*/
>   __LOAD_FAR_HANDLER(r10, kvmppc_hcall)
>   mtctr   r10
> - ld  r10,PACA_EXGEN+EX_R10(r13)
>   bctr
>  #else
> - ld  r10,PACA_EXGEN+EX_R10(r13)
>   b   kvmppc_hcall
>  #endif
>  #endif
>
> -
>  /**
>   * Interrupt 0xd00 - Trace Interrupt.
>   * This is a synchronous interrupt in response to instruction step or
> diff --git a/arch/powerpc/kvm/book3s_64_entry.S 
> b/arch/powerpc/kvm/book3s_64_entry.S
> index c21fa64059ef..f527e16707db 100644
> --- a/arch/powerpc/kvm/book3s_64_entry.S
> +++ b/arch/powerpc/kvm/book3s_64_entry.S
> @@ -14,6 +14,36 @@
>  .global  kvmppc_hcall
>  .balign IFETCH_ALIGN_BYTES
>  kvmppc_hcall:
> + /*
> +  * This is a hcall, so register convention is as
> +  * Documentation/powerpc/papr_hcalls.rst, with these additions:
> +  * R13  = PACA
> +  * guest R13 saved in SPRN_SCRATCH0
> +  * R10  = free
> +  * guest r10 saved in PACA_EXGEN
> +  *
> +  * This may also be a syscall from PR-KVM userspace that is to be
> +  * reflected to the PR guest kernel, so registers may be set up for
> +  * a system call rather than hcall. We don't currently clobber
> +  * anything here, but the 0xc00 handler has already clobbered CTR
> +  * and CR0, so PR-KVM can not support a guest kernel that preserves
> +  * those registers across its s

Re: [PATCH v1 5/7] KVM: PPC: Book3S 64: Move interrupt early register setup to KVM

2021-04-16 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Like the earlier patch for hcalls, KVM interrupt entry requires a
> different calling convention than the Linux interrupt handlers
> set up. Move the code that converts from one to the other into KVM.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/kernel/exceptions-64s.S | 131 +--
>  arch/powerpc/kvm/book3s_64_entry.S   |  50 +-
>  2 files changed, 71 insertions(+), 110 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 1bfd0d7af09e..cd1731642b12 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -187,7 +187,6 @@ do_define_int n
>   .endif
>  .endm
>
> -#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
>  /*
>   * All interrupts which set HSRR registers, as well as SRESET and MCE and
>   * syscall when invoked with "sc 1" switch to MSR[HV]=1 (HVMODE) to be taken,
> @@ -220,54 +219,25 @@ do_define_int n
>   * to KVM to handle.
>   */
>
> -.macro KVMTEST name
> +.macro KVMTEST name handler
> +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
>   lbz r10,HSTATE_IN_GUEST(r13)
>   cmpwi   r10,0
> - bne \name\()_kvm
> -.endm
> -
> -.macro GEN_KVM name
> - .balign IFETCH_ALIGN_BYTES
> -\name\()_kvm:
> -
> -BEGIN_FTR_SECTION
> - ld  r10,IAREA+EX_CFAR(r13)
> - std r10,HSTATE_CFAR(r13)
> -END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
> -
> - ld  r10,IAREA+EX_CTR(r13)
> - mtctr   r10
> -BEGIN_FTR_SECTION
> - ld  r10,IAREA+EX_PPR(r13)
> - std r10,HSTATE_PPR(r13)
> -END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
> - ld  r11,IAREA+EX_R11(r13)
> - ld  r12,IAREA+EX_R12(r13)
> - std r12,HSTATE_SCRATCH0(r13)
> - sldir12,r9,32
> - ld  r9,IAREA+EX_R9(r13)
> - ld  r10,IAREA+EX_R10(r13)
>   /* HSRR variants have the 0x2 bit added to their trap number */
>   .if IHSRR_IF_HVMODE
>   BEGIN_FTR_SECTION
> - ori r12,r12,(IVEC + 0x2)
> + li  r10,(IVEC + 0x2)
>   FTR_SECTION_ELSE
> - ori r12,r12,(IVEC)
> + li  r10,(IVEC)
>   ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
>   .elseif IHSRR
> - ori r12,r12,(IVEC+ 0x2)
> + li  r10,(IVEC + 0x2)
>   .else
> - ori r12,r12,(IVEC)
> + li  r10,(IVEC)
>   .endif
> - b   kvmppc_interrupt
> -.endm
> -
> -#else
> -.macro KVMTEST name
> -.endm
> -.macro GEN_KVM name
> -.endm
> + bne \handler
>  #endif
> +.endm
>
>  /*
>   * This is the BOOK3S interrupt entry code macro.
> @@ -409,7 +379,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>  DEFINE_FIXED_SYMBOL(\name\()_common_real)
>  \name\()_common_real:
>   .if IKVM_REAL
> - KVMTEST \name
> + KVMTEST \name kvm_interrupt
>   .endif
>
>   ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
> @@ -432,7 +402,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
>  DEFINE_FIXED_SYMBOL(\name\()_common_virt)
>  \name\()_common_virt:
>   .if IKVM_VIRT
> - KVMTEST \name
> + KVMTEST \name kvm_interrupt
>  1:
>   .endif
>   .endif /* IVIRT */
> @@ -446,7 +416,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
>  DEFINE_FIXED_SYMBOL(\name\()_common_real)
>  \name\()_common_real:
>   .if IKVM_REAL
> - KVMTEST \name
> + KVMTEST \name kvm_interrupt
>   .endif
>  .endm
>
> @@ -967,8 +937,6 @@ EXC_COMMON_BEGIN(system_reset_common)
>   EXCEPTION_RESTORE_REGS
>   RFI_TO_USER_OR_KERNEL
>
> - GEN_KVM system_reset
> -
>
>  /**
>   * Interrupt 0x200 - Machine Check Interrupt (MCE).
> @@ -1132,7 +1100,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
>   /*
>* Check if we are coming from guest. If yes, then run the normal
>* exception handler which will take the
> -  * machine_check_kvm->kvmppc_interrupt branch to deliver the MC event
> +  * machine_check_kvm->kvm_interrupt branch to deliver the MC event
>* to guest.
>*/
>   lbz r11,HSTATE_IN_GUEST(r13)
> @@ -1203,8 +1171,6 @@ EXC_COMMON_BEGIN(machine_check_common)
>   bl  machine_check_exception
>   b   interrupt_return
>
> - GEN_KVM machine_check
> -
>
>  #ifdef CONFIG_PPC_P7_NAP
>  /*
> @@ -1339,8 +1305,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>   REST_NVGPRS(r1)
>   b   interrupt_return
>
> - GEN_KVM data_access
> -
>
>  /**
>   * Interrupt 0x380 - Data Segment Interrupt (DSLB).
> @@ -1390,8 +1354,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>   bl  do_bad_slb_fault
>   b   interrupt_return
>
> - GEN_KVM data_access_slb
> -
>
>  /**
>   * Interrupt 0x400 - Instruction Storage Interrupt (ISI).
> @@ -1428,8 +1390,6 @@ MMU_FTR_SECTION_ELSE
>  ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>   b   interrupt_return
>
> - GEN_KVM instruction_access
> -

Re: [PATCH v2] powerpc/kexec_file: use current CPU info while setting up FDT

2021-04-16 Thread Hari Bathini




On 16/04/21 6:16 pm, Sourabh Jain wrote:

kexec_file_load uses initial_boot_params in setting up the device-tree
for the kernel to be loaded. Though initial_boot_params holds info
about CPUs at the time of boot, it doesn't account for hot added CPUs.

So, kexec'ing with kexec_file_load syscall would leave the kexec'ed
kernel with inaccurate CPU info. Also, if kdump kernel is loaded with
kexec_file_load syscall and the system crashes on a hot added CPU,
capture kernel hangs failing to identify the boot CPU.

  Kernel panic - not syncing: sysrq triggered crash
  CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54
  Call Trace:
  [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable)
  [c000e590fb00] [c0145290] panic+0x16c/0x41c
  [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40
  [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0
  [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178
  [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0
  [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330
  [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140
  [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290
  [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278
  --- interrupt: c00 at 0x7fff905b9664
  NIP:  7fff905b9664 LR: 7fff905320c4 CTR: 
  REGS: c000e590fe80 TRAP: 0c00   Not tainted  (5.12.0-rc5upstream)
  MSR:  8280f033   CR: 28000242
XER: 
  IRQMASK: 0
  GPR00: 0004 75fedf30 7fff906a7300 0001
  GPR04: 01002a7355b0 0002 0001 75fef616
  GPR08: 0001   
  GPR12:  7fff9073a160  
  GPR16:    
  GPR20:  7fff906a4ee0 0002 0001
  GPR24: 7fff906a0898  0002 01002a7355b0
  GPR28: 0002 7fff906a1790 01002a7355b0 0002
  NIP [7fff905b9664] 0x7fff905b9664
  LR [7fff905320c4] 0x7fff905320c4
  --- interrupt: c00

To avoid this from happening, extract current CPU info from of_root
device node and use it for setting up the fdt in kexec_file_load case.

Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory reserve 
map")


Missed marking sta...@vger.kernel.org on Cc for this fix..



+int add_node_prop(void *fdt, int node_offset, const struct device_node *np)
+{





+int update_cpus_node(void *fdt)
+{


I think the above two new functions should be marked 'static'...

Thanks
Hari


[PATCH net-next 2/2] powerpc: dts: fsl: Drop obsolete fsl, rx-bit-map and fsl, tx-bit-map properties

2021-04-16 Thread Claudiu Manoil
These are very old properties that were used by the "gianfar" ethernet
driver.  They don't have documented bindings and are obsolete.

Signed-off-by: Claudiu Manoil 
---
 arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi |  4 
 arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi |  4 
 arch/powerpc/boot/dts/fsl/c293si-post.dtsi|  4 
 arch/powerpc/boot/dts/fsl/p1010si-post.dtsi   | 21 ---
 4 files changed, 33 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi
index 0c0efa94cfb4..2a677fd323eb 100644
--- a/arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi
@@ -170,8 +170,6 @@ timer@41100 {
 /include/ "pq3-etsec2-0.dtsi"
 enet0: ethernet@b {
queue-group@b {
-   fsl,rx-bit-map = <0xff>;
-   fsl,tx-bit-map = <0xff>;
interrupts = <26 2 0 0 27 2 0 0 28 2 0 0>;
};
 };
@@ -179,8 +177,6 @@ queue-group@b {
 /include/ "pq3-etsec2-1.dtsi"
 enet1: ethernet@b1000 {
queue-group@b1000 {
-   fsl,rx-bit-map = <0xff>;
-   fsl,tx-bit-map = <0xff>;
interrupts = <33 2 0 0 34 2 0 0 35 2 0 0>;
};
 };
diff --git a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
index b5f071574e83..b8e0edd1ac69 100644
--- a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
@@ -190,8 +190,6 @@ sec_jr3: jr@4000 {
 /include/ "pq3-etsec2-0.dtsi"
 enet0: ethernet@b {
queue-group@b {
-   fsl,rx-bit-map = <0xff>;
-   fsl,tx-bit-map = <0xff>;
interrupts = <26 2 0 0 27 2 0 0 28 2 0 0>;
};
 };
@@ -199,8 +197,6 @@ queue-group@b {
 /include/ "pq3-etsec2-1.dtsi"
 enet1: ethernet@b1000 {
queue-group@b1000 {
-   fsl,rx-bit-map = <0xff>;
-   fsl,tx-bit-map = <0xff>;
interrupts = <33 2 0 0 34 2 0 0 35 2 0 0>;
};
 };
diff --git a/arch/powerpc/boot/dts/fsl/c293si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/c293si-post.dtsi
index bd208320bff5..bec0fc36849d 100644
--- a/arch/powerpc/boot/dts/fsl/c293si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/c293si-post.dtsi
@@ -171,8 +171,6 @@ jr@2000{
enet0: ethernet@b {
queue-group@b {
reg = <0x1 0x1000>;
-   fsl,rx-bit-map = <0xff>;
-   fsl,tx-bit-map = <0xff>;
};
};
 
@@ -180,8 +178,6 @@ queue-group@b {
enet1: ethernet@b1000 {
queue-group@b1000 {
reg = <0x11000 0x1000>;
-   fsl,rx-bit-map = <0xff>;
-   fsl,tx-bit-map = <0xff>;
};
};
 
diff --git a/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi
index 1b4aafc1f6a2..c2717f31925a 100644
--- a/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi
@@ -172,29 +172,8 @@ sdhc@2e000 {
 /include/ "pq3-mpic-timer-B.dtsi"
 
 /include/ "pq3-etsec2-0.dtsi"
-   enet0: ethernet@b {
-   queue-group@b {
-   fsl,rx-bit-map = <0xff>;
-   fsl,tx-bit-map = <0xff>;
-   };
-   };
-
 /include/ "pq3-etsec2-1.dtsi"
-   enet1: ethernet@b1000 {
-   queue-group@b1000 {
-   fsl,rx-bit-map = <0xff>;
-   fsl,tx-bit-map = <0xff>;
-   };
-   };
-
 /include/ "pq3-etsec2-2.dtsi"
-   enet2: ethernet@b2000 {
-   queue-group@b2000 {
-   fsl,rx-bit-map = <0xff>;
-   fsl,tx-bit-map = <0xff>;
-   };
-
-   };
 
global-utilities@e {
compatible = "fsl,p1010-guts";
-- 
2.25.1



[PATCH net-next 1/2] gianfar: Drop GFAR_MQ_POLLING support

2021-04-16 Thread Claudiu Manoil
Gianfar used to enable all 8 Rx queues (DMA rings) per
ethernet device, even though the controller can only
support 2 interrupt lines at most.  This meant that
multiple Rx queues would have to be grouped per NAPI poll
routine, and the CPU would have to split the budget and
service them in a round robin manner.  The overhead of
this scheme proved to outweight the potential benefits.
The alternative was to introduce the "Single Queue" polling
mode, supporting one Rx queue per NAPI, which became the
default packet processing option and helped improve the
performance of the driver.
MQ_POLLING also relies on undocumeted device tree properties
to specify how to map the 8 Rx and Tx queues to a given
interrupt line (aka "interrupt group").  Using module parameters
to enable this mode wasn't an option either.  Long story short,
MQ_POLLING became obsolete, now it is just dead code, and no
one asked for it so far.
For the Tx queues, multi-queue support (more than 1 Tx queue
per CPU) could be revisited by adding tc MQPRIO support, but
again, one has to consider that there are only 2 interrupt lines.
So the NAPI poll routine would have to service multiple Tx rings.

Signed-off-by: Claudiu Manoil 
---
 drivers/net/ethernet/freescale/gianfar.c | 170 ++-
 drivers/net/ethernet/freescale/gianfar.h |  17 ---
 2 files changed, 11 insertions(+), 176 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 3ec4d9fddd52..4e4c62d4061e 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -175,10 +175,7 @@ static void gfar_mac_rx_config(struct gfar_private *priv)
if (priv->rx_filer_enable) {
rctrl |= RCTRL_FILREN | RCTRL_PRSDEP_INIT;
/* Program the RIR0 reg with the required distribution */
-   if (priv->poll_mode == GFAR_SQ_POLLING)
-   gfar_write(®s->rir0, DEFAULT_2RXQ_RIR0);
-   else /* GFAR_MQ_POLLING */
-   gfar_write(®s->rir0, DEFAULT_8RXQ_RIR0);
+   gfar_write(®s->rir0, DEFAULT_2RXQ_RIR0);
}
 
/* Restore PROMISC mode */
@@ -521,29 +518,9 @@ static int gfar_parse_group(struct device_node *np,
grp->priv = priv;
spin_lock_init(&grp->grplock);
if (priv->mode == MQ_MG_MODE) {
-   u32 rxq_mask, txq_mask;
-   int ret;
-
+   /* One Q per interrupt group: Q0 to G0, Q1 to G1 */
grp->rx_bit_map = (DEFAULT_MAPPING >> priv->num_grps);
grp->tx_bit_map = (DEFAULT_MAPPING >> priv->num_grps);
-
-   ret = of_property_read_u32(np, "fsl,rx-bit-map", &rxq_mask);
-   if (!ret) {
-   grp->rx_bit_map = rxq_mask ?
-   rxq_mask : (DEFAULT_MAPPING >> priv->num_grps);
-   }
-
-   ret = of_property_read_u32(np, "fsl,tx-bit-map", &txq_mask);
-   if (!ret) {
-   grp->tx_bit_map = txq_mask ?
-   txq_mask : (DEFAULT_MAPPING >> priv->num_grps);
-   }
-
-   if (priv->poll_mode == GFAR_SQ_POLLING) {
-   /* One Q per interrupt group: Q0 to G0, Q1 to G1 */
-   grp->rx_bit_map = (DEFAULT_MAPPING >> priv->num_grps);
-   grp->tx_bit_map = (DEFAULT_MAPPING >> priv->num_grps);
-   }
} else {
grp->rx_bit_map = 0xFF;
grp->tx_bit_map = 0xFF;
@@ -650,18 +627,15 @@ static int gfar_of_init(struct platform_device *ofdev, 
struct net_device **pdev)
u32 stash_len = 0;
u32 stash_idx = 0;
unsigned int num_tx_qs, num_rx_qs;
-   unsigned short mode, poll_mode;
+   unsigned short mode;
 
if (!np)
return -ENODEV;
 
-   if (of_device_is_compatible(np, "fsl,etsec2")) {
+   if (of_device_is_compatible(np, "fsl,etsec2"))
mode = MQ_MG_MODE;
-   poll_mode = GFAR_SQ_POLLING;
-   } else {
+   else
mode = SQ_SG_MODE;
-   poll_mode = GFAR_SQ_POLLING;
-   }
 
if (mode == SQ_SG_MODE) {
num_tx_qs = 1;
@@ -677,22 +651,8 @@ static int gfar_of_init(struct platform_device *ofdev, 
struct net_device **pdev)
return -EINVAL;
}
 
-   if (poll_mode == GFAR_SQ_POLLING) {
-   num_tx_qs = num_grps; /* one txq per int group */
-   num_rx_qs = num_grps; /* one rxq per int group */
-   } else { /* GFAR_MQ_POLLING */
-   u32 tx_queues, rx_queues;
-   int ret;
-
-   /* parse the num of HW tx and rx queues */
-   ret = of_property_read_u32(np, "fsl,num_tx_queues",
-  &tx_queues);
-   num_tx_q

[PATCH net-next 0/2] net: gianfar: Drop GFAR_MQ_POLLING support

2021-04-16 Thread Claudiu Manoil
Drop long time obsolete "per NAPI multi-queue" support in gianfar,
and related (and undocumented) device tree properties.

Claudiu Manoil (2):
  gianfar: Drop GFAR_MQ_POLLING support
  powerpc: dts: fsl: Drop obsolete fsl,rx-bit-map and fsl,tx-bit-map
properties

 arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi |   4 -
 arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi |   4 -
 arch/powerpc/boot/dts/fsl/c293si-post.dtsi|   4 -
 arch/powerpc/boot/dts/fsl/p1010si-post.dtsi   |  21 ---
 drivers/net/ethernet/freescale/gianfar.c  | 170 ++
 drivers/net/ethernet/freescale/gianfar.h  |  17 --
 6 files changed, 11 insertions(+), 209 deletions(-)

-- 
2.25.1



Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-16 Thread Jesper Dangaard Brouer
On Fri, 16 Apr 2021 16:27:55 +0100
Matthew Wilcox  wrote:

> On Thu, Apr 15, 2021 at 08:08:32PM +0200, Jesper Dangaard Brouer wrote:
> > See below patch.  Where I swap32 the dma address to satisfy
> > page->compound having bit zero cleared. (It is the simplest fix I could
> > come up with).  
> 
> I think this is slightly simpler, and as a bonus code that assumes the
> old layout won't compile.

This is clever, I like it!  When reading the code one just have to
remember 'unsigned long' size difference between 64-bit vs 32-bit.
And I assume compiler can optimize the sizeof check out then doable.

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6613b26a8894..5aacc1c10a45 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -97,10 +97,10 @@ struct page {
>   };
>   struct {/* page_pool used by netstack */
>   /**
> -  * @dma_addr: might require a 64-bit value even on
> +  * @dma_addr: might require a 64-bit value on
>* 32-bit architectures.
>*/
> - dma_addr_t dma_addr;
> + unsigned long dma_addr[2];
>   };
>   struct {/* slab, slob and slub */
>   union {
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index b5b195305346..db7c7020746a 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct 
> page_pool *pool,
>  
>  static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>  {
> - return page->dma_addr;
> + dma_addr_t ret = page->dma_addr[0];
> + if (sizeof(dma_addr_t) > sizeof(unsigned long))
> + ret |= (dma_addr_t)page->dma_addr[1] << 32;
> + return ret;
> +}
> +
> +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> +{
> + page->dma_addr[0] = addr;
> + if (sizeof(dma_addr_t) > sizeof(unsigned long))
> + page->dma_addr[1] = addr >> 32;
>  }
>  
>  static inline bool is_page_pool_compiled_in(void)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index ad8b0707af04..f014fd8c19a6 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct 
> page_pool *pool,
> struct page *page,
> unsigned int dma_sync_size)
>  {
> + dma_addr_t dma_addr = page_pool_get_dma_addr(page);
> +
>   dma_sync_size = min(dma_sync_size, pool->p.max_len);
> - dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
> + dma_sync_single_range_for_device(pool->p.dev, dma_addr,
>pool->p.offset, dma_sync_size,
>pool->p.dma_dir);
>  }
> @@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct 
> page_pool *pool,
>   put_page(page);
>   return NULL;
>   }
> - page->dma_addr = dma;
> + page_pool_set_dma_addr(page, dma);
>  
>   if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>   page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> @@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, 
> struct page *page)
>*/
>   goto skip_dma_unmap;
>  
> - dma = page->dma_addr;
> + dma = page_pool_get_dma_addr(page);
>  
> - /* When page is unmapped, it cannot be returned our pool */
> + /* When page is unmapped, it cannot be returned to our pool */
>   dma_unmap_page_attrs(pool->p.dev, dma,
>PAGE_SIZE << pool->p.order, pool->p.dma_dir,
>DMA_ATTR_SKIP_CPU_SYNC);
> - page->dma_addr = 0;
> + page_pool_set_dma_addr(page, 0);
>  skip_dma_unmap:
>   /* This may be the last page returned, releasing the pool, so
>* it is not safe to reference pool afterwards.
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer



Re: [PATCH 3/3] powerpc/smp: Cache CPU to chip lookup

2021-04-16 Thread Srikar Dronamraju
* Gautham R Shenoy  [2021-04-16 21:27:48]:

> On Thu, Apr 15, 2021 at 11:21:10PM +0530, Srikar Dronamraju wrote:
> > * Gautham R Shenoy  [2021-04-15 22:49:21]:
> > 
> > > > 
> > > > +int *chip_id_lookup_table;
> > > > +
> > > >  #ifdef CONFIG_PPC64
> > > >  int __initdata iommu_is_off;
> > > >  int __initdata iommu_force_on;
> > > > @@ -914,13 +916,22 @@ EXPORT_SYMBOL(of_get_ibm_chip_id);
> > > >  int cpu_to_chip_id(int cpu)
> > > >  {
> > > > struct device_node *np;
> > > > +   int ret = -1, idx;
> > > > +
> > > > +   idx = cpu / threads_per_core;
> > > > +   if (chip_id_lookup_table && chip_id_lookup_table[idx] != -1)
> > > 
> > 
> > > The value -1 is ambiguous since we won't be able to determine if
> > > it is because we haven't yet made a of_get_ibm_chip_id() call
> > > or if of_get_ibm_chip_id() call was made and it returned a -1.
> > > 
> > 
> > We don't allocate chip_id_lookup_table unless cpu_to_chip_id() return
> > !-1 value for the boot-cpuid. So this ensures that we dont
> > unnecessarily allocate chip_id_lookup_table. Also I check for
> > chip_id_lookup_table before calling cpu_to_chip_id() for other CPUs.
> > So this avoids overhead of calling cpu_to_chip_id() for platforms that
> > dont support it.  Also its most likely that if the
> > chip_id_lookup_table is initialized then of_get_ibm_chip_id() call
> > would return a valid value.
> > 
> > + Below we are only populating the lookup table, only when the
> > of_get_cpu_node is valid.
> > 
> > So I dont see any drawbacks of initializing it to -1. Do you see
> any?
> 
> 
> Only if other callers of cpu_to_chip_id() don't check for whether the
> chip_id_lookup_table() has been allocated or not. From a code
> readability point of view, it is easier to have that check  this inside
> cpu_to_chip_id() instead of requiring all its callers to make that
> check.
> 

I didn't understand your comment. However let me reiterate what I said
earlier. We don't have control over who and when cpu_to_chip_id() gets
called. If the cpu_to_chip_id() might be called for non present CPU,
in which case it will return -1, Should we cache it or not?

If we cache it, we will return wrong value when the CPU may turn out
to be present. If we cache and retry it then having one value for
initializing and another for invalid is all the same as having just 1
value for initializing and invalid. Just that we end up adding more
confusing code. Atleast to me, code isnt readable if I say retry for
-1 and -2 too. After few years, we ourselves will wonder why we have
two values if we are checking and performing same actions.

> > 
> > > Thus, perhaps we can initialize chip_id_lookup_table[idx] with a
> > > different unique negative value. How about S32_MIN ? and check
> > > chip_id_lookup_table[idx] is different here ?
> > > 
> > 
> > I had initially initialized to -2, But then I thought we adding in
> > more confusion than necessary and it was not solving any issues.
> > 
> > 
> > -- 
> > Thanks and Regards
> > Srikar Dronamraju

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-16 Thread Steven Price

On 16/04/2021 16:15, Christophe Leroy wrote:



Le 16/04/2021 à 17:04, Christophe Leroy a écrit :



Le 16/04/2021 à 16:40, Christophe Leroy a écrit :



Le 16/04/2021 à 15:00, Steven Price a écrit :

On 16/04/2021 12:08, Christophe Leroy wrote:



Le 16/04/2021 à 12:51, Steven Price a écrit :

On 16/04/2021 11:38, Christophe Leroy wrote:



Le 16/04/2021 à 11:28, Steven Price a écrit :
To be honest I don't fully understand why powerpc requires the 
page_size - it appears to be using it purely to find "holes" in 
the calls to note_page(), but I haven't worked out why such 
holes would occur.


I was indeed introduced for KASAN. We have a first commit 
https://github.com/torvalds/linux/commit/cabe8138 which uses page 
size to detect whether it is a KASAN like stuff.


Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a 
fix. I can't remember what the problem was exactly, something 
around the use of hugepages for kernel memory, came as part of 
the series 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ 








Ah, that's useful context. So it looks like powerpc took a 
different route to reducing the KASAN output to x86.


Given the generic ptdump code has handling for KASAN already it 
should be possible to drop that from the powerpc arch code, which 
I think means we don't actually need to provide page size to 
notepage(). Hopefully that means more code to delete ;)




Yes ... and no.

It looks like the generic ptdump handles the case when several 
pgdir entries points to the same kasan_early_shadow_pte. But it 
doesn't take into account the powerpc case where we have regular 
page tables where several (if not all) PTEs are pointing to the 
kasan_early_shadow_page .


I'm not sure I follow quite how powerpc is different here. But could 
you have a similar check for PTEs against kasan_early_shadow_pte as 
the other levels already have?


I'm just worried that page_size isn't well defined in this interface 
and it's going to cause problems in the future.




I'm trying. I reverted the two commits b00ff6d8c and cabe8138.

At the moment, I don't get exactly what I expect: For linear memory I 
get one line for each 8M page whereas before reverting the patches I 
got one 16M line and one 112M line.


And for KASAN shadow area I get two lines for the 2x 8M pages 
shadowing linear mem then I get one 4M line for each PGDIR entry 
pointing to kasan_early_shadow_pte.


0xf800-0xf87f 0x0700 8M   huge    rw   
present
0xf880-0xf8ff 0x0780 8M   huge    rw   
present
0xf900-0xf93f 0x0143 4M   r
present

...
0xfec0-0xfeff 0x0143 4M   r
present


Any idea ?




I think the different with other architectures is here:

 } else if (flag != st->current_flags || level != st->level ||
    addr >= st->marker[1].start_address ||
    pa != st->last_pa + PAGE_SIZE) {


In addition to the checks everyone do, powerpc also checks "pa != 
st->last_pa + PAGE_SIZE".
And it is definitely for that test that page_size argument add been 
added.


By replacing that test by (pa - st->start_pa != addr - 
st->start_address) it works again. So we definitely don't need the real 
page size.


Yes that should work. Thanks for figuring it out!





I see that other architectures except RISCV don't dump the physical 
address. But even RISCV doesn't include that check.


Yes not having the physical address certainly simplifies things - 
although I can see why that can be handy to see. The disadvantage is 
that user space or vmalloc()'d memory will produce a lot of output 
because the physical addresses are unlikely to be contiguous. And for 
most uses you don't need the information.


That physical address dump was added by commit aaa229529244 
("powerpc/mm: Add physical address to Linux page table dump") 
[https://github.com/torvalds/linux/commit/aaa2295]


How do other architectures deal with the problem described by the 
commit log of that patch ?


AFAIK other architectures are "broken" in this regard. In practice I 
don't think it often causes an issue though.


Steve


Re: [PATCH 3/3] powerpc/smp: Cache CPU to chip lookup

2021-04-16 Thread Gautham R Shenoy
On Thu, Apr 15, 2021 at 11:21:10PM +0530, Srikar Dronamraju wrote:
> * Gautham R Shenoy  [2021-04-15 22:49:21]:
> 
> > > 
> > > +int *chip_id_lookup_table;
> > > +
> > >  #ifdef CONFIG_PPC64
> > >  int __initdata iommu_is_off;
> > >  int __initdata iommu_force_on;
> > > @@ -914,13 +916,22 @@ EXPORT_SYMBOL(of_get_ibm_chip_id);
> > >  int cpu_to_chip_id(int cpu)
> > >  {
> > >   struct device_node *np;
> > > + int ret = -1, idx;
> > > +
> > > + idx = cpu / threads_per_core;
> > > + if (chip_id_lookup_table && chip_id_lookup_table[idx] != -1)
> > 
> 
> > The value -1 is ambiguous since we won't be able to determine if
> > it is because we haven't yet made a of_get_ibm_chip_id() call
> > or if of_get_ibm_chip_id() call was made and it returned a -1.
> > 
> 
> We don't allocate chip_id_lookup_table unless cpu_to_chip_id() return
> !-1 value for the boot-cpuid. So this ensures that we dont
> unnecessarily allocate chip_id_lookup_table. Also I check for
> chip_id_lookup_table before calling cpu_to_chip_id() for other CPUs.
> So this avoids overhead of calling cpu_to_chip_id() for platforms that
> dont support it.  Also its most likely that if the
> chip_id_lookup_table is initialized then of_get_ibm_chip_id() call
> would return a valid value.
> 
> + Below we are only populating the lookup table, only when the
> of_get_cpu_node is valid.
> 
> So I dont see any drawbacks of initializing it to -1. Do you see
any?


Only if other callers of cpu_to_chip_id() don't check for whether the
chip_id_lookup_table() has been allocated or not. From a code
readability point of view, it is easier to have that check  this inside
cpu_to_chip_id() instead of requiring all its callers to make that
check.

> 
> > Thus, perhaps we can initialize chip_id_lookup_table[idx] with a
> > different unique negative value. How about S32_MIN ? and check
> > chip_id_lookup_table[idx] is different here ?
> > 
> 
> I had initially initialized to -2, But then I thought we adding in
> more confusion than necessary and it was not solving any issues.
> 
> 
> -- 
> Thanks and Regards
> Srikar Dronamraju


Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-16 Thread Matthew Wilcox
On Thu, Apr 15, 2021 at 08:08:32PM +0200, Jesper Dangaard Brouer wrote:
> See below patch.  Where I swap32 the dma address to satisfy
> page->compound having bit zero cleared. (It is the simplest fix I could
> come up with).

I think this is slightly simpler, and as a bonus code that assumes the
old layout won't compile.

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6613b26a8894..5aacc1c10a45 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -97,10 +97,10 @@ struct page {
};
struct {/* page_pool used by netstack */
/**
-* @dma_addr: might require a 64-bit value even on
+* @dma_addr: might require a 64-bit value on
 * 32-bit architectures.
 */
-   dma_addr_t dma_addr;
+   unsigned long dma_addr[2];
};
struct {/* slab, slob and slub */
union {
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index b5b195305346..db7c7020746a 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct 
page_pool *pool,
 
 static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
 {
-   return page->dma_addr;
+   dma_addr_t ret = page->dma_addr[0];
+   if (sizeof(dma_addr_t) > sizeof(unsigned long))
+   ret |= (dma_addr_t)page->dma_addr[1] << 32;
+   return ret;
+}
+
+static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
+{
+   page->dma_addr[0] = addr;
+   if (sizeof(dma_addr_t) > sizeof(unsigned long))
+   page->dma_addr[1] = addr >> 32;
 }
 
 static inline bool is_page_pool_compiled_in(void)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index ad8b0707af04..f014fd8c19a6 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct page_pool 
*pool,
  struct page *page,
  unsigned int dma_sync_size)
 {
+   dma_addr_t dma_addr = page_pool_get_dma_addr(page);
+
dma_sync_size = min(dma_sync_size, pool->p.max_len);
-   dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
+   dma_sync_single_range_for_device(pool->p.dev, dma_addr,
 pool->p.offset, dma_sync_size,
 pool->p.dma_dir);
 }
@@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct 
page_pool *pool,
put_page(page);
return NULL;
}
-   page->dma_addr = dma;
+   page_pool_set_dma_addr(page, dma);
 
if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
@@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, 
struct page *page)
 */
goto skip_dma_unmap;
 
-   dma = page->dma_addr;
+   dma = page_pool_get_dma_addr(page);
 
-   /* When page is unmapped, it cannot be returned our pool */
+   /* When page is unmapped, it cannot be returned to our pool */
dma_unmap_page_attrs(pool->p.dev, dma,
 PAGE_SIZE << pool->p.order, pool->p.dma_dir,
 DMA_ATTR_SKIP_CPU_SYNC);
-   page->dma_addr = 0;
+   page_pool_set_dma_addr(page, 0);
 skip_dma_unmap:
/* This may be the last page returned, releasing the pool, so
 * it is not safe to reference pool afterwards.


Re: [PATCH v2] powerpc/kexec_file: use current CPU info while setting up FDT

2021-04-16 Thread kernel test robot
Hi Sourabh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on linus/master v5.12-rc7 next-20210416]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Sourabh-Jain/powerpc-kexec_file-use-current-CPU-info-while-setting-up-FDT/20210416-204821
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/c4f40243a6928fb16798b2b98c5371815b49e4cc
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Sourabh-Jain/powerpc-kexec_file-use-current-CPU-info-while-setting-up-FDT/20210416-204821
git checkout c4f40243a6928fb16798b2b98c5371815b49e4cc
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> arch/powerpc/kexec/file_load_64.c:972:5: warning: no previous prototype for 
>> 'add_node_prop' [-Wmissing-prototypes]
 972 | int add_node_prop(void *fdt, int node_offset, const struct 
device_node *np)
 | ^
>> arch/powerpc/kexec/file_load_64.c:1003:5: warning: no previous prototype for 
>> 'update_cpus_node' [-Wmissing-prototypes]
1003 | int update_cpus_node(void *fdt)
 | ^~~~


vim +/add_node_prop +972 arch/powerpc/kexec/file_load_64.c

   962  
   963  /**
   964   * add_node_prop - Read property from device node structure and add
   965   *  them to fdt.
   966   * @fdt:Flattened device tree of the kernel
   967   * @node_offset:offset of the node to add a property at
   968   * np:  device node pointer
   969   *
   970   * Returns 0 on success, negative errno on error.
   971   */
 > 972  int add_node_prop(void *fdt, int node_offset, const struct device_node 
 > *np)
   973  {
   974  int ret = 0;
   975  struct property *pp;
   976  unsigned long flags;
   977  
   978  if (!np)
   979  return -EINVAL;
   980  
   981  raw_spin_lock_irqsave(&devtree_lock, flags);
   982  for (pp = np->properties; pp; pp = pp->next) {
   983  ret = fdt_setprop(fdt, node_offset, pp->name,
   984pp->value, pp->length);
   985  if (ret < 0) {
   986  pr_err("Unable to add %s property: %s\n",
   987  pp->name, fdt_strerror(ret));
   988  goto out;
   989  }
   990  }
   991  out:
   992  raw_spin_unlock_irqrestore(&devtree_lock, flags);
   993  return ret;
   994  }
   995  
   996  /**
   997   * update_cpus_node - Update cpus node of flattened device-tree using 
of_root
   998   *  device node.
   999   * @fdt:Flattened device tree of the kernel.
  1000   *
  1001   * Returns 0 on success, negative errno on error.
  1002   */
> 1003  int update_cpus_node(void *fdt)
  1004  {
  1005  struct device_node *cpus_node, *dn;
  1006  int cpus_offset, cpus_subnode_off, ret = 0;
  1007  
  1008  cpus_offset = fdt_path_offset(fdt, "/cpus");
  1009  if (cpus_offset == -FDT_ERR_NOTFOUND || cpus_offset > 0) {
  1010  if (cpus_offset > 0) {
  1011  ret = fdt_del_node(fdt, cpus_offset);
  1012  if (ret < 0) {
  1013  pr_err("Error deleting /cpus node: 
%s\n",
  1014 fdt_strerror(ret));
  1015  return -EINVAL;
  1016  }
  1017  }
  1018  
  1019  /* Add cpus node to fdt */
  1020  cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, 
"/"),
  1021"cpus");
  1022  if (cpus_offset < 0) {
  1023  pr_err("Error creating /cpus node: %s\n",
  1024 fdt_strerror(cpus_offset));
  1025  return -E

Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-16 Thread Christophe Leroy




Le 16/04/2021 à 17:04, Christophe Leroy a écrit :



Le 16/04/2021 à 16:40, Christophe Leroy a écrit :



Le 16/04/2021 à 15:00, Steven Price a écrit :

On 16/04/2021 12:08, Christophe Leroy wrote:



Le 16/04/2021 à 12:51, Steven Price a écrit :

On 16/04/2021 11:38, Christophe Leroy wrote:



Le 16/04/2021 à 11:28, Steven Price a écrit :
To be honest I don't fully understand why powerpc requires the page_size - it appears to be 
using it purely to find "holes" in the calls to note_page(), but I haven't worked out why 
such holes would occur.


I was indeed introduced for KASAN. We have a first commit 
https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is 
a KASAN like stuff.


Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what 
the problem was exactly, something around the use of hugepages for kernel memory, came as part 
of the series 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ 







Ah, that's useful context. So it looks like powerpc took a different route to reducing the 
KASAN output to x86.


Given the generic ptdump code has handling for KASAN already it should be possible to drop that 
from the powerpc arch code, which I think means we don't actually need to provide page size to 
notepage(). Hopefully that means more code to delete ;)




Yes ... and no.

It looks like the generic ptdump handles the case when several pgdir entries points to the same 
kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular 
page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page .


I'm not sure I follow quite how powerpc is different here. But could you have a similar check for 
PTEs against kasan_early_shadow_pte as the other levels already have?


I'm just worried that page_size isn't well defined in this interface and it's going to cause 
problems in the future.




I'm trying. I reverted the two commits b00ff6d8c and cabe8138.

At the moment, I don't get exactly what I expect: For linear memory I get one line for each 8M 
page whereas before reverting the patches I got one 16M line and one 112M line.


And for KASAN shadow area I get two lines for the 2x 8M pages shadowing linear mem then I get one 
4M line for each PGDIR entry pointing to kasan_early_shadow_pte.


0xf800-0xf87f 0x0700 8M   huge    rw   present
0xf880-0xf8ff 0x0780 8M   huge    rw   present
0xf900-0xf93f 0x0143 4M   r    present

...

0xfec0-0xfeff 0x0143 4M   r    present

Any idea ?




I think the different with other architectures is here:

 } else if (flag != st->current_flags || level != st->level ||
    addr >= st->marker[1].start_address ||
    pa != st->last_pa + PAGE_SIZE) {


In addition to the checks everyone do, powerpc also checks "pa != st->last_pa + 
PAGE_SIZE".
And it is definitely for that test that page_size argument add been added.


By replacing that test by (pa - st->start_pa != addr - st->start_address) it works again. So we 
definitely don't need the real page size.





I see that other architectures except RISCV don't dump the physical address. But even RISCV doesn't 
include that check.


That physical address dump was added by commit aaa229529244 ("powerpc/mm: Add physical address to 
Linux page table dump") [https://github.com/torvalds/linux/commit/aaa2295]


How do other architectures deal with the problem described by the commit log of 
that patch ?

Christophe


Re: [PATCH v1 1/1] powerpc/papr_scm: Properly handle UUID types and API

2021-04-16 Thread Vaibhav Jain
Andy Shevchenko  writes:

> On Thu, Apr 15, 2021 at 8:10 PM Vaibhav Jain  wrote:
>>
>>
>> Thanks for the patch Andy,
>>
>> Unfortunately ran into a compilation issue due to missing "#include
>> " that provides definition for
>> get_unaligned_le64(). Gcc reported following error:
>>
>> error: implicit declaration of function ‘get_unaligned_le64’
>
> Right, I have not tested it (as mentioned in the comments to the patch)
>
>> After including the necessary header file, kernel compiled fine and I
>> was able to test & verify the patch.
>
> Thank you very much for the testing.
>
> I'm not sure what the coverage of your test is.

Your patch updates the way the interleaved set-cookies are populated in
papr_scm which are then used to populate label entry for a namespace. I
verified that the reported region setcookie hasnt changed for an nvdimm
region before and after applying your patch for both BE and LE variants:

# 64-bit Little endian kernel before applying the patch
$ sudo cat /sys/devices/ndbus0/region0/set_cookie
0x8b6b26cbc930e2b5

# 64-bit Little endian kernel after applying your patch
$ sudo cat /sys/devices/ndbus0/region0/set_cookie
0x8b6b26cbc930e2b5

# 64-bit Big endian kernel before applying your patch
$ sudo cat /sys/devices/ndbus0/region0/set_cookie
0x8b6b26cbc930e2b5

# 64-bit Big endian kernel after applying your patch
$ sudo cat /sys/devices/ndbus0/region0/set_cookie
0x8b6b26cbc930e2b5

> That's why I have an
> additional question below. Is the byte ordering kept the same in BE
> (32- and 64-bit) cases? Because I'm worrying that I might have missed
> something.
Libnvdimm store these cookies in label area as little endian values and
based on the results above I think we are good.
>
>
> -- 
> With Best Regards,
> Andy Shevchenko

-- 
Cheers
~ Vaibhav


Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-16 Thread Christophe Leroy




Le 16/04/2021 à 16:40, Christophe Leroy a écrit :



Le 16/04/2021 à 15:00, Steven Price a écrit :

On 16/04/2021 12:08, Christophe Leroy wrote:



Le 16/04/2021 à 12:51, Steven Price a écrit :

On 16/04/2021 11:38, Christophe Leroy wrote:



Le 16/04/2021 à 11:28, Steven Price a écrit :
To be honest I don't fully understand why powerpc requires the page_size - it appears to be 
using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such 
holes would occur.


I was indeed introduced for KASAN. We have a first commit 
https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is 
a KASAN like stuff.


Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what 
the problem was exactly, something around the use of hugepages for kernel memory, came as part 
of the series 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ 






Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN 
output to x86.


Given the generic ptdump code has handling for KASAN already it should be possible to drop that 
from the powerpc arch code, which I think means we don't actually need to provide page size to 
notepage(). Hopefully that means more code to delete ;)




Yes ... and no.

It looks like the generic ptdump handles the case when several pgdir entries points to the same 
kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular 
page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page .


I'm not sure I follow quite how powerpc is different here. But could you have a similar check for 
PTEs against kasan_early_shadow_pte as the other levels already have?


I'm just worried that page_size isn't well defined in this interface and it's going to cause 
problems in the future.




I'm trying. I reverted the two commits b00ff6d8c and cabe8138.

At the moment, I don't get exactly what I expect: For linear memory I get one line for each 8M page 
whereas before reverting the patches I got one 16M line and one 112M line.


And for KASAN shadow area I get two lines for the 2x 8M pages shadowing linear mem then I get one 4M 
line for each PGDIR entry pointing to kasan_early_shadow_pte.


0xf800-0xf87f 0x0700 8M   huge    rw   present
0xf880-0xf8ff 0x0780 8M   huge    rw   present
0xf900-0xf93f 0x0143 4M   r    present

...

0xfec0-0xfeff 0x0143 4M   r    present

Any idea ?




I think the different with other architectures is here:

} else if (flag != st->current_flags || level != st->level ||
   addr >= st->marker[1].start_address ||
   pa != st->last_pa + PAGE_SIZE) {


In addition to the checks everyone do, powerpc also checks "pa != st->last_pa + 
PAGE_SIZE".
And it is definitely for that test that page_size argument add been added.

I see that other architectures except RISCV don't dump the physical address. But even RISCV doesn't 
include that check.


That physical address dump was added by commit aaa229529244 ("powerpc/mm: Add physical address to 
Linux page table dump") [https://github.com/torvalds/linux/commit/aaa2295]


How do other architectures deal with the problem described by the commit log of 
that patch ?

Christophe


Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-16 Thread Christophe Leroy




Le 16/04/2021 à 15:00, Steven Price a écrit :

On 16/04/2021 12:08, Christophe Leroy wrote:



Le 16/04/2021 à 12:51, Steven Price a écrit :

On 16/04/2021 11:38, Christophe Leroy wrote:



Le 16/04/2021 à 11:28, Steven Price a écrit :

On 15/04/2021 18:18, Christophe Leroy wrote:

In order to support large pages on powerpc, notepage()
needs to know the page size of the page.

Add a page_size argument to notepage().

Signed-off-by: Christophe Leroy 
---
  arch/arm64/mm/ptdump.c |  2 +-
  arch/riscv/mm/ptdump.c |  2 +-
  arch/s390/mm/dump_pagetables.c |  3 ++-
  arch/x86/mm/dump_pagetables.c  |  2 +-
  include/linux/ptdump.h |  2 +-
  mm/ptdump.c    | 16 
  6 files changed, 14 insertions(+), 13 deletions(-)


[...]

diff --git a/mm/ptdump.c b/mm/ptdump.c
index da751448d0e4..61cd16afb1c8 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk,
  {
  struct ptdump_state *st = walk->private;
-    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]));
+    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE);


I'm not completely sure what the page_size is going to be used for, but note that KASAN 
presents an interesting case here. We short-cut by detecting it's a KASAN region at a high 
level (PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but 
with level==4 because we know KASAN sets up the page table like that.


However the one call actually covers a much larger region - so while PAGE_SIZE matches the 
level it doesn't match the region covered. AFAICT this will lead to odd results if you enable 
KASAN on powerpc.


Hum  I successfully tested it with KASAN, I now realise that I tested it with 
CONFIG_KASAN_VMALLOC selected. In this situation, since 
https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table 
anymore.


I'll test again without CONFIG_KASAN_VMALLOC.



To be honest I don't fully understand why powerpc requires the page_size - it appears to be 
using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such 
holes would occur.


I was indeed introduced for KASAN. We have a first commit 
https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a 
KASAN like stuff.


Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the 
problem was exactly, something around the use of hugepages for kernel memory, came as part of 
the series 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ 





Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN 
output to x86.


Given the generic ptdump code has handling for KASAN already it should be possible to drop that 
from the powerpc arch code, which I think means we don't actually need to provide page size to 
notepage(). Hopefully that means more code to delete ;)




Yes ... and no.

It looks like the generic ptdump handles the case when several pgdir entries points to the same 
kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular 
page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page .


I'm not sure I follow quite how powerpc is different here. But could you have a similar check for 
PTEs against kasan_early_shadow_pte as the other levels already have?


I'm just worried that page_size isn't well defined in this interface and it's going to cause 
problems in the future.




I'm trying. I reverted the two commits b00ff6d8c and cabe8138.

At the moment, I don't get exactly what I expect: For linear memory I get one line for each 8M page 
whereas before reverting the patches I got one 16M line and one 112M line.


And for KASAN shadow area I get two lines for the 2x 8M pages shadowing linear mem then I get one 4M 
line for each PGDIR entry pointing to kasan_early_shadow_pte.


0xf800-0xf87f 0x0700 8M   hugerw   present
0xf880-0xf8ff 0x0780 8M   hugerw   present
0xf900-0xf93f 0x0143 4M   rpresent
0xf940-0xf97f 0x0143 4M   rpresent
0xf980-0xf9bf 0x0143 4M   rpresent
0xf9c0-0xf9ff 0x0143 4M   rpresent
0xfa00-0xfa3f 0x0143 4M   rpresent
0xfa40-0xfa7f 0x0143 4M   rpresent
0xfa80-0xfabf 0x0143 4M   rpresent
0xfac0-0xfaff 0x0143 4M   rpresent
0xfb00-0xfb3f 0x0143 4M   rpresent
0xfb40-0xfb7f 0x0143 4M   r   

Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

2021-04-16 Thread Lakshmi Ramasubramanian

On 4/16/21 2:05 AM, Michael Ellerman wrote:


Daniel Axtens  writes:

On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:

Sorry - missed copying device-tree and powerpc mailing lists.


There are a few "goto out;" statements before the local variable "fdt"
is initialized through the call to of_kexec_alloc_and_setup_fdt() in
elf64_load(). This will result in an uninitialized "fdt" being passed
to kvfree() in this function if there is an error before the call to
of_kexec_alloc_and_setup_fdt().

Initialize the local variable "fdt" to NULL.


I'm a huge fan of initialising local variables! But I'm struggling to
find the code path that will lead to an uninit fdt being returned...

The out label reads in part:

/* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
return ret ? ERR_PTR(ret) : fdt;

As far as I can tell, any time we get a non-zero ret, we're going to
return an error pointer rather than the uninitialised value...


As Dan pointed out, the new code is in linux-next.

I have copied the new one below - the function doesn't return fdt, but 
instead sets it in the arch specific field (please see the link to the 
updated elf_64.c below).


https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/tree/arch/powerpc/kexec/elf_64.c?h=for-next



(btw, it does look like we might leak fdt if we have an error after we
successfully kmalloc it.)

Am I missing something? Can you link to the report for the kernel test
robot or from Dan?


/*
 * Once FDT buffer has been successfully passed to 
kexec_add_buffer(),
 * the FDT buffer address is saved in image->arch.fdt. In that 
case,

 * the memory cannot be freed here in case of any other error.
 */
if (ret && !image->arch.fdt)
kvfree(fdt);

return ret ? ERR_PTR(ret) : NULL;

In case of an error, the memory allocated for fdt is freed unless it has 
already been passed to kexec_add_buffer().


thanks,
 -lakshmi



FWIW, I think it's worth including this patch _anyway_ because initing
local variables is good practice, but I'm just not sure on the
justification.


Why is it good practice?

It defeats -Wuninitialized. So you're guaranteed to be returning
something initialised, but not necessarily initialised to the right
value.

In a case like this NULL seems like a safe choice, but it's still wrong.
The function is meant to return a pointer to the successfully allocated
fdt, or an ERR_PTR() value. NULL is neither of those.

I agree there are security reasons that initialising stack variables is
desirable, but I think that should be handled by the compiler, not at
the source level.

cheers





Re: Bogus struct page layout on 32-bit

2021-04-16 Thread Arnd Bergmann
On Fri, Apr 16, 2021 at 11:27 AM 'Grygorii Strashko' via Clang Built
Linux  wrote:
> On 10/04/2021 11:52, Ilias Apalodimas wrote:
> > +CC Grygorii for the cpsw part as Ivan's email is not valid anymore
> The TI platforms am3/4/5 (cpsw) and Keystone 2 (netcp) can do only 32bit DMA 
> even in case of LPAE (dma-ranges are used).
> Originally, as I remember, CONFIG_ARCH_DMA_ADDR_T_64BIT has not been selected 
> for the LPAE case
> on TI platforms and the fact that it became set is the result of 
> multi-paltform/allXXXconfig/DMA
> optimizations and unification.
> (just checked - not set in 4.14)
>
> Probable commit 4965a68780c5 ("arch: define the ARCH_DMA_ADDR_T_64BIT config 
> symbol in lib/Kconfig").

I completely missed this change in the past, and I don't really agree
with it either.

Most 32-bit Arm platforms are in fact limited to 32-bit DMA, even when they have
MMIO or RAM areas above the 4GB boundary that require LPAE.

> The TI drivers have been updated, finally to accept ARCH_DMA_ADDR_T_64BIT=y 
> by using
> things like (__force u32) for example.
>
> Honestly, I've done sanity check of CPSW with LPAE=y 
> (ARCH_DMA_ADDR_T_64BIT=y) very long time ago.

This is of course a good idea, drivers should work with any
combination of 32-bit
or 64-bit phys_addr_t and dma_addr_t.

Arnd


Re: [PATCH v2] tools: do not include scripts/Kbuild.include

2021-04-16 Thread Paolo Bonzini

On 16/04/21 15:26, Christian Borntraeger wrote:



On 16.04.21 15:00, Masahiro Yamada wrote:

Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to
scripts/Makefile.compiler"), some kselftests fail to build.

The tools/ directory opted out Kbuild, and went in a different
direction. They copy any kind of files to the tools/ directory
in order to do whatever they want in their world.

tools/build/Build.include mimics scripts/Kbuild.include, but some
tool Makefiles included the Kbuild one to import a feature that is
missing in tools/build/Build.include:

  - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector"
    only if supported") included scripts/Kbuild.include from
    tools/thermal/tmon/Makefile to import the cc-option macro.

  - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do
    not support -no-pie") included scripts/Kbuild.include from
    tools/testing/selftests/kvm/Makefile to import the try-run macro.

  - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang
    failures") included scripts/Kbuild.include from
    tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR
    target.

  - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for
    unrecognized option") included scripts/Kbuild.include from
    tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the
    try-run macro.

Copy what they need into tools/build/Build.include, and make them
include it instead of scripts/Kbuild.include.

Link: 
https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/ 

Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to 
scripts/Makefile.compiler")

Reported-by: Janosch Frank 
Reported-by: Christian Borntraeger 
Signed-off-by: Masahiro Yamada 


looks better.
Tested-by: Christian Borntraeger 



Thank you very much Masahiro, this look great.

Paolo



Re: [PATCH v2] tools: do not include scripts/Kbuild.include

2021-04-16 Thread Christian Borntraeger




On 16.04.21 15:00, Masahiro Yamada wrote:

Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to
scripts/Makefile.compiler"), some kselftests fail to build.

The tools/ directory opted out Kbuild, and went in a different
direction. They copy any kind of files to the tools/ directory
in order to do whatever they want in their world.

tools/build/Build.include mimics scripts/Kbuild.include, but some
tool Makefiles included the Kbuild one to import a feature that is
missing in tools/build/Build.include:

  - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector"
only if supported") included scripts/Kbuild.include from
tools/thermal/tmon/Makefile to import the cc-option macro.

  - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do
not support -no-pie") included scripts/Kbuild.include from
tools/testing/selftests/kvm/Makefile to import the try-run macro.

  - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang
failures") included scripts/Kbuild.include from
tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR
target.

  - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for
unrecognized option") included scripts/Kbuild.include from
tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the
try-run macro.

Copy what they need into tools/build/Build.include, and make them
include it instead of scripts/Kbuild.include.

Link: 
https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/
Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to 
scripts/Makefile.compiler")
Reported-by: Janosch Frank 
Reported-by: Christian Borntraeger 
Signed-off-by: Masahiro Yamada 


looks better.
Tested-by: Christian Borntraeger 


[PATCH] powerpc/pseries/mce: Fix a typo in error type assignment

2021-04-16 Thread Ganesh Goudar
The error type is ICACHE and DCACHE, for case MCE_ERROR_TYPE_ICACHE.

Signed-off-by: Ganesh Goudar 
---
 arch/powerpc/platforms/pseries/ras.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c 
b/arch/powerpc/platforms/pseries/ras.c
index f8b390a9d9fb..9d4ef65da7f3 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -699,7 +699,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs,
mce_err.error_type = MCE_ERROR_TYPE_DCACHE;
break;
case MC_ERROR_TYPE_I_CACHE:
-   mce_err.error_type = MCE_ERROR_TYPE_DCACHE;
+   mce_err.error_type = MCE_ERROR_TYPE_ICACHE;
break;
case MC_ERROR_TYPE_UNKNOWN:
default:
-- 
2.26.2



[PATCH v2] tools: do not include scripts/Kbuild.include

2021-04-16 Thread Masahiro Yamada
Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to
scripts/Makefile.compiler"), some kselftests fail to build.

The tools/ directory opted out Kbuild, and went in a different
direction. They copy any kind of files to the tools/ directory
in order to do whatever they want in their world.

tools/build/Build.include mimics scripts/Kbuild.include, but some
tool Makefiles included the Kbuild one to import a feature that is
missing in tools/build/Build.include:

 - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector"
   only if supported") included scripts/Kbuild.include from
   tools/thermal/tmon/Makefile to import the cc-option macro.

 - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do
   not support -no-pie") included scripts/Kbuild.include from
   tools/testing/selftests/kvm/Makefile to import the try-run macro.

 - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang
   failures") included scripts/Kbuild.include from
   tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR
   target.

 - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for
   unrecognized option") included scripts/Kbuild.include from
   tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the
   try-run macro.

Copy what they need into tools/build/Build.include, and make them
include it instead of scripts/Kbuild.include.

Link: 
https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/
Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to 
scripts/Makefile.compiler")
Reported-by: Janosch Frank 
Reported-by: Christian Borntraeger 
Signed-off-by: Masahiro Yamada 
---

Changes in v2:
  - copy macros to tools/build/BUild.include

 tools/build/Build.include | 24 +++
 tools/testing/selftests/bpf/Makefile  |  2 +-
 tools/testing/selftests/kvm/Makefile  |  2 +-
 .../selftests/powerpc/pmu/ebb/Makefile|  2 +-
 tools/thermal/tmon/Makefile   |  2 +-
 5 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/tools/build/Build.include b/tools/build/Build.include
index 585486e40995..2cf3b1bde86e 100644
--- a/tools/build/Build.include
+++ b/tools/build/Build.include
@@ -100,3 +100,27 @@ cxx_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CXXFLAGS) 
-D"BUILD_STR(s)=\#s" $(CXX
 ## HOSTCC C flags
 
 host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(KBUILD_HOSTCFLAGS) 
-D"BUILD_STR(s)=\#s" $(HOSTCFLAGS_$(basetarget).o) $(HOSTCFLAGS_$(obj))
+
+# output directory for tests below
+TMPOUT = .tmp_
+
+# try-run
+# Usage: option = $(call try-run, $(CC)...-o "$$TMP",option-ok,otherwise)
+# Exit code chooses option. "$$TMP" serves as a temporary file and is
+# automatically cleaned up.
+try-run = $(shell set -e;  \
+   TMP=$(TMPOUT)/tmp;  \
+   mkdir -p $(TMPOUT); \
+   trap "rm -rf $(TMPOUT)" EXIT;   \
+   if ($(1)) >/dev/null 2>&1;  \
+   then echo "$(2)";   \
+   else echo "$(3)";   \
+   fi)
+
+# cc-option
+# Usage: cflags-y += $(call cc-option,-march=winchip-c6,-march=i586)
+cc-option = $(call try-run, \
+   $(CC) -Werror $(1) -c -x c /dev/null -o "$$TMP",$(1),$(2))
+
+# delete partially updated (i.e. corrupted) files on error
+.DELETE_ON_ERROR:
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 044bfdcf5b74..17a5cdf48d37 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-include ../../../../scripts/Kbuild.include
+include ../../../build/Build.include
 include ../../../scripts/Makefile.arch
 include ../../../scripts/Makefile.include
 
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index a6d61f451f88..5ef141f265bd 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-include ../../../../scripts/Kbuild.include
+include ../../../build/Build.include
 
 all:
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/Makefile 
b/tools/testing/selftests/powerpc/pmu/ebb/Makefile
index af3df79d8163..c5ecb4634094 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/Makefile
+++ b/tools/testing/selftests/powerpc/pmu/ebb/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-include ../../../../../../scripts/Kbuild.include
+include ../../../../../build/Build.include
 
 noarg:
$(MAKE) -C ../../
diff --git a/tools/thermal/tmon/Makefile b/tools/thermal/tmon/Makefile
index 59e417ec3e13..9db867df7679 100644
--- a/tools/thermal/tmon/Makefile
+++ b/tools/thermal/tmon/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 # We need this for the "cc-option" macro.
-include ../../../scripts/Kbuild.include
+include ../../build/Build.include
 
 VERSION = 1.0
 
-- 
2.27.0



Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-16 Thread Steven Price

On 16/04/2021 12:08, Christophe Leroy wrote:



Le 16/04/2021 à 12:51, Steven Price a écrit :

On 16/04/2021 11:38, Christophe Leroy wrote:



Le 16/04/2021 à 11:28, Steven Price a écrit :

On 15/04/2021 18:18, Christophe Leroy wrote:

In order to support large pages on powerpc, notepage()
needs to know the page size of the page.

Add a page_size argument to notepage().

Signed-off-by: Christophe Leroy 
---
  arch/arm64/mm/ptdump.c |  2 +-
  arch/riscv/mm/ptdump.c |  2 +-
  arch/s390/mm/dump_pagetables.c |  3 ++-
  arch/x86/mm/dump_pagetables.c  |  2 +-
  include/linux/ptdump.h |  2 +-
  mm/ptdump.c    | 16 
  6 files changed, 14 insertions(+), 13 deletions(-)


[...]

diff --git a/mm/ptdump.c b/mm/ptdump.c
index da751448d0e4..61cd16afb1c8 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct 
mm_walk *walk,

  {
  struct ptdump_state *st = walk->private;
-    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]));
+    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), 
PAGE_SIZE);


I'm not completely sure what the page_size is going to be used for, 
but note that KASAN presents an interesting case here. We short-cut 
by detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) 
and instead of walking the tree down just call note_page() *once* 
but with level==4 because we know KASAN sets up the page table like 
that.


However the one call actually covers a much larger region - so while 
PAGE_SIZE matches the level it doesn't match the region covered. 
AFAICT this will lead to odd results if you enable KASAN on powerpc.


Hum  I successfully tested it with KASAN, I now realise that I 
tested it with CONFIG_KASAN_VMALLOC selected. In this situation, 
since https://github.com/torvalds/linux/commit/af3d0a686 we don't 
have any common shadow page table anymore.


I'll test again without CONFIG_KASAN_VMALLOC.



To be honest I don't fully understand why powerpc requires the 
page_size - it appears to be using it purely to find "holes" in the 
calls to note_page(), but I haven't worked out why such holes would 
occur.


I was indeed introduced for KASAN. We have a first commit 
https://github.com/torvalds/linux/commit/cabe8138 which uses page 
size to detect whether it is a KASAN like stuff.


Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a 
fix. I can't remember what the problem was exactly, something around 
the use of hugepages for kernel memory, came as part of the series 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ 




Ah, that's useful context. So it looks like powerpc took a different 
route to reducing the KASAN output to x86.


Given the generic ptdump code has handling for KASAN already it should 
be possible to drop that from the powerpc arch code, which I think 
means we don't actually need to provide page size to notepage(). 
Hopefully that means more code to delete ;)




Yes ... and no.

It looks like the generic ptdump handles the case when several pgdir 
entries points to the same kasan_early_shadow_pte. But it doesn't take 
into account the powerpc case where we have regular page tables where 
several (if not all) PTEs are pointing to the kasan_early_shadow_page .


I'm not sure I follow quite how powerpc is different here. But could you 
have a similar check for PTEs against kasan_early_shadow_pte as the 
other levels already have?


I'm just worried that page_size isn't well defined in this interface and 
it's going to cause problems in the future.


Steve


Re: [PATCH 2/2] tools: do not include scripts/Kbuild.include

2021-04-16 Thread Masahiro Yamada
On Fri, Apr 16, 2021 at 2:56 PM Christian Borntraeger
 wrote:
>
>
> On 15.04.21 10:06, Christian Borntraeger wrote:
> >
> > On 15.04.21 09:27, Masahiro Yamada wrote:
> >> Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to
> >> scripts/Makefile.compiler"), some kselftests fail to build.
> >>
> >> The tools/ directory opted out Kbuild, and went in a different
> >> direction. They copy any kind of files to the tools/ directory
> >> in order to do whatever they want to do in their world.
> >>
> >> tools/build/Build.include mimics scripts/Kbuild.include, but some
> >> tool Makefiles included the Kbuild one to import a feature that is
> >> missing in tools/build/Build.include:
> >>
> >>   - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector"
> >> only if supported") included scripts/Kbuild.include from
> >> tools/thermal/tmon/Makefile to import the cc-option macro.
> >>
> >>   - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do
> >> not support -no-pie") included scripts/Kbuild.include from
> >> tools/testing/selftests/kvm/Makefile to import the try-run macro.
> >>
> >>   - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang
> >> failures") included scripts/Kbuild.include from
> >> tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR
> >> target.
> >>
> >>   - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for
> >> unrecognized option") included scripts/Kbuild.include from
> >> tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the
> >> try-run macro.
> >>
> >> Copy what they want there, and stop including scripts/Kbuild.include
> >> from the tool Makefiles.
> >>
> >> Link: 
> >> https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/
> >> Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to 
> >> scripts/Makefile.compiler")
> >> Reported-by: Janosch Frank 
> >> Reported-by: Christian Borntraeger 
> >> Signed-off-by: Masahiro Yamada 
> >
> > When applying this on top of d9f4ff50d2aa ("kbuild: spilt cc-option and 
> > friends to scripts/Makefile.compiler")
> >
> > I still do get
> >
> > #  Test Assertion Failure 
> > #   lib/kvm_util.c:142: vm->fd >= 0
> > #   pid=315635 tid=315635 - Invalid argument
> > #  10x01002f4b: vm_open at kvm_util.c:142
> > #  2 (inlined by) vm_create at kvm_util.c:258
> > #  30x010015ef: test_add_max_memory_regions at 
> > set_memory_region_test.c:351
> > #  4 (inlined by) main at set_memory_region_test.c:397
> > #  50x03ff971abb89: ?? ??:0
> > #  60x010017ad: .annobin_abi_note.c.hot at crt1.o:?
> > #   KVM_CREATE_VM ioctl failed, rc: -1 errno: 22
> > not ok 7 selftests: kvm: set_memory_region_test # exit=254
> >
> > and the testcase compilation does not pickup the pgste option.
>
> What does work is the following:
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index a6d61f451f88..d9c6d9c2069e 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -1,5 +1,6 @@
>   # SPDX-License-Identifier: GPL-2.0-only
>   include ../../../../scripts/Kbuild.include
> +include ../../../../scripts/Makefile.compiler
>
>   all:
>
>
> as it does pickup the linker option handling.


Kbuild and the tools are divorced.

They cannot be married unless the tools/
build system is largely refactored.
That will be a tons of works (and
I am not sure if it is welcome).

The Kbuild refactoring should not be bothered by
the tools.
For now, I want them separated from each other.



--
Best Regards

Masahiro Yamada


[PATCH v2] powerpc/kexec_file: use current CPU info while setting up FDT

2021-04-16 Thread Sourabh Jain
kexec_file_load uses initial_boot_params in setting up the device-tree
for the kernel to be loaded. Though initial_boot_params holds info
about CPUs at the time of boot, it doesn't account for hot added CPUs.

So, kexec'ing with kexec_file_load syscall would leave the kexec'ed
kernel with inaccurate CPU info. Also, if kdump kernel is loaded with
kexec_file_load syscall and the system crashes on a hot added CPU,
capture kernel hangs failing to identify the boot CPU.

 Kernel panic - not syncing: sysrq triggered crash
 CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54
 Call Trace:
 [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable)
 [c000e590fb00] [c0145290] panic+0x16c/0x41c
 [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40
 [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0
 [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178
 [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0
 [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330
 [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140
 [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290
 [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278
 --- interrupt: c00 at 0x7fff905b9664
 NIP:  7fff905b9664 LR: 7fff905320c4 CTR: 
 REGS: c000e590fe80 TRAP: 0c00   Not tainted  (5.12.0-rc5upstream)
 MSR:  8280f033   CR: 28000242
   XER: 
 IRQMASK: 0
 GPR00: 0004 75fedf30 7fff906a7300 0001
 GPR04: 01002a7355b0 0002 0001 75fef616
 GPR08: 0001   
 GPR12:  7fff9073a160  
 GPR16:    
 GPR20:  7fff906a4ee0 0002 0001
 GPR24: 7fff906a0898  0002 01002a7355b0
 GPR28: 0002 7fff906a1790 01002a7355b0 0002
 NIP [7fff905b9664] 0x7fff905b9664
 LR [7fff905320c4] 0x7fff905320c4
 --- interrupt: c00

To avoid this from happening, extract current CPU info from of_root
device node and use it for setting up the fdt in kexec_file_load case.

Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory 
reserve map")

Signed-off-by: Sourabh Jain 
---
 arch/powerpc/kexec/file_load_64.c | 98 +++
 1 file changed, 98 insertions(+)

 ---
Changelog:

v1 -> v2
  - fdt should be updated regardless of kexec type
  - updated commit message and title

 ---

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 02b9e4d0dc40..21d2f0e172ed 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -960,6 +960,99 @@ unsigned int kexec_fdt_totalsize_ppc64(struct kimage 
*image)
return fdt_size;
 }
 
+/**
+ * add_node_prop - Read property from device node structure and add
+ * them to fdt.
+ * @fdt:   Flattened device tree of the kernel
+ * @node_offset:   offset of the node to add a property at
+ * np: device node pointer
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int add_node_prop(void *fdt, int node_offset, const struct device_node *np)
+{
+   int ret = 0;
+   struct property *pp;
+   unsigned long flags;
+
+   if (!np)
+   return -EINVAL;
+
+   raw_spin_lock_irqsave(&devtree_lock, flags);
+   for (pp = np->properties; pp; pp = pp->next) {
+   ret = fdt_setprop(fdt, node_offset, pp->name,
+ pp->value, pp->length);
+   if (ret < 0) {
+   pr_err("Unable to add %s property: %s\n",
+   pp->name, fdt_strerror(ret));
+   goto out;
+   }
+   }
+out:
+   raw_spin_unlock_irqrestore(&devtree_lock, flags);
+   return ret;
+}
+
+/**
+ * update_cpus_node - Update cpus node of flattened device-tree using of_root
+ * device node.
+ * @fdt:   Flattened device tree of the kernel.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int update_cpus_node(void *fdt)
+{
+   struct device_node *cpus_node, *dn;
+   int cpus_offset, cpus_subnode_off, ret = 0;
+
+   cpus_offset = fdt_path_offset(fdt, "/cpus");
+   if (cpus_offset == -FDT_ERR_NOTFOUND || cpus_offset > 0) {
+   if (cpus_offset > 0) {
+   ret = fdt_del_node(fdt, cpus_offset);
+   if (ret < 0) {
+   pr_err("Error deleting /cpus node: %s\n",
+  fdt_strerror(ret));
+   return -EINVAL;
+   }
+   }
+
+

Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

2021-04-16 Thread Michael Ellerman
Dan Carpenter  writes:
> On Fri, Apr 16, 2021 at 09:00:12AM +0200, Christophe Leroy wrote:
>> Le 16/04/2021 à 08:44, Daniel Axtens a écrit :
>> > > On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:
>> > > 
>> > > > There are a few "goto out;" statements before the local variable "fdt"
>> > > > is initialized through the call to of_kexec_alloc_and_setup_fdt() in
>> > > > elf64_load(). This will result in an uninitialized "fdt" being passed
>> > > > to kvfree() in this function if there is an error before the call to
>> > > > of_kexec_alloc_and_setup_fdt().
>> > > > 
>> > > > Initialize the local variable "fdt" to NULL.
>> > > > 
>> > I'm a huge fan of initialising local variables! But I'm struggling to
>> > find the code path that will lead to an uninit fdt being returned...
>> > 
>> > The out label reads in part:
>> > 
>> >/* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
>> >return ret ? ERR_PTR(ret) : fdt;
>> > 
>> > As far as I can tell, any time we get a non-zero ret, we're going to
>> > return an error pointer rather than the uninitialised value...
>> 
>> I don't think GCC is smart enough to detect that.
>> 
>
> We disabled uninitialized variable checking for GCC.

We disabled -Wmaybe-uninitialized, but that doesn't disable *all*
uninitialized warnings does it?

I wish we hadn't disabled it, it's already led to bugs slipping through.

> But actually is something that has been on my mind recently.  Smatch is
> supposed to parse this correctly but there is a bug that affects powerpc
> and I don't know how to debug it.  The kbuild bot is doing cross
> platform compiles but I don't have one set up on myself.  Could someone
> with Smatch installed test something for me?
>
> Or if you don't have Smatch installed then you should definitely install
> it.  :P
> https://www.spinics.net/lists/smatch/msg00568.html

I have smatch installed, and even run it sometimes ;)


> Apply the patch from below and edit the path to point to the correct
> directory.  Then run kchecker and email me the output?

That gave me:

CC  arch/powerpc/kernel/hw_breakpoint.o
  /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c: In function 
‘task_bps_add’:
  /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:176:16: error: 
passing argument 1 of ‘__smatch_about’ makes integer from pointer without a 
cast [-Werror=int-conversion]
176 | __smatch_about(tmp);
|^~~
||
|struct breakpoint *
  In file included from 
/home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:170:
  /home/michael/smatch/check_debug.h:4:40: note: expected ‘long int’ but 
argument is of type ‘struct breakpoint *’
  4 | static inline void __smatch_about(long var){}
|   ~^~~
  /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:180:21: error: 
passing argument 1 of ‘__smatch_about’ makes integer from pointer without a 
cast [-Werror=int-conversion]
180 |  __smatch_about(tmp);
| ^~~
| |
| struct breakpoint *
  In file included from 
/home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:170:
  /home/michael/smatch/check_debug.h:4:40: note: expected ‘long int’ but 
argument is of type ‘struct breakpoint *’
  4 | static inline void __smatch_about(long var){}
|   ~^~~
  cc1: all warnings being treated as errors


Which looks like it didn't work.

Right, needs tmp cast to long.

Output below, hope it helps. Happy to test other things.

cheers


  GEN Makefile
  CHECK   /home/michael/linux/scripts/mod/empty.c
  CALL/home/michael/linux/scripts/checksyscalls.sh
  CALL/home/michael/linux/scripts/atomic/check-atomics.sh
  CHECK   /home/michael/linux/arch/powerpc/kernel/vdso64/vgettimeofday.c
  CC  arch/powerpc/kernel/hw_breakpoint.o
  CHECK   /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c
/home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add()  
about 
/home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() 
implied: tmp = 's64min-(-4096),(-12),4096-s64max'
/home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() buf 
size: 'tmp' 0 elements, 0 bytes (rl = (-1),32)
/home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() 
strlen: 'tmp'  characters
/home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() real 
absolute: tmp = 's64min-(-4096),(-12),4096-s64max'
/home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() mtag 
= 0 offset = 0 rl = ''
/home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() 
[register_smatch_extra] tmp = '4096-ptr_max,(-12)' [merged] 
(4096-ptr_max,(-12), 4096-ptr_max, (-12), 4096-ptr_max, (-12))
/home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() 
[register_modific

[PATCH] macintosh/via-pmu: Fix build warning

2021-04-16 Thread Michael Ellerman
Now that __fake_sleep is static, we get a warning about it being unused
in some configurations:

  drivers/macintosh/via-pmu.c:190:12: warning: '__fake_sleep' defined but not 
used
190 | static int __fake_sleep;

Move it inside the ifdef where it's used to avoid the warning.

Fixes: 95d143923379 ("macintosh/via-pmu: Make some symbols static")
Reported-by: Stephen Rothwell 
Signed-off-by: Michael Ellerman 
---
 drivers/macintosh/via-pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/via-pmu.c b/drivers/macintosh/via-pmu.c
index 478766434919..4bdd4c45e7a7 100644
--- a/drivers/macintosh/via-pmu.c
+++ b/drivers/macintosh/via-pmu.c
@@ -187,7 +187,6 @@ static int query_batt_timer = BATTERY_POLLING_COUNT;
 static struct adb_request batt_req;
 static struct proc_dir_entry *proc_pmu_batt[PMU_MAX_BATTERIES];
 
-static int __fake_sleep;
 int asleep;
 
 #ifdef CONFIG_ADB
@@ -1833,6 +1832,7 @@ pmu_present(void)
  */
  
 static u32 save_via[8];
+static int __fake_sleep;
 
 static void
 save_via_state(void)
-- 
2.25.1



[PATCH 1/2] powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h

2021-04-16 Thread Michael Ellerman
From: Tony Ambardar 

A few archs like powerpc have different errno.h values for macros
EDEADLOCK and EDEADLK. In code including both libc and linux versions of
errno.h, this can result in multiple definitions of EDEADLOCK in the
include chain. Definitions to the same value (e.g. seen with mips) do
not raise warnings, but on powerpc there are redefinitions changing the
value, which raise warnings and errors (if using "-Werror").

Guard against these redefinitions to avoid build errors like the following,
first seen cross-compiling libbpf v5.8.9 for powerpc using GCC 8.4.0 with
musl 1.1.24:

  In file included from ../../arch/powerpc/include/uapi/asm/errno.h:5,
   from ../../include/linux/err.h:8,
   from libbpf.c:29:
  ../../include/uapi/asm-generic/errno.h:40: error: "EDEADLOCK" redefined 
[-Werror]
   #define EDEADLOCK EDEADLK

  In file included from 
toolchain-powerpc_8540_gcc-8.4.0_musl/include/errno.h:10,
   from libbpf.c:26:
  toolchain-powerpc_8540_gcc-8.4.0_musl/include/bits/errno.h:58: note: this is 
the location of the previous definition
   #define EDEADLOCK   58

  cc1: all warnings being treated as errors

Cc: Stable 
Reported-by: Rosen Penev 
Signed-off-by: Tony Ambardar 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200917135437.1238787-1-tony.ambar...@gmail.com
---
 arch/powerpc/include/uapi/asm/errno.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/include/uapi/asm/errno.h 
b/arch/powerpc/include/uapi/asm/errno.h
index cc79856896a1..4ba87de32be0 100644
--- a/arch/powerpc/include/uapi/asm/errno.h
+++ b/arch/powerpc/include/uapi/asm/errno.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_POWERPC_ERRNO_H
 #define _ASM_POWERPC_ERRNO_H
 
+#undef EDEADLOCK
 #include 
 
 #undef EDEADLOCK
-- 
2.25.1



Re: [PATCH] powerpc/kdump: fix kdump kernel hangup issue with hot add CPUs

2021-04-16 Thread Sourabh Jain



On 16/04/21 3:03 pm, Hari Bathini wrote:



On 16/04/21 12:17 pm, Sourabh Jain wrote:

With the kexec_file_load system call when system crashes on the hot add
CPU the capture kernel hangs and failed to collect the vmcore.

  Kernel panic - not syncing: sysrq triggered crash
  CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 
5.12.0-rc5upstream #54

  Call Trace:
  [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 
(unreliable)

  [c000e590fb00] [c0145290] panic+0x16c/0x41c
  [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40
  [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0
  [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178
  [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0
  [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330
  [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140
  [c000e590fdb0] [c0031ee0] 
system_call_exception+0x150/0x290

  [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278
  --- interrupt: c00 at 0x7fff905b9664
  NIP:  7fff905b9664 LR: 7fff905320c4 CTR: 
  REGS: c000e590fe80 TRAP: 0c00   Not tainted (5.12.0-rc5upstream)
  MSR:  8280f033   CR: 
28000242

    XER: 
  IRQMASK: 0
  GPR00: 0004 75fedf30 7fff906a7300 
0001
  GPR04: 01002a7355b0 0002 0001 
75fef616
  GPR08: 0001   

  GPR12:  7fff9073a160  

  GPR16:    

  GPR20:  7fff906a4ee0 0002 
0001
  GPR24: 7fff906a0898  0002 
01002a7355b0
  GPR28: 0002 7fff906a1790 01002a7355b0 
0002

  NIP [7fff905b9664] 0x7fff905b9664
  LR [7fff905320c4] 0x7fff905320c4
  --- interrupt: c00




I will update the commit message.


  /**
   * setup_new_fdt_ppc64 - Update the flattend device-tree of the kernel
   *   being loaded.
@@ -1020,6 +1113,13 @@ int setup_new_fdt_ppc64(const struct kimage 
*image, void *fdt,

  }
  }
  +    /* Update cpus nodes information to account hotplug CPUs. */
+    if (image->type == KEXEC_TYPE_CRASH) {


Shouldn't this apply to regular kexec_file_load case as well? Yeah, 
there won't be a hang in regular kexec_file_load case but for 
correctness, that kernel should also not see stale CPU info in FDT?


Yes better to update the fdt for both kexec and kdump.

Thanks for the review Hari.

- Sourabh Jain


[PATCH 2/2] powerpc/papr_scm: Fix build error due to wrong printf specifier

2021-04-16 Thread Michael Ellerman
When I changed the rc variable to be long rather than int64_t I
neglected to update the printk(), leading to a build break:

  arch/powerpc/platforms/pseries/papr_scm.c: In function 'papr_scm_pmem_flush':
  arch/powerpc/platforms/pseries/papr_scm.c:144:26: warning: format
'%lld' expects argument of type 'long long int', but argument 3 has
type 'long int' [-Wformat=]

Fixes: 75b7c05ebf90 ("powerpc/papr_scm: Implement support for H_SCM_FLUSH 
hcall")
Reported-by: Stephen Rothwell 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index ae6f5d80d5ce..48de21902116 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -141,7 +141,7 @@ static int papr_scm_pmem_flush(struct nd_region *nd_region,
} while (rc == H_BUSY);
 
if (rc) {
-   dev_err(&p->pdev->dev, "flush error: %lld", rc);
+   dev_err(&p->pdev->dev, "flush error: %ld", rc);
rc = -EIO;
} else {
dev_dbg(&p->pdev->dev, "flush drc 0x%x complete", p->drc_index);
-- 
2.25.1



[PATCH 1/2] powerpc/configs: Add PAPR_SCM to pseries_defconfig

2021-04-16 Thread Michael Ellerman
This is a pseries only driver, it should be built by default as part of
pseries_defconfig to get some build coverage.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/pseries_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 777221775c83..968095d7682c 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -41,6 +41,7 @@ CONFIG_DTL=y
 CONFIG_SCANLOG=m
 CONFIG_PPC_SMLPAR=y
 CONFIG_IBMEBUS=y
+CONFIG_PAPR_SCM=m
 CONFIG_PPC_SVM=y
 # CONFIG_PPC_PMAC is not set
 CONFIG_RTAS_FLASH=m
-- 
2.25.1



Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-16 Thread Christophe Leroy




Le 16/04/2021 à 12:51, Steven Price a écrit :

On 16/04/2021 11:38, Christophe Leroy wrote:



Le 16/04/2021 à 11:28, Steven Price a écrit :

On 15/04/2021 18:18, Christophe Leroy wrote:

In order to support large pages on powerpc, notepage()
needs to know the page size of the page.

Add a page_size argument to notepage().

Signed-off-by: Christophe Leroy 
---
  arch/arm64/mm/ptdump.c |  2 +-
  arch/riscv/mm/ptdump.c |  2 +-
  arch/s390/mm/dump_pagetables.c |  3 ++-
  arch/x86/mm/dump_pagetables.c  |  2 +-
  include/linux/ptdump.h |  2 +-
  mm/ptdump.c    | 16 
  6 files changed, 14 insertions(+), 13 deletions(-)


[...]

diff --git a/mm/ptdump.c b/mm/ptdump.c
index da751448d0e4..61cd16afb1c8 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk,
  {
  struct ptdump_state *st = walk->private;
-    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]));
+    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE);


I'm not completely sure what the page_size is going to be used for, but note that KASAN presents 
an interesting case here. We short-cut by detecting it's a KASAN region at a high level 
(PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but with 
level==4 because we know KASAN sets up the page table like that.


However the one call actually covers a much larger region - so while PAGE_SIZE matches the level 
it doesn't match the region covered. AFAICT this will lead to odd results if you enable KASAN on 
powerpc.


Hum  I successfully tested it with KASAN, I now realise that I tested it with 
CONFIG_KASAN_VMALLOC selected. In this situation, since 
https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table 
anymore.


I'll test again without CONFIG_KASAN_VMALLOC.



To be honest I don't fully understand why powerpc requires the page_size - it appears to be using 
it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes 
would occur.


I was indeed introduced for KASAN. We have a first commit 
https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a 
KASAN like stuff.


Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the 
problem was exactly, something around the use of hugepages for kernel memory, came as part of the 
series 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ 



Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN 
output to x86.


Given the generic ptdump code has handling for KASAN already it should be possible to drop that from 
the powerpc arch code, which I think means we don't actually need to provide page size to 
notepage(). Hopefully that means more code to delete ;)




Yes ... and no.

It looks like the generic ptdump handles the case when several pgdir entries points to the same 
kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular page 
tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page .


Christophe


Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-16 Thread Matthew Wilcox
On Fri, Apr 16, 2021 at 07:32:35AM +, David Laight wrote:
> From: Matthew Wilcox 
> > Sent: 15 April 2021 23:22
> > 
> > On Thu, Apr 15, 2021 at 09:11:56PM +, David Laight wrote:
> > > Isn't it possible to move the field down one long?
> > > This might require an explicit zero - but this is not a common
> > > code path - the extra write will be noise.
> > 
> > Then it overlaps page->mapping.  See emails passim.
> 
> The rules on overlaps make be wonder if every 'long'
> should be in its own union.

That was what we used to have.  It was worse.

> The comments would need to say when each field is used.
> It would, at least, make these errors less common.
> 
> That doesn't solve the 64bit dma_addr though.
> 
> Actually rather that word-swapping dma_addr on 32bit BE
> could you swap over the two fields it overlays with.
> That might look messy in the .h, but it doesn't require
> an accessor function to do the swap - easily missed.

No.


Re: linux-next: build warning after merge of the powerpc tree

2021-04-16 Thread Michael Ellerman
Stephen Rothwell  writes:
> Hi all,
>
> After merging the powerpc tree, today's linux-next build (powerpc
> allyesconfig) produced this warning:
>
> In file included from include/linux/device.h:15,
>  from arch/powerpc/include/asm/io.h:27,
>  from include/linux/io.h:13,
>  from include/linux/irq.h:20,
>  from arch/powerpc/include/asm/hardirq.h:6,
>  from include/linux/hardirq.h:11,
>  from include/linux/highmem.h:10,
>  from include/linux/bio.h:8,
>  from include/linux/libnvdimm.h:14,
>  from arch/powerpc/platforms/pseries/papr_scm.c:12:
> arch/powerpc/platforms/pseries/papr_scm.c: In function 'papr_scm_pmem_flush':
> arch/powerpc/platforms/pseries/papr_scm.c:144:26: warning: format '%lld' 
> expects argument of type 'long long int', but argument 3 has type 'long int' 
> [-Wformat=]
>   144 |   dev_err(&p->pdev->dev, "flush error: %lld", rc);
>   |  ^~~
> include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt'
>19 | #define dev_fmt(fmt) fmt
>   |  ^~~
> arch/powerpc/platforms/pseries/papr_scm.c:144:3: note: in expansion of macro 
> 'dev_err'
>   144 |   dev_err(&p->pdev->dev, "flush error: %lld", rc);
>   |   ^~~
> arch/powerpc/platforms/pseries/papr_scm.c:144:43: note: format string is 
> defined here
>   144 |   dev_err(&p->pdev->dev, "flush error: %lld", rc);
>   |~~~^
>   |   |
>   |   long long int
>   |%ld
>
> Introduced by commit
>
>   75b7c05ebf90 ("powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall")

My bad.

cheers


Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-16 Thread Steven Price

On 16/04/2021 11:38, Christophe Leroy wrote:



Le 16/04/2021 à 11:28, Steven Price a écrit :

On 15/04/2021 18:18, Christophe Leroy wrote:

In order to support large pages on powerpc, notepage()
needs to know the page size of the page.

Add a page_size argument to notepage().

Signed-off-by: Christophe Leroy 
---
  arch/arm64/mm/ptdump.c |  2 +-
  arch/riscv/mm/ptdump.c |  2 +-
  arch/s390/mm/dump_pagetables.c |  3 ++-
  arch/x86/mm/dump_pagetables.c  |  2 +-
  include/linux/ptdump.h |  2 +-
  mm/ptdump.c    | 16 
  6 files changed, 14 insertions(+), 13 deletions(-)


[...]

diff --git a/mm/ptdump.c b/mm/ptdump.c
index da751448d0e4..61cd16afb1c8 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct 
mm_walk *walk,

  {
  struct ptdump_state *st = walk->private;
-    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]));
+    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), 
PAGE_SIZE);


I'm not completely sure what the page_size is going to be used for, 
but note that KASAN presents an interesting case here. We short-cut by 
detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) and 
instead of walking the tree down just call note_page() *once* but with 
level==4 because we know KASAN sets up the page table like that.


However the one call actually covers a much larger region - so while 
PAGE_SIZE matches the level it doesn't match the region covered. 
AFAICT this will lead to odd results if you enable KASAN on powerpc.


Hum  I successfully tested it with KASAN, I now realise that I 
tested it with CONFIG_KASAN_VMALLOC selected. In this situation, since 
https://github.com/torvalds/linux/commit/af3d0a686 we don't have any 
common shadow page table anymore.


I'll test again without CONFIG_KASAN_VMALLOC.



To be honest I don't fully understand why powerpc requires the 
page_size - it appears to be using it purely to find "holes" in the 
calls to note_page(), but I haven't worked out why such holes would 
occur.


I was indeed introduced for KASAN. We have a first commit 
https://github.com/torvalds/linux/commit/cabe8138 which uses page size 
to detect whether it is a KASAN like stuff.


Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I 
can't remember what the problem was exactly, something around the use of 
hugepages for kernel memory, came as part of the series 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ 


Ah, that's useful context. So it looks like powerpc took a different 
route to reducing the KASAN output to x86.


Given the generic ptdump code has handling for KASAN already it should 
be possible to drop that from the powerpc arch code, which I think means 
we don't actually need to provide page size to notepage(). Hopefully 
that means more code to delete ;)


Steve


Re: [PATCH v3] powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h

2021-04-16 Thread Michael Ellerman
Tony Ambardar  writes:
> Hello Michael,
>
> The latest version of this patch addressed all feedback I'm aware of
> when submitted last September, and I've seen no further comments from
> reviewers since then.
>
> Could you please let me know where this stands and if anything further
> is needed?

Sorry, it's still sitting in my inbox :/

I was going to reply to suggest we split the tools change out. The
headers under tools are usually updated by another maintainer, I think
it might even be scripted.

Anyway I've applied your patch and done that (dropped the change to
tools/.../errno.h), which should also mean the stable backport is more
likely to work automatically.

It will hit mainline in v5.13-rc1 and then be backported to the stable
trees.

I don't think you actually need the tools version of the header updated
to fix your bug? In which case we can probably just wait for it to be
updated automatically when the tools headers are sync'ed with the kernel
versions.

cheers


> On Thu, 17 Sept 2020 at 06:54, Tony Ambardar  wrote:
>>
>> A few archs like powerpc have different errno.h values for macros
>> EDEADLOCK and EDEADLK. In code including both libc and linux versions of
>> errno.h, this can result in multiple definitions of EDEADLOCK in the
>> include chain. Definitions to the same value (e.g. seen with mips) do
>> not raise warnings, but on powerpc there are redefinitions changing the
>> value, which raise warnings and errors (if using "-Werror").
>>
>> Guard against these redefinitions to avoid build errors like the following,
>> first seen cross-compiling libbpf v5.8.9 for powerpc using GCC 8.4.0 with
>> musl 1.1.24:
>>
>>   In file included from ../../arch/powerpc/include/uapi/asm/errno.h:5,
>>from ../../include/linux/err.h:8,
>>from libbpf.c:29:
>>   ../../include/uapi/asm-generic/errno.h:40: error: "EDEADLOCK" redefined 
>> [-Werror]
>>#define EDEADLOCK EDEADLK
>>
>>   In file included from 
>> toolchain-powerpc_8540_gcc-8.4.0_musl/include/errno.h:10,
>>from libbpf.c:26:
>>   toolchain-powerpc_8540_gcc-8.4.0_musl/include/bits/errno.h:58: note: this 
>> is the location of the previous definition
>>#define EDEADLOCK   58
>>
>>   cc1: all warnings being treated as errors
>>
>> CC: Stable 
>> Reported-by: Rosen Penev 
>> Signed-off-by: Tony Ambardar 
>> ---
>> v1 -> v2:
>>  * clean up commit description formatting
>>
>> v2 -> v3: (per Michael Ellerman)
>>  * drop indeterminate 'Fixes' tags, request stable backports instead
>> ---
>>  arch/powerpc/include/uapi/asm/errno.h   | 1 +
>>  tools/arch/powerpc/include/uapi/asm/errno.h | 1 +
>>  2 files changed, 2 insertions(+)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/errno.h 
>> b/arch/powerpc/include/uapi/asm/errno.h
>> index cc79856896a1..4ba87de32be0 100644
>> --- a/arch/powerpc/include/uapi/asm/errno.h
>> +++ b/arch/powerpc/include/uapi/asm/errno.h
>> @@ -2,6 +2,7 @@
>>  #ifndef _ASM_POWERPC_ERRNO_H
>>  #define _ASM_POWERPC_ERRNO_H
>>
>> +#undef EDEADLOCK
>>  #include 
>>
>>  #undef EDEADLOCK
>> diff --git a/tools/arch/powerpc/include/uapi/asm/errno.h 
>> b/tools/arch/powerpc/include/uapi/asm/errno.h
>> index cc79856896a1..4ba87de32be0 100644
>> --- a/tools/arch/powerpc/include/uapi/asm/errno.h
>> +++ b/tools/arch/powerpc/include/uapi/asm/errno.h
>> @@ -2,6 +2,7 @@
>>  #ifndef _ASM_POWERPC_ERRNO_H
>>  #define _ASM_POWERPC_ERRNO_H
>>
>> +#undef EDEADLOCK
>>  #include 
>>
>>  #undef EDEADLOCK
>> --
>> 2.25.1
>>


Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-16 Thread Christophe Leroy




Le 16/04/2021 à 11:28, Steven Price a écrit :

On 15/04/2021 18:18, Christophe Leroy wrote:

In order to support large pages on powerpc, notepage()
needs to know the page size of the page.

Add a page_size argument to notepage().

Signed-off-by: Christophe Leroy 
---
  arch/arm64/mm/ptdump.c |  2 +-
  arch/riscv/mm/ptdump.c |  2 +-
  arch/s390/mm/dump_pagetables.c |  3 ++-
  arch/x86/mm/dump_pagetables.c  |  2 +-
  include/linux/ptdump.h |  2 +-
  mm/ptdump.c    | 16 
  6 files changed, 14 insertions(+), 13 deletions(-)


[...]

diff --git a/mm/ptdump.c b/mm/ptdump.c
index da751448d0e4..61cd16afb1c8 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk,
  {
  struct ptdump_state *st = walk->private;
-    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]));
+    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE);


I'm not completely sure what the page_size is going to be used for, but note that KASAN presents an 
interesting case here. We short-cut by detecting it's a KASAN region at a high level 
(PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but with 
level==4 because we know KASAN sets up the page table like that.


However the one call actually covers a much larger region - so while PAGE_SIZE matches the level it 
doesn't match the region covered. AFAICT this will lead to odd results if you enable KASAN on powerpc.


Hum  I successfully tested it with KASAN, I now realise that I tested it with 
CONFIG_KASAN_VMALLOC selected. In this situation, since 
https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table anymore.


I'll test again without CONFIG_KASAN_VMALLOC.



To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it 
purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would 
occur.


I was indeed introduced for KASAN. We have a first commit 
https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a 
KASAN like stuff.


Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the 
problem was exactly, something around the use of hugepages for kernel memory, came as part of the 
series 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/



Christophe


Re: [PATCH v1 1/1] powerpc/papr_scm: Properly handle UUID types and API

2021-04-16 Thread Aneesh Kumar K.V

On 4/16/21 2:39 PM, Andy Shevchenko wrote:

On Fri, Apr 16, 2021 at 01:28:21PM +0530, Aneesh Kumar K.V wrote:

On 4/15/21 7:16 PM, Andy Shevchenko wrote:

Parse to and export from UUID own type, before dereferencing.
This also fixes wrong comment (Little Endian UUID is something else)
and should fix Sparse warnings about assigning strict types to POD.

Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie")
Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for storing 
uuid from the device tree")
Cc: Oliver O'Halloran 
Cc: Aneesh Kumar K.V 
Signed-off-by: Andy Shevchenko 
---
Not tested
   arch/powerpc/platforms/pseries/papr_scm.c | 13 -
   1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index ae6f5d80d5ce..4366e1902890 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -1085,8 +1085,9 @@ static int papr_scm_probe(struct platform_device *pdev)
u32 drc_index, metadata_size;
u64 blocks, block_size;
struct papr_scm_priv *p;
+   u8 uuid_raw[UUID_SIZE];
const char *uuid_str;
-   u64 uuid[2];
+   uuid_t uuid;
int rc;
/* check we have all the required DT properties */
@@ -1129,16 +1130,18 @@ static int papr_scm_probe(struct platform_device *pdev)
p->hcall_flush_required = of_property_read_bool(dn, 
"ibm,hcall-flush-required");
/* We just need to ensure that set cookies are unique across */
-   uuid_parse(uuid_str, (uuid_t *) uuid);
+   uuid_parse(uuid_str, &uuid);
+
/*
 * cookie1 and cookie2 are not really little endian
-* we store a little endian representation of the
+* we store a raw buffer representation of the
 * uuid str so that we can compare this with the label
 * area cookie irrespective of the endian config with which
 * the kernel is built.
 */
-   p->nd_set.cookie1 = cpu_to_le64(uuid[0]);
-   p->nd_set.cookie2 = cpu_to_le64(uuid[1]);
+   export_uuid(uuid_raw, &uuid);
+   p->nd_set.cookie1 = get_unaligned_le64(&uuid_raw[0]);
+   p->nd_set.cookie2 = get_unaligned_le64(&uuid_raw[8]);


ok that does the equivalent of cpu_to_le64 there. So we are good. But the
comment update is missing the details why we did that get_unaligned_le64.
Maybe raw buffer representation is the correct term?
Should we add an example in the comment. ie,



/*
  * Historically we stored the cookie in the below format.
for a uuid str 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa
cookie1 was  0xfd423b0b671b5172 cookie2 was 0xaabce8cae35b1d8d
*/


I'm fine with the comment. At least it will shed a light on the byte ordering
we are expecting.



Will you be sending an update? Also it will be good to list the sparse 
warning in the commit message?


-aneesh



Re: [PATCH] powerpc/kdump: fix kdump kernel hangup issue with hot add CPUs

2021-04-16 Thread Hari Bathini




On 16/04/21 12:17 pm, Sourabh Jain wrote:

With the kexec_file_load system call when system crashes on the hot add
CPU the capture kernel hangs and failed to collect the vmcore.

  Kernel panic - not syncing: sysrq triggered crash
  CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54
  Call Trace:
  [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable)
  [c000e590fb00] [c0145290] panic+0x16c/0x41c
  [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40
  [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0
  [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178
  [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0
  [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330
  [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140
  [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290
  [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278
  --- interrupt: c00 at 0x7fff905b9664
  NIP:  7fff905b9664 LR: 7fff905320c4 CTR: 
  REGS: c000e590fe80 TRAP: 0c00   Not tainted  (5.12.0-rc5upstream)
  MSR:  8280f033   CR: 28000242
XER: 
  IRQMASK: 0
  GPR00: 0004 75fedf30 7fff906a7300 0001
  GPR04: 01002a7355b0 0002 0001 75fef616
  GPR08: 0001   
  GPR12:  7fff9073a160  
  GPR16:    
  GPR20:  7fff906a4ee0 0002 0001
  GPR24: 7fff906a0898  0002 01002a7355b0
  GPR28: 0002 7fff906a1790 01002a7355b0 0002
  NIP [7fff905b9664] 0x7fff905b9664
  LR [7fff905320c4] 0x7fff905320c4
  --- interrupt: c00





  /**
   * setup_new_fdt_ppc64 - Update the flattend device-tree of the kernel
   *   being loaded.
@@ -1020,6 +1113,13 @@ int setup_new_fdt_ppc64(const struct kimage *image, void 
*fdt,
}
}
  
+	/* Update cpus nodes information to account hotplug CPUs. */

+   if (image->type == KEXEC_TYPE_CRASH) {


Shouldn't this apply to regular kexec_file_load case as well? Yeah, 
there won't be a hang in regular kexec_file_load case but for 
correctness, that kernel should also not see stale CPU info in FDT?



Thanks
Hari


Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-16 Thread Steven Price

On 15/04/2021 18:18, Christophe Leroy wrote:

In order to support large pages on powerpc, notepage()
needs to know the page size of the page.

Add a page_size argument to notepage().

Signed-off-by: Christophe Leroy 
---
  arch/arm64/mm/ptdump.c |  2 +-
  arch/riscv/mm/ptdump.c |  2 +-
  arch/s390/mm/dump_pagetables.c |  3 ++-
  arch/x86/mm/dump_pagetables.c  |  2 +-
  include/linux/ptdump.h |  2 +-
  mm/ptdump.c| 16 
  6 files changed, 14 insertions(+), 13 deletions(-)


[...]

diff --git a/mm/ptdump.c b/mm/ptdump.c
index da751448d0e4..61cd16afb1c8 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk,
  {
struct ptdump_state *st = walk->private;
  
-	st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]));

+   st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), 
PAGE_SIZE);


I'm not completely sure what the page_size is going to be used for, but 
note that KASAN presents an interesting case here. We short-cut by 
detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) and 
instead of walking the tree down just call note_page() *once* but with 
level==4 because we know KASAN sets up the page table like that.


However the one call actually covers a much larger region - so while 
PAGE_SIZE matches the level it doesn't match the region covered. AFAICT 
this will lead to odd results if you enable KASAN on powerpc.


To be honest I don't fully understand why powerpc requires the page_size 
- it appears to be using it purely to find "holes" in the calls to 
note_page(), but I haven't worked out why such holes would occur.


Steve

  
  	walk->action = ACTION_CONTINUE;
  
@@ -41,7 +41,7 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,

st->effective_prot(st, 0, pgd_val(val));
  
  	if (pgd_leaf(val))

-   st->note_page(st, addr, 0, pgd_val(val));
+   st->note_page(st, addr, 0, pgd_val(val), PGDIR_SIZE);
  
  	return 0;

  }
@@ -62,7 +62,7 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
st->effective_prot(st, 1, p4d_val(val));
  
  	if (p4d_leaf(val))

-   st->note_page(st, addr, 1, p4d_val(val));
+   st->note_page(st, addr, 1, p4d_val(val), P4D_SIZE);
  
  	return 0;

  }
@@ -83,7 +83,7 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
st->effective_prot(st, 2, pud_val(val));
  
  	if (pud_leaf(val))

-   st->note_page(st, addr, 2, pud_val(val));
+   st->note_page(st, addr, 2, pud_val(val), PUD_SIZE);
  
  	return 0;

  }
@@ -102,7 +102,7 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
if (st->effective_prot)
st->effective_prot(st, 3, pmd_val(val));
if (pmd_leaf(val))
-   st->note_page(st, addr, 3, pmd_val(val));
+   st->note_page(st, addr, 3, pmd_val(val), PMD_SIZE);
  
  	return 0;

  }
@@ -116,7 +116,7 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
if (st->effective_prot)
st->effective_prot(st, 4, pte_val(val));
  
-	st->note_page(st, addr, 4, pte_val(val));

+   st->note_page(st, addr, 4, pte_val(val), PAGE_SIZE);
  
  	return 0;

  }
@@ -126,7 +126,7 @@ static int ptdump_hole(unsigned long addr, unsigned long 
next,
  {
struct ptdump_state *st = walk->private;
  
-	st->note_page(st, addr, depth, 0);

+   st->note_page(st, addr, depth, 0, 0);
  
  	return 0;

  }
@@ -153,5 +153,5 @@ void ptdump_walk_pgd(struct ptdump_state *st, struct 
mm_struct *mm, pgd_t *pgd)
mmap_read_unlock(mm);
  
  	/* Flush out the last page */

-   st->note_page(st, 0, -1, 0);
+   st->note_page(st, 0, -1, 0, 0);
  }





Re: [PATCH v1 1/1] powerpc/papr_scm: Properly handle UUID types and API

2021-04-16 Thread Andy Shevchenko
On Fri, Apr 16, 2021 at 01:28:21PM +0530, Aneesh Kumar K.V wrote:
> On 4/15/21 7:16 PM, Andy Shevchenko wrote:
> > Parse to and export from UUID own type, before dereferencing.
> > This also fixes wrong comment (Little Endian UUID is something else)
> > and should fix Sparse warnings about assigning strict types to POD.
> > 
> > Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset 
> > cookie")
> > Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for 
> > storing uuid from the device tree")
> > Cc: Oliver O'Halloran 
> > Cc: Aneesh Kumar K.V 
> > Signed-off-by: Andy Shevchenko 
> > ---
> > Not tested
> >   arch/powerpc/platforms/pseries/papr_scm.c | 13 -
> >   1 file changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> > b/arch/powerpc/platforms/pseries/papr_scm.c
> > index ae6f5d80d5ce..4366e1902890 100644
> > --- a/arch/powerpc/platforms/pseries/papr_scm.c
> > +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> > @@ -1085,8 +1085,9 @@ static int papr_scm_probe(struct platform_device 
> > *pdev)
> > u32 drc_index, metadata_size;
> > u64 blocks, block_size;
> > struct papr_scm_priv *p;
> > +   u8 uuid_raw[UUID_SIZE];
> > const char *uuid_str;
> > -   u64 uuid[2];
> > +   uuid_t uuid;
> > int rc;
> > /* check we have all the required DT properties */
> > @@ -1129,16 +1130,18 @@ static int papr_scm_probe(struct platform_device 
> > *pdev)
> > p->hcall_flush_required = of_property_read_bool(dn, 
> > "ibm,hcall-flush-required");
> > /* We just need to ensure that set cookies are unique across */
> > -   uuid_parse(uuid_str, (uuid_t *) uuid);
> > +   uuid_parse(uuid_str, &uuid);
> > +
> > /*
> >  * cookie1 and cookie2 are not really little endian
> > -* we store a little endian representation of the
> > +* we store a raw buffer representation of the
> >  * uuid str so that we can compare this with the label
> >  * area cookie irrespective of the endian config with which
> >  * the kernel is built.
> >  */
> > -   p->nd_set.cookie1 = cpu_to_le64(uuid[0]);
> > -   p->nd_set.cookie2 = cpu_to_le64(uuid[1]);
> > +   export_uuid(uuid_raw, &uuid);
> > +   p->nd_set.cookie1 = get_unaligned_le64(&uuid_raw[0]);
> > +   p->nd_set.cookie2 = get_unaligned_le64(&uuid_raw[8]);
> 
> ok that does the equivalent of cpu_to_le64 there. So we are good. But the
> comment update is missing the details why we did that get_unaligned_le64.
> Maybe raw buffer representation is the correct term?
> Should we add an example in the comment. ie,

> /*
>  * Historically we stored the cookie in the below format.
> for a uuid str 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa
> cookie1 was  0xfd423b0b671b5172 cookie2 was 0xaabce8cae35b1d8d
> */

I'm fine with the comment. At least it will shed a light on the byte ordering
we are expecting.

> > /* might be zero */
> > p->metadata_size = metadata_size;

-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

2021-04-16 Thread Michael Ellerman
Daniel Axtens  writes:
>> On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:
>>
>> Sorry - missed copying device-tree and powerpc mailing lists.
>>
>>> There are a few "goto out;" statements before the local variable "fdt"
>>> is initialized through the call to of_kexec_alloc_and_setup_fdt() in
>>> elf64_load(). This will result in an uninitialized "fdt" being passed
>>> to kvfree() in this function if there is an error before the call to
>>> of_kexec_alloc_and_setup_fdt().
>>> 
>>> Initialize the local variable "fdt" to NULL.
>>>
> I'm a huge fan of initialising local variables! But I'm struggling to
> find the code path that will lead to an uninit fdt being returned...
>
> The out label reads in part:
>
>   /* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
>   return ret ? ERR_PTR(ret) : fdt;
>
> As far as I can tell, any time we get a non-zero ret, we're going to
> return an error pointer rather than the uninitialised value...
>
> (btw, it does look like we might leak fdt if we have an error after we
> successfully kmalloc it.)
>
> Am I missing something? Can you link to the report for the kernel test
> robot or from Dan? 
>
> FWIW, I think it's worth including this patch _anyway_ because initing
> local variables is good practice, but I'm just not sure on the
> justification.

Why is it good practice?

It defeats -Wuninitialized. So you're guaranteed to be returning
something initialised, but not necessarily initialised to the right
value.

In a case like this NULL seems like a safe choice, but it's still wrong.
The function is meant to return a pointer to the successfully allocated
fdt, or an ERR_PTR() value. NULL is neither of those.

I agree there are security reasons that initialising stack variables is
desirable, but I think that should be handled by the compiler, not at
the source level.

cheers


Re: [PATCH net-next v4 2/2] of: net: fix of_get_mac_addr_nvmem() for non-platform devices

2021-04-16 Thread Michael Walle

Am 2021-04-16 05:24, schrieb Benjamin Herrenschmidt:

On Mon, 2021-04-12 at 19:47 +0200, Michael Walle wrote:


 /**
  * of_get_phy_mode - Get phy mode for given device_node
@@ -59,15 +60,39 @@ static int of_get_mac_addr(struct device_node *np, 
const char *name, u8 *addr)

 static int of_get_mac_addr_nvmem(struct device_node *np, u8 *addr)
 {
struct platform_device *pdev = of_find_device_by_node(np);
+   struct nvmem_cell *cell;
+   const void *mac;
+   size_t len;
int ret;

-   if (!pdev)
-   return -ENODEV;
+   /* Try lookup by device first, there might be a 
nvmem_cell_lookup

+* associated with a given device.
+*/
+   if (pdev) {
+   ret = nvmem_get_mac_address(&pdev->dev, addr);
+   put_device(&pdev->dev);
+   return ret;
+   }
+


This smells like the wrong band aid :)

Any struct device can contain an OF node pointer these days.


But not all nodes might have an associated device, see DSA for example.
And as the name suggests of_get_mac_address() operates on a node. So
if a driver calls of_get_mac_address() it should work on the node. What
is wrong IMHO, is that the ethernet drivers where the corresponding 
board

has a nvmem_cell_lookup registered is calling of_get_mac_address(node).
It should rather call eth_get_mac_address(dev) in the first place.

One would need to figure out if there is an actual device (with an
assiciated of_node), then call eth_get_mac_address(dev) and if there
isn't a device call of_get_mac_address(node).

But I don't know if that is easy to figure out. Well, one could start
with just the device where nvmem_cell_lookup is used. Then we could
drop the workaround above.


This seems all backwards. I think we are dealing with bad evolution.

We need to do a lookup for the device because we get passed an of_node.
We should just get passed a device here... or rather stop calling
of_get_mac_addr() from all those drivers and instead call
eth_platform_get_mac_address() which in turns calls of_get_mac_addr().

Then the nvmem stuff gets put in eth_platform_get_mac_address().

of_get_mac_addr() becomes a low-level thingy that most drivers don't
care about.


The NVMEM thing is just another (optional) way how the MAC address
is fetched from the device tree. Thus, if the drivers have the
of_get_mac_address() call they should automatically get the NVMEM
method, too.

-michael


Re: [PATCH] mm: ptdump: Fix build failure

2021-04-16 Thread Steven Price

On 15/04/2021 10:31, Christophe Leroy wrote:

  CC  mm/ptdump.o
In file included from :
mm/ptdump.c: In function 'ptdump_pte_entry':
././include/linux/compiler_types.h:320:38: error: call to 
'__compiletime_assert_207' declared with attribute error: Unsupported access 
size for {READ,WRITE}_ONCE().
  320 |  _compiletime_assert(condition, msg, __compiletime_assert_, 
__COUNTER__)
  |  ^
././include/linux/compiler_types.h:301:4: note: in definition of macro 
'__compiletime_assert'
  301 |prefix ## suffix();\
  |^~
././include/linux/compiler_types.h:320:2: note: in expansion of macro 
'_compiletime_assert'
  320 |  _compiletime_assert(condition, msg, __compiletime_assert_, 
__COUNTER__)
  |  ^~~
./include/asm-generic/rwonce.h:36:2: note: in expansion of macro 
'compiletime_assert'
   36 |  compiletime_assert(__native_word(t) || sizeof(t) == 
sizeof(long long), \
  |  ^~
./include/asm-generic/rwonce.h:49:2: note: in expansion of macro 
'compiletime_assert_rwonce_type'
   49 |  compiletime_assert_rwonce_type(x);\
  |  ^~
mm/ptdump.c:114:14: note: in expansion of macro 'READ_ONCE'
  114 |  pte_t val = READ_ONCE(*pte);
  |  ^
make[2]: *** [mm/ptdump.o] Error 1

READ_ONCE() cannot be used for reading PTEs. Use ptep_get()
instead. See commit 481e980a7c19 ("mm: Allow arches to provide ptep_get()")
and commit c0e1c8c22beb ("powerpc/8xx: Provide ptep_get() with 16k pages")
for details.


It was cargo-culted from the arm64/x86 implementations (where this 
happens to be safe).



Fixes: 30d621f6723b ("mm: add generic ptdump")
Cc: Steven Price 
Signed-off-by: Christophe Leroy 


Reviewed-by: Steven Price 

Thanks,

Steve


---
  mm/ptdump.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/ptdump.c b/mm/ptdump.c
index 4354c1422d57..da751448d0e4 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -111,7 +111,7 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
unsigned long next, struct mm_walk *walk)
  {
struct ptdump_state *st = walk->private;
-   pte_t val = READ_ONCE(*pte);
+   pte_t val = ptep_get(pte);
  
  	if (st->effective_prot)

st->effective_prot(st, 4, pte_val(val));





Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

2021-04-16 Thread Dan Carpenter
On Fri, Apr 16, 2021 at 09:00:12AM +0200, Christophe Leroy wrote:
> 
> 
> Le 16/04/2021 à 08:44, Daniel Axtens a écrit :
> > Hi Lakshmi,
> > 
> > > On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:
> > > 
> > > Sorry - missed copying device-tree and powerpc mailing lists.
> > > 
> > > > There are a few "goto out;" statements before the local variable "fdt"
> > > > is initialized through the call to of_kexec_alloc_and_setup_fdt() in
> > > > elf64_load(). This will result in an uninitialized "fdt" being passed
> > > > to kvfree() in this function if there is an error before the call to
> > > > of_kexec_alloc_and_setup_fdt().
> > > > 
> > > > Initialize the local variable "fdt" to NULL.
> > > > 
> > I'm a huge fan of initialising local variables! But I'm struggling to
> > find the code path that will lead to an uninit fdt being returned...
> > 
> > The out label reads in part:
> > 
> > /* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
> > return ret ? ERR_PTR(ret) : fdt;
> > 
> > As far as I can tell, any time we get a non-zero ret, we're going to
> > return an error pointer rather than the uninitialised value...
> 
> I don't think GCC is smart enough to detect that.
> 

We disabled uninitialized variable checking for GCC.

But actually is something that has been on my mind recently.  Smatch is
supposed to parse this correctly but there is a bug that affects powerpc
and I don't know how to debug it.  The kbuild bot is doing cross
platform compiles but I don't have one set up on myself.  Could someone
with Smatch installed test something for me?

Or if you don't have Smatch installed then you should definitely install
it.  :P
https://www.spinics.net/lists/smatch/msg00568.html

Apply the patch from below and edit the path to point to the correct
directory.  Then run kchecker and email me the output?

~/path/to/smatch_scripts/kchecker arch/powerpc/kernel/hw_breakpoint.c

regads,
dan carpenter

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 8fc7a14e4d71..f2dfba54e14d 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -167,13 +167,19 @@ static bool can_co_exist(struct breakpoint *b, struct 
perf_event *bp)
return !(alternate_infra_bp(b, bp) && bp_addr_range_overlap(b->bp, bp));
 }
 
+#include "/home/XXX/path/to/smatch/check_debug.h"
 static int task_bps_add(struct perf_event *bp)
 {
struct breakpoint *tmp;
 
tmp = alloc_breakpoint(bp);
-   if (IS_ERR(tmp))
+   __smatch_about(tmp);
+   __smatch_debug_on();
+   if (IS_ERR(tmp)) {
+   __smatch_debug_off();
+   __smatch_about(tmp);
return PTR_ERR(tmp);
+   }
 
list_add(&tmp->list, &task_bps);
return 0;


Re: [PATCH v1 1/1] powerpc/papr_scm: Properly handle UUID types and API

2021-04-16 Thread Aneesh Kumar K.V

On 4/15/21 7:16 PM, Andy Shevchenko wrote:

Parse to and export from UUID own type, before dereferencing.
This also fixes wrong comment (Little Endian UUID is something else)
and should fix Sparse warnings about assigning strict types to POD.

Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie")
Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for storing 
uuid from the device tree")
Cc: Oliver O'Halloran 
Cc: Aneesh Kumar K.V 
Signed-off-by: Andy Shevchenko 
---
Not tested
  arch/powerpc/platforms/pseries/papr_scm.c | 13 -
  1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index ae6f5d80d5ce..4366e1902890 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -1085,8 +1085,9 @@ static int papr_scm_probe(struct platform_device *pdev)
u32 drc_index, metadata_size;
u64 blocks, block_size;
struct papr_scm_priv *p;
+   u8 uuid_raw[UUID_SIZE];
const char *uuid_str;
-   u64 uuid[2];
+   uuid_t uuid;
int rc;
  
  	/* check we have all the required DT properties */

@@ -1129,16 +1130,18 @@ static int papr_scm_probe(struct platform_device *pdev)
p->hcall_flush_required = of_property_read_bool(dn, 
"ibm,hcall-flush-required");
  
  	/* We just need to ensure that set cookies are unique across */

-   uuid_parse(uuid_str, (uuid_t *) uuid);
+   uuid_parse(uuid_str, &uuid);
+
/*
 * cookie1 and cookie2 are not really little endian
-* we store a little endian representation of the
+* we store a raw buffer representation of the
 * uuid str so that we can compare this with the label
 * area cookie irrespective of the endian config with which
 * the kernel is built.
 */
-   p->nd_set.cookie1 = cpu_to_le64(uuid[0]);
-   p->nd_set.cookie2 = cpu_to_le64(uuid[1]);
+   export_uuid(uuid_raw, &uuid);
+   p->nd_set.cookie1 = get_unaligned_le64(&uuid_raw[0]);
+   p->nd_set.cookie2 = get_unaligned_le64(&uuid_raw[8]);
  


ok that does the equivalent of cpu_to_le64 there. So we are good. But 
the comment update is missing the details why we did that 
get_unaligned_le64. Maybe raw buffer representation is the correct term?

Should we add an example in the comment. ie,

/*
 * Historically we stored the cookie in the below format.
for a uuid str 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa
cookie1 was  0xfd423b0b671b5172 cookie2 was 0xaabce8cae35b1d8d
*/




/* might be zero */
p->metadata_size = metadata_size;





[PATCH] symbol : Make the size of the compile-related array fixed

2021-04-16 Thread Han Dapeng


OPPO

本电子邮件及其附件含有OPPO公司的保密信息,仅限于邮件指明的收件人使用(包含个人及群组)。禁止任何人在未经授权的情况下以任何形式使用。如果您错收了本邮件,请立即以电子邮件通知发件人并删除本邮件及其附件。

This e-mail and its attachments contain confidential information from OPPO, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!


0001-symbol-Make-the-size-of-the-compile-related-array-fi.patch
Description: 0001-symbol-Make-the-size-of-the-compile-related-array-fi.patch


Re: [PATCH 1/2] powerpc/sstep: Add emulation support for ‘setb’ instruction

2021-04-16 Thread Daniel Axtens
Sathvika Vasireddy  writes:

> This adds emulation support for the following instruction:
>* Set Boolean (setb)
>
> Signed-off-by: Sathvika Vasireddy 
> ---
>  arch/powerpc/lib/sstep.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index c6aebc149d14..263c613d7490 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -1964,6 +1964,18 @@ int analyse_instr(struct instruction_op *op, const 
> struct pt_regs *regs,
>   op->val = ~(regs->gpr[rd] | regs->gpr[rb]);
>   goto logical_done;
>  
> + case 128:   /* setb */
> + if (!cpu_has_feature(CPU_FTR_ARCH_300))
> + goto unknown_opcode;

Ok, if I've understood correctly...

> + ra = ra & ~0x3;

This masks off the bits of RA that are not part of BTF:

ra is in [0, 31] which is [0b0, 0b1]
Then ~0x3 = ~0b00011
ra = ra & 0b11100

This gives us then,
ra = btf << 2; or
btf = ra >> 2;

Let's then check to see if your calculations read the right fields.

> + if ((regs->ccr) & (1 << (31 - ra)))
> + op->val = -1;
> + else if ((regs->ccr) & (1 << (30 - ra)))
> + op->val = 1;
> + else
> + op->val = 0;


CR field:  76543210
bit:  0123 0123 0123 0123 0123 0123 0123 0123
normal bit #: 0.31
ibm bit #:   31.0

If btf = 0, ra = 0, check normal bits 31 and 30, which are both in CR0.
CR field:  76543210
bit:  0123 0123 0123 0123 0123 0123 0123 0123
   ^^

If btf = 7, ra = 0b11100 = 28, so check normal bits 31-28 and 30-28,
which are 3 and 2.

CR field:  76543210
bit:  0123 0123 0123 0123 0123 0123 0123 0123
^^

If btf = 3, ra = 0b01100 = 12, for normal bits 19 and 18:

CR field:  76543210
bit:  0123 0123 0123 0123 0123 0123 0123 0123
^^

So yes, your calculations, while I struggle to follow _how_ they work,
do in fact seem to work.

Checkpatch does have one complaint:

CHECK:UNNECESSARY_PARENTHESES: Unnecessary parentheses around 'regs->ccr'
#30: FILE: arch/powerpc/lib/sstep.c:1971:
+   if ((regs->ccr) & (1 << (31 - ra)))

I don't really mind the parenteses: I think you are safe to ignore
checkpatch here unless someone else complains :)

If you do end up respinning the patch, I think it would be good to make
the maths a bit clearer. I think it works because a left shift of 2 is
the same as multiplying by 4, but it would be easier to follow if you
used a temporary variable for btf.

However, I do think this is still worth adding to the kernel either way,
so:

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> + goto compute_done;
> +
>   case 154:   /* prtyw */
>   do_prty(regs, op, regs->gpr[rd], 32);
>   goto logical_done_nocc;
> -- 
> 2.16.4


Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

2021-04-16 Thread Dan Carpenter
On Fri, Apr 16, 2021 at 04:44:30PM +1000, Daniel Axtens wrote:
> Hi Lakshmi,
> 
> > On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:
> >
> > Sorry - missed copying device-tree and powerpc mailing lists.
> >
> >> There are a few "goto out;" statements before the local variable "fdt"
> >> is initialized through the call to of_kexec_alloc_and_setup_fdt() in
> >> elf64_load(). This will result in an uninitialized "fdt" being passed
> >> to kvfree() in this function if there is an error before the call to
> >> of_kexec_alloc_and_setup_fdt().
> >> 
> >> Initialize the local variable "fdt" to NULL.
> >>
> I'm a huge fan of initialising local variables!

Don't be!  It just disables static checker warnings and hides bugs.
The kbuild emails are archived but the email is mangled and unreadable.
https://www.mail-archive.com/kbuild@lists.01.org/msg06371.html

I think maybe you're not on the most recent code.  In linux-next this
code looks like:

arch/powerpc/kexec/elf_64.c
27  static void *elf64_load(struct kimage *image, char *kernel_buf,
28  unsigned long kernel_len, char *initrd,
29  unsigned long initrd_len, char *cmdline,
30  unsigned long cmdline_len)
31  {
32  int ret;
33  unsigned long kernel_load_addr;
34  unsigned long initrd_load_addr = 0, fdt_load_addr;
35  void *fdt;
36  const void *slave_code;
37  struct elfhdr ehdr;
38  char *modified_cmdline = NULL;
39  struct kexec_elf_info elf_info;
40  struct kexec_buf kbuf = { .image = image, .buf_min = 0,
41.buf_max = ppc64_rma_size };
42  struct kexec_buf pbuf = { .image = image, .buf_min = 0,
43.buf_max = ppc64_rma_size, .top_down 
= true,
44.mem = KEXEC_BUF_MEM_UNKNOWN };
45  
46  ret = kexec_build_elf_info(kernel_buf, kernel_len, &ehdr, 
&elf_info);
47  if (ret)
48  goto out;

I really despise "goto out;" because freeing things which haven't been
allocated is always dangerous.

[ snip ].


   143  out:
   144  kfree(modified_cmdline);
   145  kexec_free_elf_info(&elf_info);
 
There is a possibility that "elf_info" has holds uninitialized stack
data if elf_read_ehdr() fails so that's probably fixing as well.  kexec()
is root only so this can't be exploited.

   146  
   147  /*
   148   * Once FDT buffer has been successfully passed to 
kexec_add_buffer(),
   149   * the FDT buffer address is saved in image->arch.fdt. In that 
case,
   150   * the memory cannot be freed here in case of any other error.
   151   */
   152  if (ret && !image->arch.fdt)
   153  kvfree(fdt);
   ^^^
Uninitialized.

   154  
   155  return ret ? ERR_PTR(ret) : NULL;
   156  }

regards,
dan carpenter


RE: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-16 Thread David Laight
From: Matthew Wilcox 
> Sent: 15 April 2021 23:22
> 
> On Thu, Apr 15, 2021 at 09:11:56PM +, David Laight wrote:
> > Isn't it possible to move the field down one long?
> > This might require an explicit zero - but this is not a common
> > code path - the extra write will be noise.
> 
> Then it overlaps page->mapping.  See emails passim.

The rules on overlaps make be wonder if every 'long'
should be in its own union.
The comments would need to say when each field is used.
It would, at least, make these errors less common.

That doesn't solve the 64bit dma_addr though.

Actually rather that word-swapping dma_addr on 32bit BE
could you swap over the two fields it overlays with.
That might look messy in the .h, but it doesn't require
an accessor function to do the swap - easily missed.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: [PATCH] powerpc/pseries: extract host bridge from pci_bus prior to bus removal

2021-04-16 Thread Daniel Axtens
Hi Tyrel,

> The pci_bus->bridge reference may no longer be valid after
> pci_bus_remove() resulting in passing a bad value to device_unregister()
> for the associated bridge device.
>
> Store the host_bridge reference in a separate variable prior to
> pci_bus_remove().
>
The patch certainly seems to do what you say. I'm not really up on the
innards of PCI, so I'm struggling to figure out by what code path
pci_bus_remove() might invalidate pci_bus->bridge? A quick look at
pci_remove_bus was not very illuminating but I didn't chase down every
call it made.

Kind regards,
Daniel

> Fixes: 7340056567e3 ("powerpc/pci: Reorder pci bus/bridge unregistration 
> during PHB removal")
> Signed-off-by: Tyrel Datwyler 
> ---
>  arch/powerpc/platforms/pseries/pci_dlpar.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c 
> b/arch/powerpc/platforms/pseries/pci_dlpar.c
> index f9ae17e8a0f4..a8f9140a24fa 100644
> --- a/arch/powerpc/platforms/pseries/pci_dlpar.c
> +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
> @@ -50,6 +50,7 @@ EXPORT_SYMBOL_GPL(init_phb_dynamic);
>  int remove_phb_dynamic(struct pci_controller *phb)
>  {
>   struct pci_bus *b = phb->bus;
> + struct pci_host_bridge *host_bridge = to_pci_host_bridge(b->bridge);
>   struct resource *res;
>   int rc, i;
>  
> @@ -76,7 +77,8 @@ int remove_phb_dynamic(struct pci_controller *phb)
>   /* Remove the PCI bus and unregister the bridge device from sysfs */
>   phb->bus = NULL;
>   pci_remove_bus(b);
> - device_unregister(b->bridge);
> + host_bridge->bus = NULL;
> + device_unregister(&host_bridge->dev);
>  
>   /* Now release the IO resource */
>   if (res->flags & IORESOURCE_IO)
> -- 
> 2.27.0


Re: [PATCH] soc: fsl: qe: remove unused function

2021-04-16 Thread Christophe Leroy




Le 16/04/2021 à 08:57, Daniel Axtens a écrit :

Hi Jiapeng,


Fix the following clang warning:


You are not fixing a warning, you are removing a function in order to fix a 
warning ...



drivers/soc/fsl/qe/qe_ic.c:234:29: warning: unused function
'qe_ic_from_irq' [-Wunused-function].


Would be wise to tell that the last users of the function where removed by commit d7c2878cfcfa 
("soc: fsl: qe: remove unused qe_ic_set_* functions")


https://github.com/torvalds/linux/commit/d7c2878cfcfa



Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
  drivers/soc/fsl/qe/qe_ic.c | 5 -
  1 file changed, 5 deletions(-)

diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c
index 0390af9..b573712 100644
--- a/drivers/soc/fsl/qe/qe_ic.c
+++ b/drivers/soc/fsl/qe/qe_ic.c
@@ -231,11 +231,6 @@ static inline void qe_ic_write(__be32  __iomem *base, 
unsigned int reg,
qe_iowrite32be(value, base + (reg >> 2));
  }
  
-static inline struct qe_ic *qe_ic_from_irq(unsigned int virq)

-{
-   return irq_get_chip_data(virq);
-}


This seems good to me.

  * We know that this function can't be called directly from outside the
   file, because it is static.

  * The function address isn't used as a function pointer anywhere, so
that means it can't be called from outside the file that way (also
it's inline, which would make using a function pointer unwise!)

  * There's no obvious macros in that file that might construct the name
of the function in a way that is hidden from grep.

All in all, I am fairly confident that the function is indeed not used.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel


-
  static inline struct qe_ic *qe_ic_from_irq_data(struct irq_data *d)
  {
return irq_data_get_irq_chip_data(d);
--
1.8.3.1


答复: [PATCH] symbol : Make the size of the compile-related array fixed

2021-04-16 Thread Han Dapeng
Thank you!Got it.
I tried to use Git, but it didn't work.
I'll do it next time.

-邮件原件-
发件人: Daniel Axtens 
发送时间: 2021年4月16日 14:19
收件人: 韩大鹏(Han Dapeng) ; Michael Ellerman 
; Benjamin Herrenschmidt ; Paul 
Mackerras ; Heiko Carstens ; Vasily 
Gorbik ; Christian Borntraeger ; 
Thomas Gleixner ; Ingo Molnar ; Borislav 
Petkov ; x...@kernel.org; H. Peter Anvin ; 
Masahiro Yamada ; Michal Marek ; 
Mike Rapoport ; Pekka Enberg ; Andrew 
Morton ; Arseny Solokha ; 
韩大鹏(Han Dapeng) ; Arvind Sankar ; 
Kees Cook ; Joerg Roedel ; Christian 
Brauner ; Kirill Tkhai ; 
linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org; 
linux-s...@vger.kernel.org; linux-kbu...@vger.kernel.org
抄送: 陈安庆(Anqing) 
主题: Re: [PATCH] symbol : Make the size of the compile-related array fixed

Hi,

Thanks for your contribution to the kernel!

I notice that your patch is sumbitted as an attachment. In future, please could 
you submit your patch inline, rather than as an attachment?
See 
https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Fv4.15%2Fprocess%2F5.Posting.html&data=04%7C01%7Chandapeng%40oppo.com%7Cd00adeb268d34207f5bb08d9009f98b2%7Cf1905eb1c35341c5951662b4a54b5ee6%7C0%7C0%7C637541508341234090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8Sncu0Yyqvv3AtEZiL73vexVEpwmxmwrUjJ124ez%2BPs%3D&reserved=0
I'd recommend you use git send-email if possible: see e.g.
https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Fv4.15%2Fprocess%2Femail-clients.html&data=04%7C01%7Chandapeng%40oppo.com%7Cd00adeb268d34207f5bb08d9009f98b2%7Cf1905eb1c35341c5951662b4a54b5ee6%7C0%7C0%7C637541508341234090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Ny2BFB4AE05nHHqITJkixtOwfPrB8s1KboWVLTFVXF8%3D&reserved=0

> Subject: [PATCH] symbol : Make the size of the compile-related array
> fixed
>
> For the same code, the machine's user name, hostname, or compilation
> time may cause the kernel symbol address to be inconsistent, which is
> not friendly to some symbol-dependent software, such as Crash.

If I understand correctly, this patch makes it easier to recompile the kernel 
from the same source but at a different time or on a different machine or with 
a different user, but still get the same symbols.
Is that right?

I wonder if there are other reproducible build techniques that might be simpler 
to apply? There is a kernel documentation page at
https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Fkbuild%2Freproducible-builds.html&data=04%7C01%7Chandapeng%40oppo.com%7Cd00adeb268d34207f5bb08d9009f98b2%7Cf1905eb1c35341c5951662b4a54b5ee6%7C0%7C0%7C637541508341234090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=s54GLV2Pue3ArPgX4OcEyhAGsanJyBy1LRESkVMOGts%3D&reserved=0
which gives exisiting techniques to override the date, user and host.
Would they be sufficient to address your use case?

Kind regards,
Daniel

>
> Signed-off-by: Han Dapeng 
> ---
>  arch/powerpc/mm/nohash/kaslr_booke.c | 2 +-
>  arch/s390/boot/version.c | 2 +-
>  arch/x86/boot/compressed/kaslr.c | 2 +-
>  arch/x86/boot/version.c  | 2 +-
>  init/version.c   | 4 ++--
>  scripts/mkcompile_h  | 2 ++
>  6 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c
> b/arch/powerpc/mm/nohash/kaslr_booke.c
> index 4c74e8a5482b..494ef408e60c 100644
> --- a/arch/powerpc/mm/nohash/kaslr_booke.c
> +++ b/arch/powerpc/mm/nohash/kaslr_booke.c
> @@ -37,7 +37,7 @@ struct regions {
>  };
>
>  /* Simplified build-specific string for starting entropy. */ -static
> const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
> +static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" 
> LINUX_COMPILE_BY "@"
>  LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
>
>  struct regions __initdata regions;
> diff --git a/arch/s390/boot/version.c b/arch/s390/boot/version.c index
> d32e58bdda6a..627416a27d74 100644
> --- a/arch/s390/boot/version.c
> +++ b/arch/s390/boot/version.c
> @@ -3,5 +3,5 @@
>  #include 
>  #include "boot.h"
>
> -const char kernel_version[] = UTS_RELEASE
> +const char kernel_version[COMPILE_STR_MAX] = UTS_RELEASE
>  " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") " UTS_VERSION; diff
> --git a/arch/x86/boot/compressed/kaslr.c
> b/arch/x86/boot/compressed/kaslr.c
> index b92fffbe761f..7b72b518a4c8 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -43,7 +43,7 @@
>  extern unsigned long get_cmd_line_ptr(void);
>
>  /* Simplified build-specific string for starting entropy. */ -static
> const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
> +static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" 
> LINUX_COMPILE_BY "@

答复: [PATCH] symbol : Make the size of the compile-related array fixed

2021-04-16 Thread Han Dapeng
The raw text is shown below:

From 540e6a6c36e6372d4f99eeb4a50c8eaa6d7989b3 Mon Sep 17 00:00:00 2001
From: Han Dapeng 
Date: Fri, 16 Apr 2021 10:36:38 +0800
Subject: [PATCH] symbol : Make the size of the compile-related array fixed

For the same code, the machine's user name, hostname, or compilation time
may cause the kernel symbol address to be inconsistent, which is not
friendly to some symbol-dependent software, such as Crash.

Signed-off-by: Han Dapeng 
---
 arch/powerpc/mm/nohash/kaslr_booke.c | 2 +-
 arch/s390/boot/version.c | 2 +-
 arch/x86/boot/compressed/kaslr.c | 2 +-
 arch/x86/boot/version.c  | 2 +-
 init/version.c   | 4 ++--
 scripts/mkcompile_h  | 2 ++
 6 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c 
b/arch/powerpc/mm/nohash/kaslr_booke.c
index 4c74e8a5482b..494ef408e60c 100644
--- a/arch/powerpc/mm/nohash/kaslr_booke.c
+++ b/arch/powerpc/mm/nohash/kaslr_booke.c
@@ -37,7 +37,7 @@ struct regions {
 };
 
 /* Simplified build-specific string for starting entropy. */
-static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
+static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" 
LINUX_COMPILE_BY "@"
LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
 
 struct regions __initdata regions;
diff --git a/arch/s390/boot/version.c b/arch/s390/boot/version.c
index d32e58bdda6a..627416a27d74 100644
--- a/arch/s390/boot/version.c
+++ b/arch/s390/boot/version.c
@@ -3,5 +3,5 @@
 #include 
 #include "boot.h"
 
-const char kernel_version[] = UTS_RELEASE
+const char kernel_version[COMPILE_STR_MAX] = UTS_RELEASE
" (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") " UTS_VERSION;
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index b92fffbe761f..7b72b518a4c8 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -43,7 +43,7 @@
 extern unsigned long get_cmd_line_ptr(void);
 
 /* Simplified build-specific string for starting entropy. */
-static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
+static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" 
LINUX_COMPILE_BY "@"
LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
 
 static unsigned long rotate_xor(unsigned long hash, const void *area,
diff --git a/arch/x86/boot/version.c b/arch/x86/boot/version.c
index a1aaaf6c06a6..08feaa2d7a10 100644
--- a/arch/x86/boot/version.c
+++ b/arch/x86/boot/version.c
@@ -14,6 +14,6 @@
 #include 
 #include 
 
-const char kernel_version[] =
+const char kernel_version[COMPILE_STR_MAX] =
UTS_RELEASE " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") "
UTS_VERSION;
diff --git a/init/version.c b/init/version.c
index 92afc782b043..adfc9e91b56b 100644
--- a/init/version.c
+++ b/init/version.c
@@ -35,11 +35,11 @@ struct uts_namespace init_uts_ns = {
 EXPORT_SYMBOL_GPL(init_uts_ns);
 
 /* FIXED STRINGS! Don't touch! */
-const char linux_banner[] =
+const char linux_banner[COMPILE_STR_MAX] =
"Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@"
LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n";
 
-const char linux_proc_banner[] =
+const char linux_proc_banner[COMPILE_STR_MAX] =
"%s version %s"
" (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ")"
" (" LINUX_COMPILER ") %s\n";
diff --git a/scripts/mkcompile_h b/scripts/mkcompile_h
index 4ae735039daf..02b9d9d54da9 100755
--- a/scripts/mkcompile_h
+++ b/scripts/mkcompile_h
@@ -65,6 +65,8 @@ UTS_VERSION="$(echo $UTS_VERSION $CONFIG_FLAGS $TIMESTAMP | 
cut -b -$UTS_LEN)"
   LD_VERSION=$($LD -v | head -n1 | sed 's/(compatible with [^)]*)//' \
  | sed 's/[[:space:]]*$//')
   printf '#define LINUX_COMPILER "%s"\n' "$CC_VERSION, $LD_VERSION"
+
+  echo \#define COMPILE_STR_MAX 512
 } > .tmpcompile
 
 # Only replace the real compile.h if the new one is different,
-- 
2.27.0

=
This statement is automatically attached to our company's mail client.
Our company encourages giving code back to the community, no problem, I will 
take responsibility for this.

Thank you!

-邮件原件-
发件人: Christophe Leroy  
发送时间: 2021年4月16日 14:13
收件人: 韩大鹏(Han Dapeng) ; Michael Ellerman 
; Benjamin Herrenschmidt ; Paul 
Mackerras ; Heiko Carstens ; Vasily 
Gorbik ; Christian Borntraeger ; 
Thomas Gleixner ; Ingo Molnar ; Borislav 
Petkov ; x...@kernel.org; H. Peter Anvin ; 
Masahiro Yamada ; Michal Marek ; 
Mike Rapoport ; Pekka Enberg ; Andrew 
Morton ; Arseny Solokha ; 
Arvind Sankar ; Kees Cook ; Joerg 
Roedel ; Christian Brauner ; 
Kirill Tkhai ; linuxppc-dev@lists.ozlabs.org; 
linux-ker...@vger.kernel.org; linux-s...@vger.kernel.org; 
linux-kbu...@vger.kernel.org
抄送: 陈安庆(Anqing) 
主题: Re: [PATCH] symbol : Make the size

Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()

2021-04-16 Thread Christophe Leroy




Le 16/04/2021 à 08:44, Daniel Axtens a écrit :

Hi Lakshmi,


On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote:

Sorry - missed copying device-tree and powerpc mailing lists.


There are a few "goto out;" statements before the local variable "fdt"
is initialized through the call to of_kexec_alloc_and_setup_fdt() in
elf64_load(). This will result in an uninitialized "fdt" being passed
to kvfree() in this function if there is an error before the call to
of_kexec_alloc_and_setup_fdt().

Initialize the local variable "fdt" to NULL.


I'm a huge fan of initialising local variables! But I'm struggling to
find the code path that will lead to an uninit fdt being returned...

The out label reads in part:

/* Make kimage_file_post_load_cleanup free the fdt buffer for us. */
return ret ? ERR_PTR(ret) : fdt;

As far as I can tell, any time we get a non-zero ret, we're going to
return an error pointer rather than the uninitialised value...


I don't think GCC is smart enough to detect that.



(btw, it does look like we might leak fdt if we have an error after we
successfully kmalloc it.)

Am I missing something? Can you link to the report for the kernel test
robot or from Dan?

FWIW, I think it's worth including this patch _anyway_ because initing
local variables is good practice, but I'm just not sure on the
justification.


I don't think local systematically initing local variables is a good practice at all, as it leads to 
bugs where you get a wrong value because of pathes where you forgot to set the correct value.


If you don't init local variable at declaration and forget to set it in some pathes, the compiler 
will detect it and warn you.
If you init the local variable with an arbitrary value at declaration and forget to set it later in 
some pathes, the compiler won't be able to detect it and you will go with the wrong value.


Christophe



Kind regards,
Daniel


Signed-off-by: Lakshmi Ramasubramanian 
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 
---
   arch/powerpc/kexec/elf_64.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 5a569bb51349..0051440c1f77 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -32,7 +32,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
int ret;
unsigned long kernel_load_addr;
unsigned long initrd_load_addr = 0, fdt_load_addr;
-   void *fdt;
+   void *fdt = NULL;
const void *slave_code;
struct elfhdr ehdr;
char *modified_cmdline = NULL;



thanks,
   -lakshmi