Re: [RFC PATCH 0/7] mm: Get rid of vmalloc_sync_(un)mappings()

2020-05-09 Thread Andy Lutomirski
On Sat, May 9, 2020 at 10:52 AM Joerg Roedel  wrote:
>
> On Fri, May 08, 2020 at 04:49:17PM -0700, Andy Lutomirski wrote:
> > On Fri, May 8, 2020 at 2:36 PM Joerg Roedel  wrote:
> > >
> > > On Fri, May 08, 2020 at 02:33:19PM -0700, Andy Lutomirski wrote:
> > > > On Fri, May 8, 2020 at 7:40 AM Joerg Roedel  wrote:
> > >
> > > > What's the maximum on other system types?  It might make more sense to
> > > > take the memory hit and pre-populate all the tables at boot so we
> > > > never have to sync them.
> > >
> > > Need to look it up for 5-level paging, with 4-level paging its 64 pages
> > > to pre-populate the vmalloc area.
> > >
> > > But that would not solve the problem on x86-32, which needs to
> > > synchronize unmappings on the PMD level.
> >
> > What changes in this series with x86-32?
>
> This series sets ARCH_PAGE_TABLE_SYNC_MASK to PGTBL_PMD_MODIFIED, so
> that the synchronization happens every time PMD(s) in the vmalloc areas
> are changed. Before this series this synchronization only happened at
> arbitrary places calling vmalloc_sync_(un)mappings().
>
> > We already do that synchronization, right?  IOW, in the cases where
> > the vmalloc *fault* code does anything at all, we should have a small
> > bound for how much memory to preallocate and, if we preallocate it,
> > then there is nothing to sync and nothing to fault.  And we have the
> > benefit that we never need to sync anything on 64-bit, which is kind
> > of nice.
>
> Don't really get you here, what is pre-allocated and why is there no
> need to sync and fault then?
>
> > Do we actually need PMD-level things for 32-bit?  What if we just
> > outlawed huge pages in the vmalloc space on 32-bit non-PAE?
>
> Disallowing huge-pages would at least remove the need to sync
> unmappings, but we still need to sync new PMD entries. Remember that the
> size of the vmalloc area on 32 bit is dynamic and depends on the VM-split
> and the actual amount of RAM on the system.
>
> A machine wit 512MB of RAM and a 1G/3G split will have around 2.5G of
> VMALLOC address space. And if we want to avoid vmalloc-faults there, we
> need to pre-allocate all PTE pages for that area (and the amount of PTE
> pages needed increases when RAM decreases).
>
> On a machine with 512M of RAM we would need ca. 1270+ PTE pages, which
> is around 5M (or 1% of total system memory).

I can never remember which P?D name goes with which level and which
machine type, but I don't think I agree with your math regardless.  On
x86, there are two fundamental situations that can occur:

1. Non-PAE.  There is a single 4k top-level page table per mm, and
this table contains either 512 or 1024 entries total. Of those
entries, some fraction (half or less) control the kernel address
space, and some fraction of *that* is for vmalloc space.  Those
entries are the *only* thing that needs syncing -- all mms will either
have null (not present) in those slots or will have pointers to the
*same* next-level-down directories.

2. PAE.  Depending on your perspective, there could be a grand total
of four top-level paging pointers, of which one (IIRC) is for the
kernel.  That points to the same place for all mms.  Or, if you look
at it the other way, PAE is just like #1 except that the top-level
table has only four entries and only one points to VMALLOC space.

So, unless I'm missing something here, there is an absolute maximum of
512 top-level entries that ever need to be synchronized.

Now, there's an additional complication.  On x86_64, we have a rule:
those entries that need to be synced start out null and may, during
the lifetime of the system, change *once*.  They are never unmapped or
modified after being allocated.  This means that those entries can
only ever point to a page *table* and not to a ginormous page.  So,
even if the hardware were to support ginormous pages (which, IIRC, it
doesn't), we would be limited to merely immense and not ginormous
pages in the vmalloc range.  On x86_32, I don't think we have this
rule right now.  And this means that it's possible for one of these
pages to be unmapped or modified.

So my suggestion is that just apply the x86_64 rule to x86_32 as well.
The practical effect will be that 2-level-paging systems will not be
able to use huge pages in the vmalloc range, since the rule will be
that the vmalloc-relevant entries in the top-level table must point to
page *tables* instead of huge pages.

On top of this, if we preallocate these entries, then the maximum
amount of memory we can possibly waste is 4k * (entries pointing to
vmalloc space - entries actually used for vmalloc space).  I don't
know what this number typically is, but I don't think it's very large.
Preallocating means that vmalloc faults *and* synchronization go away
entirely.  All of the page tables used for vmalloc will be entirely
shared by all mms, so all that's needed to modify vmalloc mappings is
to update init_mm and, if needed, flush TLBs.  No other page tables
will need modification at all.

On 

Re: [PATCH 2/2] fpga: zynq: Add AFI config driver

2020-05-09 Thread Moritz Fischer
Hi Nava,

On Mon, May 04, 2020 at 11:55:23AM +, Nava kishore Manne wrote:
> Hi Mortiz,
> 
> Thanks for proving the comments.
> Please find my response inline.
> 
> > -Original Message-
> > From: Moritz Fischer [mailto:m...@kernel.org]
> > Sent: Thursday, April 23, 2020 8:59 AM
> > To: Nava kishore Manne 
> > Cc: m...@kernel.org; Michal Simek ; linux-
> > f...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> > ker...@vger.kernel.org; chinnikishore...@gmail.com
> > Subject: Re: [PATCH 2/2] fpga: zynq: Add AFI config driver
> > 
> > Hi Nava,
> > 
> > On Wed, Apr 15, 2020 at 03:54:50PM +0530, Nava kishore Manne wrote:
> > > This patch Adds AFI config driver. This is useful for the PS to PL
> > > configuration for the fpga manager On zynq platform.
> > >
> > > Signed-off-by: Nava kishore Manne 
> > > ---
> > >  drivers/fpga/Kconfig|  8 +
> > >  drivers/fpga/Makefile   |  1 +
> > >  drivers/fpga/zynq-afi.c | 81
> > > +
> > >  3 files changed, 90 insertions(+)
> > >  create mode 100644 drivers/fpga/zynq-afi.c
> > >
> > > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig index
> > > 474f304e..60982a0 100644
> > > --- a/drivers/fpga/Kconfig
> > > +++ b/drivers/fpga/Kconfig
> > > @@ -214,4 +214,12 @@ config FPGA_MGR_ZYNQMP_FPGA
> > > to configure the programmable logic(PL) through PS
> > > on ZynqMP SoC.
> > >
> > > +config FPGA_MGR_ZYNQ_AFI_FPGA
> > > + bool "Xilinx AFI FPGA"
> > > + depends on FPGA_MGR_ZYNQ_FPGA
> > Curious. How does this dependency play in here?
> > > + help
> > > +   Zynq AFI driver support for writing to the AFI registers
> > > +   for configuring the PS_PL interface. For some of the bitstream
> > > +   or designs to work the PS to PL interfaces need to be configured
> > > +   like the data bus-width etc.
> > >  endif # FPGA
> > > diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile index
> > > 312b937..d115e29 100644
> > > --- a/drivers/fpga/Makefile
> > > +++ b/drivers/fpga/Makefile
> > > @@ -26,6 +26,7 @@ obj-$(CONFIG_FPGA_BRIDGE)   += fpga-
> > bridge.o
> > >  obj-$(CONFIG_SOCFPGA_FPGA_BRIDGE)+= altera-hps2fpga.o altera-
> > fpga2sdram.o
> > >  obj-$(CONFIG_ALTERA_FREEZE_BRIDGE)   += altera-freeze-bridge.o
> > >  obj-$(CONFIG_XILINX_PR_DECOUPLER)+= xilinx-pr-decoupler.o
> > > +obj-$(CONFIG_FPGA_MGR_ZYNQ_AFI_FPGA) += zynq-afi.o
> > >
> > >  # High Level Interfaces
> > >  obj-$(CONFIG_FPGA_REGION)+= fpga-region.o
> > > diff --git a/drivers/fpga/zynq-afi.c b/drivers/fpga/zynq-afi.c new
> > > file mode 100644 index 000..7ce0d08
> > > --- /dev/null
> > > +++ b/drivers/fpga/zynq-afi.c
> > > @@ -0,0 +1,81 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * Xilinx FPGA AFI driver.
> > > + * Copyright (c) 2018 Xilinx Inc.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +/* Registers and special values for doing register-based operations */
> > > +#define AFI_RDCHAN_CTRL_OFFSET   0x00
> > > +#define AFI_WRCHAN_CTRL_OFFSET   0x14
> > > +
> > > +#define AFI_BUSWIDTH_MASK0x01
> > > +
> > > +/**
> > > + * struct afi_fpga - AFI register description
> > > + * @membase: pointer to register struct
> > > + * @afi_width:   AFI bus width to be written
> > > + */
> > > +struct zynq_afi_fpga {
> > > + void __iomem*membase;
> > > + u32 afi_width;
> > > +};
> > > +
> > > +static int zynq_afi_fpga_probe(struct platform_device *pdev) {
> > > + struct zynq_afi_fpga *afi_fpga;
> > > + struct resource *res;
> > > + u32 reg_val;
> > > + u32 val;
> > > +
> > > + afi_fpga = devm_kzalloc(>dev, sizeof(*afi_fpga), GFP_KERNEL);
> > > + if (!afi_fpga)
> > > + return -ENOMEM;
> > > +
> > > + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > > + afi_fpga->membase = devm_ioremap_resource(>dev, res);
> > > + if (IS_ERR(afi_fpga->membase))
> > > + return PTR_ERR(afi_fpga->membase);
> > > +
> > > + val = device_property_read_u32(>dev, "xlnx,afi-width",
> > > +_fpga->afi_width);
> > > + if (val) {
> > > + dev_err(>dev, "Fail to get the afi bus width\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + reg_val = readl(afi_fpga->membase + AFI_RDCHAN_CTRL_OFFSET);
> > > + reg_val &= ~AFI_BUSWIDTH_MASK;
> > > + writel(reg_val | afi_fpga->afi_width,
> > > +afi_fpga->membase + AFI_RDCHAN_CTRL_OFFSET);
> > > + reg_val = readl(afi_fpga->membase + AFI_WRCHAN_CTRL_OFFSET);
> > > + reg_val &= ~AFI_BUSWIDTH_MASK;
> > > + writel(reg_val | afi_fpga->afi_width,
> > > +afi_fpga->membase + AFI_WRCHAN_CTRL_OFFSET);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static const struct of_device_id zynq_afi_fpga_ids[] = {
> > > + { .compatible = "xlnx,zynq-afi-fpga" },
> > > + { },
> > > +};
> > > +MODULE_DEVICE_TABLE(of, zynq_afi_fpga_ids);
> > > +
> > > +static struct platform_driver 

Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint

2020-05-09 Thread Jaegeuk Kim
On 05/09, Chao Yu wrote:
> On 2020/5/9 0:10, Jaegeuk Kim wrote:
> > Hi Sayali,
> > 
> > In order to address the perf regression, how about this?
> > 
> >>From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim 
> > Date: Fri, 8 May 2020 09:08:37 -0700
> > Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
> > 
> > There could be a scenario where f2fs_sync_node_pages gets
> > called during checkpoint, which in turn tries to flush
> > inline data and calls iput(). This results in deadlock as
> > iput() tries to hold cp_rwsem, which is already held at the
> > beginning by checkpoint->block_operations().
> > 
> > Call stack :
> > 
> > Thread AThread B
> > f2fs_write_checkpoint()
> > - block_operations(sbi)
> >  - f2fs_lock_all(sbi);
> >   - down_write(>cp_rwsem);
> > 
> > - open()
> >  - igrab()
> > - write() write inline data
> > - unlink()
> > - f2fs_sync_node_pages()
> >  - if (is_inline_node(page))
> >   - flush_inline_data()
> >- ilookup()
> >  page = f2fs_pagecache_get_page()
> >  if (!page)
> >   goto iput_out;
> >  iput_out:
> > -close()
> > -iput()
> >iput(inode);
> >- f2fs_evict_inode()
> > - f2fs_truncate_blocks()
> >  - f2fs_lock_op()
> >- down_read(>cp_rwsem);
> > 
> > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to 
> > inline_data")
> > Signed-off-by: Sayali Lokhande 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  fs/f2fs/node.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index 1db8cabf727ef..626d7daca09de 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> > goto continue_unlock;
> > }
> >  
> > -   /* flush inline_data */
> > -   if (is_inline_node(page)) {
> > +   /* flush inline_data, if it's not sync path. */
> > +   if (do_balance && is_inline_node(page)) {
> 
> IIRC, this flow was designed to avoid running out of free space issue
> during checkpoint:
> 
> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> 
> The sceanrio is:
> 1. create fully node blocks
> 2. flush node blocks
> 3. write inline_data for all the node blocks again
> 4. flush node blocks redundantly
> 
> I guess this may cause failing one case of fstest.

Yeah, actually I was hitting 204 failure, and thus, revised like this.
Now, I don't see any regression in fstest.

>From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
From: Sayali Lokhande 
Date: Thu, 30 Apr 2020 16:28:29 +0530
Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint

There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

Call stack :

Thread AThread B
f2fs_write_checkpoint()
- block_operations(sbi)
 - f2fs_lock_all(sbi);
  - down_write(>cp_rwsem);

- open()
 - igrab()
- write() write inline data
- unlink()
- f2fs_sync_node_pages()
 - if (is_inline_node(page))
  - flush_inline_data()
   - ilookup()
 page = f2fs_pagecache_get_page()
 if (!page)
  goto iput_out;
 iput_out:
-close()
-iput()
   iput(inode);
   - f2fs_evict_inode()
- f2fs_truncate_blocks()
 - f2fs_lock_op()
   - down_read(>cp_rwsem);

Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c |  9 -
 fs/f2fs/f2fs.h   |  4 ++--
 fs/f2fs/node.c   | 10 +-
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d49f7a01d8a26..928aea4ff663d 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
};
int err = 0, cnt = 0;
 
+   /*
+* Let's flush node pages first to flush inline_data.
+* We'll actually guarantee everything below under f2fs_lock_all.
+*/
+   f2fs_sync_node_pages(sbi, , false, false, FS_CP_NODE_IO);
+
 retry_flush_quotas:
f2fs_lock_all(sbi);
if (__need_flush_quota(sbi)) {
@@ -1222,7 +1228,8 @@ static int block_operations(struct f2fs_sb_info *sbi)
if (get_pages(sbi, F2FS_DIRTY_NODES)) {

Re: [PATCH v3] kernel: add panic_on_taint

2020-05-09 Thread Kees Cook
On Sat, May 09, 2020 at 09:57:37AM -0400, Rafael Aquini wrote:
> Analogously to the introduction of panic_on_warn, this patch
> introduces a kernel option named panic_on_taint in order to
> provide a simple and generic way to stop execution and catch
> a coredump when the kernel gets tainted by any given taint flag.
> 
> This is useful for debugging sessions as it avoids rebuilding
> the kernel to explicitly add calls to panic() or BUG() into
> code sites that introduce the taint flags of interest.
> Another, perhaps less frequent, use for this option would be
> as a mean for assuring a security policy (in paranoid mode)
> case where no single taint is allowed for the running system.
> 
> Suggested-by: Qian Cai 
> Signed-off-by: Rafael Aquini 

Reviewed-by: Kees Cook 

-- 
Kees Cook


Re: [PATCH] fpga: zynqmp: fix modular build

2020-05-09 Thread Moritz Fischer
On Tue, May 05, 2020 at 04:00:11PM +0200, Arnd Bergmann wrote:
> Two symbols need to be exported to allow the zynqmp-fpga module
> to get loaded dynamically:
> 
> ERROR: modpost: "zynqmp_pm_fpga_load" [drivers/fpga/zynqmp-fpga.ko] undefined!
> ERROR: modpost: "zynqmp_pm_fpga_get_status" [drivers/fpga/zynqmp-fpga.ko] 
> undefined!
> 
> To ensure this is done correctly, also fix the Kconfig dependency
> to only allow building the fpga driver when the firmware driver is
> either disabled, or when it is reachable. With that, the dependency
> on the SoC itself can be removed, and there are no surprises when
> the fpga driver is built-in but the firmware a module.
> 
> Fixes: 4db8180ffe7c ("firmware: xilinx: Remove eemi ops for fpga related 
> APIs")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/fpga/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> index b2408a710662..7cd5a29fc437 100644
> --- a/drivers/fpga/Kconfig
> +++ b/drivers/fpga/Kconfig
> @@ -208,7 +208,7 @@ config FPGA_DFL_PCI
>  
>  config FPGA_MGR_ZYNQMP_FPGA
>   tristate "Xilinx ZynqMP FPGA"
> - depends on ARCH_ZYNQMP || COMPILE_TEST
> + depends on ZYNQMP_FIRMWARE || (!ZYNQMP_FIRMWARE && COMPILE_TEST)
>   help
> FPGA manager driver support for Xilinx ZynqMP FPGAs.
> This driver uses the processor configuration port(PCAP)
> -- 
> 2.26.0
> 
Applied to fixes,

Thanks


My Dear in the lord

2020-05-09 Thread Mrs. Mina A. Brunel



My Dear in the lord


My name is Mrs. Mina A. Brunel I am a Norway Citizen who is living in Burkina 
Faso, I am married to Mr. Brunel Patrice, a politician who owns a small gold 
company in Burkina Faso; He died of Leprosy and Radesyge, in the year February 
2010, During his lifetime he deposited the sum of € 8.5 Million Euro) Eight 
million, Five hundred thousand Euros in a bank in Rome the capital city of 
Italy in Southern Europe. The money was from the sale of his company and death 
benefits payment and entitlements of my deceased husband by his company.

I am sending you this message with heavy tears in my eyes and great sorrow in 
my heart, and also praying that it will reach you in good health because I am 
not in good health, I sleep every night without knowing if I may be alive to 
see the next day. I am suffering from long time cancer and presently I am 
partially suffering from Leprosy, which has become difficult for me to move 
around. I was married to my late husband for more than 6 years without having a 
child and my doctor confided that I have less chance to live, having to know 
when the cup of death will come, I decided to contact you to claim the fund 
since I don't have any relation I grew up from an orphanage home.

I have decided to donate this money for the support of helping Motherless 
babies/Less privileged/Widows and churches also to build the house of God 
because I am dying and diagnosed with cancer for about 3 years ago. I have 
decided to donate from what I have inherited from my late husband to you for 
the good work of Almighty God; I will be going in for an operation surgery soon.

Now I want you to stand as my next of kin to claim the funds for charity 
purposes. Because of this money remains unclaimed after my death, the bank 
executives or the government will take the money as unclaimed fund and maybe 
use it for selfishness and worthless ventures, I need a very honest person who 
can claim this money and use it for Charity works, for orphanages, widows and 
also build schools and churches for less privilege that will be named after my 
late husband and my name.

I need your urgent answer to know if you will be able to execute this project, 
and I will give you more information on how the fund will be transferred to 
your bank account or online banking.

Thanks
Mrs. Mina A. Brunel


Re: [PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

2020-05-09 Thread Christophe Leroy




Le 09/05/2020 à 17:54, Christophe Leroy a écrit :



Le 28/04/2020 à 18:05, Arnd Bergmann a écrit :

On Tue, Apr 28, 2020 at 3:16 PM Christophe Leroy
 wrote:


Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.

Signed-off-by: Christophe Leroy 


Looks good to me

Reviewed-by: Arnd Bergmann 

There was a bug on ARM for the corresponding function, so far it is 
unclear
if this was a problem related to particular hardware, the 32-bit 
kernel code,

or the common implementation of clock_gettime64 in the vdso library,
see https://github.com/richfelker/musl-cross-make/issues/96

Just to be sure that powerpc is not affected by the same issue, can you
confirm that repeatedly calling clock_gettime64 on powerpc32, alternating
between vdso and syscall, results in monotically increasing times?



I think that's one of the things vdsotest checks, so yes that's ok I think.



Here is the full result with vdsotest:

gettimeofday: syscall: 3715 nsec/call
gettimeofday:libc: 794 nsec/call
gettimeofday:vdso: 947 nsec/call
getcpu: syscall: 1614 nsec/call
getcpu:libc: 484 nsec/call
getcpu:vdso: 184 nsec/call
clock-gettime64-realtime-coarse: syscall: 3152 nsec/call
clock-gettime64-realtime-coarse:libc: not tested
clock-gettime64-realtime-coarse:vdso: 653 nsec/call
clock-getres-realtime-coarse: syscall: 2963 nsec/call
clock-getres-realtime-coarse:libc: 745 nsec/call
clock-getres-realtime-coarse:vdso: 553 nsec/call
clock-gettime-realtime-coarse: syscall: 5120 nsec/call
clock-gettime-realtime-coarse:libc: 731 nsec/call
clock-gettime-realtime-coarse:vdso: 577 nsec/call
clock-gettime64-realtime: syscall: 3719 nsec/call
clock-gettime64-realtime:libc: not tested
clock-gettime64-realtime:vdso: 976 nsec/call
clock-getres-realtime: syscall: 2496 nsec/call
clock-getres-realtime:libc: 745 nsec/call
clock-getres-realtime:vdso: 546 nsec/call
clock-gettime-realtime: syscall: 4800 nsec/call
clock-gettime-realtime:libc: 1080 nsec/call
clock-gettime-realtime:vdso: 1798 nsec/call
clock-gettime64-boottime: syscall: 4132 nsec/call
clock-gettime64-boottime:libc: not tested
clock-gettime64-boottime:vdso: 975 nsec/call
clock-getres-boottime: syscall: 2497 nsec/call
clock-getres-boottime:libc: 730 nsec/call
clock-getres-boottime:vdso: 546 nsec/call
clock-gettime-boottime: syscall: 3728 nsec/call
clock-gettime-boottime:libc: 1079 nsec/call
clock-gettime-boottime:vdso: 941 nsec/call
clock-gettime64-tai: syscall: 4148 nsec/call
clock-gettime64-tai:libc: not tested
clock-gettime64-tai:vdso: 955 nsec/call
clock-getres-tai: syscall: 2494 nsec/call
clock-getres-tai:libc: 730 nsec/call
clock-getres-tai:vdso: 545 nsec/call
clock-gettime-tai: syscall: 3729 nsec/call
clock-gettime-tai:libc: 1079 nsec/call
clock-gettime-tai:vdso: 927 nsec/call
clock-gettime64-monotonic-raw: syscall: 3677 nsec/call
clock-gettime64-monotonic-raw:libc: not tested
clock-gettime64-monotonic-raw:vdso: 1032 nsec/call
clock-getres-monotonic-raw: syscall: 2494 nsec/call
clock-getres-monotonic-raw:libc: 745 nsec/call
clock-getres-monotonic-raw:vdso: 545 nsec/call
clock-gettime-monotonic-raw: syscall: 3276 nsec/call
clock-gettime-monotonic-raw:libc: 1140 nsec/call
clock-gettime-monotonic-raw:vdso: 1002 nsec/call
clock-gettime64-monotonic-coarse: syscall: 4099 nsec/call
clock-gettime64-monotonic-coarse:libc: not tested
clock-gettime64-monotonic-coarse:vdso: 653 nsec/call
clock-getres-monotonic-coarse: syscall: 2962 nsec/call
clock-getres-monotonic-coarse:libc: 745 nsec/call
clock-getres-monotonic-coarse:vdso: 545 nsec/call
clock-gettime-monotonic-coarse: syscall: 4297 nsec/call
clock-gettime-monotonic-coarse:libc: 730 nsec/call
clock-gettime-monotonic-coarse:vdso: 592 nsec/call
clock-gettime64-monotonic: syscall: 3863 nsec/call
clock-gettime64-monotonic:libc: not tested
clock-gettime64-monotonic:vdso: 975 nsec/call
clock-getres-monotonic: syscall: 2494 nsec/call
clock-getres-monotonic:libc: 745 nsec/call
clock-getres-monotonic:vdso: 545 nsec/call
clock-gettime-monotonic: syscall: 3465 nsec/call
clock-gettime-monotonic:libc: 1079 nsec/call
clock-gettime-monotonic:vdso: 927 nsec/call

Christophe


Re: [PATCH] staging: vc04_services: interface: vchiq_arm: vchiq_connected.c: Block comments should align the * on each line

2020-05-09 Thread John Oldman
Hi Greg

Yea, thanks for the feedback.

Will fix and resend...

John


On Sat, 9 May 2020 at 18:11, Greg KH  wrote:
>
> On Sat, May 09, 2020 at 02:07:14PM +0100, John Oldman wrote:
> > Coding style issue
>
> Your subject line needs to be much shorter, don't you think?
>
> Please fix up and resend.
>
> greg k-h


Re: [PATCH 00/15] net: taint when the device driver firmware crashes

2020-05-09 Thread Jakub Kicinski
On Sat,  9 May 2020 04:35:37 + Luis Chamberlain wrote:
> Device driver firmware can crash, and sometimes, this can leave your
> system in a state which makes the device or subsystem completely
> useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
> of scraping some magical words from the kernel log, which is driver
> specific, is much easier. So instead this series provides a helper which
> lets drivers annotate this and shows how to use this on networking
> drivers.
> 
> My methodology for finding when firmware crashes is to git grep for
> "crash" and then doing some study of the code to see if this indeed
> a place where the firmware crashes. In some places this is quite
> obvious.
> 
> I'm starting off with networking first, if this gets merged later on I
> can focus on the other drivers, but I already have some work done on
> other subsytems.
> 
> Review, flames, etc are greatly appreciated.

Tainting itself may be useful, but that's just the first step. I'd much
rather see folks start using the devlink health infrastructure. Devlink
is netlink based, but it's _not_ networking specific (many of its
optional features obviously are, but don't let that mislead you).

With devlink health we get (a) a standard notification on the failure; 
(b) information/state dump in a (somewhat) structured form, which can be
collected & shared with vendors; (c) automatic remediation (usually
device reset of some scope).

Now regarding the tainting - as I said it may be useful, but don't we
have to define what constitutes a "firmware crash"? There are many
failure modes, some perfectly recoverable (e.g. processing queue hang), 
some mere bugs (e.g. device fails to initialize some functions). All of
them may impact the functioning of the system. How do we choose those
that taint? 



Re: [PATCH v1 2/3] armv8: gpio: add gpio feature

2020-05-09 Thread Amit Tomer
> Oh, and... u-b...@linux.nxdi.nxp.com bounces because that domain is
> not resolvable - I guess that is internal to NXP, and this patch
> should have remained within NXP and not been posted publically.

 Yeah, realized it just after sending my reply. It is for internal NXP list.
 Had it been for open source U-boot, Author would have chosen
 "u-b...@lists.denx.de" ?

But, I still believe Author did it unknowingly and we all should be
easy on him/her.

Thanks
-Amit


Re: [PATCH v1 2/3] armv8: gpio: add gpio feature

2020-05-09 Thread Russell King - ARM Linux admin
On Sat, May 09, 2020 at 07:18:45PM +0100, Russell King - ARM Linux admin wrote:
> On Sat, May 09, 2020 at 11:34:59PM +0530, Amit Tomer wrote:
> > > From what I can tell, these patches are not for the kernel.  The
> > > filenames don't match th kernel layout.
> > 
> > These files looks to be from U-boot, and must be intended for U-boot
> > as I see U-boot mailing address in recipient's address?
> 
> So why is it copied to:
> 
> devicet...@vger.kernel.org - a kernel mailing list
> linux-kernel@vger.kernel.org - the main kernel mailing list
> linux-g...@vger.kernel.org - the gpio driver kernel mailing list
> linux-arm-ker...@lists.infradead.org - the ARM kernel mailing list
> 
> Given that it includes four kernel mailing lists (ok, devicetree
> may be argued to have a wider application), then I don't think the
> conclusion that "it's for u-boot, because there's _one_ u-boot
> mailing list in the recipients" is particularly obvious.
> 
> The author really needs to state that up front if they're sending
> it to a wide audeience, rather than leaving people to guess, thereby
> potentially wasting their time.
> 
> Not only did Andrew review the patch as if it were for the kernel,
> but I also wasted time on this as well when I double-took the
> ifdefs, and wanted to check the current driver in the kernel.

Oh, and... u-b...@linux.nxdi.nxp.com bounces because that domain is
not resolvable - I guess that is internal to NXP, and this patch
should have remained within NXP and not been posted publically.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up


Re: [PATCH v1 2/3] armv8: gpio: add gpio feature

2020-05-09 Thread Russell King - ARM Linux admin
On Sat, May 09, 2020 at 11:34:59PM +0530, Amit Tomer wrote:
> > From what I can tell, these patches are not for the kernel.  The
> > filenames don't match th kernel layout.
> 
> These files looks to be from U-boot, and must be intended for U-boot
> as I see U-boot mailing address in recipient's address?

So why is it copied to:

devicet...@vger.kernel.org - a kernel mailing list
linux-kernel@vger.kernel.org - the main kernel mailing list
linux-g...@vger.kernel.org - the gpio driver kernel mailing list
linux-arm-ker...@lists.infradead.org - the ARM kernel mailing list

Given that it includes four kernel mailing lists (ok, devicetree
may be argued to have a wider application), then I don't think the
conclusion that "it's for u-boot, because there's _one_ u-boot
mailing list in the recipients" is particularly obvious.

The author really needs to state that up front if they're sending
it to a wide audeience, rather than leaving people to guess, thereby
potentially wasting their time.

Not only did Andrew review the patch as if it were for the kernel,
but I also wasted time on this as well when I double-took the
ifdefs, and wanted to check the current driver in the kernel.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up


Re: [PATCH] net/sonic: Fix some resource leaks in error handling paths

2020-05-09 Thread Jakub Kicinski
On Sat, 9 May 2020 18:47:08 +0200 Christophe JAILLET wrote:
> Le 09/05/2020 à 03:54, Jakub Kicinski a écrit :
> > On Fri,  8 May 2020 19:25:57 +0200 Christophe JAILLET wrote:  
> >> @@ -527,8 +531,9 @@ static int mac_sonic_platform_remove(struct 
> >> platform_device *pdev)
> >>struct sonic_local* lp = netdev_priv(dev);
> >>   
> >>unregister_netdev(dev);
> >> -  dma_free_coherent(lp->device, SIZEOF_SONIC_DESC * 
> >> SONIC_BUS_SCALE(lp->dma_bitmode),
> >> -lp->descriptors, lp->descriptors_laddr);
> >> +  dma_free_coherent(lp->device,
> >> +SIZEOF_SONIC_DESC * SONIC_BUS_SCALE(lp->dma_bitmode),
> >> +lp->descriptors, lp->descriptors_laddr);
> >>free_netdev(dev);
> >>   
> >>return 0;  
> > This is a white-space only change, right? Since this is a fix we should
> > avoid making cleanups which are not strictly necessary.
>
> Right.
> 
> The reason of this clean-up is that I wanted to avoid a checkpatch 
> warning with the proposed patch and I felt that having the same layout 
> in the error handling path of the probe function and in the remove 
> function was clearer.
> So I updated also the remove function.

I understand the motivation is good.

> Fell free to ignore this hunk if not desired. I will not sent a V2 only 
> for that.

That's not how it works. Busy maintainers don't have time to hand edit
patches. I'm not applying this to the networking tree and I'm tossing it
from patchwork. Please address the basic feedback.

Thank you.


Re: [PATCH v1 2/3] armv8: gpio: add gpio feature

2020-05-09 Thread Amit Tomer
> From what I can tell, these patches are not for the kernel.  The
> filenames don't match th kernel layout.

These files looks to be from U-boot, and must be intended for U-boot
as I see U-boot mailing
address in recipient's address?


Re: [PATCH v2] net: Protect INET_ADDR_COOKIE on 32-bit architectures

2020-05-09 Thread Jakub Kicinski
On Sat, 9 May 2020 10:13:22 +0200 Stephen Kitt wrote:
> Hi,
> 
> Thanks for taking the time to review my patch.
> 
> On Fri, 8 May 2020 20:50:25 -0700, Jakub Kicinski  wrote:
> > On Fri,  8 May 2020 14:04:57 +0200 Stephen Kitt wrote:  
> > > Commit c7228317441f ("net: Use a more standard macro for
> > > INET_ADDR_COOKIE") added a __deprecated marker to the cookie name on
> > > 32-bit architectures, with the intent that the compiler would flag
> > > uses of the name. However since commit 771c035372a0 ("deprecate the
> > > '__deprecated' attribute warnings entirely and for good"),
> > > __deprecated doesn't do anything and should be avoided.
> > > 
> > > This patch changes INET_ADDR_COOKIE to declare a dummy struct so that
> > > any subsequent use of the cookie's name will in all likelihood break
> > > the build. It also removes the __deprecated marker.  
> > 
> > I think the macro is supposed to cause a warning when the variable
> > itself is accessed. And I don't think that happens with your patch
> > applied.  
> 
> Yes, the warning is what was lost when __deprecated lost its meaning. I was
> trying to preserve that, or rather extend it so that the build would break if
> the cookie was used on 32-bit architectures, and my patch ensures it does if
> the cookie is used in a comparison or assignment, but ...
> 
> > +   kfree();  
> 
> I hadn’t thought of taking a pointer to it.
> 
> If we want to preserve the use of the macro with a semi-colon, which is what
> Joe’s patch introduced (along with the deprecation warning), we still need
> some sort of declaration which can’t be used. Perhaps
> 
> #define INET_ADDR_COOKIE(__name, __saddr, __daddr) \
>   struct __name {} __attribute__((unused))
> 
> would be better — it declares the cookie as a struct, not a variable, so then
> the build fails if the cookie is used as anything other than a struct. If
> anyone does try to use it as a struct, the build will fail on 64-bit
> architectures...
> 
>   CC  net/ipv4/inet_hashtables.o
> net/ipv4/inet_hashtables.c: In function ‘__inet_lookup_established’:
> net/ipv4/inet_hashtables.c:362:9: error: ‘acookie’ undeclared (first use in 
> this function)
>   kfree();
>  ^~~
> net/ipv4/inet_hashtables.c:362:9: note: each undeclared identifier is 
> reported only once for each function it appears in
> make[2]: *** [scripts/Makefile.build:267: net/ipv4/inet_hashtables.o] Error 1
> make[1]: *** [scripts/Makefile.build:488: net/ipv4] Error 2
> make: *** Makefile:1722: net] Error 2

Hm. That does seem better. Although thinking about it - we will not get
a warning when someone declares a variable with the same name..

What if we went back to your original proposal of an empty struct but
added in an extern in front? That way we should get linker error on
pointer references.


Re: [RFC PATCH 0/7] mm: Get rid of vmalloc_sync_(un)mappings()

2020-05-09 Thread Joerg Roedel
On Sat, May 09, 2020 at 11:25:16AM +0200, Peter Zijlstra wrote:
> Right; it's just that the moment you do trigger it, it'll iterate that
> pgd_list and that is potentially huge. Then again, that's not a new
> problem.
> 
> I suppose we can deal with it if/when it becomes an actual problem.
> 
> I had a quick look and I think it might be possible to make it an RCU
> managed list. We'd have to remove the pgd_list entry in
> put_task_struct_rcu_user(). Then we can play games in sync_global_*()
> and use RCU iteration. New tasks (which can be missed in the RCU
> iteration) will already have a full clone of the PGD anyway.

Right, it should not be that difficult to rcu-protect the pgd-list, but
we can try that when it actually becomes a problem.


Joerg


Fw:

2020-05-09 Thread Chan Leap Phala

Gorące pozdrowienia,

Wysłałem ci ten list miesiąc temu, ale nie jestem pewien, czy go 
otrzymałeś, ponieważ nie otrzymałem od ciebie żadnej odpowiedzi, dlatego 
ponownie go wysyłam.


Jestem adwokatem Chan Leap Phala, osobistym prawnikiem mojego zmarłego 
klienta przed jego przedwczesną śmiercią z rodziną. Otrzymałem od jego 
banku mandat na dostarczenie / przedstawienie bliskich krewnych kwoty 
jego funduszu o wartości Siedemdziesiąt Osiem Milionów Pięćset Sto 
Osiemdziesięciu Tysięcy Dolarów Amerykańskich 78 580 000 USD, więc 
skontaktowałem się z Tobą. Po nieudanej próbie znalezienia krewnego 
mojego zmarłego klienta postanowiłem się z tobą skontaktować, ponieważ 
ma on to samo nazwisko i narodowość.


Proszę o kontakt w celu uzyskania dalszych informacji: 
(chan.l.ph...@gmail.com)


Z poważaniem
Barrister Chan Leap PHALA
Prawnik / Adwokaci
E-mail: chan.l.ph...@gmail.com


Re: [RFC PATCH 0/7] mm: Get rid of vmalloc_sync_(un)mappings()

2020-05-09 Thread Joerg Roedel
On Fri, May 08, 2020 at 04:49:17PM -0700, Andy Lutomirski wrote:
> On Fri, May 8, 2020 at 2:36 PM Joerg Roedel  wrote:
> >
> > On Fri, May 08, 2020 at 02:33:19PM -0700, Andy Lutomirski wrote:
> > > On Fri, May 8, 2020 at 7:40 AM Joerg Roedel  wrote:
> >
> > > What's the maximum on other system types?  It might make more sense to
> > > take the memory hit and pre-populate all the tables at boot so we
> > > never have to sync them.
> >
> > Need to look it up for 5-level paging, with 4-level paging its 64 pages
> > to pre-populate the vmalloc area.
> >
> > But that would not solve the problem on x86-32, which needs to
> > synchronize unmappings on the PMD level.
> 
> What changes in this series with x86-32?

This series sets ARCH_PAGE_TABLE_SYNC_MASK to PGTBL_PMD_MODIFIED, so
that the synchronization happens every time PMD(s) in the vmalloc areas
are changed. Before this series this synchronization only happened at
arbitrary places calling vmalloc_sync_(un)mappings().

> We already do that synchronization, right?  IOW, in the cases where
> the vmalloc *fault* code does anything at all, we should have a small
> bound for how much memory to preallocate and, if we preallocate it,
> then there is nothing to sync and nothing to fault.  And we have the
> benefit that we never need to sync anything on 64-bit, which is kind
> of nice.

Don't really get you here, what is pre-allocated and why is there no
need to sync and fault then?

> Do we actually need PMD-level things for 32-bit?  What if we just
> outlawed huge pages in the vmalloc space on 32-bit non-PAE?

Disallowing huge-pages would at least remove the need to sync
unmappings, but we still need to sync new PMD entries. Remember that the
size of the vmalloc area on 32 bit is dynamic and depends on the VM-split
and the actual amount of RAM on the system.

A machine wit 512MB of RAM and a 1G/3G split will have around 2.5G of
VMALLOC address space. And if we want to avoid vmalloc-faults there, we
need to pre-allocate all PTE pages for that area (and the amount of PTE
pages needed increases when RAM decreases).

On a machine with 512M of RAM we would need ca. 1270+ PTE pages, which
is around 5M (or 1% of total system memory).

Regards,

Joerg


[PATCH] drm/exynos: fimc: Add support for S5PV210 variant

2020-05-09 Thread Jonathan Bakker
S5PV210 can be trivially supported by this driver.  Only one of
its FIMC devices (#2) supports the same scaling values as Exynos4, and
it is marked by mainscaler-ext in the DTS (as it is for all of the
Exynos4 devices).  It's limits are the same as that of id's 0-2 of
Exynos4 so we don't even need to change the device id check.

It has been tested with a modified libdrm's test from
https://github.com/tobiasjakobi/libdrm/tree/ippv2

Signed-off-by: Jonathan Bakker 
---
 drivers/gpu/drm/exynos/exynos_drm_fimc.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimc.c 
b/drivers/gpu/drm/exynos/exynos_drm_fimc.c
index 29ab8be8604c..63e1b8ccb8e9 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_fimc.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_fimc.c
@@ -89,6 +89,7 @@ struct fimc_scaler {
  * @regs: memory mapped io registers.
  * @lock: locking of operations.
  * @clocks: fimc clocks.
+ * @num_clocks: number of fimc clocks
  * @sc: scaler infomations.
  * @pol: porarity of writeback.
  * @id: fimc id.
@@ -107,6 +108,7 @@ struct fimc_context {
void __iomem*regs;
spinlock_t  lock;
struct clk  *clocks[FIMC_CLKS_MAX];
+   int num_clocks;
struct fimc_scaler  sc;
int id;
int irq;
@@ -1183,7 +1185,7 @@ static int fimc_setup_clocks(struct fimc_context *ctx)
for (i = 0; i < FIMC_CLKS_MAX; i++)
ctx->clocks[i] = ERR_PTR(-EINVAL);
 
-   for (i = 0; i < FIMC_CLKS_MAX; i++) {
+   for (i = 0; i < ctx->num_clocks; i++) {
if (i == FIMC_CLK_WB_A || i == FIMC_CLK_WB_B)
dev = fimc_dev->parent;
else
@@ -1210,6 +1212,9 @@ int exynos_drm_check_fimc_device(struct device *dev)
 {
int id = of_alias_get_id(dev->of_node, "fimc");
 
+   if (!of_property_read_bool(dev->of_node, "samsung,mainscaler-ext"))
+   return -ENODEV;
+
if (id >= 0 && (BIT(id) & fimc_mask))
return 0;
return -ENODEV;
@@ -1277,6 +1282,11 @@ static int fimc_probe(struct platform_device *pdev)
ctx->dev = dev;
ctx->id = of_alias_get_id(dev->of_node, "fimc");
 
+   if (of_device_is_compatible(dev->of_node, "samsung,s5pv210-fimc"))
+   ctx->num_clocks = 2;
+   else
+   ctx->num_clocks = FIMC_CLKS_MAX;
+
/* construct formats/limits array */
num_formats = ARRAY_SIZE(fimc_formats) + ARRAY_SIZE(fimc_tiled_formats);
formats = devm_kcalloc(dev, num_formats, sizeof(*formats),
@@ -1409,6 +1419,7 @@ static const struct dev_pm_ops fimc_pm_ops = {
 static const struct of_device_id fimc_of_match[] = {
{ .compatible = "samsung,exynos4210-fimc" },
{ .compatible = "samsung,exynos4212-fimc" },
+   { .compatible = "samsung,s5pv210-fimc" },
{ },
 };
 MODULE_DEVICE_TABLE(of, fimc_of_match);
-- 
2.20.1



Mutual business proposal

2020-05-09 Thread reginadan1


Greeting,
My Name is Regina Daniel am a Business Consultant and I represent a group of 
company based in Gulf Region that wish to invest between US$10,000,000.00 TO  
US$550,000,000. 00 in foreign investment depending on your investment capacity 
based on the amount you can invest and manage. We are currently seeking means 
of expanding and relocating our business interest abroad in the following 
sectors: oil/Gas, real estate, construction stock, mining, transportation, 
health sector, tobacco or Communication Services.

Also in Agriculture or any other viable sector.  If you think you have a solid 
background and idea of making good profit in any of the mentioned business 
sectors or any other business  in your country, please write me for possible 
business co-operation and the investment amount you can handle.

Let me hear from you.
Yours sincerely,
Mr Regina Daniel




Re: WARNING in memtype_reserve

2020-05-09 Thread Jeremy Linton

Hi,

On 5/9/20 2:20 AM, syzbot wrote:

Hello,

syzbot found the following crash on:

HEAD commit:d5eeab8d Merge tag 'scsi-fixes' of git://git.kernel.org/pu..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1509363210
kernel config:  https://syzkaller.appspot.com/x/.config?x=b0212dbee046bc1f
dashboard link: https://syzkaller.appspot.com/bug?extid=353be47c9ce21b68b7ed
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
userspace arch: i386
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=168ee02c10
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=119f378810

The bug was bisected to:

commit 2bef9aed6f0e22391c8d4570749b1acc9bc3981e
Author: Jeremy Linton 
Date:   Mon May 4 20:13:48 2020 +

 usb: usbfs: correct kernel->user page attribute mismatch

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1701f16810
final crash:https://syzkaller.appspot.com/x/report.txt?x=1481f16810
console output: https://syzkaller.appspot.com/x/log.txt?x=1081f16810

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+353be47c9ce21b68b...@syzkaller.appspotmail.com
Fixes: 2bef9aed6f0e ("usb: usbfs: correct kernel->user page attribute mismatch")

[ cut here ]
memtype_reserve failed: [mem 0xff000-0x8fff], req write-back
WARNING: CPU: 1 PID: 7025 at arch/x86/mm/pat/memtype.c:589 
memtype_reserve+0x69f/0x820 arch/x86/mm/pat/memtype.c:589
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 7025 Comm: syz-executor254 Not tainted 5.7.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x188/0x20d lib/dump_stack.c:118
  panic+0x2e3/0x75c kernel/panic.c:221
  __warn.cold+0x2f/0x35 kernel/panic.c:582
  report_bug+0x27b/0x2f0 lib/bug.c:195
  fixup_bug arch/x86/kernel/traps.c:175 [inline]
  fixup_bug arch/x86/kernel/traps.c:170 [inline]
  do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
  do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
  invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:memtype_reserve+0x69f/0x820 arch/x86/mm/pat/memtype.c:589
Code: 48 8b 2c ed c0 00 29 88 e8 ae ad 3e 00 48 8d 4b ff 49 89 e8 4c 89 e2 48 c7 c6 
20 01 29 88 48 c7 c7 80 f9 28 88 e8 79 e8 0f 00 <0f> 0b 41 bf ea ff ff ff e9 03 
fc ff ff 41 bf ea ff ff ff e9 f8 fb
RSP: 0018:c900015e7790 EFLAGS: 00010282
RAX:  RBX: 9000 RCX: 
RDX:  RSI: 815ce181 RDI: f520002bcee4
RBP: 8828ff40 R08: 888097ce85c0 R09: ed1015ce45f1
R10: 8880ae722f83 R11: ed1015ce45f0 R12: 000ff000
R13: 1920002bcef8 R14: dc00 R15: 
  reserve_pfn_range+0x173/0x470 arch/x86/mm/pat/memtype.c:941
  track_pfn_remap+0x18b/0x280 arch/x86/mm/pat/memtype.c:1033
  remap_pfn_range+0x202/0xbf0 mm/memory.c:2130
  dma_direct_mmap+0x197/0x260 kernel/dma/direct.c:453
  dma_mmap_attrs+0xfe/0x150 kernel/dma/mapping.c:237
  usbdev_mmap+0x3ae/0x730 drivers/usb/core/devio.c:254
  call_mmap include/linux/fs.h:1912 [inline]
  mmap_region+0xafb/0x1540 mm/mmap.c:1772
  do_mmap+0x849/0x1160 mm/mmap.c:1545
  do_mmap_pgoff include/linux/mm.h:2553 [inline]
  vm_mmap_pgoff+0x197/0x200 mm/util.c:506
  ksys_mmap_pgoff+0x457/0x5b0 mm/mmap.c:1595
  do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline]
  do_fast_syscall_32+0x270/0xe90 arch/x86/entry/common.c:396
  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches



Adding a range check in dma_direct_mmap() looks like a fine fix. I'm 
curious how the mapping attribute changed though. I expected that would 
be rare on x86 (ISA devices/etc).


What is the GCE USB controller here? Its not the same as the virtual 
qemu/pci/xhci right?




Re: [PATCH] parisc: suppress error messages for 'make clean'

2020-05-09 Thread Helge Deller
* Masahiro Yamada :
> Hi Helge,
>
> On Sat, May 9, 2020 at 6:46 AM Helge Deller  wrote:
> >
> > * Masahiro Yamada :
> > > On Sat, Apr 25, 2020 at 2:47 PM Masahiro Yamada  
> > > wrote:
> > > >
> > > > 'make ARCH=parisc clean' emits a tons of error messages as follows:
> > > >
> > > >   $ make ARCH=parisc clean
> > > >   gcc: error: unrecognized command line option '-mno-space-regs'
> > > >   gcc: error: unrecognized command line option '-mfast-indirect-calls'; 
> > > > did you mean '-mforce-indirect-call'?
> > > >   gcc: error: unrecognized command line option '-mdisable-fpregs'
> > > >   gcc: error: missing argument to '-Wframe-larger-than='
> > > >   gcc: error: unrecognized command line option '-mno-space-regs'
> > > >   gcc: error: unrecognized command line option '-mfast-indirect-calls'; 
> > > > did you mean '-mforce-indirect-call'?
> > > >   gcc: error: unrecognized command line option '-mdisable-fpregs'
> > > >   gcc: error: missing argument to '-Wframe-larger-than='
> > > > ...
> > > >
> > > > You can supporess them except '-Wframe-larger-than' by setting correct
> > > > CROSS_COMPILE=, but we should not require any compiler for cleaning.
> > > >
> > > > This $(shell ...) is evaluated so many times because LIBGCC is exported.
> > > > Use the ':=' operator to evaluate it just once, and sink the stderr.
> > > >
> > >
> > > Applied to linux-kbuild.
> >
> > That patch breaks then building the boot loader/compressor:
> > ...
> >   hppa-linux-gnu-ld-X -e startup --as-needed -T 
> > arch/parisc/boot/compressed/vmlinux.lds arch/parisc/boot/compressed/head.o 
> > arch/parisc/boot/compressed/real2.o arch/parisc/boot/compressed/firmware.o 
> > arch/parisc/boot/compressed/misc.o arch/parisc/boot/compressed/piggy.o -o 
> > arch/parisc/boot/compressed/vmlinux
> > hppa-linux-gnu-ld: arch/parisc/boot/compressed/misc.o: in function 
> > `dec_vli':
> > (.text+0x104): undefined reference to `__ashldi3'
> > hppa-linux-gnu-ld: arch/parisc/boot/compressed/misc.o: in function 
> > `lzma_len':
> > (.text+0x2b0): undefined reference to `$$mulI'
> > hppa-linux-gnu-ld: (.text+0x344): undefined reference to `$$mulI'
> > hppa-linux-gnu-ld: (.text+0x3f8): undefined reference to `$$mulI'
> >
> >
> > The patch below works, but I wonder if it's possible to avoid
> > to examine LIBGCC twice?
> >
> > Helge
>
>
> Sorry for the breakage.
>
> How about moving LIBGCC below ?

Good idea.
The patch below does work for me.
We do not need $KBUILD_CFLAGS to get the libgcc.a filename,
so we do not need to pipe the output to /dev/null either.
Can you try if that works, and if yes, can you apply it?

Helge


diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile
index 628cd8bb7ad8..fadbbd010337 100644
--- a/arch/parisc/Makefile
+++ b/arch/parisc/Makefile
@@ -21,8 +21,6 @@ KBUILD_IMAGE := vmlinuz

 NM = sh $(srctree)/arch/parisc/nm
 CHECKFLAGS += -D__hppa__=1
-LIBGCC = $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)
-export LIBGCC

 ifdef CONFIG_64BIT
 UTS_MACHINE:= parisc64
@@ -110,6 +108,8 @@ cflags-$(CONFIG_PA8X00) += -march=2.0 
-mschedule=8000
 head-y := arch/parisc/kernel/head.o

 KBUILD_CFLAGS  += $(cflags-y)
+LIBGCC := $(shell $(CC) -print-libgcc-file-name)
+export LIBGCC

 kernel-y   := mm/ kernel/ math-emu/



[PATCH v11 04/18] x86/entry/64: Clean up paranoid exit

2020-05-09 Thread Sasha Levin
From: Andy Lutomirski 

All that paranoid exit needs to do is to disable IRQs, handle IRQ tracing,
then restore CR3, and restore GS base. Simply do those actions in that
order. Cleaning up the spaghetti code.

Signed-off-by: Andy Lutomirski 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
Cc: Vegard Nossum 
---
 arch/x86/entry/entry_64.S | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 0e9504fabe526..3adb3c8e2409b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1266,19 +1266,25 @@ SYM_CODE_END(paranoid_entry)
 SYM_CODE_START_LOCAL(paranoid_exit)
UNWIND_HINT_REGS
DISABLE_INTERRUPTS(CLBR_ANY)
+
+   /*
+* The order of operations is important. IRQ tracing requires
+* kernel GS base and CR3. RESTORE_CR3 requires kernel GS base.
+*
+* NB to anyone to try to optimize this code: this code does
+* not execute at all for exceptions from user mode. Those
+* exceptions go through error_exit instead.
+*/
TRACE_IRQS_OFF_DEBUG
-   testl   %ebx, %ebx  /* swapgs needed? */
-   jnz .Lparanoid_exit_no_swapgs
-   TRACE_IRQS_IRETQ
-   /* Always restore stashed CR3 value (see paranoid_entry) */
-   RESTORE_CR3 scratch_reg=%rbx save_reg=%r14
+   RESTORE_CR3 scratch_reg=%rax save_reg=%r14
+
+   /* If EBX is 0, SWAPGS is required */
+   testl   %ebx, %ebx
+   jnz restore_regs_and_return_to_kernel
+
+   /* We are returning to a context with user GS base */
SWAPGS_UNSAFE_STACK
jmp restore_regs_and_return_to_kernel
-.Lparanoid_exit_no_swapgs:
-   TRACE_IRQS_IRETQ_DEBUG
-   /* Always restore stashed CR3 value (see paranoid_entry) */
-   RESTORE_CR3 scratch_reg=%rbx save_reg=%r14
-   jmp restore_regs_and_return_to_kernel
 SYM_CODE_END(paranoid_exit)
 
 /*
-- 
2.20.1



[PATCH v11 10/18] x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions

2020-05-09 Thread Sasha Levin
From: "Chang S. Bae" 

Add CPU feature conditional FS/GS base access to the relevant helper
functions. That allows accelerating certain FS/GS base operations in
subsequent changes.

Note, that while possible, the user space entry/exit GS base operations are
not going to use the new FSGSBASE instructions. The reason is that it would
require additional storage for the user space value which adds more
complexity to the low level code and experiments have shown marginal
benefit. This may be revisited later but for now the SWAPGS based handling
in the entry code is preserved except for the paranoid entry/exit code.

Suggested-by: Tony Luck 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
Cc: Andrew Cooper 
---
 arch/x86/include/asm/fsgsbase.h | 27 +++
 arch/x86/kernel/process_64.c| 58 +
 2 files changed, 70 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index fdd1177499b40..aefd53767a5d4 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -49,35 +49,32 @@ static __always_inline void wrgsbase(unsigned long gsbase)
asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
 }
 
+#include 
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)
 {
unsigned long fsbase;
 
-   rdmsrl(MSR_FS_BASE, fsbase);
+   if (static_cpu_has(X86_FEATURE_FSGSBASE))
+   fsbase = rdfsbase();
+   else
+   rdmsrl(MSR_FS_BASE, fsbase);
 
return fsbase;
 }
 
-static inline unsigned long x86_gsbase_read_cpu_inactive(void)
-{
-   unsigned long gsbase;
-
-   rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
-
-   return gsbase;
-}
-
 static inline void x86_fsbase_write_cpu(unsigned long fsbase)
 {
-   wrmsrl(MSR_FS_BASE, fsbase);
+   if (static_cpu_has(X86_FEATURE_FSGSBASE))
+   wrfsbase(fsbase);
+   else
+   wrmsrl(MSR_FS_BASE, fsbase);
 }
 
-static inline void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
-{
-   wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
-}
+extern unsigned long x86_gsbase_read_cpu_inactive(void);
+extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
 
 #endif /* CONFIG_X86_64 */
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 5ef9d8f25b0e8..aaa65f284b9b9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -328,6 +328,64 @@ static unsigned long x86_fsgsbase_read_task(struct 
task_struct *task,
return base;
 }
 
+unsigned long x86_gsbase_read_cpu_inactive(void)
+{
+   unsigned long gsbase;
+
+   if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+   bool need_restore = false;
+   unsigned long flags;
+
+   /*
+* We read the inactive GS base value by swapping
+* to make it the active one. But we cannot allow
+* an interrupt while we switch to and from.
+*/
+   if (!irqs_disabled()) {
+   local_irq_save(flags);
+   need_restore = true;
+   }
+
+   native_swapgs();
+   gsbase = rdgsbase();
+   native_swapgs();
+
+   if (need_restore)
+   local_irq_restore(flags);
+   } else {
+   rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+   }
+
+   return gsbase;
+}
+
+void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
+{
+   if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+   bool need_restore = false;
+   unsigned long flags;
+
+   /*
+* We write the inactive GS base value by swapping
+* to make it the active one. But we cannot allow
+* an interrupt while we switch to and from.
+*/
+   if (!irqs_disabled()) {
+   local_irq_save(flags);
+   need_restore = true;
+   }
+
+   native_swapgs();
+   wrgsbase(gsbase);
+   native_swapgs();
+
+   if (need_restore)
+   local_irq_restore(flags);
+   } else {
+   wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+   }
+}
+
 unsigned long x86_fsbase_read_task(struct task_struct *task)
 {
unsigned long fsbase;
-- 
2.20.1



[PATCH v11 16/18] x86/fsgsbase/64: Enable FSGSBASE on 64bit by default and add a chicken bit

2020-05-09 Thread Sasha Levin
From: Andy Lutomirski 

Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable
FSGSBASE by default, and add nofsgsbase to disable it.

While this changes userspace visible ABI, we could not find a project
that would be affected by this. Few projects were contacted for input
and ack:

- 5-level EPT: 
http://lkml.kernel.org/r/9ddf602b-6c8b-8c1e-ab46-07ed12366...@redhat.com
- rr: https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html
- CRIU: https://lists.openvz.org/pipermail/criu/2018-March/040654.html

Signed-off-by: Andy Lutomirski 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
---
 .../admin-guide/kernel-parameters.txt |  3 +-
 arch/x86/kernel/cpu/common.c  | 32 ---
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index af3aaade195b8..1924845c879c2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3033,8 +3033,7 @@
no5lvl  [X86-64] Disable 5-level paging mode. Forces
kernel to use 4-level paging instead.
 
-   unsafe_fsgsbase [X86] Allow FSGSBASE instructions.  This will be
-   replaced with a nofsgsbase flag.
+   nofsgsbase  [X86] Disables FSGSBASE instructions.
 
no_console_suspend
[HW] Never suspend the console
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4224760c74e27..0d480cbadc7dc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -418,21 +418,21 @@ static void __init setup_cr_pinning(void)
static_key_enable(_pinning.key);
 }
 
-/*
- * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
- * updated. This allows us to get the kernel ready incrementally.
- *
- * Once all the pieces are in place, these will go away and be replaced with
- * a nofsgsbase chicken flag.
- */
-static bool unsafe_fsgsbase;
-
-static __init int setup_unsafe_fsgsbase(char *arg)
+static __init int x86_nofsgsbase_setup(char *arg)
 {
-   unsafe_fsgsbase = true;
+   /* Require an exact match without trailing characters. */
+   if (strlen(arg))
+   return 0;
+
+   /* Do not emit a message if the feature is not present. */
+   if (!boot_cpu_has(X86_FEATURE_FSGSBASE))
+   return 1;
+
+   setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);
+   pr_info("FSGSBASE disabled via kernel command line\n");
return 1;
 }
-__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+__setup("nofsgsbase", x86_nofsgsbase_setup);
 
 /*
  * Protection Keys are not available in 32-bit mode.
@@ -1495,12 +1495,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_umip(c);
 
/* Enable FSGSBASE instructions if available. */
-   if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
-   if (unsafe_fsgsbase)
-   cr4_set_bits(X86_CR4_FSGSBASE);
-   else
-   clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
-   }
+   if (cpu_has(c, X86_FEATURE_FSGSBASE))
+   cr4_set_bits(X86_CR4_FSGSBASE);
 
/*
 * The vendor-specific functions might have changed features.
-- 
2.20.1



[PATCH v11 12/18] x86/fsgsbase/64: move save_fsgs to header file

2020-05-09 Thread Sasha Levin
Given copy_thread_tls() is now shared between 32 and 64 bit and we need
to use save_fsgs() there, move it to a header file.

Signed-off-by: Sasha Levin 
---
 arch/x86/kernel/process.h| 68 
 arch/x86/kernel/process_64.c | 68 
 2 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/arch/x86/kernel/process.h b/arch/x86/kernel/process.h
index 1d0797b2338a2..e21b6669a3851 100644
--- a/arch/x86/kernel/process.h
+++ b/arch/x86/kernel/process.h
@@ -37,3 +37,71 @@ static inline void switch_to_extra(struct task_struct *prev,
 prev_tif & _TIF_WORK_CTXSW_PREV))
__switch_to_xtra(prev, next);
 }
+
+enum which_selector {
+   FS,
+   GS
+};
+
+/*
+ * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
+ * not available.  The goal is to be reasonably fast on non-FSGSBASE systems.
+ * It's forcibly inlined because it'll generate better code and this function
+ * is hot.
+ */
+static __always_inline void save_base_legacy(struct task_struct *prev_p,
+ unsigned short selector,
+ enum which_selector which)
+{
+   if (likely(selector == 0)) {
+   /*
+* On Intel (without X86_BUG_NULL_SEG), the segment base could
+* be the pre-existing saved base or it could be zero.  On AMD
+* (with X86_BUG_NULL_SEG), the segment base could be almost
+* anything.
+*
+* This branch is very hot (it's hit twice on almost every
+* context switch between 64-bit programs), and avoiding
+* the RDMSR helps a lot, so we just assume that whatever
+* value is already saved is correct.  This matches historical
+* Linux behavior, so it won't break existing applications.
+*
+* To avoid leaking state, on non-X86_BUG_NULL_SEG CPUs, if we
+* report that the base is zero, it needs to actually be zero:
+* see the corresponding logic in load_seg_legacy.
+*/
+   } else {
+   /*
+* If the selector is 1, 2, or 3, then the base is zero on
+* !X86_BUG_NULL_SEG CPUs and could be anything on
+* X86_BUG_NULL_SEG CPUs.  In the latter case, Linux
+* has never attempted to preserve the base across context
+* switches.
+*
+* If selector > 3, then it refers to a real segment, and
+* saving the base isn't necessary.
+*/
+   if (which == FS)
+   prev_p->thread.fsbase = 0;
+   else
+   prev_p->thread.gsbase = 0;
+   }
+}
+
+static __always_inline void save_fsgs(struct task_struct *task)
+{
+   savesegment(fs, task->thread.fsindex);
+   savesegment(gs, task->thread.gsindex);
+   if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+   /*
+* If FSGSBASE is enabled, we can't make any useful guesses
+* about the base, and user code expects us to save the current
+* value.  Fortunately, reading the base directly is efficient.
+*/
+   task->thread.fsbase = rdfsbase();
+   task->thread.gsbase = x86_gsbase_read_cpu_inactive();
+   } else {
+   save_base_legacy(task, task->thread.fsindex, FS);
+   save_base_legacy(task, task->thread.gsindex, GS);
+   }
+}
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e066750be89a0..4be88124d81ea 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -145,74 +145,6 @@ void release_thread(struct task_struct *dead_task)
WARN_ON(dead_task->mm);
 }
 
-enum which_selector {
-   FS,
-   GS
-};
-
-/*
- * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
- * not available.  The goal is to be reasonably fast on non-FSGSBASE systems.
- * It's forcibly inlined because it'll generate better code and this function
- * is hot.
- */
-static __always_inline void save_base_legacy(struct task_struct *prev_p,
-unsigned short selector,
-enum which_selector which)
-{
-   if (likely(selector == 0)) {
-   /*
-* On Intel (without X86_BUG_NULL_SEG), the segment base could
-* be the pre-existing saved base or it could be zero.  On AMD
-* (with X86_BUG_NULL_SEG), the segment base could be almost
-* anything.
-*
-* This branch is very hot (it's hit twice on almost every
-* context switch between 64-bit programs), and avoiding
-

[PATCH v11 13/18] x86/fsgsbase/64: Use FSGSBASE instructions on thread copy and ptrace

2020-05-09 Thread Sasha Levin
From: "Chang S. Bae" 

When FSGSBASE is enabled, copying threads and reading FS/GS base using
ptrace must read the actual values.

When copying a thread, use fsgs_save() and copy the saved values. For
ptrace, the bases must be read from memory regardless of the selector
if FSGSBASE is enabled.

Suggested-by: Andy Lutomirski 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
---
 arch/x86/kernel/process.c| 10 ++
 arch/x86/kernel/process_64.c |  6 --
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9da70b279dad8..3ebb56cc2cfee 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -127,6 +127,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned 
long sp,
struct inactive_task_frame *frame;
struct fork_frame *fork_frame;
struct pt_regs *childregs;
+   struct task_struct *me = current;
int ret = 0;
 
childregs = task_pt_regs(p);
@@ -140,10 +141,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned 
long sp,
memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
 
 #ifdef CONFIG_X86_64
-   savesegment(gs, p->thread.gsindex);
-   p->thread.gsbase = p->thread.gsindex ? 0 : current->thread.gsbase;
-   savesegment(fs, p->thread.fsindex);
-   p->thread.fsbase = p->thread.fsindex ? 0 : current->thread.fsbase;
+   save_fsgs(me);
+   p->thread.fsindex = me->thread.fsindex;
+   p->thread.fsbase = me->thread.fsbase;
+   p->thread.gsindex = me->thread.gsindex;
+   p->thread.gsbase = me->thread.gsbase;
savesegment(es, p->thread.es);
savesegment(ds, p->thread.ds);
 #else
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4be88124d81ea..57cdbbb0381ac 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -346,7 +346,8 @@ unsigned long x86_fsbase_read_task(struct task_struct *task)
 
if (task == current)
fsbase = x86_fsbase_read_cpu();
-   else if (task->thread.fsindex == 0)
+   else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+(task->thread.fsindex == 0))
fsbase = task->thread.fsbase;
else
fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
@@ -360,7 +361,8 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)
 
if (task == current)
gsbase = x86_gsbase_read_cpu_inactive();
-   else if (task->thread.gsindex == 0)
+   else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+(task->thread.gsindex == 0))
gsbase = task->thread.gsbase;
else
gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
-- 
2.20.1



[PATCH v11 14/18] x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation

2020-05-09 Thread Sasha Levin
From: Tony Luck 

Before enabling FSGSBASE the kernel could safely assume that the content
of GS base was a user address. Thus any speculative access as the result
of a mispredicted branch controlling the execution of SWAPGS would be to
a user address. So systems with speculation-proof SMAP did not need to
add additional LFENCE instructions to mitigate.

With FSGSBASE enabled a hostile user can set GS base to a kernel address.
So they can make the kernel speculatively access data they wish to leak
via a side channel. This means that SMAP provides no protection.

Add FSGSBASE as an additional condition to enable the fence-based SWAPGS
mitigation.

Signed-off-by: Tony Luck 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
---
 arch/x86/kernel/cpu/bugs.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ed54b3b21c396..487603ea51cd1 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -450,14 +450,12 @@ static void __init spectre_v1_select_mitigation(void)
 * If FSGSBASE is enabled, the user can put a kernel address in
 * GS, in which case SMAP provides no protection.
 *
-* [ NOTE: Don't check for X86_FEATURE_FSGSBASE until the
-* FSGSBASE enablement patches have been merged. ]
-*
 * If FSGSBASE is disabled, the user can only put a user space
 * address in GS.  That makes an attack harder, but still
 * possible if there's no SMAP protection.
 */
-   if (!smap_works_speculatively()) {
+   if (boot_cpu_has(X86_FEATURE_FSGSBASE) ||
+   !smap_works_speculatively()) {
/*
 * Mitigation can be provided from SWAPGS itself or
 * PTI as the CR3 write in the Meltdown mitigation
-- 
2.20.1



[PATCH v11 05/18] x86/entry/64: Switch CR3 before SWAPGS in paranoid entry

2020-05-09 Thread Sasha Levin
From: "Chang S. Bae" 

When FSGSBASE is enabled, the GS base handling in paranoid entry will need
to retrieve the kernel GS base which requires that the kernel page table is
active.

As the CR3 switch to the kernel page tables (PTI is active) does not depend
on kernel GS base, move the CR3 switch in front of the GS base handling.

Comment the EBX content while at it.

No functional change.

Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
Cc: Vegard Nossum 
---
 arch/x86/entry/entry_64.S | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3adb3c8e2409b..7f27626f8426f 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1220,15 +1220,7 @@ SYM_CODE_START_LOCAL(paranoid_entry)
cld
PUSH_AND_CLEAR_REGS save_ret=1
ENCODE_FRAME_POINTER 8
-   movl$1, %ebx
-   movl$MSR_GS_BASE, %ecx
-   rdmsr
-   testl   %edx, %edx
-   js  1f  /* negative -> in kernel */
-   SWAPGS
-   xorl%ebx, %ebx
 
-1:
/*
 * Always stash CR3 in %r14.  This value will be restored,
 * verbatim, at exit.  Needed if paranoid_entry interrupted
@@ -1238,16 +1230,31 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 * This is also why CS (stashed in the "iret frame" by the
 * hardware at entry) can not be used: this may be a return
 * to kernel code, but with a user CR3 value.
+*
+* Switching CR3 does not depend on kernel GS base so it can
+* be done before switching to the kernel GS base. This is
+* required for FSGSBASE because the kernel GS base has to
+* be retrieved from a kernel internal table.
 */
SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
 
+   /* EBX = 1 -> kernel GSBASE active, no restore required */
+   movl$1, %ebx
/*
-* The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
-* unconditional CR3 write, even in the PTI case.  So do an lfence
-* to prevent GS speculation, regardless of whether PTI is enabled.
+* The kernel-enforced convention is a negative GS base indicates
+* a kernel value. No SWAPGS needed on entry and exit.
 */
-   FENCE_SWAPGS_KERNEL_ENTRY
+   movl$MSR_GS_BASE, %ecx
+   rdmsr
+   testl   %edx, %edx
+   jns .Lparanoid_entry_swapgs
+   ret
 
+.Lparanoid_entry_swapgs:
+   SWAPGS
+   FENCE_SWAPGS_KERNEL_ENTRY
+   /* EBX = 0 -> SWAPGS required on exit */
+   xorl%ebx, %ebx
ret
 SYM_CODE_END(paranoid_entry)
 
-- 
2.20.1



[PATCH v11 09/18] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions

2020-05-09 Thread Sasha Levin
From: Andi Kleen 

[ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
  make  safe to include on 32-bit kernels. ]

Signed-off-by: Andi Kleen 
Signed-off-by: Andy Lutomirski 
Signed-off-by: Chang S. Bae 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Sasha Levin 
Reviewed-by: Andy Lutomirski 
Reviewed-by: Andi Kleen 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: H. Peter Anvin 
Cc: Andi Kleen 
---
 arch/x86/include/asm/fsgsbase.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index bca4c743de77c..fdd1177499b40 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -19,6 +19,36 @@ extern unsigned long x86_gsbase_read_task(struct task_struct 
*task);
 extern void x86_fsbase_write_task(struct task_struct *task, unsigned long 
fsbase);
 extern void x86_gsbase_write_task(struct task_struct *task, unsigned long 
gsbase);
 
+/* Must be protected by X86_FEATURE_FSGSBASE check. */
+
+static __always_inline unsigned long rdfsbase(void)
+{
+   unsigned long fsbase;
+
+   asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory");
+
+   return fsbase;
+}
+
+static __always_inline unsigned long rdgsbase(void)
+{
+   unsigned long gsbase;
+
+   asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory");
+
+   return gsbase;
+}
+
+static __always_inline void wrfsbase(unsigned long fsbase)
+{
+   asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory");
+}
+
+static __always_inline void wrgsbase(unsigned long gsbase)
+{
+   asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
+}
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)
-- 
2.20.1



[PATCH v11 08/18] x86/entry/64: Document GSBASE handling in the paranoid path

2020-05-09 Thread Sasha Levin
From: "Chang S. Bae" 

On FSGSBASE systems, the way to handle GS base in the paranoid path is
different from the existing SWAPGS-based entry/exit path handling. Document
the reason and what has to be done for FSGSBASE enabled systems.

Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
---
 Documentation/x86/entry_64.rst | 9 +
 1 file changed, 9 insertions(+)

diff --git a/Documentation/x86/entry_64.rst b/Documentation/x86/entry_64.rst
index a48b3f6ebbe87..0499a40723af3 100644
--- a/Documentation/x86/entry_64.rst
+++ b/Documentation/x86/entry_64.rst
@@ -108,3 +108,12 @@ We try to only use IST entries and the paranoid entry code 
for vectors
 that absolutely need the more expensive check for the GS base - and we
 generate all 'normal' entry points with the regular (faster) paranoid=0
 variant.
+
+On FSGSBASE systems, however, user space can set GS without kernel
+interaction. It means the value of GS base itself does not imply anything,
+whether a kernel value or a user space value. So, there is no longer a safe
+way to check whether the exception is entering from user mode or kernel
+mode in the paranoid entry code path. So the GS base value needs to be read
+out, saved and the kernel GS base value written. On exit, the saved GS base
+value needs to be restored unconditionally. The non-paranoid entry/exit
+code still uses SWAPGS unconditionally as the state is known.
-- 
2.20.1



[PATCH v11 07/18] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit

2020-05-09 Thread Sasha Levin
From: "Chang S. Bae" 

Without FSGSBASE, user space cannot change GS base other than through a
PRCTL. The kernel enforces that the user space GS base value is positive
as negative values are used for detecting the kernel space GS base value
in the paranoid entry code.

If FSGSBASE is enabled, user space can set arbitrary GS base values without
kernel intervention, including negative ones, which breaks the paranoid
entry assumptions.

To avoid this, paranoid entry needs to unconditionally save the current
GS base value independent of the interrupted context, retrieve and write
the kernel GS base and unconditionally restore the saved value on exit.
The restore happens either in paranoid exit or in the special exit path of
the NMI low level code.

All other entry code paths which use unconditional SWAPGS are not affected
as they do not depend on the actual content.

The new logic for paranoid entry, when FSGSBASE is enabled, removes SWAPGS
and replaces with unconditional WRGSBASE. Hence no fences are needed.

Suggested-by: H. Peter Anvin 
Suggested-by: Andy Lutomirski 
Suggested-by: Thomas Gleixner 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Acked-by: Tom Lendacky 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
Cc: Tom Lendacky 
Cc: Vegard Nossum 
---
 arch/x86/entry/calling.h  |  6 +++
 arch/x86/entry/entry_64.S | 78 ++-
 2 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 0eb134e18b7a9..5f3a8ecaddc2d 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -340,6 +340,12 @@ For 32-bit we have the following conventions - kernel is 
built with
 #endif
 .endm
 
+.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
+   rdgsbase \save_reg
+   GET_PERCPU_BASE \scratch_reg
+   wrgsbase \scratch_reg
+.endm
+
 #endif /* CONFIG_X86_64 */
 
 .macro STACKLEAK_ERASE
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 7f27626f8426f..a4fd01c8f2970 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "calling.h"
@@ -1211,9 +1212,14 @@ idtentry machine_check   do_mce  
has_error_code=0paranoid=1
 #endif
 
 /*
- * Save all registers in pt_regs, and switch gs if needed.
- * Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ * Save all registers in pt_regs. Return GS base related information
+ * in EBX depending on the availability of the FSGSBASE instructions:
+ *
+ * FSGSBASER/EBX
+ * N0 -> SWAPGS on exit
+ *  1 -> no SWAPGS on exit
+ *
+ * YGS base value at entry, must be restored in paranoid_exit
  */
 SYM_CODE_START_LOCAL(paranoid_entry)
UNWIND_HINT_FUNC
@@ -1238,7 +1244,29 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 */
SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
 
-   /* EBX = 1 -> kernel GSBASE active, no restore required */
+   /*
+* Handling GS base depends on the availability of FSGSBASE.
+*
+* Without FSGSBASE the kernel enforces that negative GS base
+* values indicate kernel GS base. With FSGSBASE no assumptions
+* can be made about the GS base value when entering from user
+* space.
+   */
+   ALTERNATIVE "jmp .Lparanoid_entry_checkgs", "", X86_FEATURE_FSGSBASE
+
+   /*
+* Read the current GS base and store it in %rbx unconditionally,
+* retrieve and set the current CPUs kernel GS base. The stored value
+* has to be restored in paranoid_exit unconditionally.
+*
+* This unconditional write of GS base ensures no subsequent load
+* based on a mispredicted GS base.
+*/
+   SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
+   ret
+
+.Lparanoid_entry_checkgs:
+   /* EBX = 1 -> kernel GS base active, no restore required */
movl$1, %ebx
/*
 * The kernel-enforced convention is a negative GS base indicates
@@ -1265,10 +1293,17 @@ SYM_CODE_END(paranoid_entry)
  *
  * We may be returning to very strange contexts (e.g. very early
  * in syscall entry), so checking for preemption here would
- * be complicated.  Fortunately, we there's no good reason
- * to try to handle preemption here.
+ * be complicated.  Fortunately, there's no good reason to try
+ * to handle preemption here.
+ *
+ * R/EBX contains the GS base related information depending on the
+ * availability of the FSGSBASE instructions:
+ *
+ * FSGSBASER/EBX
+ * N0 -> SWAPGS on exit
+ *  1 -> no SWAPGS on exit
  *
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ * YUser space GS base, must be 

[PATCH v11 15/18] selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE

2020-05-09 Thread Sasha Levin
From: "Chang S. Bae" 

This validates that GS selector and base are independently preserved in
ptrace commands.

Suggested-by: Andy Lutomirski 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
---
 tools/testing/selftests/x86/fsgsbase.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c 
b/tools/testing/selftests/x86/fsgsbase.c
index 950a48b2e3662..9a4349813a30a 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -465,7 +465,7 @@ static void test_ptrace_write_gsbase(void)
wait();
 
if (WSTOPSIG(status) == SIGTRAP) {
-   unsigned long gs;
+   unsigned long gs, base;
unsigned long gs_offset = USER_REGS_OFFSET(gs);
unsigned long base_offset = USER_REGS_OFFSET(gs_base);
 
@@ -481,6 +481,7 @@ static void test_ptrace_write_gsbase(void)
err(1, "PTRACE_POKEUSER");
 
gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
+   base = ptrace(PTRACE_PEEKUSER, child, base_offset, NULL);
 
/*
 * In a non-FSGSBASE system, the nonzero selector will load
@@ -501,8 +502,14 @@ static void test_ptrace_write_gsbase(void)
 */
if (gs == 0)
printf("\tNote: this is expected behavior on 
older kernels.\n");
+   } else if (have_fsgsbase && (base != 0xFF)) {
+   nerrs++;
+   printf("[FAIL]\tGSBASE changed to %lx\n", base);
} else {
-   printf("[OK]\tGS remained 0x%hx\n", *shared_scratch);
+   printf("[OK]\tGS remained 0x%hx", *shared_scratch);
+   if (have_fsgsbase)
+   printf(" and GSBASE changed to 0xFF");
+   printf("\n");
}
}
 
-- 
2.20.1



[PATCH v11 11/18] x86/fsgsbase/64: Use FSGSBASE in switch_to() if available

2020-05-09 Thread Sasha Levin
From: Andy Lutomirski 

With the new FSGSBASE instructions, FS/GS base can be efficiently read
and written in __switch_to(). Use that capability to preserve the full
state.

This will enable user code to do whatever it wants with the new
instructions without any kernel-induced gotchas.  (There can still be
architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GS base
if WRGSBASE was used, but users are expected to read the CPU manual
before doing things like that.)

This is a considerable speedup. It seems to save about 100 cycles per
context switch compared to the baseline 4.6-rc1 behavior on a Skylake
laptop.

[ chang: 5~10% performance improvements were seen by a context switch
  benchmark that ran threads with different FS/GS base values (to the
  baseline 4.16). ]

Signed-off-by: Andy Lutomirski 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
---
 arch/x86/kernel/process_64.c | 34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index aaa65f284b9b9..e066750be89a0 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -199,8 +199,18 @@ static __always_inline void save_fsgs(struct task_struct 
*task)
 {
savesegment(fs, task->thread.fsindex);
savesegment(gs, task->thread.gsindex);
-   save_base_legacy(task, task->thread.fsindex, FS);
-   save_base_legacy(task, task->thread.gsindex, GS);
+   if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+   /*
+* If FSGSBASE is enabled, we can't make any useful guesses
+* about the base, and user code expects us to save the current
+* value.  Fortunately, reading the base directly is efficient.
+*/
+   task->thread.fsbase = rdfsbase();
+   task->thread.gsbase = x86_gsbase_read_cpu_inactive();
+   } else {
+   save_base_legacy(task, task->thread.fsindex, FS);
+   save_base_legacy(task, task->thread.gsindex, GS);
+   }
 }
 
 #if IS_ENABLED(CONFIG_KVM)
@@ -279,10 +289,22 @@ static __always_inline void load_seg_legacy(unsigned 
short prev_index,
 static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
  struct thread_struct *next)
 {
-   load_seg_legacy(prev->fsindex, prev->fsbase,
-   next->fsindex, next->fsbase, FS);
-   load_seg_legacy(prev->gsindex, prev->gsbase,
-   next->gsindex, next->gsbase, GS);
+   if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+   /* Update the FS and GS selectors if they could have changed. */
+   if (unlikely(prev->fsindex || next->fsindex))
+   loadseg(FS, next->fsindex);
+   if (unlikely(prev->gsindex || next->gsindex))
+   loadseg(GS, next->gsindex);
+
+   /* Update the bases. */
+   wrfsbase(next->fsbase);
+   x86_gsbase_write_cpu_inactive(next->gsbase);
+   } else {
+   load_seg_legacy(prev->fsindex, prev->fsbase,
+   next->fsindex, next->fsbase, FS);
+   load_seg_legacy(prev->gsindex, prev->gsbase,
+   next->gsindex, next->gsbase, GS);
+   }
 }
 
 static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
-- 
2.20.1



[PATCH v11 17/18] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2

2020-05-09 Thread Sasha Levin
From: Andi Kleen 

The kernel needs to explicitly enable FSGSBASE. So, the application needs
to know if it can safely use these instructions. Just looking at the CPUID
bit is not enough because it may be running in a kernel that does not
enable the instructions.

One way for the application would be to just try and catch the SIGILL.
But that is difficult to do in libraries which may not want to overwrite
the signal handlers of the main application.

Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF
aux vector. AT_HWCAP2 is already used by PPC for similar purposes.

The application can access it open coded or by using the getauxval()
function in newer versions of glibc.

Signed-off-by: Andi Kleen 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
---
 arch/x86/include/uapi/asm/hwcap2.h | 3 +++
 arch/x86/kernel/cpu/common.c   | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/hwcap2.h 
b/arch/x86/include/uapi/asm/hwcap2.h
index 8b2effe6efb82..5fdfcb47000f9 100644
--- a/arch/x86/include/uapi/asm/hwcap2.h
+++ b/arch/x86/include/uapi/asm/hwcap2.h
@@ -5,4 +5,7 @@
 /* MONITOR/MWAIT enabled in Ring 3 */
 #define HWCAP2_RING3MWAIT  (1 << 0)
 
+/* Kernel allows FSGSBASE instructions available in Ring 3 */
+#define HWCAP2_FSGSBASEBIT(1)
+
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0d480cbadc7dc..b5a086ea34258 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1495,8 +1495,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_umip(c);
 
/* Enable FSGSBASE instructions if available. */
-   if (cpu_has(c, X86_FEATURE_FSGSBASE))
+   if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
cr4_set_bits(X86_CR4_FSGSBASE);
+   elf_hwcap2 |= HWCAP2_FSGSBASE;
+   }
 
/*
 * The vendor-specific functions might have changed features.
-- 
2.20.1



[PATCH v11 18/18] Documentation/x86/64: Add documentation for GS/FS addressing mode

2020-05-09 Thread Sasha Levin
From: Thomas Gleixner 

Explain how the GS/FS based addressing can be utilized in user space
applications along with the differences between the generic prctl() based
GS/FS base control and the FSGSBASE version available on newer CPUs.

Originally-by: Andi Kleen 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Reviewed-by: Randy Dunlap 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
Cc: Randy Dunlap 
Cc: Jonathan Corbet 
---
 Documentation/x86/x86_64/fsgs.rst  | 199 +
 Documentation/x86/x86_64/index.rst |   1 +
 2 files changed, 200 insertions(+)
 create mode 100644 Documentation/x86/x86_64/fsgs.rst

diff --git a/Documentation/x86/x86_64/fsgs.rst 
b/Documentation/x86/x86_64/fsgs.rst
new file mode 100644
index 0..50960e09e1f66
--- /dev/null
+++ b/Documentation/x86/x86_64/fsgs.rst
@@ -0,0 +1,199 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Using FS and GS segments in user space applications
+===
+
+The x86 architecture supports segmentation. Instructions which access
+memory can use segment register based addressing mode. The following
+notation is used to address a byte within a segment:
+
+  Segment-register:Byte-address
+
+The segment base address is added to the Byte-address to compute the
+resulting virtual address which is accessed. This allows to access multiple
+instances of data with the identical Byte-address, i.e. the same code. The
+selection of a particular instance is purely based on the base-address in
+the segment register.
+
+In 32-bit mode the CPU provides 6 segments, which also support segment
+limits. The limits can be used to enforce address space protections.
+
+In 64-bit mode the CS/SS/DS/ES segments are ignored and the base address is
+always 0 to provide a full 64bit address space. The FS and GS segments are
+still functional in 64-bit mode.
+
+Common FS and GS usage
+--
+
+The FS segment is commonly used to address Thread Local Storage (TLS). FS
+is usually managed by runtime code or a threading library. Variables
+declared with the '__thread' storage class specifier are instantiated per
+thread and the compiler emits the FS: address prefix for accesses to these
+variables. Each thread has its own FS base address so common code can be
+used without complex address offset calculations to access the per thread
+instances. Applications should not use FS for other purposes when they use
+runtimes or threading libraries which manage the per thread FS.
+
+The GS segment has no common use and can be used freely by
+applications. GCC and Clang support GS based addressing via address space
+identifiers.
+
+Reading and writing the FS/GS base address
+--
+
+There exist two mechanisms to read and write the FS/GS base address:
+
+ - the arch_prctl() system call
+
+ - the FSGSBASE instruction family
+
+Accessing FS/GS base with arch_prctl()
+--
+
+ The arch_prctl(2) based mechanism is available on all 64-bit CPUs and all
+ kernel versions.
+
+ Reading the base:
+
+   arch_prctl(ARCH_GET_FS, );
+   arch_prctl(ARCH_GET_GS, );
+
+ Writing the base:
+
+   arch_prctl(ARCH_SET_FS, fsbase);
+   arch_prctl(ARCH_SET_GS, gsbase);
+
+ The ARCH_SET_GS prctl may be disabled depending on kernel configuration
+ and security settings.
+
+Accessing FS/GS base with the FSGSBASE instructions
+---
+
+ With the Ivy Bridge CPU generation Intel introduced a new set of
+ instructions to access the FS and GS base registers directly from user
+ space. These instructions are also supported on AMD Family 17H CPUs. The
+ following instructions are available:
+
+  === ===
+  RDFSBASE %reg   Read the FS base register
+  RDGSBASE %reg   Read the GS base register
+  WRFSBASE %reg   Write the FS base register
+  WRGSBASE %reg   Write the GS base register
+  === ===
+
+ The instructions avoid the overhead of the arch_prctl() syscall and allow
+ more flexible usage of the FS/GS addressing modes in user space
+ applications. This does not prevent conflicts between threading libraries
+ and runtimes which utilize FS and applications which want to use it for
+ their own purpose.
+
+FSGSBASE instructions enablement
+
+ The instructions are enumerated in CPUID leaf 7, bit 0 of EBX. If
+ available /proc/cpuinfo shows 'fsgsbase' in the flag entry of the CPUs.
+
+ The availability of the instructions does not enable them
+ automatically. The kernel has to enable them explicitly in CR4. The
+ reason for this is that older kernels make assumptions about the values in
+ the GS register and enforce them when GS base is set via
+ arch_prctl(). Allowing user 

[PATCH v11 06/18] x86/entry/64: Introduce the FIND_PERCPU_BASE macro

2020-05-09 Thread Sasha Levin
From: "Chang S. Bae" 

GS base is used to find per-CPU data in the kernel. But when GS base is
unknown, the per-CPU base can be found from the per_cpu_offset table with a
CPU NR.  The CPU NR is extracted from the limit field of the CPUNODE entry
in GDT, or by the RDPID instruction. This is a prerequisite for using
FSGSBASE in the low level entry code.

Also, add the GAS-compatible RDPID macro as binutils 2.21 does not support
it. Support is added in version 2.27.

Suggested-by: H. Peter Anvin 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
Cc: Vegard Nossum 
---
 arch/x86/entry/calling.h| 34 ++
 arch/x86/include/asm/inst.h | 15 +++
 2 files changed, 49 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 0789e13ece905..0eb134e18b7a9 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
 
@@ -347,6 +348,39 @@ For 32-bit we have the following conventions - kernel is 
built with
 #endif
 .endm
 
+#ifdef CONFIG_SMP
+
+/*
+ * CPU/node NR is loaded from the limit (size) field of a special segment
+ * descriptor entry in GDT.
+ */
+.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
+   movq$__CPUNODE_SEG, \reg
+   lsl \reg, \reg
+.endm
+
+/*
+ * Fetch the per-CPU GS base value for this processor and put it in @reg.
+ * We normally use %gs for accessing per-CPU data, but we are setting up
+ * %gs here and obviously can not use %gs itself to access per-CPU data.
+ */
+.macro GET_PERCPU_BASE reg:req
+   ALTERNATIVE \
+   "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
+   "RDPID  \reg", \
+   X86_FEATURE_RDPID
+   andq$VDSO_CPUNODE_MASK, \reg
+   movq__per_cpu_offset(, \reg, 8), \reg
+.endm
+
+#else
+
+.macro GET_PERCPU_BASE reg:req
+   movqpcpu_unit_offsets(%rip), \reg
+.endm
+
+#endif /* CONFIG_SMP */
+
 /*
  * This does 'call enter_from_user_mode' unless we can avoid it based on
  * kernel config or using the static jump infrastructure.
diff --git a/arch/x86/include/asm/inst.h b/arch/x86/include/asm/inst.h
index f5a796da07f88..d063841a17e39 100644
--- a/arch/x86/include/asm/inst.h
+++ b/arch/x86/include/asm/inst.h
@@ -306,6 +306,21 @@
.endif
MODRM 0xc0 movq_r64_xmm_opd1 movq_r64_xmm_opd2
.endm
+
+.macro RDPID opd
+   REG_TYPE rdpid_opd_type \opd
+   .if rdpid_opd_type == REG_TYPE_R64
+   R64_NUM rdpid_opd \opd
+   .else
+   R32_NUM rdpid_opd \opd
+   .endif
+   .byte 0xf3
+   .if rdpid_opd > 7
+   PFX_REX rdpid_opd 0
+   .endif
+   .byte 0x0f, 0xc7
+   MODRM 0xc0 rdpid_opd 0x7
+.endm
 #endif
 
 #endif
-- 
2.20.1



[PATCH v11 02/18] selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write

2020-05-09 Thread Sasha Levin
From: "Chang S. Bae" 

The test validates that the selector is not changed when a ptracer writes
the ptracee's GS base.

Originally-by: Andy Lutomirski 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
---
 tools/testing/selftests/x86/fsgsbase.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c 
b/tools/testing/selftests/x86/fsgsbase.c
index 15a329da59fa3..950a48b2e3662 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -465,7 +465,7 @@ static void test_ptrace_write_gsbase(void)
wait();
 
if (WSTOPSIG(status) == SIGTRAP) {
-   unsigned long gs, base;
+   unsigned long gs;
unsigned long gs_offset = USER_REGS_OFFSET(gs);
unsigned long base_offset = USER_REGS_OFFSET(gs_base);
 
@@ -481,7 +481,6 @@ static void test_ptrace_write_gsbase(void)
err(1, "PTRACE_POKEUSER");
 
gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
-   base = ptrace(PTRACE_PEEKUSER, child, base_offset, NULL);
 
/*
 * In a non-FSGSBASE system, the nonzero selector will load
@@ -489,11 +488,21 @@ static void test_ptrace_write_gsbase(void)
 * selector value is changed or not by the GSBASE write in
 * a ptracer.
 */
-   if (gs == 0 && base == 0xFF) {
-   printf("[OK]\tGS was reset as expected\n");
-   } else {
+   if (gs != *shared_scratch) {
nerrs++;
-   printf("[FAIL]\tGS=0x%lx, GSBASE=0x%lx (should be 0, 
0xFF)\n", gs, base);
+   printf("[FAIL]\tGS changed to %lx\n", gs);
+
+   /*
+* On older kernels, poking a nonzero value into the
+* base would zero the selector.  On newer kernels,
+* this behavior has changed -- poking the base
+* changes only the base and, if FSGSBASE is not
+* available, this may not effect.
+*/
+   if (gs == 0)
+   printf("\tNote: this is expected behavior on 
older kernels.\n");
+   } else {
+   printf("[OK]\tGS remained 0x%hx\n", *shared_scratch);
}
}
 
-- 
2.20.1



[PATCH v11 01/18] x86/ptrace: Prevent ptrace from clearing the FS/GS selector

2020-05-09 Thread Sasha Levin
From: "Chang S. Bae" 

When a ptracer writes a ptracee's FS/GS base with a different value, the
selector is also cleared. While this behavior is incorrect as the selector
should be preserved, most userspace applications did not notice that as
they do not use non-zero segments to begin with.

Instead, with this patch, when a tracee sets the base we will let it do
so without clearing the selector.

The change above means that a tracee that already has a selector set
will fail in an attempt to set the base - the change won't stick and the
value will be instead based on the value of the selector. As with the
above, we haven't found userspace that would be affected by this change.

Suggested-by: Andy Lutomirski 
Signed-off-by: Chang S. Bae 
[sasha: rewrite commit message]
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
---
 arch/x86/kernel/ptrace.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index f0e1ddbc2fd78..cc56efb75d275 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -380,25 +380,12 @@ static int putreg(struct task_struct *child,
case offsetof(struct user_regs_struct,fs_base):
if (value >= TASK_SIZE_MAX)
return -EIO;
-   /*
-* When changing the FS base, use do_arch_prctl_64()
-* to set the index to zero and to set the base
-* as requested.
-*
-* NB: This behavior is nonsensical and likely needs to
-* change when FSGSBASE support is added.
-*/
-   if (child->thread.fsbase != value)
-   return do_arch_prctl_64(child, ARCH_SET_FS, value);
+   x86_fsbase_write_task(child, value);
return 0;
case offsetof(struct user_regs_struct,gs_base):
-   /*
-* Exactly the same here as the %fs handling above.
-*/
if (value >= TASK_SIZE_MAX)
return -EIO;
-   if (child->thread.gsbase != value)
-   return do_arch_prctl_64(child, ARCH_SET_GS, value);
+   x86_gsbase_write_task(child, value);
return 0;
 #endif
}
-- 
2.20.1



[PATCH v11 03/18] x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE

2020-05-09 Thread Sasha Levin
From: Andy Lutomirski 

This is temporary.  It will allow the next few patches to be tested
incrementally.

Setting unsafe_fsgsbase is a root hole.  Don't do it.

Signed-off-by: Andy Lutomirski 
Signed-off-by: Chang S. Bae 
Signed-off-by: Sasha Levin 
Reviewed-by: Tony Luck 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Dave Hansen 
Cc: Tony Luck 
Cc: Andi Kleen 
---
 .../admin-guide/kernel-parameters.txt |  3 +++
 arch/x86/kernel/cpu/common.c  | 24 +++
 2 files changed, 27 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 7bc83f3d9bdfe..af3aaade195b8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3033,6 +3033,9 @@
no5lvl  [X86-64] Disable 5-level paging mode. Forces
kernel to use 4-level paging instead.
 
+   unsafe_fsgsbase [X86] Allow FSGSBASE instructions.  This will be
+   replaced with a nofsgsbase flag.
+
no_console_suspend
[HW] Never suspend the console
Disable suspending of consoles during suspend and
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index bed0cb83fe245..4224760c74e27 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -418,6 +418,22 @@ static void __init setup_cr_pinning(void)
static_key_enable(_pinning.key);
 }
 
+/*
+ * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
+ * updated. This allows us to get the kernel ready incrementally.
+ *
+ * Once all the pieces are in place, these will go away and be replaced with
+ * a nofsgsbase chicken flag.
+ */
+static bool unsafe_fsgsbase;
+
+static __init int setup_unsafe_fsgsbase(char *arg)
+{
+   unsafe_fsgsbase = true;
+   return 1;
+}
+__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1478,6 +1494,14 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_smap(c);
setup_umip(c);
 
+   /* Enable FSGSBASE instructions if available. */
+   if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
+   if (unsafe_fsgsbase)
+   cr4_set_bits(X86_CR4_FSGSBASE);
+   else
+   clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
+   }
+
/*
 * The vendor-specific functions might have changed features.
 * Now we do "generic changes."
-- 
2.20.1



[PATCH v11 00/18] Enable FSGSBASE instructions

2020-05-09 Thread Sasha Levin
Benefits:
Currently a user process that wishes to read or write the FS/GS base must
make a system call. But recent X86 processors have added new instructions
for use in 64-bit mode that allow direct access to the FS and GS segment
base addresses.  The operating system controls whether applications can
use these instructions with a %cr4 control bit.

In addition to benefits to applications, performance improvements to the
OS context switch code are possible by making use of these instructions. A
third party reported out promising performance numbers out of their
initial benchmarking of the previous version of this patch series [9].

Enablement check:
The kernel provides information about the enabled state of FSGSBASE to
applications using the ELF_AUX vector. If the HWCAP2_FSGSBASE bit is set in
the AUX vector, the kernel has FSGSBASE instructions enabled and
applications can use them.

Kernel changes:
Major changes made in the kernel are in context switch, paranoid path, and
ptrace. In a context switch, a task's FS/GS base will be secured regardless
of its selector. In the paranoid path, GS base is unconditionally
overwritten to the kernel GS base on entry and the original GS base is
restored on exit. Ptrace includes divergence of FS/GS index and base
values.

Security:
For mitigating the Spectre v1 SWAPGS issue, LFENCE instructions were added
on most kernel entries. Those patches are dependent on previous behaviors
that users couldn't load a kernel address into the GS base. These patches
change that assumption since the user can load any address into GS base.
The changes to the kernel entry path in this patch series take account of
the SWAPGS issue.

Changes from v10:

 - Rewrite the commit message for patch #1.
 - Document communication/acks from userspace projects that are
   potentially affected by this.

Andi Kleen (2):
  x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
  x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2

Andy Lutomirski (4):
  x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  x86/entry/64: Clean up paranoid exit
  x86/fsgsbase/64: Use FSGSBASE in switch_to() if available
  x86/fsgsbase/64: Enable FSGSBASE on 64bit by default and add a chicken
bit

Chang S. Bae (9):
  x86/ptrace: Prevent ptrace from clearing the FS/GS selector
  selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base
write
  x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
  x86/entry/64: Introduce the FIND_PERCPU_BASE macro
  x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
  x86/entry/64: Document GSBASE handling in the paranoid path
  x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
  x86/fsgsbase/64: Use FSGSBASE instructions on thread copy and ptrace
  selftests/x86/fsgsbase: Test ptracer-induced GS base write with
FSGSBASE

Sasha Levin (1):
  x86/fsgsbase/64: move save_fsgs to header file

Thomas Gleixner (1):
  Documentation/x86/64: Add documentation for GS/FS addressing mode

Tony Luck (1):
  x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation

 .../admin-guide/kernel-parameters.txt |   2 +
 Documentation/x86/entry_64.rst|   9 +
 Documentation/x86/x86_64/fsgs.rst | 199 ++
 Documentation/x86/x86_64/index.rst|   1 +
 arch/x86/entry/calling.h  |  40 
 arch/x86/entry/entry_64.S | 131 +---
 arch/x86/include/asm/fsgsbase.h   |  45 +++-
 arch/x86/include/asm/inst.h   |  15 ++
 arch/x86/include/uapi/asm/hwcap2.h|   3 +
 arch/x86/kernel/cpu/bugs.c|   6 +-
 arch/x86/kernel/cpu/common.c  |  22 ++
 arch/x86/kernel/process.c |  10 +-
 arch/x86/kernel/process.h |  68 ++
 arch/x86/kernel/process_64.c  | 142 +++--
 arch/x86/kernel/ptrace.c  |  17 +-
 tools/testing/selftests/x86/fsgsbase.c|  24 ++-
 16 files changed, 605 insertions(+), 129 deletions(-)
 create mode 100644 Documentation/x86/x86_64/fsgs.rst

-- 
2.20.1



Re: [PATCH v1 2/3] armv8: gpio: add gpio feature

2020-05-09 Thread Russell King - ARM Linux admin
On Sat, May 09, 2020 at 05:33:15PM +0200, Andrew Lunn wrote:
> On Sat, May 09, 2020 at 06:39:55PM +0800, Hui Song wrote:
> > From: "hui.song" 
> > 
> > add one struct mpc8xxx_gpio_plat to enable gpio feature.
> > 
> > Signed-off-by: hui.song 
> > ---
> >  .../include/asm/arch-fsl-layerscape/gpio.h| 22 +++
> >  1 file changed, 22 insertions(+)
> >  create mode 100644 arch/arm/include/asm/arch-fsl-layerscape/gpio.h
> > 
> > diff --git a/arch/arm/include/asm/arch-fsl-layerscape/gpio.h 
> > b/arch/arm/include/asm/arch-fsl-layerscape/gpio.h
> > new file mode 100644
> > index 00..d8dd750a72
> > --- /dev/null
> > +++ b/arch/arm/include/asm/arch-fsl-layerscape/gpio.h
> > @@ -0,0 +1,22 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ */
> > +/*
> > + * Copyright 2014 Freescale Semiconductor, Inc.
> > + */
> > +
> > +/*
> > + * Dummy header file to enable CONFIG_OF_CONTROL.
> > + * If CONFIG_OF_CONTROL is enabled, lib/fdtdec.c is compiled.
> > + * It includes  via , so those SoCs that 
> > enable
> > + * OF_CONTROL must have arch/gpio.h.
> > + */
> 
> This does not seem right. You would expect each sub arch to have a
> subdirectory in arch/arm/include/asm/ when in fact none do.

>From what I can tell, these patches are not for the kernel.  The
filenames don't match th kernel layout.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up


Re: [PATCH net] net: broadcom: Imply BROADCOM_PHY for BCMGENET

2020-05-09 Thread Florian Fainelli




On 5/9/2020 12:38 AM, Geert Uytterhoeven wrote:

Hi Florian,

Thanks for your patch!

On Sat, May 9, 2020 at 12:32 AM Florian Fainelli  wrote:

The GENET controller on the Raspberry Pi 4 (2711) is typically
interfaced with an external Broadcom PHY via a RGMII electrical
interface. To make sure that delays are properly configured at the PHY
side, ensure that we get a chance to have the dedicated Broadcom PHY
driver (CONFIG_BROADCOM_PHY) enabled for this to happen.


I guess it can be interfaced to a different external PHY, too?


Yes, although this has not happened yet to the best of my knowledge.




Fixes: 402482a6a78e ("net: bcmgenet: Clear ID_MODE_DIS in EXT_RGMII_OOB_CTRL when 
not needed")
Reported-by: Marek Szyprowski 
Signed-off-by: Florian Fainelli 



--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -69,6 +69,7 @@ config BCMGENET
 select BCM7XXX_PHY
 select MDIO_BCM_UNIMAC
 select DIMLIB
+   imply BROADCOM_PHY if ARCH_BCM2835


Which means support for the BROADCOM_PHY is always included
on ARCH_BCM2835, even if a different PHY is used?


It is included by default on  and can be deselected if needed, which is 
exactly what we want here, a sane default, but without the inflexibility 
of "select".





 help
   This driver supports the built-in Ethernet MACs found in the
   Broadcom BCM7xxx Set Top Box family chipset.


Gr{oetje,eeting}s,

 Geert



--
Florian


Re: [PATCH] staging: vc04_services: interface: vchiq_arm: vchiq_connected.c: Block comments should align the * on each line

2020-05-09 Thread Greg KH
On Sat, May 09, 2020 at 02:07:14PM +0100, John Oldman wrote:
> Coding style issue

Your subject line needs to be much shorter, don't you think?

Please fix up and resend.

greg k-h


Re: [PATCH v2] usb: raw-gadget: fix gadget endpoint selection

2020-05-09 Thread Greg Kroah-Hartman
On Sat, May 09, 2020 at 11:02:13AM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Andrey Konovalov  writes:
> >> here you're changing userspace ABI. Aren't we going to possibly break
> >> some existing applications?
> >
> > Hi Felipe,
> >
> >  I've been working on tests for Raw Gadget for the last few weeks [1],
> > which revealed a few problems with the interface. This isn't yet
> > included into any released kernel, so my hope that changing the ABI is
> > OK during the rc stage.
> 
> Fair enough. If that's okay with Greg, it's okay with me, but then how
> do we include it into the -rc seen as it's not really a fix?
> 
> Greg, are you okay with me including such large patches during the -rc?
> They essentially add new IOCTLs to the raw-gadget interface.

Yes, as the driver only went in for -rc1, it's fine to add fixes like
this so late as we don't want to ship a -final with it in broken form.

thanks,

greg k-h


Re: [PATCH] staging: vt6656: vt6655: clean Makefiles

2020-05-09 Thread Greg Kroah-Hartman
On Sat, May 09, 2020 at 11:07:27AM +0200, Matej Dujava wrote:
> This patch is removing CFLAGS that are defining flags that are not used.

You are also modifying the indentation and moving lines around for no
reason :(

Please only do one thing for a patch, and always describe everything you
do in the changelog text.

Can you fix this up and send a v2?

thanks,

greg k-h


Re: [PATCH] Driver: Adding helper macro for platform_driver boilerplate.

2020-05-09 Thread Greg KH
On Sat, May 09, 2020 at 03:37:16PM +0530, Harshal Chaudhari wrote:
> From: Harshal 
> 
> For simple module that contain a single platform_driver without any 
> additional setup code then ends up being a block
> of duplicated boilerplate.
> 
> This patch add a new micro, module_platform_driver(), which replace the 
> module_init()/module_exit() registrations
> with template functions.

Please properly wrap your changelog text.

And fixup the Subject line to match other patches for this driver.

And the text needs to be revised a bit to say what this really is doing,
as what you describe is not correct.

> 
> Signed-off-by: harshal chaudhari 

Finally, this does not match your "From:" line for the patch, which
means I couldn't take it even if all of the above was correct :(

Please fix up and resend.

thanks,

greg k-h


[PATCH fixes] powerpc/40x: Make more space for system call exception

2020-05-09 Thread Christophe Leroy
When CONFIG_VIRT_CPU_ACCOUNTING is selected, system call exception
handler doesn't fit below 0xd00 and build fails.

As exception 0xd00 doesn't exist and is never generated by 40x,
comment it out in order to get more space for system call exception.

Reported-by: kbuild test robot 
Fixes: 9e27086292aa ("powerpc/32: Warn and return ENOSYS on syscalls from 
kernel")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_40x.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 9bb663977e84..2cec543c38f0 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -344,8 +344,9 @@ _ENTRY(saved_ksp_limit)
 /* 0x0C00 - System Call Exception */
START_EXCEPTION(0x0C00, SystemCall)
SYSCALL_ENTRY   0xc00
+/* Trap_0D is commented out to get more space for system call exception */
 
-   EXCEPTION(0x0D00, Trap_0D, unknown_exception, EXC_XFER_STD)
+/* EXCEPTION(0x0D00, Trap_0D, unknown_exception, EXC_XFER_STD) */
EXCEPTION(0x0E00, Trap_0E, unknown_exception, EXC_XFER_STD)
EXCEPTION(0x0F00, Trap_0F, unknown_exception, EXC_XFER_STD)
 
-- 
2.25.0



Re: [PATCH -next v2] locking/osq_lock: annotate a data race in osq_lock

2020-05-09 Thread Qian Cai



> On May 9, 2020, at 12:12 PM, Paul E. McKenney  wrote:
> 
> Ah, and I forgot to ask.  Why "if (data_race(prev->next == node)" instead
> of "if (data_race(prev->next) == node"?

I think the one you suggested is slightly better to point out the exact race. 
Do you want me to resend or you could squash it instead?

Re: [PATCH] net/sonic: Fix some resource leaks in error handling paths

2020-05-09 Thread Christophe JAILLET

Le 09/05/2020 à 03:54, Jakub Kicinski a écrit :

On Fri,  8 May 2020 19:25:57 +0200 Christophe JAILLET wrote:

@@ -527,8 +531,9 @@ static int mac_sonic_platform_remove(struct platform_device 
*pdev)
struct sonic_local* lp = netdev_priv(dev);
  
  	unregister_netdev(dev);

-   dma_free_coherent(lp->device, SIZEOF_SONIC_DESC * 
SONIC_BUS_SCALE(lp->dma_bitmode),
- lp->descriptors, lp->descriptors_laddr);
+   dma_free_coherent(lp->device,
+ SIZEOF_SONIC_DESC * SONIC_BUS_SCALE(lp->dma_bitmode),
+ lp->descriptors, lp->descriptors_laddr);
free_netdev(dev);
  
  	return 0;

This is a white-space only change, right? Since this is a fix we should
avoid making cleanups which are not strictly necessary.


Right.

The reason of this clean-up is that I wanted to avoid a checkpatch 
warning with the proposed patch and I felt that having the same layout 
in the error handling path of the probe function and in the remove 
function was clearer.

So I updated also the remove function.

Fell free to ignore this hunk if not desired. I will not sent a V2 only 
for that.


CJ



Re: [PATCH 01/15] taint: add module firmware crash taint support

2020-05-09 Thread Luis Chamberlain
On Sat, May 09, 2020 at 11:18:29AM -0400, Rafael Aquini wrote:
> We are still missing the documentation bits for this
> new flag, though.

Ah yeah sorry about that.

> How about having a blurb similar to:
> 
> diff --git a/Documentation/admin-guide/tainted-kernels.rst 
> b/Documentation/admin-guide/tainted-kernels.rst
> index 71e9184a9079..5c6a9e2478b0 100644
> --- a/Documentation/admin-guide/tainted-kernels.rst
> +++ b/Documentation/admin-guide/tainted-kernels.rst
> @@ -100,6 +100,7 @@ Bit  Log  Number  Reason that got the kernel tainted
>   15  _/K   32768  kernel has been live patched
>   16  _/X   65536  auxiliary taint, defined for and used by distros
>   17  _/T  131072  kernel was built with the struct randomization plugin
> + 18  _/Q  262144  driver firmware crash annotation
>  ===  ===  ==  
> 
>  Note: The character ``_`` is representing a blank in this table to make 
> reading
> @@ -162,3 +163,7 @@ More detailed explanation for tainting
>   produce extremely unusual kernel structure layouts (even performance
>   pathological ones), which is important to know when debugging. Set at
>   build time.
> +
> + 18) ``Q`` Device drivers might annotate the kernel with this taint, in cases
> + their firmware might have crashed leaving the driver in a crippled and
> + potentially useless state.

Sure, I'll modify it a bit to add the use case to help with support
issues, ie, to help rule out firmware issues.

I'm starting to think that to make this even more usesul later we may
want to add a uevent to add_taint() so that userspace can decide to look
into this, ignore it, or report something to the user, say on their
desktop.

  Luis


Re: [EXT] [PATCH 09/15] qed: use new module_firmware_crashed()

2020-05-09 Thread Luis Chamberlain
On Sat, May 09, 2020 at 09:32:51AM +0300, Igor Russkikh wrote:
> 
> > This makes use of the new module_firmware_crashed() to help
> > annotate when firmware for device drivers crash. When firmware
> > crashes devices can sometimes become unresponsive, and recovery
> > sometimes requires a driver unload / reload and in the worst cases
> > a reboot.
> > 
> > Using a taint flag allows us to annotate when this happens clearly.
> > 
> > Cc: Ariel Elior 
> > Cc: gr-everest-linux...@marvell.com
> > Signed-off-by: Luis Chamberlain 
> > ---
> >  drivers/net/ethernet/qlogic/qed/qed_debug.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c
> > b/drivers/net/ethernet/qlogic/qed/qed_debug.c
> > index f4eebaabb6d0..9cc6287b889b 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_debug.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c
> > @@ -7854,6 +7854,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> > *buffer)
> >  REGDUMP_HEADER_SIZE,
> >  _size);
> > if (!rc) {
> > +   module_firmware_crashed();
> > *(u32 *)((u8 *)buffer + offset) =
> > qed_calc_regdump_header(cdev,
> > PROTECTION_OVERRIDE,
> > cur_engine,
> > @@ -7869,6 +7870,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> > *buffer)
> > rc = qed_dbg_fw_asserts(cdev, (u8 *)buffer + offset +
> > REGDUMP_HEADER_SIZE,
> > _size);
> > if (!rc) {
> > +   module_firmware_crashed();
> > *(u32 *)((u8 *)buffer + offset) =
> > qed_calc_regdump_header(cdev, FW_ASSERTS,
> > cur_engine,
> > feature_size,
> > @@ -7906,6 +7908,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> > *buffer)
> > rc = qed_dbg_grc(cdev, (u8 *)buffer + offset +
> >  REGDUMP_HEADER_SIZE, _size);
> > if (!rc) {
> > +   module_firmware_crashed();
> > *(u32 *)((u8 *)buffer + offset) =
> > qed_calc_regdump_header(cdev, GRC_DUMP,
> > cur_engine,
> 
> 
> Hi Luis,
> 
> qed_dbg_all_data is being used to gather debug dump from device. Failures
> inside it may happen due to various reasons, but they normally do not indicate
> FW failure.
> 
> So I think its not a good place to insert this call.
> Its hard to find exact good place to insert it in qed.

Is there a way to check if what happened was indeed a fw crash?

> One more thing is that AFAIU taint flag gets permanent on kernel, but for
> example our device can recover itself from some FW crashes, thus it'd be
> transparent for user.

Similar things are *supposed* to recoverable with other device, however
this can also sometimes lead to a situation where devices are not usable
anymore, and require a full driver unload / load.

> Whats the logical purpose of module_firmware_crashed? Does it mean fatal
> unrecoverable error on device?

Its just to annotate on the module and kernel that this has happened.

I take it you may agree that, firmware crashing *often* is not good design,
and these issues should be reported to / fixed by vendors. In cases
where driver bugs are reported it is good to see if a firmware crash has
happened before, so that during analysis this is ruled out.

  Luis


Re: [PATCH] usb: roles: Switch on role-switch uevent reporting

2020-05-09 Thread Bryan O'Donoghue

On 09/05/2020 04:24, Wen Yang wrote:



在 2020/5/9 上午12:29, Bryan O'Donoghue 写道:

Right now we don't report to user-space a role switch when doing a
usb_role_switch_set_role() despite having registered the uevent 
callbacks.


This patch switches on the notifications allowing user-space to see
role-switch change notifications and subsequently determine the current
controller data-role.

example:
PFX=/devices/platform/soc/78d9000.usb/ci_hdrc.0

root@somebox# udevadm monitor -p

KERNEL[49.894994] change $PFX/usb_role/ci_hdrc.0-role-switch (usb_role)
ACTION=change
DEVPATH=$PFX/usb_role/ci_hdrc.0-role-switch
SUBSYSTEM=usb_role
DEVTYPE=usb_role_switch
USB_ROLE_SWITCH=ci_hdrc.0-role-switch
SEQNUM=2432

Cc: Heikki Krogerus 
Cc: Greg Kroah-Hartman 
Cc: Chunfeng Yun 
Cc: Suzuki K Poulose 
Cc: Alexandre Belloni 
Cc: Wen Yang 
Cc: chenqiwu 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bryan O'Donoghue 
---
  drivers/usb/roles/class.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c
index 5b17709821df..27d92af29635 100644
--- a/drivers/usb/roles/class.c
+++ b/drivers/usb/roles/class.c
@@ -49,8 +49,10 @@ int usb_role_switch_set_role(struct usb_role_switch 
*sw, enum usb_role role)

  mutex_lock(>lock);
  ret = sw->set(sw, role);
-    if (!ret)
+    if (!ret) {
  sw->role = role;
+    kobject_uevent(>dev.kobj, KOBJ_CHANGE);
+    }
  mutex_unlock(>lock);



Hi, we may also need to deal with the return value of kobject_uevent(). 


For an KOBJ_ADD you'd return an error.

grep -r "= kobject_uevent(" *
drivers/misc/cxl/sysfs.c:   rc = kobject_uevent(>kobj, KOBJ_ADD);
drivers/uio/uio.c:  ret = kobject_uevent(>kobj, KOBJ_ADD);
drivers/uio/uio.c:  ret = kobject_uevent(>kobj, KOBJ_ADD);
drivers/visorbus/visorchipset.c:	res = 
kobject_uevent(_dev->acpi_device->dev.kobj, KOBJ_ONLINE);
drivers/visorbus/visorchipset.c:	int res = 
kobject_uevent(_dev->acpi_device->dev.kobj,


For a KOBJ_CHANGE I guess we could print an error

if (kobject_uevent(>dev.kobj, KOBJ_CHANGE)
dev_err(>dev, "failed to signal USB role-switch uevent\n");

Nobody else seems that bothered about it.

grep -r "if (kobject_uevent(" *


Should we move it under the line mutex_unlock(>lock)?


I think probably not. the mutex serializes the notification. In theory 
outside the mutex you could get an out-of-order notification.


The main reason I put it where it is, is we already test ret and should 
only notify the change, when the role-switch has suceeded.


As I say, in theory anyway, the mutex enforces the signalling, whatever 
about the reception, of the role-switch change, so IMO inside the bounds 
of the mutex is the right place to put it.


---
bod


Re: [PATCH v5 0/4] Introduce the for_each_set_clump macro

2020-05-09 Thread Syed Nayyar Waris
On Tue, May 5, 2020 at 8:24 PM William Breathitt Gray
 wrote:
>
> On Tue, May 05, 2020 at 04:51:56PM +0300, Andy Shevchenko wrote:
> > On Mon, May 4, 2020 at 5:41 PM William Breathitt Gray
> >  wrote:
> > > On Mon, May 04, 2020 at 02:41:09PM +0300, Andy Shevchenko wrote:
> > > > On Sun, May 03, 2020 at 04:38:36AM +0530, Syed Nayyar Waris wrote:
> >
> > ...
> >
> > > > Looking into the last patches where we have examples I still do not see 
> > > > a
> > > > benefit of variadic clump sizes. power of 2 sizes would make sense (and 
> > > > be
> > > > optimized accordingly (64-bit, 32-bit).
> > > >
> > > > --
> > > > With Best Regards,
> > > > Andy Shevchenko
> > >
> > > There is of course benefit in defining for_each_set_clump with clump
> > > sizes of powers of 2 (we can optimize for 32 and 64 bit sizes and avoid
> > > boundary checks that we know will not occur), but at the very least the
> > > variable size bitmap_set_value and bitmap_get_value provide significant
> > > benefit for the readability of the gpio-xilinx code:
> > >
> > > bitmap_set_value(old, state[0], 0, width[0]);
> > > bitmap_set_value(old, state[1], width[0], width[1]);
> > > ...
> > > state[0] = bitmap_get_value(new, 0, width[0]);
> > > state[1] = bitmap_get_value(new, width[0], width[1]);
> > >
> > > These lines are simple and clear to read: we know immediately what they
> > > do. But if we did not have bitmap_set_value/bitmap_get_value, we'd have
> > > to use several bitwise operations for each line; the obfuscation of the
> > > code would be an obvious hinderance here.
> >
> > Do I understand correctly that width[0] and width[1] may not be power
> > of two and it's actually the case?
> >
> > --
> > With Best Regards,
> > Andy Shevchenko
>
> I'm under the impression that width[0] and width[1] are arbitrarily
> chosen by the user and could be any integer. I have never used this
> hardware so I'm hoping one of the gpio-xilinx or GPIO subsystem
> maintainers in this thread will respond with some guidance.
>
> If the values of width[0] and width[1] are restricted to powers of 2,
> then I agree that there is no need for generic bitmap_set_value and
> bitmap_get_value functions and we can instead use more optimized power
> of 2 versions.
>
> William Breathitt Gray


Regarding the question that whether width[0] and width[1] can have any
value or they are restricted to power-of-2.

Referring to the document (This xilinx GPIO IP was mentioned in the
gpio-xilinx.c file):
https://www.xilinx.com/support/documentation/ip_documentation/axi_gpio/v2_0/pg144-axi-gpio.pdf

On page 8, we can see that the GPIO widths for the 2 channels can have
values different from power-of-2.For example: 5, 15 etc.

So, I think we should keep the 'for_each_set_clump',
'bitmap_get_value' and 'bitmap_set_value' as completely generic.

I am proceeding further for my next patchset submission keeping above
findings in mind. If you guys think something else or would like to
add something, let me know.

Regards
Syed Nayyar Waris


Re: [PATCH v2 3/3] platform/x86: Intel PMT Telemetry capability driver

2020-05-09 Thread David E. Box
On Fri, 2020-05-08 at 12:57 +0300, Andy Shevchenko wrote:
> On Fri, May 8, 2020 at 5:18 AM David E. Box <
> david.e@linux.intel.com> wrote:
> > PMT Telemetry is a capability of the Intel Platform Monitoring
> > Technology.
> > The Telemetry capability provides access to device telemetry
> > metrics that
> > provide hardware performance data to users from continuous, memory
> > mapped,
> > read-only register spaces.
> > 
> > Register mappings are not provided by the driver. Instead, a GUID
> > is read
> > from a header for each endpoint. The GUID identifies the device and
> > is to
> > be used with an XML, provided by the vendor, to discover the
> > available set
> > of metrics and their register mapping.  This allows firmware
> > updates to
> > modify the register space without needing to update the driver
> > every time
> > with new mappings. Firmware writes a new GUID in this case to
> > specify the
> > new mapping.  Software tools with access to the associated XML file
> > can
> > then interpret the changes.
> > 
> > This module manages access to all PMT Telemetry endpoints on a
> > system,
> > regardless of the device exporting them. It creates a pmt_telemetry
> > class
> > to manage the list. For each endpoint, sysfs files provide GUID and
> > size
> > information as well as a pointer to the parent device the telemetry
> > comes
> > from. Software may discover the association between endpoints and
> > devices
> > by iterating through the list in sysfs, or by looking for the
> > existence of
> 
> ABI needs documentation.

We will be releasing a Linux software spec for PMT. We are waiting on
public release of the PMT spec. For this patch we did document the
sysfs class ABI.

> 
> > the class folder under the device of interest.  A device node of
> > the same
> > name allows software to then map the telemetry space for direct
> > access.
> 
> ...
> 
> > +config INTEL_PMT_TELEM
> 
> TELEMETRY
> 
> ...
> 
> > +obj-$(CONFIG_INTEL_PMT_TELEM)  += intel_pmt_telem.o
> 
> telemetry
> 
> (Inside the file it's fine to have telem)
> 
> ...
> 
> > +   priv->dvsec = dev_get_platdata(>dev);
> > +   if (!priv->dvsec) {
> > +   dev_err(>dev, "Platform data not found\n");
> > +   return -ENODEV;
> > +   }
> 
> I don't see how is it being used?

Good catch :). This was initially used to pass the DVSEC info from the
pci device to the telemetry driver. But with changes all of the needed
info is now read from the driver's memory resource. It was unnoticed
that dvsec  fields are no longer used. Will remove in next version.

Okay on other comments.

David



Re: [PATCH 2/2] drm/panfrost: add devfreq regulator support

2020-05-09 Thread Clément Péron
Hi Steven,

On Thu, 7 May 2020 at 16:30, Steven Price  wrote:
>
> On 02/05/2020 23:07, Clément Péron wrote:
> > Hi Steven,
> >
> > On Tue, 14 Apr 2020 at 15:10, Steven Price  wrote:
> >>
> >> Hi Clément,
> >>
> >> On 13/04/2020 18:28, Clément Péron wrote:
> >>> Hi Steven,
> >>>
> >

>
> Since you've got a reproduction - can you get a backtrace where the
> regulator is getting disabled?

Regulator is disabled from regulator_late_cleanup()

[   33.757650] vdd-gpu: disabling
[   33.760718] CPU: 2 PID: 31 Comm: kworker/2:1 Not tainted
5.7.0-rc2-next-20200424 #8
[   33.768362] Hardware name: Beelink GS1 (DT)
[   33.772553] Workqueue: events regulator_init_complete_work_function
[   33.778813] Call trace:
[   33.781261]  dump_backtrace+0x0/0x1a0
[   33.784922]  show_stack+0x18/0x30
[   33.788238]  dump_stack+0xc0/0x114
[   33.791638]  regulator_late_cleanup+0x164/0x1f0
[   33.796165]  class_for_each_device+0x64/0xe0
[   33.800431]  regulator_init_complete_work_function+0x4c/0x60
[   33.806084]  process_one_work+0x19c/0x330
[   33.810090]  worker_thread+0x4c/0x430
[   33.813748]  kthread+0x138/0x160
[   33.816973]  ret_from_fork+0x10/0x24

the use_count is at 0...

I have check and the regulator_get is called and regulator_put is
never called for vdd-gpu.
Not sure what is happening here...


>
> >   - The Cooling map is not probe correctly :
> > [2.545756] panfrost 180.gpu: [drm:panfrost_devfreq_init
> > [panfrost]] Failed to register cooling device
> > Introduce in this commit :
> > https://github.com/clementperon/linux/commit/0252c38fd55ad78366ac4b1714e285c88db34557
> >
> > Do you have an hint about what I'm missing ?
>
> Sorry, my knowledge of the cooling framework is very limited. What
> you've got looks plausible, but I'm afraid I can't really help beyond
> that! As before - can you try adding some printk()s in e.g.
> of_devfreq_cooling_register_power() and find out where it is bailing out?

Dumb issue, I was missing the CONFIG_DEVFREQ_THERMAL -_-, I will make
a patch to enable it in arm64 defconfig.

Regards,
Clement

>
> Steve


Re: [PATCH for-5.7] io_uring: fix zero len do_splice()

2020-05-09 Thread Jens Axboe
On 5/9/20 10:07 AM, Pavel Begunkov wrote:
> On 04/05/2020 23:00, Pavel Begunkov wrote:
>> do_splice() doesn't expect len to be 0. Just always return 0 in this
>> case as splice(2) do.
>>
> 
> If it was missed, may you take a look? I reattached the patch btw killing
> reported warnings.

Thanks for re-sending, I'll queue it up for 5.7.

-- 
Jens Axboe



Re: [PATCH -next v2] locking/osq_lock: annotate a data race in osq_lock

2020-05-09 Thread Paul E. McKenney
On Sat, May 09, 2020 at 09:01:53AM -0400, Qian Cai wrote:
> 
> 
> > On May 9, 2020, at 12:33 AM, Paul E. McKenney  wrote:
> > 
> > On Fri, May 08, 2020 at 04:59:05PM -0400, Qian Cai wrote:
> >> 
> >> 
> >>> On Feb 11, 2020, at 8:54 AM, Qian Cai  wrote:
> >>> 
> >>> prev->next could be accessed concurrently as noticed by KCSAN,
> >>> 
> >>> write (marked) to 0x9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
> >>> osq_lock+0x25f/0x350
> >>> osq_wait_next at kernel/locking/osq_lock.c:79
> >>> (inlined by) osq_lock at kernel/locking/osq_lock.c:185
> >>> rwsem_optimistic_spin
> >>> 
> >>> 
> >>> read to 0x9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
> >>> osq_lock+0x196/0x350
> >>> osq_lock at kernel/locking/osq_lock.c:157
> >>> rwsem_optimistic_spin
> >>> 
> >>> 
> >>> Since the write only stores NULL to prev->next and the read tests if
> >>> prev->next equals to this_cpu_ptr(_node). Even if the value is
> >>> shattered, the code is still working correctly. Thus, mark it as an
> >>> intentional data race using the data_race() macro.
> >>> 
> >>> Signed-off-by: Qian Cai 
> >> 
> >> Hmm, this patch has been dropped from linux-next from some reasons.
> >> 
> >> Paul, can you pick this up along with KCSAN fixes?
> >> 
> >> https://lore.kernel.org/lkml/1581429255-12542-1-git-send-email-...@lca.pw/
> > 
> > I have queued it on -rcu, but it is too late for v5.8 via the -rcu tree,
> > so this is v5.9 at the earliest.  Plus I would need an ack from one of
> > the locking folks.
> 
> Peter, Will, can you give an ACK? This v2 should incorporate all the feedback 
> from Peter,
> 
> https://lore.kernel.org/lkml/20200211124753.gp14...@hirez.programming.kicks-ass.net/
> 
> V5.9 is fine. All I care about is it is always in linux-next (so the testing 
> bots won’t trigger this over and over again) and to be in mainline at some 
> point in the future.

Ah, and I forgot to ask.  Why "if (data_race(prev->next == node)" instead
of "if (data_race(prev->next) == node"?

Thanx, Paul

> >>> ---
> >>> 
> >>> v2: insert some code comments.
> >>> 
> >>> kernel/locking/osq_lock.c | 6 +-
> >>> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>> 
> >>> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> >>> index 1f7734949ac8..f733bcd99e8a 100644
> >>> --- a/kernel/locking/osq_lock.c
> >>> +++ b/kernel/locking/osq_lock.c
> >>> @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> >>>*/
> >>> 
> >>>   for (;;) {
> >>> - if (prev->next == node &&
> >>> + /*
> >>> +  * cpu_relax() below implies a compiler barrier which would
> >>> +  * prevent this comparison being optimized away.
> >>> +  */
> >>> + if (data_race(prev->next == node) &&
> >>>   cmpxchg(>next, node, NULL) == node)
> >>>   break;
> >>> 
> >>> -- 
> >>> 1.8.3.1
> 


Re: [PATCH for-5.7] io_uring: fix zero len do_splice()

2020-05-09 Thread Pavel Begunkov
On 04/05/2020 23:00, Pavel Begunkov wrote:
> do_splice() doesn't expect len to be 0. Just always return 0 in this
> case as splice(2) do.
> 

If it was missed, may you take a look? I reattached the patch btw killing
reported warnings.

---
From: Pavel Begunkov 
Subject: [PATCH] io_uring: fix zero len do_splice()

do_splice() doesn't expect len to be 0. Do the check in io_splice()
exactly as in sys_splice().

Fixes: 7d67af2c0134 ("io_uring: add splice(2) support")
Reported-by: Jann Horn 
Signed-off-by: Pavel Begunkov 
---
 fs/io_uring.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 65458eda2127..70d8943c337b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2772,16 +2772,16 @@ static int io_splice(struct io_kiocb *req, bool
force_nonblock)
struct file *out = sp->file_out;
unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED;
loff_t *poff_in, *poff_out;
-   long ret;
+   long ret = 0;

if (force_nonblock)
return -EAGAIN;

poff_in = (sp->off_in == -1) ? NULL : >off_in;
poff_out = (sp->off_out == -1) ? NULL : >off_out;
-   ret = do_splice(in, poff_in, out, poff_out, sp->len, flags);
-   if (force_nonblock && ret == -EAGAIN)
-   return -EAGAIN;
+
+   if (sp->len)
+   ret = do_splice(in, poff_in, out, poff_out, sp->len, flags);

io_put_file(req, in, (sp->flags & SPLICE_F_FD_IN_FIXED));
req->flags &= ~REQ_F_NEED_CLEANUP;
-- 
2.24.0


Re: [PATCH RFC tip/core/rcu] Add shrinker to shift to fast/inefficient GP mode

2020-05-09 Thread Paul E. McKenney
On Sat, May 09, 2020 at 11:54:40AM +0300, Konstantin Khlebnikov wrote:
> On 08/05/2020 17.46, Paul E. McKenney wrote:
> > On Fri, May 08, 2020 at 12:00:28PM +0300, Konstantin Khlebnikov wrote:
> > > On 07/05/2020 22.09, Paul E. McKenney wrote:
> > > > On Thu, May 07, 2020 at 02:31:02PM -0400, Johannes Weiner wrote:
> > > > > On Thu, May 07, 2020 at 10:09:03AM -0700, Paul E. McKenney wrote:
> > > > > > On Thu, May 07, 2020 at 01:00:06PM -0400, Johannes Weiner wrote:
> > > > > > > On Wed, May 06, 2020 at 05:55:35PM -0700, Andrew Morton wrote:
> > > > > > > > On Wed, 6 May 2020 17:42:40 -0700 "Paul E. McKenney" 
> > > > > > > >  wrote:
> > > > > > > > 
> > > > > > > > > This commit adds a shrinker so as to inform RCU when memory 
> > > > > > > > > is scarce.
> > > > > > > > > RCU responds by shifting into the same fast and inefficient 
> > > > > > > > > mode that is
> > > > > > > > > used in the presence of excessive numbers of RCU callbacks.  
> > > > > > > > > RCU remains
> > > > > > > > > in this state for one-tenth of a second, though this time 
> > > > > > > > > window can be
> > > > > > > > > extended by another call to the shrinker.
> > > > > > > 
> > > > > > > We may be able to use shrinkers here, but merely being invoked 
> > > > > > > does
> > > > > > > not carry a reliable distress signal.
> > > > > > > 
> > > > > > > Shrinkers get invoked whenever vmscan runs. It's a useful 
> > > > > > > indicator
> > > > > > > for when to age an auxiliary LRU list - test references, clear and
> > > > > > > rotate or reclaim stale entries. The urgency, and what can and 
> > > > > > > cannot
> > > > > > > be considered "stale", is encoded in the callback frequency and 
> > > > > > > scan
> > > > > > > counts, and meant to be relative to the VM's own rate of aging: 
> > > > > > > "I've
> > > > > > > tested X percent of mine for recent use, now you go and test the 
> > > > > > > same
> > > > > > > share of your pool." It doesn't translate well to other
> > > > > > > interpretations of the callbacks, although people have tried.
> > > > > > 
> > > > > > Would it make sense for RCU to interpret two invocations within 
> > > > > > (say)
> > > > > > 100ms of each other as indicating urgency?  (Hey, I had to ask!)
> > > > > 
> > > > > It's the perfect number for one combination of CPU, storage device,
> > > > > and shrinker implementation :-)
> > > > 
> > > > Woo-hoo!!!
> > > > 
> > > > But is that one combination actually in use anywhere?  ;-)
> > > > 
> > > > > > > > > If it proves feasible, a later commit might add a function 
> > > > > > > > > call directly
> > > > > > > > > indicating the end of the period of scarce memory.
> > > > > > > > 
> > > > > > > > (Cc David Chinner, who often has opinions on shrinkers ;))
> > > > > > > > 
> > > > > > > > It's a bit abusive of the intent of the slab shrinkers, but I 
> > > > > > > > don't
> > > > > > > > immediately see a problem with it.  Always returning 0 from
> > > > > > > > ->scan_objects might cause a problem in some situations(?).
> > > > > > > > 
> > > > > > > > Perhaps we should have a formal "system getting low on memory, 
> > > > > > > > please
> > > > > > > > do something" notification API.
> > > > > > > 
> > > > > > > It's tricky to find a useful definition of what low on memory
> > > > > > > means. In the past we've used sc->priority cutoffs, the vmpressure
> > > > > > > interface (reclaimed/scanned - reclaim efficiency cutoffs), oom
> > > > > > > notifiers (another reclaim efficiency cutoff). But none of these
> > > > > > > reliably capture "distress", and they vary highly between 
> > > > > > > different
> > > > > > > hardware setups. It can be hard to trigger OOM itself on fast IO
> > > > > > > devices, even when the machine is way past useful (where useful is
> > > > > > > somewhat subjective to the user). Userspace OOM implementations 
> > > > > > > that
> > > > > > > consider userspace health (also subjective) are getting more 
> > > > > > > common.
> > > > > > > 
> > > > > > > > How significant is this?  How much memory can RCU consume?
> > > > > > > 
> > > > > > > I think if rcu can end up consuming a significant share of 
> > > > > > > memory, one
> > > > > > > way that may work would be to do proper shrinker integration and 
> > > > > > > track
> > > > > > > the age of its objects relative to the age of other allocations 
> > > > > > > in the
> > > > > > > system. I.e. toss them all on a clock list with "new" bits and 
> > > > > > > shrink
> > > > > > > them at VM velocity. If the shrinker sees objects with new bit 
> > > > > > > set,
> > > > > > > clear and rotate. If it sees objects without them, we know 
> > > > > > > rcu_heads
> > > > > > > outlive cache pages etc. and should probably cycle faster too.
> > > > > > 
> > > > > > It would be easy for RCU to pass back (or otherwise use) the age of 
> > > > > > the
> > > > > > current grace period, if that would help.
> > > > > > 
> > > > > > Tracking the age of individual callbacks is out of the 

Re: [RFC] splice/tee: len=0 fast path after validity check

2020-05-09 Thread Pavel Begunkov
On 09/05/2020 17:42, Jens Axboe wrote:
> On 5/9/20 2:46 AM, Pavel Begunkov wrote:
>> When len=0, splice() and tee() return 0 even if specified fds are
>> invalid, hiding errors from users. Move len=0 optimisation later after
>> basic validity checks.
>>
>> before:
>> splice(len=0, fd_in=-1, ...) == 0;
>>
>> after:
>> splice(len=0, fd_in=-1, ...) == -EBADF;
> 
> I'm not sure what the purpose of this would be. It probably should have
> been done that way from the beginning, but it wasn't.  While there's
> very little risk of breaking any applications due to this change, it
> also seems like a pointless exercise at this point.
> 
> So my suggestion would be to just leave it alone.

Ok

-- 
Pavel Begunkov


Re: [PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

2020-05-09 Thread Christophe Leroy




Le 28/04/2020 à 18:05, Arnd Bergmann a écrit :

On Tue, Apr 28, 2020 at 3:16 PM Christophe Leroy
 wrote:


Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.

Signed-off-by: Christophe Leroy 


Looks good to me

Reviewed-by: Arnd Bergmann 

There was a bug on ARM for the corresponding function, so far it is unclear
if this was a problem related to particular hardware, the 32-bit kernel code,
or the common implementation of clock_gettime64 in the vdso library,
see https://github.com/richfelker/musl-cross-make/issues/96

Just to be sure that powerpc is not affected by the same issue, can you
confirm that repeatedly calling clock_gettime64 on powerpc32, alternating
between vdso and syscall, results in monotically increasing times?



I think that's one of the things vdsotest checks, so yes that's ok I think.

Christophe


VERY URGENT

2020-05-09 Thread Mr . Sébastien Toni
FROM MR.SÉBASTIEN TONI
AUDIT& ACCOUNT MANAGER
BANK OF AFRICA (B.O.A)
OUAGADOUGOU BURKINA FASO
WEST AFRICA.

Dear Friend,

With due respect, I have decided to contact you on
abusinesstransaction  that will be beneficial to both of us. At the
bank last account and  auditing evaluation, my staffs came across an
old account which was being maintained by a foreign client who we
learn was among the deceased passengers of motor accident on
November.2003, the deceased was unable to run this account since his
death. Theaccount has  remained dormant without the knowledge of his
family since it was put in a  safe deposit account in the bank for
future investment by the client.

Since his demise, even the members of his family haven't applied for
claims  over this fund and it has been in the safe deposit account
until I  discovered that it cannot be claimed since our client
isaforeign national and we are sure that he has no next of kin here to
file claims over the money. As the director of the department, this
discovery was brought to my office so as to decide what is to bedone.I
decided to seek ways through which to transfer this money out of the
bank  and out of the country too.

The total amount in the account is 18.6 million with my positions as
staffs  of the bank, I am handicapped because I cannot operate foreign
accounts and  cannot lay bonafide claim over this money. The client
was a foreign  national and you will only be asked to act as his next
of kin and I will  supply you with all the necessary information and
bank data to assist you in being able to transfer this money to any
bank of your  choice where this money could be transferred into.The
total sum will be shared as follows: 50% for me, 50% for you and
expenses incidental occur  during the transfer will be incur by both
of us. The transfer is risk free on both sides hence you are going to
follow my instruction till the fund  transfer to your account. Since I
work in this bank that is why you should  be confident in the success
of this transaction because you will be updated with information as at
when desired.

I will wish you to keep this transaction secret and confidential as I
am  hoping to retire with my share of this money at the end of
transaction  which will be when this money is safety in your account.
I will then come over to your country for sharing according to the
previously agreed percentages. You might even have to advise me on
possibilities of investment in your country or elsewhere of our
choice. May  God help you to help me to a restive retirement,Amen,And
You have to  contact me through my private e-mail
at(sebastient...@gmail.com)Please for further information and inquires
feel free to contact me back immediately for more explanation and
better  understanding I want you to assure me your capability of
handling this  project with trust by providing me your following
information details such as:

(1)NAME..
(2)AGE:
(3)SEX:.
(4)PHONE NUMBER:.
(5)OCCUPATION:.
(6)YOUR COUNTRY:.

Yours sincerely,
Mr.Sébastien Toni


Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Christophe Leroy




Le 08/05/2020 à 19:41, Qian Cai a écrit :




On May 8, 2020, at 10:39 AM, Qian Cai  wrote:

Booting POWER9 PowerNV has this message,

"ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
early_ioremap() instead”

but use the patch below will result in leaks because it will never call 
early_iounmap() anywhere. However, it looks me it was by design that phb->regs 
mapping would be there forever where it would be used in pnv_ioda_get_inval_reg(), 
so is just that check_early_ioremap_leak() initcall too strong?

--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -36,6 +36,7 @@
#include 
#include 
#include 
+#include 

#include 

@@ -3827,7 +3828,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
/* Get registers */
if (!of_address_to_resource(np, 0, )) {
phb->regs_phys = r.start;
-   phb->regs = ioremap(r.start, resource_size());
+   phb->regs = early_ioremap(r.start, resource_size());
if (phb->regs == NULL)
pr_err("  Failed to map registers !\n”);


This will also trigger a panic with debugfs reads, so isn’t that this commit 
bogus at least for powerpc64?

d538aadc2718 (“powerpc/ioremap: warn on early use of ioremap()")


No d538aadc2718 is not bogus. That's the point, we want to remove all 
early usages of ioremap() in order to remove the hack with the 
ioremap_bot stuff and all, and stick to the generic ioremap logic.


In order to do so, all early use of ioremap() has to be converted to 
early_ioremap() or to fixmap or anything else that allows to do ioremaps 
before the slab is ready.


early_ioremap() is for temporary mappings necessary at boottime. For 
long lasting mappings, another method is to be used.


Now, the point is that other architectures like for instance x86 don't 
seem to have to use early_ioremap() much. Powerpc is for instance doing 
early mappings for PCI. Seems like x86 initialises PCI once slab is 
ready. Can't powerpc do the same ?


Christophe


Re: [PATCH net-next 1/2] ath10k: fix gcc-10 zero-length-bounds warnings

2020-05-09 Thread Gustavo A. R. Silva
Arnd,

On Sat, May 09, 2020 at 02:06:32PM +0200, Arnd Bergmann wrote:
> gcc-10 started warning about out-of-bounds access for zero-length
> arrays:
> 
> In file included from drivers/net/wireless/ath/ath10k/core.h:18,
>  from drivers/net/wireless/ath/ath10k/htt_rx.c:8:
> drivers/net/wireless/ath/ath10k/htt_rx.c: In function 
> 'ath10k_htt_rx_tx_fetch_ind':
> drivers/net/wireless/ath/ath10k/htt.h:1683:17: warning: array subscript 65535 
> is outside the bounds of an interior zero-length array 'struct 
> htt_tx_fetch_record[0]' [-Wzero-length-bounds]
>  1683 |  return (void *)>records[le16_to_cpu(ind->num_records)];
>   | ^~~~
> drivers/net/wireless/ath/ath10k/htt.h:1676:29: note: while referencing 
> 'records'
>  1676 |  struct htt_tx_fetch_record records[0];
>   | ^~~
> 
> Make records[] a flexible array member to allow this, moving it behind
> the other zero-length member that is not accessed in a way that gcc
> warns about.
> 
> Fixes: 3ba225b506a2 ("treewide: Replace zero-length array with flexible-array 
> member")

This treewide patch no longer contains changes for ath10k. I removed them
since Monday (05/04/2020). So, this "Fixes" tag does not apply.

Thanks
--
Gustavo

> Fixes: 22e6b3bc5d96 ("ath10k: add new htt definitions")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/net/wireless/ath/ath10k/htt.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/ath/ath10k/htt.h 
> b/drivers/net/wireless/ath/ath10k/htt.h
> index 8f3710cf28f4..aa056a186402 100644
> --- a/drivers/net/wireless/ath/ath10k/htt.h
> +++ b/drivers/net/wireless/ath/ath10k/htt.h
> @@ -1673,8 +1673,8 @@ struct htt_tx_fetch_ind {
>   __le32 token;
>   __le16 num_resp_ids;
>   __le16 num_records;
> - struct htt_tx_fetch_record records[0];
>   __le32 resp_ids[0]; /* ath10k_htt_get_tx_fetch_ind_resp_ids() */
> + struct htt_tx_fetch_record records[];
>  } __packed;
>  
>  static inline void *
> -- 
> 2.26.0
> 


Re: [PATCH v5] x86: bitops: fix build regression

2020-05-09 Thread Brian Gerst
On Sat, May 9, 2020 at 8:20 AM Andy Shevchenko
 wrote:
>
> On Fri, May 8, 2020 at 9:35 PM Nick Desaulniers  
> wrote:
> >
> > This is easily reproducible via CC=clang+CONFIG_STAGING=y+CONFIG_VT6656=m.
> >
> > It turns out that if your config tickles __builtin_constant_p via
> > differences in choices to inline or not, these statements produce
> > invalid assembly:
> >
> > $ cat foo.c
> > long a(long b, long c) {
> >   asm("orb\t%1, %0" : "+q"(c): "r"(b));
> >   return c;
> > }
> > $ gcc foo.c
> > foo.c: Assembler messages:
> > foo.c:2: Error: `%rax' not allowed with `orb'
> >
> > Use the `%b` "x86 Operand Modifier" to instead force register allocation
> > to select a lower-8-bit GPR operand.
> >
> > The "q" constraint only has meaning on -m32 otherwise is treated as
> > "r". Not all GPRs have low-8-bit aliases for -m32.
> >
>
> Looks very good!
> One question though, does it work with minimum supported version of gcc?

Yes.  The operand width modifiers have been around a long time but not
well documented until more recently.

--
Brian Gerst


Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Qian Cai



> On May 9, 2020, at 4:38 AM, Nicholas Piggin  wrote:
> 
> Your patch to use early_ioremap is faulting? I wonder why?

Yes, I don’t know the reasons either. I suppose not many places in other parts 
of the kernel which keep using those addresses from early_ioremap() after 
system booted. Otherwise, we would see those leak warnings elsewhere.

Thus, we probably have to audit the code, and if still necessary, call 
early_ioremap() and then early_iounmap() followed by a ioremap() once slab 
allocator is available?

Re: [PATCH v1 2/3] armv8: gpio: add gpio feature

2020-05-09 Thread Andrew Lunn
On Sat, May 09, 2020 at 06:39:55PM +0800, Hui Song wrote:
> From: "hui.song" 
> 
> add one struct mpc8xxx_gpio_plat to enable gpio feature.
> 
> Signed-off-by: hui.song 
> ---
>  .../include/asm/arch-fsl-layerscape/gpio.h| 22 +++
>  1 file changed, 22 insertions(+)
>  create mode 100644 arch/arm/include/asm/arch-fsl-layerscape/gpio.h
> 
> diff --git a/arch/arm/include/asm/arch-fsl-layerscape/gpio.h 
> b/arch/arm/include/asm/arch-fsl-layerscape/gpio.h
> new file mode 100644
> index 00..d8dd750a72
> --- /dev/null
> +++ b/arch/arm/include/asm/arch-fsl-layerscape/gpio.h
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/*
> + * Copyright 2014 Freescale Semiconductor, Inc.
> + */
> +
> +/*
> + * Dummy header file to enable CONFIG_OF_CONTROL.
> + * If CONFIG_OF_CONTROL is enabled, lib/fdtdec.c is compiled.
> + * It includes  via , so those SoCs that enable
> + * OF_CONTROL must have arch/gpio.h.
> + */

This does not seem right. You would expect each sub arch to have a
subdirectory in arch/arm/include/asm/ when in fact none do.

 Andrew


Re: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS contexts to handle RSS tables

2020-05-09 Thread Matteo Croce
On Sat, May 9, 2020 at 4:49 PM Russell King - ARM Linux admin
 wrote:
>
> On Sat, May 09, 2020 at 02:51:05PM +0100, Russell King - ARM Linux admin 
> wrote:
> > On Sat, May 09, 2020 at 03:14:05PM +0200, Matteo Croce wrote:
> > > On Sat, May 9, 2020 at 1:45 PM Russell King - ARM Linux admin
> > >  wrote:
> > > >
> > > > On Sat, May 09, 2020 at 11:15:58AM +, Stefan Chulski wrote:
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > > From: Matteo Croce 
> > > > > > Sent: Saturday, May 9, 2020 3:13 AM
> > > > > > To: David S . Miller 
> > > > > > Cc: Maxime Chevallier ; netdev
> > > > > > ; LKML ; 
> > > > > > Antoine
> > > > > > Tenart ; Thomas Petazzoni
> > > > > > ; gregory.clem...@bootlin.com;
> > > > > > miquel.ray...@bootlin.com; Nadav Haklai ; Stefan
> > > > > > Chulski ; Marcin Wojtas ; 
> > > > > > Linux
> > > > > > ARM ; Russell King - ARM 
> > > > > > Linux admin
> > > > > > 
> > > > > > Subject: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS 
> > > > > > contexts to
> > > > > > handle RSS tables
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > What do you think about temporarily disabling it like this?
> > > > > >
> > > > > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > > > > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > > > > @@ -5775,7 +5775,8 @@ static int mvpp2_port_probe(struct 
> > > > > > platform_device
> > > > > > *pdev,
> > > > > > NETIF_F_HW_VLAN_CTAG_FILTER;
> > > > > >
> > > > > > if (mvpp22_rss_is_supported()) {
> > > > > > -   dev->hw_features |= NETIF_F_RXHASH;
> > > > > > +   if (port->phy_interface != PHY_INTERFACE_MODE_SGMII)
> > > > > > +   dev->hw_features |= NETIF_F_RXHASH;
> > > > > > dev->features |= NETIF_F_NTUPLE;
> > > > > > }
> > > > > >
> > > > > >
> > > > > > David, is this "workaround" too bad to get accepted?
> > > > >
> > > > > Not sure that RSS related to physical interface(SGMII), better just 
> > > > > remove NETIF_F_RXHASH as "workaround".
> > > >
> > > > Hmm, I'm not sure this is the right way forward.  This patch has the
> > > > effect of disabling:
> > > >
> > > > d33ec4525007 ("net: mvpp2: add an RSS classification step for each 
> > > > flow")
> > > >
> > > > but the commit you're pointing at which caused the regression is:
> > > >
> > > > 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables")
> > > >
> > > >
> > >
> > > Hi,
> > >
> > > When git bisect pointed to 895586d5dc32 ("net: mvpp2: cls: Use RSS
> > > contexts to handle RSS tables"), which was merged
> > > almost an year after d33ec4525007 ("net: mvpp2: add an RSS
> > > classification step for each flow"), so I assume that between these
> > > two commits either the feature was working or it was disable and we
> > > didn't notice
> > >
> > > Without knowing what was happening, which commit should my Fixes tag 
> > > point to?
> >
> > Let me make sure that I get this clear:
> >
> > - Prior to 895586d5dc32, you can turn on and off rxhash without issue
> >   on any port.
> > - After 895586d5dc32, turning rxhash on eth2 prevents reception.
> >
> > Prior to 895586d5dc32, with rxhash on, it looks like hashing using
> > CRC32 is supported but only one context.  So, if it's possible to
> > enable rxhash on any port on the mcbin without 895586d5dc32, and the
> > port continues to work, I'd say the bug was introduced by
> > 895586d5dc32.
> >
> > Of course, that would be reinforced if there was a measurable
> > difference in performance due to rxhash on each port.
>
> I've just run this test, but I can detect no difference in performance
> with or without 895586d5dc32 on eth0 or eth2 on the mcbin (apart from
> eth2 stopping working with 895586d5dc32 applied.)  I tested this by
> reverting almost all changes to the mvpp2 driver between 5.6 and that
> commit.
>
> That's not too surprising; I'm using my cex7 platform with the Mellanox
> card in for one end of the 10G link, and that platform doesn't seem to
> be able to saturdate a 10G link - it only seems to manage around 4Gbps.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
>

Well it depends on the traffic type. I used to generate 5k flows with
T-Rex and an Intel X710 card.
This way t-rex changes the UDP port of every packet:

root@macchiatobin:~# tcpdump -tnni eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
IP 16.0.0.18.9874 > 48.0.0.81.2001: UDP, length 18
IP 16.0.0.248.56289 > 48.0.192.56.2001: UDP, length 18
IP 16.0.0.154.44965 > 48.0.128.26.2001: UDP, length 18
IP 16.0.0.23.31363 > 48.0.0.86.2001: UDP, length 18
IP 16.0.0.192.1674 > 48.0.192.63.2001: UDP, length 18
IP 16.0.0.155.62370 > 48.0.128.27.2001: UDP, length 18
IP 16.0.0.30.22126 > 48.0.0.93.2001: UDP, length 

Re: [PATCH v1 1/3] gpio: mpc8xxx: support fsl-layerscape platform.

2020-05-09 Thread Andrew Lunn
On Sat, May 09, 2020 at 06:39:54PM +0800, Hui Song wrote:
> From: "hui.song" 
> 
> Make the MPC8XXX gpio driver to support the fsl-layerscape.
> 
> Signed-off-by: hui.song 
> ---
>  drivers/gpio/mpc8xxx_gpio.c | 59 +
>  1 file changed, 59 insertions(+)
> 
> diff --git a/drivers/gpio/mpc8xxx_gpio.c b/drivers/gpio/mpc8xxx_gpio.c
> index 1dfd22522c..466f5f50cf 100644
> --- a/drivers/gpio/mpc8xxx_gpio.c
> +++ b/drivers/gpio/mpc8xxx_gpio.c
> @@ -12,6 +12,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  struct ccsr_gpio {
>   u32 gpdir;
> @@ -20,6 +22,7 @@ struct ccsr_gpio {
>   u32 gpier;
>   u32 gpimr;
>   u32 gpicr;
> + u32 gpibe;
>  };
>  
>  struct mpc8xxx_gpio_data {
> @@ -49,31 +52,51 @@ inline u32 gpio_mask(uint gpio)
>  
>  static inline u32 mpc8xxx_gpio_get_val(struct ccsr_gpio *base, u32 mask)
>  {
> +#if CONFIG_ARM
> + return in_le32(>gpdat) & mask;
> +#else
>   return in_be32(>gpdat) & mask;
> +#endif
>  }

Hi Hui

Did the hardware engineers really change the endinness of the
register? Forget about the CPU here, did the register change
endinness? In general, you should not need to use #if like this, so
long as the register itself is still the same. There are functions
which will do the correct thing depending on if the CPU is big or
little endian.

> @@ -147,13 +183,29 @@ static int mpc8xxx_gpio_ofdata_to_platdata(struct 
> udevice *dev)
>  {
>   struct mpc8xxx_gpio_plat *plat = dev_get_platdata(dev);
>   fdt_addr_t addr;
> + u32 i;
> +#if CONFIG_ARM
> + u32 reg[4];
> +
> + dev_read_u32_array(dev, "reg", reg, 4);
> +#else
>   u32 reg[2];
>  
>   dev_read_u32_array(dev, "reg", reg, 2);
> +#endif
> +
> +#if CONFIG_ARM
> + for (i = 0; i < 2; i++)
> + reg[i] = be32_to_cpu(reg[i]);
> +#endif
>   addr = dev_translate_address(dev, reg);
>  
>   plat->addr = addr;
> +#if CONFIG_ARM
> + plat->size = reg[3];
> +#else
>   plat->size = reg[1];
> +#endif
>   plat->ngpios = dev_read_u32_default(dev, "ngpios", 32);

So you are extending the DT binding. You need to document this. And it
should really have a different compatible string, since the binding is
not compatible between the two variants.
>  
>   return 0;
> @@ -187,6 +239,7 @@ static int mpc8xxx_gpio_platdata_to_priv(struct udevice 
> *dev)
>  static int mpc8xxx_gpio_probe(struct udevice *dev)
>  {
>   struct gpio_dev_priv *uc_priv = dev_get_uclass_priv(dev);
> + struct device_node const  *np = dev->node.np;
>   struct mpc8xxx_gpio_data *data = dev_get_priv(dev);
>   char name[32], *str;
>  
> @@ -198,6 +251,12 @@ static int mpc8xxx_gpio_probe(struct udevice *dev)
>   if (!str)
>   return -ENOMEM;
>  
> + if (of_device_is_compatible(np, "fsl,qoriq-gpio", NULL, NULL)) {
> + unsigned long gpibe = data->addr + sizeof(struct ccsr_gpio);
> +
> + out_be32(gpibe, 0x);

That is an odd way to determine the address of a register.

 Andrew


Re: [PATCH 1/1] firmware: arm_scmi/mailbox: ignore notification for tx done using irq

2020-05-09 Thread Sudeep Holla
On Sat, May 09, 2020 at 04:54:57PM +0800, joe_zhu...@126.com wrote:
> From: Joe Zhu 
> 
> If mailbox uses IRQ method, it already notified framework with
> mbox_chan_txdone() in ISR.
> 
> Signed-off-by: Joe Zhu 
> ---
>  drivers/firmware/arm_scmi/mailbox.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/arm_scmi/mailbox.c 
> b/drivers/firmware/arm_scmi/mailbox.c
> index 73077bbc4ad9..303a5dc42429 100644
> --- a/drivers/firmware/arm_scmi/mailbox.c
> +++ b/drivers/firmware/arm_scmi/mailbox.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

This is an indication that something is wrong. The mailbox controller
and client interfaces are very clear. You need to use mailbox controller
interface when implementing a mailbox controller and use only client
interface when implementing a mailbox client.

>  #include 
>  #include 
>  #include 
> @@ -147,7 +148,8 @@ static void mailbox_mark_txdone(struct scmi_chan_info 
> *cinfo, int ret)
>* Unfortunately, we have to kick the mailbox framework after we have
>* received our message.
>*/
> - mbox_client_txdone(smbox->chan, ret);
> + if (!smbox->chan->mbox->txdone_irq)
> + mbox_client_txdone(smbox->chan, ret);

If this patch is to avoid getting "Client can't run the TX ticker" error
messages, then you make need to fix that with something like below:

Regards,
Sudeep

-->8
diff --git i/drivers/mailbox/mailbox.c w/drivers/mailbox/mailbox.c
index 0b821a5b2db8..5a78a0adcce4 100644
--- i/drivers/mailbox/mailbox.c
+++ w/drivers/mailbox/mailbox.c
@@ -189,7 +189,9 @@ EXPORT_SYMBOL_GPL(mbox_chan_txdone);
 void mbox_client_txdone(struct mbox_chan *chan, int r)
 {
if (unlikely(!(chan->txdone_method & TXDONE_BY_ACK))) {
-   dev_err(chan->mbox->dev, "Client can't run the TX ticker\n");
+   if (unlikely(!(chan->txdone_method & TXDONE_BY_IRQ)))
+   dev_err(chan->mbox->dev,
+   "Client can't run the TX ticker\n");
return;
}



Re: [PATCH 01/15] taint: add module firmware crash taint support

2020-05-09 Thread Rafael Aquini
On Sat, May 09, 2020 at 04:35:38AM +, Luis Chamberlain wrote:
> Device driver firmware can crash, and sometimes, this can leave your
> system in a state which makes the device or subsystem completely
> useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
> of scraping some magical words from the kernel log, which is driver
> specific, is much easier. So instead provide a helper which lets drivers
> annotate this.
> 
> Once this happens, scrapers can easily look for modules taint flags
> for a firmware crash. This will taint both the kernel and respective
> calling module.
> 
> The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as this
> fact should in no way shape or form affect lockdep. This taint is device
> driver specific.
> 
> Signed-off-by: Luis Chamberlain 
> ---
>  include/linux/kernel.h|  3 ++-
>  include/linux/module.h| 13 +
>  include/trace/events/module.h |  3 ++-
>  kernel/module.c   |  5 +++--
>  kernel/panic.c|  1 +
>  5 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 04a5885cec1b..19e1541c82c7 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -601,7 +601,8 @@ extern enum system_states {
>  #define TAINT_LIVEPATCH  15
>  #define TAINT_AUX16
>  #define TAINT_RANDSTRUCT 17
> -#define TAINT_FLAGS_COUNT18
> +#define TAINT_FIRMWARE_CRASH 18
> +#define TAINT_FLAGS_COUNT19
>

We are still missing the documentation bits for this
new flag, though.

How about having a blurb similar to:

diff --git a/Documentation/admin-guide/tainted-kernels.rst 
b/Documentation/admin-guide/tainted-kernels.rst
index 71e9184a9079..5c6a9e2478b0 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -100,6 +100,7 @@ Bit  Log  Number  Reason that got the kernel tainted
  15  _/K   32768  kernel has been live patched
  16  _/X   65536  auxiliary taint, defined for and used by distros
  17  _/T  131072  kernel was built with the struct randomization plugin
+ 18  _/Q  262144  driver firmware crash annotation
 ===  ===  ==  

 Note: The character ``_`` is representing a blank in this table to make reading
@@ -162,3 +163,7 @@ More detailed explanation for tainting
  produce extremely unusual kernel structure layouts (even performance
  pathological ones), which is important to know when debugging. Set at
  build time.
+
+ 18) ``Q`` Device drivers might annotate the kernel with this taint, in cases
+ their firmware might have crashed leaving the driver in a crippled and
+ potentially useless state.




>  struct taint_flag {
>   char c_true;/* character printed when tainted */
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 2c2e988bcf10..221200078180 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -697,6 +697,14 @@ static inline bool is_livepatch_module(struct module 
> *mod)
>  bool is_module_sig_enforced(void);
>  void set_module_sig_enforced(void);
>  
> +void add_taint_module(struct module *mod, unsigned flag,
> +   enum lockdep_ok lockdep_ok);
> +
> +static inline void module_firmware_crashed(void)
> +{
> + add_taint_module(THIS_MODULE, TAINT_FIRMWARE_CRASH, LOCKDEP_STILL_OK);
> +}
> +
>  #else /* !CONFIG_MODULES... */
>  
>  static inline struct module *__module_address(unsigned long addr)
> @@ -844,6 +852,11 @@ void *dereference_module_function_descriptor(struct 
> module *mod, void *ptr)
>   return ptr;
>  }
>  
> +static inline void module_firmware_crashed(void)
> +{
> + add_taint(TAINT_FIRMWARE_CRASH, LOCKDEP_STILL_OK);
> +}
> +
>  #endif /* CONFIG_MODULES */
>  
>  #ifdef CONFIG_SYSFS
> diff --git a/include/trace/events/module.h b/include/trace/events/module.h
> index 097485c73c01..b749ea25affd 100644
> --- a/include/trace/events/module.h
> +++ b/include/trace/events/module.h
> @@ -26,7 +26,8 @@ struct module;
>   { (1UL << TAINT_OOT_MODULE),"O" },  \
>   { (1UL << TAINT_FORCED_MODULE), "F" },  \
>   { (1UL << TAINT_CRAP),  "C" },  \
> - { (1UL << TAINT_UNSIGNED_MODULE),   "E" })
> + { (1UL << TAINT_UNSIGNED_MODULE),   "E" },  \
> + { (1UL << TAINT_FIRMWARE_CRASH),"Q" })
>  
>  TRACE_EVENT(module_load,
>  
> diff --git a/kernel/module.c b/kernel/module.c
> index 80faaf2116dd..f98e8c25c6b4 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -325,12 +325,13 @@ static inline int strong_try_module_get(struct module 
> *mod)
>   return -ENOENT;
>  }
>  
> -static inline void add_taint_module(struct module *mod, unsigned flag,
> - enum lockdep_ok lockdep_ok)
> +void 

Re: [PATCH v12 02/11] lib/test_linear_ranges: add a test for the 'linear_ranges'

2020-05-09 Thread Vaittinen, Matti

On Fri, 2020-05-08 at 18:17 +0100, Mark Brown wrote:
> On Fri, May 08, 2020 at 06:40:43PM +0300, Matti Vaittinen wrote:
> > Add a KUnit test for the linear_ranges helper.
> > 
> > Signed-off-by: Matti Vaittinen 
> > Reviewed-by: Brendan Higgins 
> 
> This now generates:
> 
> WARNING: modpost: missing MODULE_LICENSE() in lib/linear_ranges.o
> see include/linux/module.h for more information
> 
> when the tests are built as a module and select the library.

I sent a fix as an incremental patch. Please let me know if it is not
the way to go.

And Sorry for the trouble!

Best Regards
Matti Vaittinen


[PATCH] lib: linear_ranges: Add missing MODULE_LICENSE()

2020-05-09 Thread Matti Vaittinen
When linear_ranges is compiled as module we get warning
about missing MODULE_LICENSE(). Fix it by adding
MODULE_LICENSE("GPL") as is suggested by SPDX and EXPORTs.

Signed-off-by: Matti Vaittinen 
---

I saw Mark applied the linear-ranges patch. So I sent this fix as
incremental patch - but I still use the same Linus tree as a base of
this fix - the linear-ranges file should be unchanged in regulator tree.
If this does not apply I can clone regulator tree and create this fix on
it.

I don't know if this is the correct way to fix this as the linear-ranges
should be merged to power-supply tree.

I guess we can either:
- Use this patch to fix regulator tree and create fixed tag for
  power-supply(?)
- Add this fix in the original series and resend whole series(?)
- re-create the series and drop the already applied patches. Add this
  fix as a fixup patch in new series and apply it to power-supply tree
  after the linear-ranges from regulator is merged to power-supply.

 Please adviece me if this patch is not the way to go.

Oh, and I am really sorry for the trouble. I saw I had regulators=y
in all of my compilations due to some pincontrol dependencies. So
linear-ranges was not built as module in any of my test compilations :(

Thanks for testing Mark!

 lib/linear_ranges.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/linear_ranges.c b/lib/linear_ranges.c
index d1336c75ccd7..9495ef3572b7 100644
--- a/lib/linear_ranges.c
+++ b/lib/linear_ranges.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * linear_range_values_in_range - return the amount of values in a range
@@ -239,3 +240,6 @@ int linear_range_get_selector_high(const struct 
linear_range *r,
return 0;
 }
 EXPORT_SYMBOL_GPL(linear_range_get_selector_high);
+
+MODULE_DESCRIPTION("linear-ranges helper");
+MODULE_LICENSE("GPL");
-- 
2.21.0


-- 
Matti Vaittinen, Linux device drivers
ROHM Semiconductors, Finland SWDC
Kiviharjunlenkki 1E
90220 OULU
FINLAND

~~~ "I don't think so," said Rene Descartes. Just then he vanished ~~~
Simon says - in Latin please.
~~~ "non cogito me" dixit Rene Descarte, deinde evanescavit ~~~
Thanks to Simon Glass for the translation =] 


Re: remove a few uses of ->queuedata

2020-05-09 Thread Dan Williams
On Sat, May 9, 2020 at 1:24 AM Christoph Hellwig  wrote:
>
> On Fri, May 08, 2020 at 11:04:45AM -0700, Dan Williams wrote:
> > On Fri, May 8, 2020 at 9:16 AM Christoph Hellwig  wrote:
> > >
> > > Hi all,
> > >
> > > various bio based drivers use queue->queuedata despite already having
> > > set up disk->private_data, which can be used just as easily.  This
> > > series cleans them up to only use a single private data pointer.
> >
> > ...but isn't the queue pretty much guaranteed to be cache hot and the
> > gendisk cache cold? I'm not immediately seeing what else needs the
> > gendisk in the I/O path. Is there another motivation I'm missing?
>
> ->private_data is right next to the ->queue pointer, pat0 and part_tbl
> which are all used in the I/O submission path (generic_make_request /
> generic_make_request_checks).  This is mostly a prep cleanup patch to
> also remove the pointless queue argument from ->make_request - then
> ->queue is an extra dereference and extra churn.

Ah ok. If the changelogs had been filled in with something like "In
preparation for removing @q from make_request_fn, stop using
->queuedata", I probably wouldn't have looked twice.

For the nvdimm/ driver updates you can add:

Reviewed-by: Dan Williams 

...or just let me know if you want me to pick those up through the nvdimm tree.


Re: [PATCH] drm/panel: visionox-rm69299: Add module license

2020-05-09 Thread Jason Yan



在 2020/5/9 14:36, Randy Dunlap 写道:

On 5/8/20 11:30 PM, Jason Yan wrote:

Fix the following build warning:

WARNING: modpost: missing MODULE_LICENSE() in 
drivers/gpu/drm/panel/panel-visionox-rm69299.o
see include/linux/module.h for more information

Signed-off-by: Jason Yan 
---
  drivers/gpu/drm/panel/panel-visionox-rm69299.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/panel/panel-visionox-rm69299.c 
b/drivers/gpu/drm/panel/panel-visionox-rm69299.c
index 25fe8b0bb040..ec279ffdbd94 100644
--- a/drivers/gpu/drm/panel/panel-visionox-rm69299.c
+++ b/drivers/gpu/drm/panel/panel-visionox-rm69299.c
@@ -300,3 +300,4 @@ static struct mipi_dsi_driver visionox_rm69299_driver = {
  module_mipi_dsi_driver(visionox_rm69299_driver);
  
  MODULE_DESCRIPTION("Visionox RM69299 DSI Panel Driver");

+MODULE_LICENSE("GPL");



I sent a patch for this yesterday.



I'm sorry. Please ignore this one.

Thanks,







Re: [PATCH v2] kernel: add panic_on_taint

2020-05-09 Thread Rafael Aquini
On Sat, May 09, 2020 at 03:48:54AM +, Luis Chamberlain wrote:
> On Fri, May 08, 2020 at 08:47:19AM -0400, Rafael Aquini wrote:
> > On Thu, May 07, 2020 at 10:25:58PM +, Luis Chamberlain wrote:
> > > On Thu, May 07, 2020 at 06:06:06PM -0400, Rafael Aquini wrote:
> > > > On Thu, May 07, 2020 at 08:33:40PM +, Luis Chamberlain wrote:
> > > > > I *think* that a cmdline route to enable this would likely remove the
> > > > > need for the kernel config for this. But even with Vlastimil's work
> > > > > merged, I think we'd want yet-another value to enable / disable this
> > > > > feature. Do we need yet-another-taint flag to tell us that this 
> > > > > feature
> > > > > was enabled?
> > > > >
> > > > 
> > > > I guess it makes sense to get rid of the sysctl interface for
> > > > proc_on_taint, and only keep it as a cmdline option. 
> > > 
> > > That would be easier to support and k3eps this simple.
> > > 
> > > > But the real issue seems to be, regardless we go with a cmdline-only 
> > > > option
> > > > or not, the ability of proc_taint() to set any arbitrary taint flag 
> > > > other than just marking the kernel with TAINT_USER. 
> > > 
> > > I think we would have no other option but to add a new TAINT flag so
> > > that we know that the taint flag was modified by a user. Perhaps just
> > > re-using TAINT_USER when proc_taint() would suffice.
> > >
> > 
> > We might not need an extra taint flag if, perhaps, we could make these
> > two features mutually exclusive. The idea here is that bitmasks added 
> > via panic_on_taint get filtered out in proc_taint(), so a malicious 
> > user couldn't exploit the latter interface to easily panic the system,
> > when the first one is also in use. 
> 
> I get it, however I I can still see the person who enables enabling
> panic-on-tain wanting to know if proc_taint() was used. So even if
> it was not on their mask, if it was modified that seems like important
> information for a bug report analysis.
>

For that purpose (tracking user taints) I think sth between these lines
would work:

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8a176d8727a3..651a82c13621 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2602,6 +2602,9 @@ int proc_douintvec(struct ctl_table *table, int write,
 do_proc_douintvec_conv, NULL);
 }

+/* track which taint bits were set by the user */
+static unsigned long user_tainted;
+
 /*
  * Taint values can only be increased
  * This means we can safely use a temporary.
@@ -2629,11 +2632,20 @@ static int proc_taint(struct ctl_table *table, int 
write,
 */
int i;
for (i = 0; i < BITS_PER_LONG && tmptaint >> i; i++) {
-   if ((tmptaint >> i) & 1)
+   if ((tmptaint >> i) & 1) {
+   set_bit(i, _tainted);
add_taint(i, LOCKDEP_STILL_OK);
+   }
}
}

+   /*
+* Users with SYS_ADMIN capability can fiddle with any arbitrary
+* taint flag through this interface.
+* If that's the case, we also need to mark the kernel "tainted by user"
+*/
+   add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
return err;
 }


I don't think, though, it's panic_on_taint work to track that. I posted a v3 for
this feature with a way to select if one wants to avoid user forced taints
triggering panic() for flags also set for panic_on_taint.

Cheers,

-- Rafael



Re: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS contexts to handle RSS tables

2020-05-09 Thread Russell King - ARM Linux admin
On Sat, May 09, 2020 at 02:51:05PM +0100, Russell King - ARM Linux admin wrote:
> On Sat, May 09, 2020 at 03:14:05PM +0200, Matteo Croce wrote:
> > On Sat, May 9, 2020 at 1:45 PM Russell King - ARM Linux admin
> >  wrote:
> > >
> > > On Sat, May 09, 2020 at 11:15:58AM +, Stefan Chulski wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Matteo Croce 
> > > > > Sent: Saturday, May 9, 2020 3:13 AM
> > > > > To: David S . Miller 
> > > > > Cc: Maxime Chevallier ; netdev
> > > > > ; LKML ; Antoine
> > > > > Tenart ; Thomas Petazzoni
> > > > > ; gregory.clem...@bootlin.com;
> > > > > miquel.ray...@bootlin.com; Nadav Haklai ; Stefan
> > > > > Chulski ; Marcin Wojtas ; 
> > > > > Linux
> > > > > ARM ; Russell King - ARM Linux 
> > > > > admin
> > > > > 
> > > > > Subject: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS 
> > > > > contexts to
> > > > > handle RSS tables
> > > > >
> > > > > Hi,
> > > > >
> > > > > What do you think about temporarily disabling it like this?
> > > > >
> > > > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > > > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > > > @@ -5775,7 +5775,8 @@ static int mvpp2_port_probe(struct 
> > > > > platform_device
> > > > > *pdev,
> > > > > NETIF_F_HW_VLAN_CTAG_FILTER;
> > > > >
> > > > > if (mvpp22_rss_is_supported()) {
> > > > > -   dev->hw_features |= NETIF_F_RXHASH;
> > > > > +   if (port->phy_interface != PHY_INTERFACE_MODE_SGMII)
> > > > > +   dev->hw_features |= NETIF_F_RXHASH;
> > > > > dev->features |= NETIF_F_NTUPLE;
> > > > > }
> > > > >
> > > > >
> > > > > David, is this "workaround" too bad to get accepted?
> > > >
> > > > Not sure that RSS related to physical interface(SGMII), better just 
> > > > remove NETIF_F_RXHASH as "workaround".
> > >
> > > Hmm, I'm not sure this is the right way forward.  This patch has the
> > > effect of disabling:
> > >
> > > d33ec4525007 ("net: mvpp2: add an RSS classification step for each flow")
> > >
> > > but the commit you're pointing at which caused the regression is:
> > >
> > > 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables")
> > >
> > >
> > 
> > Hi,
> > 
> > When git bisect pointed to 895586d5dc32 ("net: mvpp2: cls: Use RSS
> > contexts to handle RSS tables"), which was merged
> > almost an year after d33ec4525007 ("net: mvpp2: add an RSS
> > classification step for each flow"), so I assume that between these
> > two commits either the feature was working or it was disable and we
> > didn't notice
> > 
> > Without knowing what was happening, which commit should my Fixes tag point 
> > to?
> 
> Let me make sure that I get this clear:
> 
> - Prior to 895586d5dc32, you can turn on and off rxhash without issue
>   on any port.
> - After 895586d5dc32, turning rxhash on eth2 prevents reception.
> 
> Prior to 895586d5dc32, with rxhash on, it looks like hashing using
> CRC32 is supported but only one context.  So, if it's possible to
> enable rxhash on any port on the mcbin without 895586d5dc32, and the
> port continues to work, I'd say the bug was introduced by
> 895586d5dc32.
> 
> Of course, that would be reinforced if there was a measurable
> difference in performance due to rxhash on each port.

I've just run this test, but I can detect no difference in performance
with or without 895586d5dc32 on eth0 or eth2 on the mcbin (apart from
eth2 stopping working with 895586d5dc32 applied.)  I tested this by
reverting almost all changes to the mvpp2 driver between 5.6 and that
commit.

That's not too surprising; I'm using my cex7 platform with the Mellanox
card in for one end of the 10G link, and that platform doesn't seem to
be able to saturdate a 10G link - it only seems to manage around 4Gbps.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up


Re: [PATCH] memcg: expose root cgroup's memory.stat

2020-05-09 Thread Yafang Shao
On Sat, May 9, 2020 at 10:06 PM Shakeel Butt  wrote:
>
> On Fri, May 8, 2020 at 2:44 PM Johannes Weiner  wrote:
> >
> > On Fri, May 08, 2020 at 10:06:30AM -0700, Shakeel Butt wrote:
> > > One way to measure the efficiency of memory reclaim is to look at the
> > > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> > > not updated consistently at the system level and the ratio of these are
> > > not very meaningful. The pgsteal and pgscan are updated for only global
> > > reclaim while pgrefill gets updated for global as well as cgroup
> > > reclaim.
> > >
> > > Please note that this difference is only for system level vmstats. The
> > > cgroup stats returned by memory.stat are actually consistent. The
> > > cgroup's pgsteal contains number of reclaimed pages for global as well
> > > as cgroup reclaim. So, one way to get the system level stats is to get
> > > these stats from root's memory.stat, so, expose memory.stat for the root
> > > cgroup.
> > >
> > >   from Johannes Weiner:
> > >   There are subtle differences between /proc/vmstat and
> > >   memory.stat, and cgroup-aware code that wants to watch the full
> > >   hierarchy currently has to know about these intricacies and
> > >   translate semantics back and forth.
> > >
> > >   Generally having the fully recursive memory.stat at the root
> > >   level could help a broader range of usecases.
> >
> > The changelog begs the question why we don't just "fix" the
> > system-level stats. It may be useful to include the conclusions from
> > that discussion, and why there is value in keeping the stats this way.
> >
>
> Right. Andrew, can you please add the following para to the changelog?
>
> Why not fix the stats by including both the global and cgroup reclaim
> activity instead of exposing root cgroup's memory.stat? The reason is
> the benefit of having metrics exposing the activity that happens
> purely due to machine capacity rather than localized activity that
> happens due to the limits throughout the cgroup tree. Additionally
> there are userspace tools like sysstat(sar) which reads these stats to
> inform about the system level reclaim activity. So, we should not
> break such use-cases.
>

Acked-by: Yafang Shao 

> > > Signed-off-by: Shakeel Butt 
> > > Suggested-by: Johannes Weiner 
> >
> > Acked-by: Johannes Weiner 
>
> Thanks a lot.



-- 
Thanks
Yafang


Re: [RFC] splice/tee: len=0 fast path after validity check

2020-05-09 Thread Jens Axboe
On 5/9/20 2:46 AM, Pavel Begunkov wrote:
> When len=0, splice() and tee() return 0 even if specified fds are
> invalid, hiding errors from users. Move len=0 optimisation later after
> basic validity checks.
> 
> before:
> splice(len=0, fd_in=-1, ...) == 0;
> 
> after:
> splice(len=0, fd_in=-1, ...) == -EBADF;

I'm not sure what the purpose of this would be. It probably should have
been done that way from the beginning, but it wasn't.  While there's
very little risk of breaking any applications due to this change, it
also seems like a pointless exercise at this point.

So my suggestion would be to just leave it alone.

-- 
Jens Axboe



Re: [RFC PATCH 00/13] Core scheduling v5

2020-05-09 Thread Dario Faggioli
On Tue, 2020-04-14 at 16:21 +0200, Peter Zijlstra wrote:
> On Wed, Mar 04, 2020 at 04:59:50PM +, vpillai wrote:
> > 
> > - Investigate the source of the overhead even when no tasks are
> > tagged:
> >   https://lkml.org/lkml/2019/10/29/242
> 
>  - explain why we're all still doing this 
> 
> Seriously, what actual problems does it solve? The patch-set still
> isn't
> L1TF complete and afaict it does exactly nothing for MDS.
> 
Hey Peter! Late to the party, I know...

But I'm replying anyway. At least, you'll have the chance to yell at me
for this during OSPM. ;-P

> Like I've written many times now, back when the world was simpler and
> all we had to worry about was L1TF, core-scheduling made some sense,
> but
> how does it make sense today?
> 
Indeed core-scheduling alone doesn't even completely solve L1TF. There
are the interrupts and the VMEXITs issues. Both are being discussed in
this thread and, FWIW, my personal opinion is that the way to go is
what Alex says here:

<79529592-5d60-2a41-fbb6-4a5f8279f...@amazon.com>

(E.g., when he mentions solution 4 "Create a "safe" page table which
runs with HT enabled", etc).

But let's stick to your point: if it were only for L1TF, then fine, but
it's all pointless because of MDS. My answer to this is very much
focused on my usecase, which is virtualization. I know you hate us, and
you surely have your good reasons, but you know... :-)

Correct me if I'm wrong, but I think that the "nice" thing of L1TF is
that it allows a VM to spy on another VM or on the host, but it does
not allow a regular task to spy on another task or on the kernel (well,
it would, but it's easily mitigated).

The bad thing about MDS is that it instead allow *all* of that.

Now, one thing that we absolutely want to avoid in virt is that a VM is
able to spy on other VMs or on the host. Sure, we also care about tasks
running in our VMs to be safe, but, really, inter-VM and VM-to-host
isolation is the primary concern of an hypervisor.

And how a VM (or stuff running inside a VM) can spy on another VM or on
the host, via L1TF or MDS? Well, if the attacker VM and the victim VM
--or if the attacker VM and the host-- are running on the same core. If
they're not, it can't... which is basically an L1TF-only looking
scenario.

So, in virt, core-scheduling:
1) is the *only* way (aside from no-EPT) to prevent attacker VM to spy 
   on victim VM, if they're running concurrently, both in guest mode,
   on the same core (and that's, of course, because with
   core-scheduling they just won't be doing that :-) )
2) interrupts and VMEXITs needs being taken care of --which was the 
   case already when, as you said "we had only L1TF". Once that is done
   we will effectively prevent all VM to VM and VM to host attack
   scenarios.

Sure, it will still be possible, for instance, for task_A in VM1 to spy
on task_B, also in VM1. This seems to be, AFAIUI, Joel's usecase, so
I'm happy to leave it to him to defend that, as he's doing already (but
indeed I'm very happy to see that it is also getting attention).

Now, of course saying anything like "works for my own usecase so let's
go for it" does not fly. But since you were asking whether and how this
feature could make sense today, suppose that:
1) we get core-scheduling,
2) we find a solution for irqs and VMEXITs, as we would have to if 
   there was only L1TF,
3) we manage to make the overhead of core-scheduling close to zero 
   when it's there (I mean, enabled at compile time) but not used (I
   mean, no tagging of tasks, or whatever).

That would mean that virt people can enable core-scheduling, and
achieve good inter-VM and VM-to-host isolation, without imposing
overhead to other use cases, that would leave core-scheduling disabled.

And this is something that I would think it makes sense.

Of course, we're not there... because even when this series will give
us point 1, we will also need 2 and we need to make sure we also
satisfy 3 (and we weren't, last time I checked ;-P).

But I think it's worth keeping trying.

I'd also add a couple of more ideas, still about core-scheduling in
virt, but from a different standpoint than security:
- if I tag vcpu0 and vcpu1 together[*], then vcpu2 and vcpu3 together,
  then vcpu4 and vcpu5 together, then I'm sure that each pair will
  always be scheduled on the same core. At which point I can define
  an SMT virtual topology, for the VM, that will make sense, even
  without pinning the vcpus;
- if I run VMs from different customers, when vcpu2 of VM1 and vcpu1
  of VM2 run on the same core, they influence each others' performance.
  If, e.g., I bill basing on time spent on CPUs, it means customer
  A's workload, running in VM1, may influence the billing of customer
  B, who owns VM2. With core scheduling, if I tag all the vcpus of each
  VM together, I won't have this any longer.

[*] with "tag together" I mean let them have the same tag which, ATM
would be "put them in the same cgroup and enable cpu.tag".


Re: ofono for d4: less hcked and more working version was Re: USB networking news, ofono for d4: less hacked version

2020-05-09 Thread Tony Lindgren
* Pavel Machek  [200508 10:03]:
> I pushed new version of ofono: I'm still not sure about those incoming
> sms (but _some_ sms are received). Rest should be better.

Please check that you have applied commit 738b150ecefb ("ARM: dts:
omap4-droid4: Fix occasional lost wakeirq for uart1"), otherwise incoming
SMS may not always show up, and GPS can stop producing data.

Hmm for ofono motchat, why not handle the U part directly in motchat
and use just a timestamp based ID there?

Regards,

Tony


Re: [PATCH 4/6] exec: Run sync_mm_rss before taking exec_update_mutex

2020-05-09 Thread Eric W. Biederman
Kees Cook  writes:

> $ git grep exec_mm_release
> fs/exec.c:  exec_mm_release(tsk, old_mm);
> include/linux/sched/mm.h:extern void exec_mm_release(struct task_struct *, 
> struct mm_struct *);
> kernel/fork.c:void exec_mm_release(struct task_struct *tsk, struct mm_struct 
> *mm)
>
> kernel/fork.c:
>
> void exit_mm_release(struct task_struct *tsk, struct mm_struct *mm)
> {
> futex_exit_release(tsk);
> mm_release(tsk, mm);
> }
>
> void exec_mm_release(struct task_struct *tsk, struct mm_struct *mm)
> {
> futex_exec_release(tsk);
> mm_release(tsk, mm);
> }
>
> $ git grep exit_mm_release
> include/linux/sched/mm.h:extern void exit_mm_release(struct task_struct *, 
> struct mm_struct *);
> kernel/exit.c:  exit_mm_release(current, mm);
> kernel/fork.c:void exit_mm_release(struct task_struct *tsk, struct mm_struct 
> *mm)
>
> kernel/exit.c:
>
> exit_mm_release(current, mm);
> if (!mm)
> return;
> sync_mm_rss(mm);
>
> It looks to me like both exec_mm_release() and exit_mm_release() could
> easily have the sync_mm_rss(...) folded into their function bodies and
> removed from the callers. *shrug*

Well it would have to be all of:
if (mm) 
sync_mm_rss(mm);

I remember reading through exit_mm_release and seeing that nothing
actually depended upon a non-NULL mm. Unless you have clear_child_tid
set.

I am not up to speed on that part of the mm layer right now to know if
it is a good idea to put sync_mm_rss in exit_mm_release but at a quick
look it feels like it.

Eric



[PATCH] mm: fix LRU balancing effect of new transparent huge pages

2020-05-09 Thread Shakeel Butt
From: Johannes Weiner 

Currently, THP are counted as single pages until they are split right
before being swapped out. However, at that point the VM is already in
the middle of reclaim, and adjusting the LRU balance then is useless.

Always account THP by the number of basepages, and remove the fixup
from the splitting path.

Signed-off-by: Johannes Weiner 
Signed-off-by: Shakeel Butt 
---
Revived the patch from https://lore.kernel.org/patchwork/patch/685703/

 mm/swap.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 4eb179ee0b72..b75c0ce90418 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -262,14 +262,14 @@ void rotate_reclaimable_page(struct page *page)
}
 }
 
-static void update_page_reclaim_stat(struct lruvec *lruvec,
-int file, int rotated)
+static void update_page_reclaim_stat(struct lruvec *lruvec, int file,
+int rotated, int nr_pages)
 {
struct zone_reclaim_stat *reclaim_stat = >reclaim_stat;
 
-   reclaim_stat->recent_scanned[file]++;
+   reclaim_stat->recent_scanned[file] += nr_pages;
if (rotated)
-   reclaim_stat->recent_rotated[file]++;
+   reclaim_stat->recent_rotated[file] += nr_pages;
 }
 
 static void __activate_page(struct page *page, struct lruvec *lruvec,
@@ -288,7 +288,7 @@ static void __activate_page(struct page *page, struct 
lruvec *lruvec,
 
__count_vm_events(PGACTIVATE, nr_pages);
__count_memcg_events(lruvec_memcg(lruvec), PGACTIVATE, 
nr_pages);
-   update_page_reclaim_stat(lruvec, file, 1);
+   update_page_reclaim_stat(lruvec, file, 1, nr_pages);
}
 }
 
@@ -546,7 +546,7 @@ static void lru_deactivate_file_fn(struct page *page, 
struct lruvec *lruvec,
__count_vm_events(PGDEACTIVATE, nr_pages);
__count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, 
nr_pages);
}
-   update_page_reclaim_stat(lruvec, file, 0);
+   update_page_reclaim_stat(lruvec, file, 0, nr_pages);
 }
 
 static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
@@ -564,7 +564,7 @@ static void lru_deactivate_fn(struct page *page, struct 
lruvec *lruvec,
 
__count_vm_events(PGDEACTIVATE, nr_pages);
__count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, 
nr_pages);
-   update_page_reclaim_stat(lruvec, file, 0);
+   update_page_reclaim_stat(lruvec, file, 0, nr_pages);
}
 }
 
@@ -590,7 +590,7 @@ static void lru_lazyfree_fn(struct page *page, struct 
lruvec *lruvec,
 
__count_vm_events(PGLAZYFREE, nr_pages);
__count_memcg_events(lruvec_memcg(lruvec), PGLAZYFREE, 
nr_pages);
-   update_page_reclaim_stat(lruvec, 1, 0);
+   update_page_reclaim_stat(lruvec, 1, 0, nr_pages);
}
 }
 
@@ -899,8 +899,6 @@ EXPORT_SYMBOL(__pagevec_release);
 void lru_add_page_tail(struct page *page, struct page *page_tail,
   struct lruvec *lruvec, struct list_head *list)
 {
-   const int file = 0;
-
VM_BUG_ON_PAGE(!PageHead(page), page);
VM_BUG_ON_PAGE(PageCompound(page_tail), page);
VM_BUG_ON_PAGE(PageLRU(page_tail), page);
@@ -926,9 +924,6 @@ void lru_add_page_tail(struct page *page, struct page 
*page_tail,
add_page_to_lru_list_tail(page_tail, lruvec,
  page_lru(page_tail));
}
-
-   if (!PageUnevictable(page))
-   update_page_reclaim_stat(lruvec, file, PageActive(page_tail));
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
@@ -973,7 +968,7 @@ static void __pagevec_lru_add_fn(struct page *page, struct 
lruvec *lruvec,
if (page_evictable(page)) {
lru = page_lru(page);
update_page_reclaim_stat(lruvec, page_is_file_lru(page),
-PageActive(page));
+PageActive(page), nr_pages);
if (was_unevictable)
__count_vm_events(UNEVICTABLE_PGRESCUED, nr_pages);
} else {
-- 
2.26.2.645.ge9eca65c58-goog



[PATCH net] mvpp2: enable rxhash only on the first port

2020-05-09 Thread Matteo Croce
Currently rxhash only works on the first port of the CP (Communication
Processor). Enabling it on other ports completely blocks packet reception.
This patch only adds rxhash as supported feature to the first port,
so rxhash can't be enabled on other ports:

# ethtool -K eth0 rxhash on
# ethtool -K eth1 rxhash on
# ethtool -K eth2 rxhash on
Cannot change receive-hashing
Could not change any device features
# ethtool -K eth3 rxhash on
Cannot change receive-hashing
Could not change any device features

Fixes: 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables")
Signed-off-by: Matteo Croce 
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c 
b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 2b5dad2ec650..ba71583c7ae3 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -5423,7 +5423,8 @@ static int mvpp2_port_probe(struct platform_device *pdev,
NETIF_F_HW_VLAN_CTAG_FILTER;
 
if (mvpp22_rss_is_supported()) {
-   dev->hw_features |= NETIF_F_RXHASH;
+   if (port->id == 0)
+   dev->hw_features |= NETIF_F_RXHASH;
dev->features |= NETIF_F_NTUPLE;
}
 
-- 
2.26.2



[PATCH] gpiolib: notify user-space about line status changes after flags are set

2020-05-09 Thread Bartosz Golaszewski
From: Bartosz Golaszewski 

Currently we emit the REQUESTED line state event after the line is
requested but before the flags are configured. This is obviously wrong
as we want to pass the updated lineinfo to user-space together with the
event.

Since the flags can be configured in different ways depending on how the
line is being requested - we need to call the notifier chain in different
places separately.

Fixes: 51c1064e82e7 ("gpiolib: add new ioctl() for monitoring changes in line 
info")
Signed-off-by: Bartosz Golaszewski 
---
This comes late in the release cycle but I only recently got down to
writing libgpiod support for this new ioctl(). When writing test cases
I noticed this doesn't really work as expected. This patch fixes the
issue I identified this week. There may be more coming the following
week though...

 drivers/gpio/gpiolib.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 40f2d7f69be2..550751a6e51c 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -729,6 +729,10 @@ static int linehandle_create(struct gpio_device *gdev, 
void __user *ip)
if (ret)
goto out_free_descs;
}
+
+   atomic_notifier_call_chain(>gdev->notifier,
+  GPIOLINE_CHANGED_REQUESTED, desc);
+
dev_dbg(>dev, "registered chardev handle for line %d\n",
offset);
}
@@ -1083,6 +1087,9 @@ static int lineevent_create(struct gpio_device *gdev, 
void __user *ip)
if (ret)
goto out_free_desc;
 
+   atomic_notifier_call_chain(>gdev->notifier,
+  GPIOLINE_CHANGED_REQUESTED, desc);
+
le->irq = gpiod_to_irq(desc);
if (le->irq <= 0) {
ret = -ENODEV;
@@ -2975,8 +2982,6 @@ static int gpiod_request_commit(struct gpio_desc *desc, 
const char *label)
}
 done:
spin_unlock_irqrestore(_lock, flags);
-   atomic_notifier_call_chain(>gdev->notifier,
-  GPIOLINE_CHANGED_REQUESTED, desc);
return ret;
 }
 
@@ -4938,6 +4943,9 @@ struct gpio_desc *__must_check gpiod_get_index(struct 
device *dev,
return ERR_PTR(ret);
}
 
+   atomic_notifier_call_chain(>gdev->notifier,
+  GPIOLINE_CHANGED_REQUESTED, desc);
+
return desc;
 }
 EXPORT_SYMBOL_GPL(gpiod_get_index);
@@ -5003,6 +5011,9 @@ struct gpio_desc *fwnode_get_named_gpiod(struct 
fwnode_handle *fwnode,
return ERR_PTR(ret);
}
 
+   atomic_notifier_call_chain(>gdev->notifier,
+  GPIOLINE_CHANGED_REQUESTED, desc);
+
return desc;
 }
 EXPORT_SYMBOL_GPL(fwnode_get_named_gpiod);
-- 
2.25.0



Re: linux-next 20200508 - build failure in kernel/resource.c w/ SPARSEMEM=n

2020-05-09 Thread Christoph Hellwig
I think the problem is the select of CONFIG_DEVICE_PRIVATE.  Jason is
looking into it already.


Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line

2020-05-09 Thread Alexey Kardashevskiy



On 09/05/2020 18:19, Christoph Hellwig wrote:
> On Tue, May 05, 2020 at 02:18:37PM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 17/04/2020 17:58, Christoph Hellwig wrote:
>>> On Wed, Apr 15, 2020 at 09:21:37PM +1000, Alexey Kardashevskiy wrote:
 And the fact they were exported leaves possibility that there is a
 driver somewhere relying on these symbols or distro kernel won't build
 because the symbol disappeared from exports (I do not know what KABI
 guarantees or if mainline kernel cares).
>>>
>>> We absolutely do not care.  In fact for abuses of APIs that drivers
>>> should not use we almost care to make them private and break people
>>> abusing them.
>>
>> ok :)
>>
 I do not care in particular but
 some might, a line separated with empty lines in the commit log would do.
>>>
>>> I'll add a blurb for the next version.
>>
>>
>> Has it gone anywhere? Thanks,
> 
> I've been hoping for the sg_buf helpers to land first, as they need
> backporting and would conflict.  Do you urgently need the series?

Nah, not that urgent. Thanks,


-- 
Alexey


Re: [PATCH] memcg: expose root cgroup's memory.stat

2020-05-09 Thread Shakeel Butt
On Fri, May 8, 2020 at 2:44 PM Johannes Weiner  wrote:
>
> On Fri, May 08, 2020 at 10:06:30AM -0700, Shakeel Butt wrote:
> > One way to measure the efficiency of memory reclaim is to look at the
> > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> > not updated consistently at the system level and the ratio of these are
> > not very meaningful. The pgsteal and pgscan are updated for only global
> > reclaim while pgrefill gets updated for global as well as cgroup
> > reclaim.
> >
> > Please note that this difference is only for system level vmstats. The
> > cgroup stats returned by memory.stat are actually consistent. The
> > cgroup's pgsteal contains number of reclaimed pages for global as well
> > as cgroup reclaim. So, one way to get the system level stats is to get
> > these stats from root's memory.stat, so, expose memory.stat for the root
> > cgroup.
> >
> >   from Johannes Weiner:
> >   There are subtle differences between /proc/vmstat and
> >   memory.stat, and cgroup-aware code that wants to watch the full
> >   hierarchy currently has to know about these intricacies and
> >   translate semantics back and forth.
> >
> >   Generally having the fully recursive memory.stat at the root
> >   level could help a broader range of usecases.
>
> The changelog begs the question why we don't just "fix" the
> system-level stats. It may be useful to include the conclusions from
> that discussion, and why there is value in keeping the stats this way.
>

Right. Andrew, can you please add the following para to the changelog?

Why not fix the stats by including both the global and cgroup reclaim
activity instead of exposing root cgroup's memory.stat? The reason is
the benefit of having metrics exposing the activity that happens
purely due to machine capacity rather than localized activity that
happens due to the limits throughout the cgroup tree. Additionally
there are userspace tools like sysstat(sar) which reads these stats to
inform about the system level reclaim activity. So, we should not
break such use-cases.

> > Signed-off-by: Shakeel Butt 
> > Suggested-by: Johannes Weiner 
>
> Acked-by: Johannes Weiner 

Thanks a lot.


Re: [PATCH 3/3] mm: swap: fix update_page_reclaim_stat for huge pages

2020-05-09 Thread Shakeel Butt
On Fri, May 8, 2020 at 2:51 PM Johannes Weiner  wrote:
>
> On Fri, May 08, 2020 at 02:22:15PM -0700, Shakeel Butt wrote:
> > Currently update_page_reclaim_stat() updates the lruvec.reclaim_stats
> > just once for a page irrespective if a page is huge or not. Fix that by
> > passing the hpage_nr_pages(page) to it.
> >
> > Signed-off-by: Shakeel Butt 
>
> https://lore.kernel.org/patchwork/patch/685703/
>
> Laughs, then cries.
>

What happened to that patch? Fell through the cracks?

> > @@ -928,7 +928,7 @@ void lru_add_page_tail(struct page *page, struct page 
> > *page_tail,
> >   }
> >
> >   if (!PageUnevictable(page))
> > - update_page_reclaim_stat(lruvec, file, PageActive(page_tail));
> > + update_page_reclaim_stat(lruvec, file, PageActive(page_tail), 
> > 1);
>
> The change to __pagevec_lru_add_fn() below makes sure the tail pages
> are already accounted. This would make them count twice.
>

Yes, you are right. I will just re-send your patch after rebase.



> >  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >
> > @@ -973,7 +973,7 @@ static void __pagevec_lru_add_fn(struct page *page, 
> > struct lruvec *lruvec,
> >   if (page_evictable(page)) {
> >   lru = page_lru(page);
> >   update_page_reclaim_stat(lruvec, page_is_file_lru(page),
> > -  PageActive(page));
> > +  PageActive(page), nr_pages);
> >   if (was_unevictable)
> >   __count_vm_events(UNEVICTABLE_PGRESCUED, nr_pages);
> >   } else {


Re: [PATCHv3] w1: omap-hdq: Simplify driver with PM runtime autosuspend

2020-05-09 Thread Tony Lindgren
* H. Nikolaus Schaller  [200509 11:48]:
> I have found another small bug and a dev_dbg format weakness
> and now it seems to work well even if I remove or reinsert the
> battery while read operations are ongoing.

OK great.

> Still I need more time to fix up the patch(es).

Well it's ready when it's ready :)

Tony


[PATCH v3] kernel: add panic_on_taint

2020-05-09 Thread Rafael Aquini
Analogously to the introduction of panic_on_warn, this patch
introduces a kernel option named panic_on_taint in order to
provide a simple and generic way to stop execution and catch
a coredump when the kernel gets tainted by any given taint flag.

This is useful for debugging sessions as it avoids rebuilding
the kernel to explicitly add calls to panic() or BUG() into
code sites that introduce the taint flags of interest.
Another, perhaps less frequent, use for this option would be
as a mean for assuring a security policy (in paranoid mode)
case where no single taint is allowed for the running system.

Suggested-by: Qian Cai 
Signed-off-by: Rafael Aquini 
---
Changelog:
* v2: get rid of unnecessary/misguided compiler hints   (Luis)
* v2: enhance documentation text for the new kernel parameter   (Randy)
* v3: drop sysctl interface, keep it only as a kernel parameter (Luis)

 Documentation/admin-guide/kdump/kdump.rst | 10 +
 .../admin-guide/kernel-parameters.txt | 15 +++
 include/linux/kernel.h|  2 +
 kernel/panic.c| 40 +++
 kernel/sysctl.c   |  9 -
 5 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst 
b/Documentation/admin-guide/kdump/kdump.rst
index ac7e131d2935..de3cf6d377cc 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -521,6 +521,16 @@ will cause a kdump to occur at the panic() call.  In cases 
where a user wants
 to specify this during runtime, /proc/sys/kernel/panic_on_warn can be set to 1
 to achieve the same behaviour.
 
+Trigger Kdump on add_taint()
+
+
+The kernel parameter, panic_on_taint, calls panic() from within add_taint(),
+whenever the value set in this bitmask matches with the bit flag being set
+by add_taint(). This will cause a kdump to occur at the panic() call.
+In cases where a user wants to specify this during runtime,
+/proc/sys/kernel/panic_on_taint can be set to a respective bitmask value
+to achieve the same behaviour.
+
 Contact
 ===
 
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 7bc83f3d9bdf..4a69fe49a70d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3404,6 +3404,21 @@
panic_on_warn   panic() instead of WARN().  Useful to cause kdump
on a WARN().
 
+   panic_on_taint= [KNL] conditionally panic() in add_taint()
+   Format: 
+   Specifies, as a string, the TAINT flag set that will
+   compose a bitmask for calling panic() when the kernel
+   gets tainted.
+   See Documentation/admin-guide/tainted-kernels.rst for
+   details on the taint flags that users can pick to
+   compose the bitmask to assign to panic_on_taint.
+   When the string is prefixed with a '-' the bitmask
+   set in panic_on_taint will be mutually exclusive
+   with the sysctl knob kernel.tainted, and any attempt
+   to write to that sysctl will fail with -EINVAL for
+   any taint value that masks with the flags set for
+   this option.
+
crash_kexec_post_notifiers
Run kdump after running panic-notifiers and dumping
kmsg. This only for the users who doubt kdump always
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 9b7a8d74a9d6..66bc102cb59a 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -528,6 +528,8 @@ extern int panic_on_oops;
 extern int panic_on_unrecovered_nmi;
 extern int panic_on_io_nmi;
 extern int panic_on_warn;
+extern unsigned long panic_on_taint;
+extern bool panic_on_taint_exclusive;
 extern int sysctl_panic_on_rcu_stall;
 extern int sysctl_panic_on_stackoverflow;
 
diff --git a/kernel/panic.c b/kernel/panic.c
index b69ee9e76cb2..65c62f8a1de8 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -44,6 +45,8 @@ static int pause_on_oops_flag;
 static DEFINE_SPINLOCK(pause_on_oops_lock);
 bool crash_kexec_post_notifiers;
 int panic_on_warn __read_mostly;
+unsigned long panic_on_taint;
+bool panic_on_taint_exclusive = false;
 
 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
@@ -434,6 +437,11 @@ void add_taint(unsigned flag, enum lockdep_ok lockdep_ok)
pr_warn("Disabling lock debugging due to kernel taint\n");
 
set_bit(flag, _mask);
+
+   if (tainted_mask & panic_on_taint) {
+   panic_on_taint = 0;
+   panic("panic_on_taint set 

Re: ofono for d4: less hcked and more working version was Re: USB networking news, ofono for d4: less hacked version

2020-05-09 Thread Tony Lindgren
* Pavel Machek  [200508 10:03]:
> Hi!
> 
> > > But I might be confused. I recall some audio patches were needed for
> > > basic phone calls (setting up mixers to connect gsm<->audio), but
> > > those worked before gsmux support was enabled. (Maybe some hardcoded
> > > commands were needed to be sent to gsmmux somewhere).
> > 
> > We're currently reconfiguring the TDM transport that based on the
> > unsolicited messages on dlci1. I still need to figure out how to add
> > that back while keeping the serdev-ngsm driver generic.
> 
> Is it really neccessary? I believe I was simply configuring codecs for
> voice call and left them like that.

Yes something needs to call set_tdm_slot() for voice calls. AFAIK we have
only the dlci1 messages indicating a voice call is happening.

So I got that added directly into the snd-soc-motmdm driver where it now
listens to notifications on dlci1 while we still have /dev/gsmtty1 for
userspace. Looks like I still need to fix the mixer controls before I can
test voice calls again though.. Will post the patches as soon as I get
that done.

I'm not yet sure what's the preferred way to do notifications with ALSA,
but based on the earlier comments the piece of code calling set_tdm_slot()
should not be the codec driver. Instead, it should be called by either
snd-soc-audio-graph-card or a custom card driver for the device.

> I pushed new version of ofono: I'm still not sure about those incoming
> sms (but _some_ sms are received). Rest should be better.

OK

Regards,

Tony


Re: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS contexts to handle RSS tables

2020-05-09 Thread Russell King - ARM Linux admin
On Sat, May 09, 2020 at 03:14:05PM +0200, Matteo Croce wrote:
> On Sat, May 9, 2020 at 1:45 PM Russell King - ARM Linux admin
>  wrote:
> >
> > On Sat, May 09, 2020 at 11:15:58AM +, Stefan Chulski wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Matteo Croce 
> > > > Sent: Saturday, May 9, 2020 3:13 AM
> > > > To: David S . Miller 
> > > > Cc: Maxime Chevallier ; netdev
> > > > ; LKML ; Antoine
> > > > Tenart ; Thomas Petazzoni
> > > > ; gregory.clem...@bootlin.com;
> > > > miquel.ray...@bootlin.com; Nadav Haklai ; Stefan
> > > > Chulski ; Marcin Wojtas ; Linux
> > > > ARM ; Russell King - ARM Linux 
> > > > admin
> > > > 
> > > > Subject: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS 
> > > > contexts to
> > > > handle RSS tables
> > > >
> > > > Hi,
> > > >
> > > > What do you think about temporarily disabling it like this?
> > > >
> > > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > > > @@ -5775,7 +5775,8 @@ static int mvpp2_port_probe(struct platform_device
> > > > *pdev,
> > > > NETIF_F_HW_VLAN_CTAG_FILTER;
> > > >
> > > > if (mvpp22_rss_is_supported()) {
> > > > -   dev->hw_features |= NETIF_F_RXHASH;
> > > > +   if (port->phy_interface != PHY_INTERFACE_MODE_SGMII)
> > > > +   dev->hw_features |= NETIF_F_RXHASH;
> > > > dev->features |= NETIF_F_NTUPLE;
> > > > }
> > > >
> > > >
> > > > David, is this "workaround" too bad to get accepted?
> > >
> > > Not sure that RSS related to physical interface(SGMII), better just 
> > > remove NETIF_F_RXHASH as "workaround".
> >
> > Hmm, I'm not sure this is the right way forward.  This patch has the
> > effect of disabling:
> >
> > d33ec4525007 ("net: mvpp2: add an RSS classification step for each flow")
> >
> > but the commit you're pointing at which caused the regression is:
> >
> > 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables")
> >
> >
> 
> Hi,
> 
> When git bisect pointed to 895586d5dc32 ("net: mvpp2: cls: Use RSS
> contexts to handle RSS tables"), which was merged
> almost an year after d33ec4525007 ("net: mvpp2: add an RSS
> classification step for each flow"), so I assume that between these
> two commits either the feature was working or it was disable and we
> didn't notice
> 
> Without knowing what was happening, which commit should my Fixes tag point to?

Let me make sure that I get this clear:

- Prior to 895586d5dc32, you can turn on and off rxhash without issue
  on any port.
- After 895586d5dc32, turning rxhash on eth2 prevents reception.

Prior to 895586d5dc32, with rxhash on, it looks like hashing using
CRC32 is supported but only one context.  So, if it's possible to
enable rxhash on any port on the mcbin without 895586d5dc32, and the
port continues to work, I'd say the bug was introduced by
895586d5dc32.

Of course, that would be reinforced if there was a measurable
difference in performance due to rxhash on each port.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up


[PATCH 0/4] arm64: kgdb/kdb: Fix single-step debugging issues

2020-05-09 Thread Wei Li
This patch set is to fix several issues of single-step debugging
in kgdb/kdb on arm64.

It seems that these issues have been shelved a very long time,
but i still hope to solve them, as the single-step debugging
is an useful feature.

Note:
Based on patch "arm64: cacheflush: Fix KGDB trap detection",
https://www.spinics.net/lists/arm-kernel/msg803741.html

Wei Li (4):
  arm64: kgdb: Fix single-step exception handling oops
  arm64: Extract kprobes_save_local_irqflag() and
kprobes_restore_local_irqflag()
  arm64: kgdb: Fix single-stepping into the irq handler wrongly
  arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step

 arch/arm64/include/asm/debug-monitors.h |  6 ++
 arch/arm64/kernel/debug-monitors.c  | 28 -
 arch/arm64/kernel/kgdb.c| 16 +++---
 arch/arm64/kernel/probes/kprobes.c  | 28 ++---
 4 files changed, 48 insertions(+), 30 deletions(-)

-- 
2.17.1



[PATCH 2/4] arm64: Extract kprobes_save_local_irqflag() and kprobes_restore_local_irqflag()

2020-05-09 Thread Wei Li
PSTATE.I and PSTATE.D are very important for single-step working.

Without disabling interrupt on local CPU, there is a chance of
interrupt occurrence in the period of exception return and start of
out-of-line single-step, that result in wrongly single stepping
into the interrupt handler. And if D bit is set then, it results into
undefined exception and when it's handler enables dbg then single-step
exception is generated, not as expected.

As they are maintained well in kprobes_save_local_irqflag() and
kprobes_restore_local_irqflag() for kprobe module, extract them as
kernel_prepare_single_step() and kernel_cleanup_single_step() for
general use.

Signed-off-by: Wei Li 
---
 arch/arm64/include/asm/debug-monitors.h |  4 
 arch/arm64/kernel/debug-monitors.c  | 26 +++
 arch/arm64/kernel/probes/kprobes.c  | 28 ++---
 3 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/debug-monitors.h 
b/arch/arm64/include/asm/debug-monitors.h
index 7619f473155f..b62469f3475b 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -113,6 +113,10 @@ void user_fastforward_single_step(struct task_struct 
*task);
 void kernel_enable_single_step(struct pt_regs *regs);
 void kernel_disable_single_step(void);
 int kernel_active_single_step(void);
+void kernel_prepare_single_step(unsigned long *flags,
+   struct pt_regs *regs);
+void kernel_cleanup_single_step(unsigned long flags,
+   struct pt_regs *regs);
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 int reinstall_suspended_bps(struct pt_regs *regs);
diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index 48222a4760c2..25ce6b5a52d2 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -429,6 +429,32 @@ int kernel_active_single_step(void)
 }
 NOKPROBE_SYMBOL(kernel_active_single_step);
 
+/*
+ * Interrupts need to be disabled before single-step mode is set, and not
+ * reenabled until after single-step mode ends.
+ * Without disabling interrupt on local CPU, there is a chance of
+ * interrupt occurrence in the period of exception return and  start of
+ * out-of-line single-step, that result in wrongly single stepping
+ * into the interrupt handler.
+ */
+void kernel_prepare_single_step(unsigned long *flags,
+   struct pt_regs *regs)
+{
+   *flags = regs->pstate & DAIF_MASK;
+   regs->pstate |= PSR_I_BIT;
+   /* Unmask PSTATE.D for enabling software step exceptions. */
+   regs->pstate &= ~PSR_D_BIT;
+}
+NOKPROBE_SYMBOL(kernel_prepare_single_step);
+
+void kernel_cleanup_single_step(unsigned long flags,
+   struct pt_regs *regs)
+{
+   regs->pstate &= ~DAIF_MASK;
+   regs->pstate |= flags;
+}
+NOKPROBE_SYMBOL(kernel_cleanup_single_step);
+
 /* ptrace API */
 void user_enable_single_step(struct task_struct *task)
 {
diff --git a/arch/arm64/kernel/probes/kprobes.c 
b/arch/arm64/kernel/probes/kprobes.c
index d1c95dcf1d78..c655b6b543e3 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -168,30 +168,6 @@ static void __kprobes set_current_kprobe(struct kprobe *p)
__this_cpu_write(current_kprobe, p);
 }
 
-/*
- * Interrupts need to be disabled before single-step mode is set, and not
- * reenabled until after single-step mode ends.
- * Without disabling interrupt on local CPU, there is a chance of
- * interrupt occurrence in the period of exception return and  start of
- * out-of-line single-step, that result in wrongly single stepping
- * into the interrupt handler.
- */
-static void __kprobes kprobes_save_local_irqflag(struct kprobe_ctlblk *kcb,
-   struct pt_regs *regs)
-{
-   kcb->saved_irqflag = regs->pstate & DAIF_MASK;
-   regs->pstate |= PSR_I_BIT;
-   /* Unmask PSTATE.D for enabling software step exceptions. */
-   regs->pstate &= ~PSR_D_BIT;
-}
-
-static void __kprobes kprobes_restore_local_irqflag(struct kprobe_ctlblk *kcb,
-   struct pt_regs *regs)
-{
-   regs->pstate &= ~DAIF_MASK;
-   regs->pstate |= kcb->saved_irqflag;
-}
-
 static void __kprobes
 set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr)
 {
@@ -227,7 +203,7 @@ static void __kprobes setup_singlestep(struct kprobe *p,
set_ss_context(kcb, slot);  /* mark pending ss */
 
/* IRQs and single stepping do not mix well. */
-   kprobes_save_local_irqflag(kcb, regs);
+   kernel_prepare_single_step(>saved_irqflag, regs);
kernel_enable_single_step(regs);
instruction_pointer_set(regs, slot);
} else {
@@ -414,7 +390,7 @@ kprobe_single_step_handler(struct pt_regs *regs, 

[PATCH 3/4] arm64: kgdb: Fix single-stepping into the irq handler wrongly

2020-05-09 Thread Wei Li
After the single-step exception handling oops is fixed, when we execute
single-step in kdb/kgdb, we may see it jumps to the irq_handler (where
PSTATE.D is cleared) instead of the next instruction.

Add the prepare and cleanup work for single-step when enabling and
disabling to maintain the PSTATE.I and PSTATE.D carefully.

Before this patch:
* kdb:
Entering kdb (current=0x8000119e2dc0, pid 0) on processor 0 due to Keyboard 
Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0x8000101486cc (printk)
is enabled   addr at 8000101486cc, hardtype=0 installed=0

[0]kdb> g

/ # echo h > /proc/sysrq-trigger

Entering kdb (current=0xfada65c0, pid 267) on processor 0 due to 
Breakpoint @ 0x8000101486cc
[0]kdb> ss

Entering kdb (current=0xfada65c0, pid 267) on processor 0 due to SS 
trap @ 0x800010082ab8
[0]kdb> 0x800010082ab8
0x800010082ab8 = 0x800010082ab8 (el1_irq+0x78)
[0]kdb>

   0x800010082ab0 <+112>:   nop
   0x800010082ab4 <+116>:   msr daifclr, #0xd
   0x800010082ab8 <+120>:   adrpx1, 0x8000113a7000 
   0x800010082abc <+124>:   ldr x1, [x1, #1408]

* kgdb:
(gdb) target remote 127.1:23002
Remote debugging using 127.1:23002
arch_kgdb_breakpoint () at 
/home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
21  asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
(gdb) b printk
Breakpoint 1 at 0x8000101486cc: file 
/home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
(gdb) c
Continuing.
[New Thread 287]
[Switching to Thread 283]

Thread 177 hit Breakpoint 1, printk (fmt=0x80001130c9d8 "\001\066sysrq: 
HELP : ")
at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
2076{
(gdb) stepi
el1_irq () at /home/liwei/main_code/linux/arch/arm64/kernel/entry.S:608
608 irq_handler
(gdb)

After this patch:
* kdb:
Entering kdb (current=0x8000119d2dc0, pid 0) on processor 0 due to Keyboard 
Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0x80001014874c (printk)
is enabled   addr at 80001014874c, hardtype=0 installed=0

[0]kdb> g

/ # echo h > /proc/sysrq-trigger

Entering kdb (current=0xfa6948c0, pid 265) on processor 0 due to 
Breakpoint @ 0x80001014874c
[0]kdb> ss

Entering kdb (current=0xfa6948c0, pid 265) on processor 0 due to SS 
trap @ 0x800010148750
[0]kdb>

* kgdb:
(gdb) target remote 127.1:23002
Remote debugging using 127.1:23002
arch_kgdb_breakpoint () at 
/home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
21  asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
(gdb) b printk
Breakpoint 1 at 0x80001014874c: file 
/home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
(gdb) c
Continuing.
[New Thread 277]
[Switching to Thread 276]

Thread 171 hit Breakpoint 1, printk (fmt=0x8000112fc130 "\001\066sysrq: 
HELP : ")
at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
2076{
(gdb) stepi
0x800010148750  2076{
(gdb)

Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
Signed-off-by: Wei Li 
---
 arch/arm64/kernel/kgdb.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 1a157ca33262..3910ac06c261 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -100,6 +100,8 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
{ "fpcr", 4, -1 },
 };
 
+static DEFINE_PER_CPU(unsigned long, kgdb_ss_flags);
+
 char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs)
 {
if (regno >= DBG_MAX_REG_NUM || regno < 0)
@@ -200,8 +202,11 @@ int kgdb_arch_handle_exception(int exception_vector, int 
signo,
/*
 * Received continue command, disable single step
 */
-   if (kernel_active_single_step())
+   if (kernel_active_single_step()) {
+   kernel_cleanup_single_step(per_cpu(kgdb_ss_flags,
+   raw_smp_processor_id()), linux_regs);
kernel_disable_single_step();
+   }
 
err = 0;
break;
@@ -221,8 +226,12 @@ int kgdb_arch_handle_exception(int exception_vector, int 
signo,
/*
 * Enable single step handling
 */
-   if (!kernel_active_single_step())
+   if (!kernel_active_single_step()) {
+   kernel_prepare_single_step(_cpu(kgdb_ss_flags,
+   raw_smp_processor_id()), linux_regs);
kernel_enable_single_step(linux_regs);
+   }
+
err = 0;
break;
default:
-- 
2.17.1



[PATCH 4/4] arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step

2020-05-09 Thread Wei Li
After fixing wrongly single-stepping into the irq handler, when we execute
single-step in kdb/kgdb, we can see only the first step can work.

Refer to the ARM Architecture Reference Manual (ARM DDI 0487E.a) D2.12,
i think PSTATE.SS=1 should be set each step for transferring the PE to the
'Active-not-pending' state. The problem here is PSTATE.SS=1 is not set
since the second single-step.

After the first single-step, the PE transferes to the 'Inactive' state,
with PSTATE.SS=0 and MDSCR.SS=1, thus PSTATE.SS won't be set to 1 due to
kernel_active_single_step()=true. Then the PE transferes to the
'Active-pending' state when ERET and returns to the debugger by step
exception.

Before this patch:
* kdb:
Entering kdb (current=0x8000119d2dc0, pid 0) on processor 0 due to Keyboard 
Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0x80001014874c (printk)
is enabled   addr at 80001014874c, hardtype=0 installed=0

[0]kdb> g

/ # echo h > /proc/sysrq-trigger

Entering kdb (current=0xfa6948c0, pid 265) on processor 3 due to 
Breakpoint @ 0x80001014874c
[3]kdb> ss

Entering kdb (current=0xfa6948c0, pid 265) on processor 3 due to SS 
trap @ 0x800010148750
[3]kdb> ss

Entering kdb (current=0xfa6948c0, pid 265) on processor 3 due to SS 
trap @ 0x800010148750
[3]kdb> ss

Entering kdb (current=0xfa6948c0, pid 265) on processor 3 due to SS 
trap @ 0x800010148750
[3]kdb>

* kgdb:
(gdb) target remote 127.1:23002
Remote debugging using 127.1:23002
arch_kgdb_breakpoint () at 
/home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
21  asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
(gdb) b printk
Breakpoint 1 at 0x80001014874c: file 
/home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
(gdb) c
Continuing.
[New Thread 277]
[Switching to Thread 276]

Thread 171 hit Breakpoint 1, printk (fmt=0x8000112fc130 "\001\066sysrq: 
HELP : ")
at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
2076{
(gdb) stepi
0x800010148750  2076{
(gdb) stepi
0x800010148750  2076{
(gdb) stepi
0x800010148750  2076{
(gdb)

After this patch:
* kdb:
Entering kdb (current=0x8000119d2dc0, pid 0) on processor 0 due to Keyboard 
Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0x80001014874c (printk)
is enabled   addr at 80001014874c, hardtype=0 installed=0

[0]kdb> g

/ # echo h > /proc/sysrq-trigger

Entering kdb (current=0xfa800040, pid 264) on processor 2 due to 
Breakpoint @ 0x80001014874c
[2]kdb> ss

Entering kdb (current=0xfa800040, pid 264) on processor 2 due to SS 
trap @ 0x800010148750
[2]kdb> ss

Entering kdb (current=0xfa800040, pid 264) on processor 2 due to SS 
trap @ 0x800010148754
[2]kdb> ss

Entering kdb (current=0xfa800040, pid 264) on processor 2 due to SS 
trap @ 0x800010148758
[2]kdb>

* kgdb:
(gdb) target remote 127.1:23002
Remote debugging using 127.1:23002
arch_kgdb_breakpoint () at 
/home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
21  asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
(gdb) b printk
Breakpoint 1 at 0x80001014874c: file 
/home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
(gdb) c
Continuing.
[New Thread 281]
[New Thread 280]
[Switching to Thread 281]

Thread 174 hit Breakpoint 1, printk (fmt=0x8000112fc138 "\001\066sysrq: 
HELP : ")
at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
2076{
(gdb) stepi
0x800010148750  2076{
(gdb) stepi
2080va_start(args, fmt);
(gdb) stepi
0x800010148758  2080va_start(args, fmt);
(gdb)

Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
Signed-off-by: Wei Li 
---
 arch/arm64/include/asm/debug-monitors.h | 2 ++
 arch/arm64/kernel/debug-monitors.c  | 2 +-
 arch/arm64/kernel/kgdb.c| 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/debug-monitors.h 
b/arch/arm64/include/asm/debug-monitors.h
index b62469f3475b..a48b507c89ee 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -78,6 +78,8 @@ struct step_hook {
int (*fn)(struct pt_regs *regs, unsigned int esr);
 };
 
+void set_regs_spsr_ss(struct pt_regs *regs);
+
 void register_user_step_hook(struct step_hook *hook);
 void unregister_user_step_hook(struct step_hook *hook);
 
diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index 25ce6b5a52d2..7a58233677de 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -141,7 +141,7 @@ postcore_initcall(debug_monitors_init);
 /*
  * Single step API and exception handling.
  */
-static void set_regs_spsr_ss(struct pt_regs *regs)
+void set_regs_spsr_ss(struct pt_regs *regs)
 {
regs->pstate |= DBG_SPSR_SS;
 }
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 

[PATCH 1/4] arm64: kgdb: Fix single-step exception handling oops

2020-05-09 Thread Wei Li
After entering kdb due to breakpoint, when we execute 'ss' or 'go' (will
delay installing breakpoints, do single-step first), it won't work
correctly, and it will enter kdb due to oops.

It's because the reason gotten in kdb_stub() is not as expected, and it
seems that the ex_vector for single-step should be 0, like what arch
powerpc/sh/parisc has implemented.

Before the patch:
Entering kdb (current=0x8000119e2dc0, pid 0) on processor 0 due to Keyboard 
Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0x8000101486cc (printk)
is enabled   addr at 8000101486cc, hardtype=0 installed=0

[0]kdb> g

/ # echo h > /proc/sysrq-trigger

Entering kdb (current=0xfa878040, pid 266) on processor 3 due to 
Breakpoint @ 0x8000101486cc
[3]kdb> ss

Entering kdb (current=0xfa878040, pid 266) on processor 3 Oops: (null)
due to oops @ 0x800010082ab8
CPU: 3 PID: 266 Comm: sh Not tainted 5.7.0-rc4-13839-gf0e5ad491718 #6
Hardware name: linux,dummy-virt (DT)
pstate: 0085 (nzcv daIf -PAN -UAO)
pc : el1_irq+0x78/0x180
lr : __handle_sysrq+0x80/0x190
sp : 800015003bf0
x29: 800015003d20 x28: fa878040
x27:  x26: 80001126b1f0
x25: 800011b6a0d8 x24: 
x23: 8025 x22: 8000101486cc
x21: 800015003d30 x20: 
x19: 8000119f2000 x18: 
x17:  x16: 
x15:  x14: 
x13:  x12: 
x11:  x10: 
x9 :  x8 : 800015003e50
x7 : 0002 x6 : 380b9990
x5 : 8000106e99e8 x4 : fadd83c0
x3 :  x2 : 800011b6a0d8
x1 : 800011b6a000 x0 : 80001130c9d8
Call trace:
 el1_irq+0x78/0x180
 printk+0x0/0x84
 write_sysrq_trigger+0xb0/0x118
 proc_reg_write+0xb4/0xe0
 __vfs_write+0x18/0x40
 vfs_write+0xb0/0x1b8
 ksys_write+0x64/0xf0
 __arm64_sys_write+0x14/0x20
 el0_svc_common.constprop.2+0xb0/0x168
 do_el0_svc+0x20/0x98
 el0_sync_handler+0xec/0x1a8
 el0_sync+0x140/0x180

[3]kdb>

After the patch:
Entering kdb (current=0x8000119e2dc0, pid 0) on processor 0 due to Keyboard 
Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0x8000101486cc (printk)
is enabled   addr at 8000101486cc, hardtype=0 installed=0

[0]kdb> g

/ # echo h > /proc/sysrq-trigger

Entering kdb (current=0xfa852bc0, pid 268) on processor 0 due to 
Breakpoint @ 0x8000101486cc
[0]kdb> g

Entering kdb (current=0xfa852bc0, pid 268) on processor 0 due to 
Breakpoint @ 0x8000101486cc
[0]kdb> ss

Entering kdb (current=0xfa852bc0, pid 268) on processor 0 due to SS 
trap @ 0x800010082ab8
[0]kdb>

Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
Signed-off-by: Wei Li 
---
 arch/arm64/kernel/kgdb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 43119922341f..1a157ca33262 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -252,7 +252,7 @@ static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned 
int esr)
if (!kgdb_single_step)
return DBG_HOOK_ERROR;
 
-   kgdb_handle_exception(1, SIGTRAP, 0, regs);
+   kgdb_handle_exception(0, SIGTRAP, 0, regs);
return DBG_HOOK_HANDLED;
 }
 NOKPROBE_SYMBOL(kgdb_step_brk_fn);
-- 
2.17.1



Re: [PATCH 2/2] drm/bridge: Add ITE IT6251 bridge driver

2020-05-09 Thread Laurent Pinchart
Hi Richard,

Thank you for the patch.

On Sat, May 09, 2020 at 01:17:32PM +0200, s...@48.io wrote:
> From: Marek Vasut 
> 
> Add driver for the ITE IT6251 LVDS-to-eDP bridge.
> 
> Signed-off-by: Marek Vasut 
> Signed-off-by: Richard Marko 
> Cc: Daniel Vetter 
> Cc: Sean Cross 
> To: dri-de...@lists.freedesktop.org
> ---
>  drivers/gpu/drm/bridge/Kconfig  |  12 +
>  drivers/gpu/drm/bridge/Makefile |   1 +
>  drivers/gpu/drm/bridge/ite-it6251.c | 582 
>  3 files changed, 595 insertions(+)
>  create mode 100644 drivers/gpu/drm/bridge/ite-it6251.c
> 
> diff --git a/drivers/gpu/drm/bridge/Kconfig b/drivers/gpu/drm/bridge/Kconfig
> index aaed2347ace9..934896a4ab2d 100644
> --- a/drivers/gpu/drm/bridge/Kconfig
> +++ b/drivers/gpu/drm/bridge/Kconfig
> @@ -38,6 +38,18 @@ config DRM_DISPLAY_CONNECTOR
> on ARM-based platforms. Saying Y here when this driver is not needed
> will not cause any issue.
>  
> +config DRM_ITE_IT6251
> + tristate "ITE IT6251 LVDS/eDP bridge"
> + depends on OF
> + select DRM_KMS_HELPER
> + select DRM_PANEL
> + select REGMAP_I2C
> + help
> +   Driver for ITE IT6251 LVDS-eDP bridge chip driver. This is used
> +   in Novena open-hardware laptop with eDP based panel.
> +   IT6251 supports LVDS input and DisplayPort 1.1a output,
> +   resolution up to 1080P and 10-bit color depth.
> +
>  config DRM_LVDS_CODEC
>   tristate "Transparent LVDS encoders and decoders support"
>   depends on OF
> diff --git a/drivers/gpu/drm/bridge/Makefile b/drivers/gpu/drm/bridge/Makefile
> index 6fb062b5b0f0..4c195dc42fce 100644
> --- a/drivers/gpu/drm/bridge/Makefile
> +++ b/drivers/gpu/drm/bridge/Makefile
> @@ -1,6 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_DRM_CDNS_DSI) += cdns-dsi.o
>  obj-$(CONFIG_DRM_DISPLAY_CONNECTOR) += display-connector.o
> +obj-$(CONFIG_DRM_ITE_IT6251) += ite-it6251.o
>  obj-$(CONFIG_DRM_LVDS_CODEC) += lvds-codec.o
>  obj-$(CONFIG_DRM_MEGACHIPS_STDP_GE_B850V3_FW) += 
> megachips-stdp-ge-b850v3-fw.o
>  obj-$(CONFIG_DRM_NXP_PTN3460) += nxp-ptn3460.o
> diff --git a/drivers/gpu/drm/bridge/ite-it6251.c 
> b/drivers/gpu/drm/bridge/ite-it6251.c
> new file mode 100644
> index ..b8534fb62c9d
> --- /dev/null
> +++ b/drivers/gpu/drm/bridge/ite-it6251.c
> @@ -0,0 +1,582 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2014 Sean Cross 
> + *
> + * Rework for mainline: Marek Vasut 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct it6251_bridge {
> + struct i2c_client   *client;
> + struct i2c_client   *lvds_client;
> + struct regmap   *regmap;
> + struct regmap   *lvds_regmap;
> + struct regulator*regulator;
> +
> + struct drm_connectorconnector;
> + struct drm_bridge   bridge;
> + struct drm_panel*panel;
> +};
> +
> +/* Register definitions */
> +#define IT6251_VENDOR_ID_LOW 0x00
> +#define IT6251_VENDOR_ID_HIGH0x01
> +#define IT6251_DEVICE_ID_LOW 0x02
> +#define IT6251_DEVICE_ID_HIGH0x03
> +#define IT6251_SYSTEM_STATUS 0x0d
> +#define IT6251_SYSTEM_STATUS_RINTSTATUS  BIT(0)
> +#define IT6251_SYSTEM_STATUS_RHPDSTATUS  BIT(1)
> +#define IT6251_SYSTEM_STATUS_RVIDEOSTABLEBIT(2)
> +#define IT6251_SYSTEM_STATUS_RPLL_IOLOCK BIT(3)
> +#define IT6251_SYSTEM_STATUS_RPLL_XPLOCK BIT(4)
> +#define IT6251_SYSTEM_STATUS_RPLL_SPLOCK BIT(5)
> +#define IT6251_SYSTEM_STATUS_RAUXFREQ_LOCK   BIT(6)
> +#define IT6251_REF_STATE 0x0e
> +#define IT6251_REF_STATE_MAIN_LINK_DISABLED  BIT(0)
> +#define IT6251_REF_STATE_AUX_CHANNEL_READBIT(1)
> +#define IT6251_REF_STATE_CR_PATTERN  BIT(2)
> +#define IT6251_REF_STATE_EQ_PATTERN  BIT(3)
> +#define IT6251_REF_STATE_NORMAL_OPERATIONBIT(4)
> +#define IT6251_REF_STATE_MUTED   BIT(5)
> +#define IT6251_RPCLK_CNT_LOW 0x13
> +#define IT6251_RPCLK_CNT_HIGH0x14
> +#define IT6251_RPC_REQ   0x2b
> +#define IT6251_RPC_REQ_RPC_FIFOFULL  BIT(6)
> +#define IT6251_RPC_REQ_RPC_FIFOEMPTY BIT(7)
> +#define IT6251_PCLK_CNT_LOW  0x57
> +#define IT6251_PCLK_CNT_HIGH 0x58
> +#define IT6251_DPHDEW_LOW0xa5
> +#define IT6251_DPHDEW_HIGH   0xa6
> +#define IT6251_DPVDEW_LOW0xaf
> +#define IT6251_DPVDEW_HIGH   0xb0
> +#define IT6251_LVDS_PORT_ADDR0xfd
> +#define IT6251_LVDS_PORT_CTRL0xfe
> +#define IT6251_LVDS_PORT_CTRL_EN BIT(0)
> +
> +/*
> + * 

<    1   2   3   4   5   6   >