RE: [PATCH 3/5] cifsd: add file operations

2021-03-22 Thread Namjae Jeon
> On Mon, Mar 22, 2021 at 02:13:42PM +0900, Namjae Jeon wrote:
> > This adds file operations and buffer pool for cifsd.
> 
> Some random notes:
> 
> > +static void rollback_path_modification(char *filename) {
> > +   if (filename) {
> > +   filename--;
> > +   *filename = '/';
> What an odd way to spell filename[-1] = '/';...
Okay. Will fix.
> 
> > +int ksmbd_vfs_inode_permission(struct dentry *dentry, int acc_mode,
> > +bool delete) {
> 
> > +   if (delete) {
> > +   struct dentry *parent;
> > +
> > +   parent = dget_parent(dentry);
> > +   if (!parent)
> > +   return -EINVAL;
> > +
> > +   if (inode_permission(&init_user_ns, d_inode(parent), MAY_EXEC | 
> > MAY_WRITE)) {
> > +   dput(parent);
> > +   return -EACCES;
> > +   }
> > +   dput(parent);
> 
> Who's to guarantee that parent is stable?  IOW, by the time of that
> inode_permission() call dentry might very well not be a child of that thing...
Okay, Will fix.
> 
> > +   parent = dget_parent(dentry);
> > +   if (!parent)
> > +   return 0;
> > +
> > +   if (!inode_permission(&init_user_ns, d_inode(parent), MAY_EXEC | 
> > MAY_WRITE))
> > +   *daccess |= FILE_DELETE_LE;
> 
> Ditto.
Okay.
> 
> > +int ksmbd_vfs_mkdir(struct ksmbd_work *work,
> > +   const char *name,
> > +   umode_t mode)
> 
> 
> > +   err = vfs_mkdir(&init_user_ns, d_inode(path.dentry), dentry, mode);
> > +   if (!err) {
> > +   ksmbd_vfs_inherit_owner(work, d_inode(path.dentry),
> > +   d_inode(dentry));
> 
> ->mkdir() might very well return success, with dentry left unhashed negative.
> Look at the callers of vfs_mkdir() to see how it should be handled.
Okay, Will fix.
> 
> > +static int check_lock_range(struct file *filp,
> > +   loff_t start,
> > +   loff_t end,
> > +   unsigned char type)
> > +{
> > +   struct file_lock *flock;
> > +   struct file_lock_context *ctx = file_inode(filp)->i_flctx;
> > +   int error = 0;
> > +
> > +   if (!ctx || list_empty_careful(&ctx->flc_posix))
> > +   return 0;
> > +
> > +   spin_lock(&ctx->flc_lock);
> > +   list_for_each_entry(flock, &ctx->flc_posix, fl_list) {
> > +   /* check conflict locks */
> > +   if (flock->fl_end >= start && end >= flock->fl_start) {
> > +   if (flock->fl_type == F_RDLCK) {
> > +   if (type == WRITE) {
> > +   ksmbd_err("not allow write by shared 
> > lock\n");
> > +   error = 1;
> > +   goto out;
> > +   }
> > +   } else if (flock->fl_type == F_WRLCK) {
> > +   /* check owner in lock */
> > +   if (flock->fl_file != filp) {
> > +   error = 1;
> > +   ksmbd_err("not allow rw access by 
> > exclusive lock from other
> opens\n");
> > +   goto out;
> > +   }
> > +   }
> > +   }
> > +   }
> > +out:
> > +   spin_unlock(&ctx->flc_lock);
> > +   return error;
> > +}
> 
> WTF is that doing in smbd?
This code was added to pass the smb2 lock test of samba's smbtorture.
Will fix it.
> 
> > +   filp = fp->filp;
> > +   inode = d_inode(filp->f_path.dentry);
> 
> That should be file_inode().  Try it on overlayfs, watch it do interesting 
> things...
Okay.
> 
> > +   nbytes = kernel_read(filp, rbuf, count, pos);
> > +   if (nbytes < 0) {
> > +   name = d_path(&filp->f_path, namebuf, sizeof(namebuf));
> > +   if (IS_ERR(name))
> > +   name = "(error)";
> > +   ksmbd_err("smb read failed for (%s), err = %zd\n",
> > +   name, nbytes);
> 
> Do you really want the full pathname here?  For (presumably) spew into syslog?
No, Will fix.
> 
> > +int ksmbd_vfs_remove_file(struct ksmbd_work *work, char *name) {
> > +   struct path parent;
> > +   struct dentry *dir, *dentry;
> > +   char *last;
> > +   int err = -ENOENT;
> > +
> > +   last = extract_last_component(name);
> > +   if (!last)
> > +   return -ENOENT;
> 
> Yeccchhh...
I guess I change it err instead of -ENOENT.

> 
> > +   if (ksmbd_override_fsids(work))
> > +   return -ENOMEM;
> > +
> > +   err = kern_path(name, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &parent);
> > +   if (err) {
> > +   ksmbd_debug(VFS, "can't get %s, err %d\n", name, err);
> > +   ksmbd_revert_fsids(work);
> > +   rollback_path_modification(last);
> > +   return err;
> > +   }
> > +
> > +   dir = parent.dentry;
> > +   if (!d_inode(dir))
> > +   goto out;
> 
> Really?  When does that happen?
Will fix.
> 
> > +static int __ksmbd_vfs_rename(struct ksmbd_work *work,
> > +

[PATCH] clkdev: remove unnecessary __ref annotations

2021-03-22 Thread Rasmus Villemoes
clkdev_alloc and vclkdev_alloc do not call any functions living in
.init.text (nor are there comments as to when and why that would be
safe). The original annotation dates back to when the code was
imported from ARM in 6d803ba736ab, and AFAICT the last reason to have
the annotation vanished with ea17c9d868c1 (sh: Use generic clkdev.h
header).

Signed-off-by: Rasmus Villemoes 
---
 drivers/clk/clkdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c
index 0f2e3fcf0f19..d446c2c697d5 100644
--- a/drivers/clk/clkdev.c
+++ b/drivers/clk/clkdev.c
@@ -153,7 +153,7 @@ struct clk_lookup_alloc {
charcon_id[MAX_CON_ID];
 };
 
-static struct clk_lookup * __ref
+static struct clk_lookup *
 vclkdev_alloc(struct clk_hw *hw, const char *con_id, const char *dev_fmt,
va_list ap)
 {
@@ -190,7 +190,7 @@ vclkdev_create(struct clk_hw *hw, const char *con_id, const 
char *dev_fmt,
return cl;
 }
 
-struct clk_lookup * __ref
+struct clk_lookup *
 clkdev_alloc(struct clk *clk, const char *con_id, const char *dev_fmt, ...)
 {
struct clk_lookup *cl;
-- 
2.29.2



Re: [PATCH 3/5] cifsd: add file operations

2021-03-22 Thread Sergey Senozhatsky
On (21/03/22 17:09), Matthew Wilcox wrote:
> On Mon, Mar 22, 2021 at 06:03:21PM +0900, Sergey Senozhatsky wrote:
> > On (21/03/22 08:15), Matthew Wilcox wrote:
> > > 
> > > What's the scenario for which your allocator performs better than slub
> > > 
> > 
> > IIRC request and reply buffers can be up to 4M in size. So this stuff
> > just allocates a number of fat buffers and keeps them around so that
> > it doesn't have to vmalloc(4M) for every request and every response.
> 
> Hang on a minute, this speaks to a deeper design problem.  If we're doing
> a 'request' or 'reply' that is that large, the I/O should be coming from
> or going to the page cache.  If it goes via a 4MB virtually contiguous
> buffer first, that's a memcpy that could/should be avoided.

But huge vmalloc buffers are still needed. For instance, `ls -la` in
a directory with a huge number of entries.

> But now i'm looking for how ksmbd_find_buffer() is used, and it isn't.
> So it looks like someone came to the same conclusion I did, but forgot
> to delete the wm code.

Yes, I think it's disabled by default and requires some smb.conf
configuration. So I guess that wm code can be removed. Especially given
that

> That said, there are plenty of opportunities to optimise the vmalloc code,
> and that's worth pursuing.

That would be really interesting to see!

> And here's the receive path which contains
> the memcpy that should be avoided (ok, it's actually the call to ->read;
> we shouldn't be reading in the entire 4MB but rather the header):

> + conn->request_buf = ksmbd_alloc_request(size);
> + if (!conn->request_buf)
> + continue;
> +
> + memcpy(conn->request_buf, hdr_buf, sizeof(hdr_buf));
> + if (!ksmbd_smb_request(conn))
> + break;
> +
> + /*
> +  * We already read 4 bytes to find out PDU size, now
> +  * read in PDU
> +  */
> + size = t->ops->read(t, conn->request_buf + 4, pdu_size);


// A side note, it seems that the maximum read/write/trans buffer size that
// windows supports is 8MB, not 4MB.


Re: [PATCH] xsysace: Remove SYSACE driver

2021-03-22 Thread Davidlohr Bueso

Hi,

On Mon, 09 Nov 2020, Michal Simek wrote:


Sysace IP is no longer used on Xilinx PowerPC 405/440 and Microblaze
systems. The driver is not regularly tested and very likely not working for
quite a long time that's why remove it.


Is there a reason this patch was never merged? can the driver be removed? I ran
into this as a potential tasklet user that can be replaced/removed.

Thanks,
Davidlohr



Signed-off-by: Michal Simek 
---

Based on discussion
https://lore.kernel.org/linux-arm-kernel/5ab9a2a1-20e3-c7b2-f666-2034df436...@kernel.dk/

I have grepped the kernel and found any old ppc platform. I have included
it in this patch to have a discussion about it.

---
MAINTAINERS |1 -
arch/microblaze/boot/dts/system.dts |8 -
arch/powerpc/boot/dts/icon.dts  |7 -
arch/powerpc/configs/44x/icon_defconfig |1 -
drivers/block/Kconfig   |6 -
drivers/block/Makefile  |1 -
drivers/block/xsysace.c | 1273 ---
7 files changed, 1297 deletions(-)
delete mode 100644 drivers/block/xsysace.c

diff --git a/MAINTAINERS b/MAINTAINERS
index cba8ddf87a08..38556c009758 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2741,7 +2741,6 @@ T:git https://github.com/Xilinx/linux-xlnx.git
F:  Documentation/devicetree/bindings/i2c/cdns,i2c-r1p10.yaml
F:  Documentation/devicetree/bindings/i2c/xlnx,xps-iic-2.00.a.yaml
F:  arch/arm/mach-zynq/
-F: drivers/block/xsysace.c
F:  drivers/clocksource/timer-cadence-ttc.c
F:  drivers/cpuidle/cpuidle-zynq.c
F:  drivers/edac/synopsys_edac.c
diff --git a/arch/microblaze/boot/dts/system.dts 
b/arch/microblaze/boot/dts/system.dts
index 5b236527176e..b7ee1056779e 100644
--- a/arch/microblaze/boot/dts/system.dts
+++ b/arch/microblaze/boot/dts/system.dts
@@ -310,14 +310,6 @@ RS232_Uart_1: serial@8400 {
xlnx,odd-parity = <0x0>;
xlnx,use-parity = <0x0>;
} ;
-   SysACE_CompactFlash: sysace@8360 {
-   compatible = "xlnx,xps-sysace-1.00.a";
-   interrupt-parent = <&xps_intc_0>;
-   interrupts = < 4 2 >;
-   reg = < 0x8360 0x1 >;
-   xlnx,family = "virtex5";
-   xlnx,mem-width = <0x10>;
-   } ;
debug_module: debug@8440 {
compatible = "xlnx,mdm-1.00.d";
reg = < 0x8440 0x1 >;
diff --git a/arch/powerpc/boot/dts/icon.dts b/arch/powerpc/boot/dts/icon.dts
index fbaa60b8f87a..4fd7a4fbb4fb 100644
--- a/arch/powerpc/boot/dts/icon.dts
+++ b/arch/powerpc/boot/dts/icon.dts
@@ -197,13 +197,6 @@ partition@fa {
reg = <0x00fa 0x0006>;
};
};
-
-   SysACE_CompactFlash: sysace@1,0 {
-   compatible = "xlnx,sysace";
-   interrupt-parent = <&UIC2>;
-   interrupts = <24 0x4>;
-   reg = <0x0001 0x 0x1>;
-   };
};

UART0: serial@f200 {
diff --git a/arch/powerpc/configs/44x/icon_defconfig 
b/arch/powerpc/configs/44x/icon_defconfig
index 930948a1da76..fb9a15573546 100644
--- a/arch/powerpc/configs/44x/icon_defconfig
+++ b/arch/powerpc/configs/44x/icon_defconfig
@@ -28,7 +28,6 @@ CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_PHYSMAP_OF=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=35000
-CONFIG_XILINX_SYSACE=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_SCSI_CONSTANTS=y
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index ecceaaa1a66f..9cb02861298d 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -388,12 +388,6 @@ config SUNVDC

source "drivers/s390/block/Kconfig"

-config XILINX_SYSACE
-   tristate "Xilinx SystemACE support"
-   depends on 4xx || MICROBLAZE
-   help
- Include support for the Xilinx SystemACE CompactFlash interface
-
config XEN_BLKDEV_FRONTEND
tristate "Xen virtual block device support"
depends on XEN
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index e1f63117ee94..5ddd9370972a 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -19,7 +19,6 @@ obj-$(CONFIG_ATARI_FLOPPY)+= ataflop.o
obj-$(CONFIG_AMIGA_Z2RAM)   += z2ram.o
obj-$(CONFIG_BLK_DEV_RAM)   += brd.o
obj-$(CONFIG_BLK_DEV_LOOP)  += loop.o
-obj-$(CONFIG_XILINX_SYSACE)+= xsysace.o
obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o
obj-$(CONFIG_SUNVDC)+= sunvdc.o
obj-$(CONFIG_BLK_DEV_SKD)   += skd.o
diff --git a/drivers/block/xsysace.c b/drivers/block/xsysace.c
deleted file mode 100644
index eb8ef65778c3..000

Re: [PATCH -tip v4 08/12] arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline

2021-03-22 Thread Masami Hiramatsu
On Mon, 22 Mar 2021 15:41:18 +0900
Masami Hiramatsu  wrote:

> Change kretprobe_trampoline to make a space for regs->ARM_pc so that
> kretprobe_trampoline_handler can call instruction_pointer_set()
> safely.

BTW, if kretprobe_trampoline is replaced with the assembly code,
I think it should fill all the regs as much as possible, because
originally it is written by a software break.
Thus the regs->sp should point the stack address at the entry of 
kretprobe_trampoline, and also regs->lr and regs->pc will be
kretprobe_trampoline, so that user handler can access caller stack.


Thanks, 

> 
> Signed-off-by: Masami Hiramatsu 
> ---
>  arch/arm/probes/kprobes/core.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
> index 1782b41df095..5f3c2b42787f 100644
> --- a/arch/arm/probes/kprobes/core.c
> +++ b/arch/arm/probes/kprobes/core.c
> @@ -397,11 +397,13 @@ int __kprobes kprobe_exceptions_notify(struct 
> notifier_block *self,
>  void __naked __kprobes kretprobe_trampoline(void)
>  {
>   __asm__ __volatile__ (
> + "subsp, sp, #16 \n\t"
>   "stmdb  sp!, {r0 - r11} \n\t"
>   "movr0, sp  \n\t"
>   "bl trampoline_handler  \n\t"
>   "movlr, r0  \n\t"
>   "ldmia  sp!, {r0 - r11} \n\t"
> + "addsp, sp, #16 \n\t"
>  #ifdef CONFIG_THUMB2_KERNEL
>   "bx lr  \n\t"
>  #else
> 


-- 
Masami Hiramatsu 


Re: [PATCH net-next v2 2/2] net: ipa: fix IPA validation

2021-03-22 Thread Alex Elder

On 3/22/21 5:56 PM, Andrew Lunn wrote:

The solution is to create a user space tool inside the
drivers/net/ipa directory that will link with the kernel
source files and will perform all the basic one-time checks
I want to make.


Hi Alex

Have you found any other driver doing this?  Where do they keep there
code?

Could this be a selftest, put somewhere in tools/testing/selftests.

Or can this be a test kernel module. Eg. we have crypt/testmsg.c which
runs a number of tests on the crypto subsystem,
./kernel/time/test_udelay.c which runs times on udelay.

Rather than inventing something new, please follow other examples
already in the kernel.


I will.  I did see the tools/testing directory and I'll
look at how people have done things there.

I need to try to get it working first, then I'll figure
out where it belongs.  I think I'll be able to do a user
space test, but it's a little tricky to be sure it's
actually testing what I want.  If that ends up being
too hard, I'll look into kernel test module.

Thanks for the suggestions.

-Alex


Andrew





Re: [RFC 2/7] ath10k: Add support to process rx packet in thread

2021-03-22 Thread Ben Greear

On 7/22/20 6:00 AM, Felix Fietkau wrote:

On 2020-07-22 14:55, Johannes Berg wrote:

On Wed, 2020-07-22 at 14:27 +0200, Felix Fietkau wrote:


I'm considering testing a different approach (with mt76 initially):
- Add a mac80211 rx function that puts processed skbs into a list
instead of handing them to the network stack directly.


Would this be *after* all the mac80211 processing, i.e. in place of the
rx-up-to-stack?

Yes, it would run all the rx handlers normally and then put the
resulting skbs into a list instead of calling netif_receive_skb or
napi_gro_frags.


Whatever came of this?  I realized I'm running Felix's patch since his mt76
driver needs it.  Any chance it will go upstream?

Thanks,
Ben



- Felix




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: [selftests] e48d82b67a: BUG_TestSlub_RZ_alloc(Not_tainted):Redzone_overwritten

2021-03-22 Thread Vlastimil Babka
On 3/17/21 7:53 PM, David Rientjes wrote:
> On Wed, 17 Mar 2021, Vlastimil Babka wrote:
>> > 
>> > [   22.154049] random: get_random_u32 called from 
>> > __kmem_cache_create+0x23/0x3e0 with crng_init=0 
>> > [   22.154070] random: get_random_u32 called from 
>> > cache_random_seq_create+0x7c/0x140 with crng_init=0 
>> > [   22.154167] random: get_random_u32 called from 
>> > allocate_slab+0x155/0x5e0 with crng_init=0 
>> > [   22.154690] test_slub: 1. kmem_cache: Clobber Redzone 0x12->0x(ptrval)
>> > [   22.164499] 
>> > =
>> > [   22.166629] BUG TestSlub_RZ_alloc (Not tainted): Redzone overwritten
>> > [   22.168179] 
>> > -
>> > [   22.168179]
>> > [   22.168372] Disabling lock debugging due to kernel taint
>> > [   22.168372] INFO: 0x(ptrval)-0x(ptrval) @offset=1064. First byte 0x12 
>> > instead of 0xcc
>> > [   22.168372] INFO: Allocated in resiliency_test+0x47/0x1be age=3 cpu=0 
>> > pid=1 
>> > [   22.168372] __slab_alloc+0x57/0x80 
>> > [   22.168372] kmem_cache_alloc (kbuild/src/consumer/mm/slub.c:2871 
>> > kbuild/src/consumer/mm/slub.c:2915 kbuild/src/consumer/mm/slub.c:2920) 
>> > [   22.168372] resiliency_test (kbuild/src/consumer/lib/test_slub.c:34 
>> > kbuild/src/consumer/lib/test_slub.c:107) 
>> > [   22.168372] test_slub_init (kbuild/src/consumer/lib/test_slub.c:124) 
>> > [   22.168372] do_one_initcall (kbuild/src/consumer/init/main.c:1226) 
>> > [   22.168372] kernel_init_freeable (kbuild/src/consumer/init/main.c:1298 
>> > kbuild/src/consumer/init/main.c:1315 kbuild/src/consumer/init/main.c:1335 
>> > kbuild/src/consumer/init/main.c:1537) 
>> > [   22.168372] kernel_init (kbuild/src/consumer/init/main.c:1426) 
>> > [   22.168372] ret_from_fork 
>> > (kbuild/src/consumer/arch/x86/entry/entry_32.S:856) 
>> > [   22.168372] INFO: Slab 0x(ptrval) objects=16 used=1 fp=0x(ptrval) 
>> > flags=0x4201
>> > [   22.168372] INFO: Object 0x(ptrval) @offset=1000 fp=0x(ptrval)
>> > [   22.168372]
>> > [   22.168372] Redzone (ptrval): cc cc cc cc cc cc cc cc   
>> >
>> > [   22.168372] Object (ptrval): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
>> > 6b 6b  
>> > [   22.168372] Object (ptrval): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
>> > 6b a5  kkk.
>> > [   22.168372] Redzone (ptrval): 12 cc cc cc   
>> >
>> > [   22.168372] Padding (ptrval): 5a 5a 5a 5a 5a 5a 5a 5a   
>> >
>> > [   22.168372] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 
>> > 5.12.0-rc2-1-ge48d82b67a2b #1
>> > [   22.168372] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> > 1.12.0-1 04/01/2014
>> > [   22.168372] Call Trace:
>> > [   22.168372] dump_stack (kbuild/src/consumer/lib/dump_stack.c:122) 
>> > [   22.168372] print_trailer (kbuild/src/consumer/mm/slub.c:737) 
>> > [   22.168372] check_bytes_and_report.cold 
>> > (kbuild/src/consumer/mm/slub.c:807) 
>> > [   22.168372] check_object (kbuild/src/consumer/mm/slub.c:914) 
>> > [   22.168372] validate_slab (kbuild/src/consumer/mm/slub.c:4635) 
>> 
>> Hm but in this case the output means the tested functionality (slub 
>> debugging)
>> is working as intended. So what can we do? Indicate/teach somehow to the bot
>> that this is OK? Does kselftest have some support for this? Or silence the
>> validation output for testing purposes? (I would prefer not to)
>> 
> 
> Unless you're familiar with everything that CONFIG_TEST_SLUB does, maybe 
> this could be inferred as an actual issue that the test has uncovered that 
> is unexpected?
> 
> I don't have a good way of silencing the check_bytes_and_report() output 
> other than a big hammer: implement {disable,enable}_slub_warnings() that 
> the resiliency test could call into before triggering these checks.

So Oliver has implemented this, but now I got a different idea that should be
much cleaner IMHO. We could add a per-cache flag SLAB_SILENT_ERRORS (similar to
SLAB_RED_ZONE etc) instead of a global bool. The test would just create the
caches with this flag.
The flag should be added to the SLAB_NEVER_MERGE, SLAB_DEBUG_FLAGS,
SLAB_FLAGS_PERMITTED macros as well.

A similar suggestion is that adding the errors counter parameter to all
validate_slab_cache() and relevant functions is tedious - there are more that
had to be modified like this than initially expected.
Instead the error counter can be added to SLUB's struct kmem_cache definition,
incremented by the various checks and the tests can look at that after 
validation.

Thanks,
Vlastimil


Re: [PATCH v2 0/4] drm/bridge: ti-sn65dsi86: Support EDID reading

2021-03-22 Thread Laurent Pinchart
Hi Stephen,

On Mon, Nov 02, 2020 at 05:15:24PM -0800, Stephen Boyd wrote:
> Quoting Sam Ravnborg (2020-11-01 09:37:41)
> > Hi Stephen.
> > 
> > On Thu, Oct 29, 2020 at 06:17:34PM -0700, Stephen Boyd wrote:
> > > This patch series cleans up the DDC code a little bit so that
> > > it is more efficient time wise and supports grabbing the EDID
> > > of the eDP panel over the aux channel. I timed this on a board
> > > I have on my desk and it takes about 20ms to grab the EDID out
> > > of the panel and make sure it is valid.
> > > 
> > > The first two patches seem less controversial so I stuck them at
> > > the beginning. The third patch does the EDID reading and caches
> > > it so we don't have to keep grabbing it over and over again. And
> > > finally the last patch updates the reply field so that short
> > > reads and nacks over the channel are reflected properly instead of
> > > treating them as some sort of error that can't be discerned.
> > > 
> > > Stephen Boyd (4):
> > >   drm/bridge: ti-sn65dsi86: Combine register accesses in
> > > ti_sn_aux_transfer()
> > >   drm/bridge: ti-sn65dsi86: Make polling a busy loop
> > >   drm/bridge: ti-sn65dsi86: Read EDID blob over DDC
> > >   drm/bridge: ti-sn65dsi86: Update reply on aux failures
> > 
> > Series looks good. You can add my a-b on the full series.
> > Acked-by: Sam Ravnborg 
> > 
> > I can apply after Douglas have had a look at the patches he did not r-b
> > yet.
> > 
> > Any chance we can convince you to prepare this bridge driver for use in
> > a chained bridge setup where the connector is created by the display
> > driver and uses drm_bridge_funcs?
> > 
> > First step wuld be to introduce the use of a panel_bridge.
> > Then add get_edid to drm_bridge_funcs and maybe more helpers.
> > 
> > Then natural final step would be to move connector creation to the
> > display driver - see how other uses drm_bridge_connector_init() to do so
> > - it is relatively simple.
> > 
> > Should be doable - and reach out if you need some help.
> 
> I started to look at this and got stuck at ti_sn_bridge_get_bpp(). Where
> can I get the details of the bpc for the downstream bridge or panel? Is
> there some function that can tell this bridge what the bpc is for the
> attached connector?

I've posted a patch series to convert to DRM_BRIDGE_ATTACH_NO_CONNECTOR
yesterday (and have CC'ed you), but I've overlooked this particular
problem :-S

You can't get the connector in the .enable() operation, but you can get
it in .atomic_enable(), with
drm_atomic_get_new_connector_for_encoder(). This being said, it's
probably not the right option.

What matters here isn't the bpc for the connector, but the format
expected by the next bridge in the chain. drm_bridge_funcs has two
operations, .atomic_get_output_bus_fmts() and
.atomic_get_input_bus_fmts(), to negotiate that format along a chain of
bridges. The panel bridge driver (drivers/gpu/drm/bridge/panel.c)
doesn't implement those operations, and neither does
display-connector.c, so that may be what we should start with.

> I see that td_mode_valid() in
> drivers/gpu/drm/bridge/tc358775.c stores away the bpc from the incoming
> drm_display_info pointer but I'm not sure that is correct because can't
> that be called for various and not necessarily the one we're using?

You're right, .mode_valid() shouldn't do that.

-- 
Regards,

Laurent Pinchart


Re: [PATCH 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug

2021-03-22 Thread David Gibson
On Fri, Mar 12, 2021 at 04:29:41AM -0300, Leonardo Bras wrote:
> During memory hotunplug, after each LMB is removed, the HPT may be
> resized-down if it would map a max of 4 times the current amount of memory.
> (2 shifts, due to introduced histeresis)
> 
> It usually is not an issue, but it can take a lot of time if HPT
> resizing-down fails. This happens  because resize-down failures
> usually repeat at each LMB removal, until there are no more bolted entries
> conflict, which can take a while to happen.
> 
> This can be solved by doing a single HPT resize at the end of memory
> hotunplug, after all requested entries are removed.
> 
> To make this happen, it's necessary to temporarily disable all HPT
> resize-downs before hotunplug, re-enable them after hotunplug ends,
> and then resize-down HPT to the current memory size.
> 
> As an example, hotunplugging 256GB from a 385GB guest took 621s without
> this patch, and 100s after applied.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/include/asm/book3s/64/hash.h |  2 ++
>  arch/powerpc/include/asm/sparsemem.h  |  2 ++
>  arch/powerpc/mm/book3s64/hash_utils.c | 28 +++
>  arch/powerpc/mm/book3s64/pgtable.c| 12 
>  .../platforms/pseries/hotplug-memory.c| 16 +++
>  5 files changed, 60 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
> b/arch/powerpc/include/asm/book3s/64/hash.h
> index 843b0a178590..f92697c107f7 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -256,6 +256,8 @@ int hash__create_section_mapping(unsigned long start, 
> unsigned long end,
>  int hash__remove_section_mapping(unsigned long start, unsigned long end);
>  
>  void hash_memory_batch_expand_prepare(unsigned long newsize);
> +void hash_memory_batch_shrink_begin(void);
> +void hash_memory_batch_shrink_end(void);
>  
>  #endif /* !__ASSEMBLY__ */
>  #endif /* __KERNEL__ */
> diff --git a/arch/powerpc/include/asm/sparsemem.h 
> b/arch/powerpc/include/asm/sparsemem.h
> index 16b5f5300c84..a7a8a0d070fc 100644
> --- a/arch/powerpc/include/asm/sparsemem.h
> +++ b/arch/powerpc/include/asm/sparsemem.h
> @@ -18,6 +18,8 @@ extern int memory_add_physaddr_to_nid(u64 start);
>  #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
>  
>  void memory_batch_expand_prepare(unsigned long newsize);
> +void memory_batch_shrink_begin(void);
> +void memory_batch_shrink_end(void);
>  
>  #ifdef CONFIG_NUMA
>  extern int hot_add_scn_to_nid(unsigned long scn_addr);
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
> b/arch/powerpc/mm/book3s64/hash_utils.c
> index 1f6aa0bf27e7..e16f207de8e4 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -794,6 +794,9 @@ static unsigned long __init htab_get_table_size(void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> +
> +atomic_t hpt_resize_disable = ATOMIC_INIT(0);
> +
>  static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking)
>  {
>   unsigned target_hpt_shift;
> @@ -805,6 +808,10 @@ static int resize_hpt_for_hotplug(unsigned long 
> new_mem_size, bool shrinking)
>  
>   if (shrinking) {
>  
> + /* When batch removing entries, only resizes HPT at the end. */
> + if (atomic_read_acquire(&hpt_resize_disable))
> + return 0;
> +

I'm not quite convinced by this locking.  Couldn't hpt_resize_disable
be set after this point, but while you're still inside
resize_hpt_for_hotplug()?  Probably better to use an explicit mutex
(and mutex_trylock()) to make the critical sections clearer.

Except... do we even need the fancy mechanics to suppress the resizes
in one place to do them elswhere.  Couldn't we just replace the
existing resize calls with the batched ones?

>   /*
>* To avoid lots of HPT resizes if memory size is fluctuating
>* across a boundary, we deliberately have some hysterisis
> @@ -872,6 +879,27 @@ void hash_memory_batch_expand_prepare(unsigned long 
> newsize)
>   pr_warn("Hash collision while resizing HPT\n");
>   }
>  }
> +
> +void hash_memory_batch_shrink_begin(void)
> +{
> + /* Disable HPT resize-down during hot-unplug */
> + atomic_set_release(&hpt_resize_disable, 1);
> +}
> +
> +void hash_memory_batch_shrink_end(void)
> +{
> + unsigned long newsize;
> +
> + /* Re-enables HPT resize-down after hot-unplug */
> + atomic_set_release(&hpt_resize_disable, 0);
> +
> + newsize = memblock_phys_mem_size();
> + /* Resize to smallest SHIFT possible */
> + while (resize_hpt_for_hotplug(newsize, true) == -ENOSPC) {
> + newsize *= 2;

As noted earlier, doing this without an explicit cap on the new hpt
size (of the existing size) this makes me nervous.  Less so, but doing
the calculations on memory size, rather than explictly on HPT size /
HPT order also seems kinda clunky.

>

Re: [PATCH -tip v4 12/12] tracing: Show kretprobe unknown indicator only for kretprobe_trampoline

2021-03-22 Thread Masami Hiramatsu
Hi Steve,

On Mon, 22 Mar 2021 11:11:42 -0400
Steven Rostedt  wrote:

> On Mon, 22 Mar 2021 15:42:02 +0900
> Masami Hiramatsu  wrote:
> 
> > ftrace shows "[unknown/kretprobe'd]" indicator all addresses in the
> > kretprobe_trampoline, but the modified address by kretprobe should
> > be only kretprobe_trampoline+0.
> > 
> > Signed-off-by: Masami Hiramatsu 
> 
> Acked-by: Steven Rostedt (VMware) 

Thank you for the Ack!

> 
> -- Steve
> 


-- 
Masami Hiramatsu 


Re: [PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs

2021-03-22 Thread Song Liu
On Mon, Mar 22, 2021 at 3:00 PM Collin Fijalkovich
 wrote:
>
> Transparent huge pages are supported for read-only non-shmem filesystems,
> but are only used for vmas with VM_DENYWRITE. This condition ensures that
> file THPs are protected from writes while an application is running
> (ETXTBSY).  Any existing file THPs are then dropped from the page cache
> when a file is opened for write in do_dentry_open(). Since sys_mmap
> ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
> produced by execve().
>
> Systems that make heavy use of shared libraries (e.g. Android) are unable
> to apply VM_DENYWRITE through the dynamic linker, preventing them from
> benefiting from the resultant reduced contention on the TLB.
>
> This patch reduces the constraint on file THPs allowing use with any
> executable mapping from a file not opened for write (see
> inode_is_open_for_write()). It also introduces additional conditions to
> ensure that files opened for write will never be backed by file THPs.

Thanks for working on this. We could also use this in many data center
workloads.

Question: when we use this on shared library, the library is still
writable. When the
shared library is opened for write, these pages will refault in as 4kB
pages, right?

Thanks,
Song


linux-next: manual merge of the v4l-dvb tree with the hwmon-staging tree

2021-03-22 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the v4l-dvb tree got a conflict in:

  MAINTAINERS

between commit:

  38f15506d965 ("hwmon: add driver for NZXT Kraken X42/X52/X62/X72")

from the hwmon-staging tree and commit:

  be157db0a3d8 ("media: Add maintainer for IMX jpeg v4l2 driver")

from the v4l-dvb tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc MAINTAINERS
index dda782e9e578,04e6df934eb0..
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@@ -12941,13 -12910,14 +12941,21 @@@ L:linux-...@lists.01.org 
(moderated fo
  S:Supported
  F:drivers/nfc/nxp-nci
  
+ NXP i.MX 8QXP/8QM JPEG V4L2 DRIVER
+ M:Mirela Rabulea 
+ R:NXP Linux Team 
+ L:linux-me...@vger.kernel.org
+ S:Maintained
+ F:Documentation/devicetree/bindings/media/imx8-jpeg.yaml
+ F:drivers/media/platform/imx-jpeg
+ 
 +NZXT-KRAKEN2 HARDWARE MONITORING DRIVER
 +M:Jonas Malaco 
 +L:linux-hw...@vger.kernel.org
 +S:Maintained
 +F:Documentation/hwmon/nzxt-kraken2.rst
 +F:drivers/hwmon/nzxt-kraken2.c
 +
  OBJAGG
  M:Jiri Pirko 
  L:net...@vger.kernel.org


pgpHSyPNfPDwt.pgp
Description: OpenPGP digital signature


[PATCH v4 net-next 11/11] net: ocelot: replay switchdev events when joining bridge

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

The premise of this change is that the switchdev port attributes and
objects offloaded by ocelot might have been missed when we are joining
an already existing bridge port, such as a bonding interface.

The patch pulls these switchdev attributes and objects from the bridge,
on behalf of the 'bridge port' net device which might be either the
ocelot switch interface, or the bonding upper interface.

The ocelot_net.c belongs strictly to the switchdev ocelot driver, while
ocelot.c is part of a library shared with the DSA felix driver.
The ocelot_port_bridge_leave function (part of the common library) used
to call ocelot_port_vlan_filtering(false), something which is not
necessary for DSA, since the framework deals with that already there.
So we move this function to ocelot_switchdev_unsync, which is specific
to the switchdev driver.

The code movement described above makes ocelot_port_bridge_leave no
longer return an error code, so we change its type from int to void.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/ocelot/felix.c |   4 +-
 drivers/net/ethernet/mscc/Kconfig  |   3 +-
 drivers/net/ethernet/mscc/ocelot.c |  18 ++--
 drivers/net/ethernet/mscc/ocelot_net.c | 117 +
 include/soc/mscc/ocelot.h  |   6 +-
 5 files changed, 113 insertions(+), 35 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 628afb47b579..6b5442be0230 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -719,7 +719,9 @@ static int felix_bridge_join(struct dsa_switch *ds, int 
port,
 {
struct ocelot *ocelot = ds->priv;
 
-   return ocelot_port_bridge_join(ocelot, port, br);
+   ocelot_port_bridge_join(ocelot, port, br);
+
+   return 0;
 }
 
 static void felix_bridge_leave(struct dsa_switch *ds, int port,
diff --git a/drivers/net/ethernet/mscc/Kconfig 
b/drivers/net/ethernet/mscc/Kconfig
index 05cb040c2677..2d3157e4d081 100644
--- a/drivers/net/ethernet/mscc/Kconfig
+++ b/drivers/net/ethernet/mscc/Kconfig
@@ -11,7 +11,7 @@ config NET_VENDOR_MICROSEMI
 
 if NET_VENDOR_MICROSEMI
 
-# Users should depend on NET_SWITCHDEV, HAS_IOMEM
+# Users should depend on NET_SWITCHDEV, HAS_IOMEM, BRIDGE
 config MSCC_OCELOT_SWITCH_LIB
select NET_DEVLINK
select REGMAP_MMIO
@@ -24,6 +24,7 @@ config MSCC_OCELOT_SWITCH_LIB
 
 config MSCC_OCELOT_SWITCH
tristate "Ocelot switch driver"
+   depends on BRIDGE || BRIDGE=n
depends on NET_SWITCHDEV
depends on HAS_IOMEM
depends on OF_NET
diff --git a/drivers/net/ethernet/mscc/ocelot.c 
b/drivers/net/ethernet/mscc/ocelot.c
index ce57929ba3d1..1a36b416fd9b 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1514,34 +1514,28 @@ int ocelot_port_mdb_del(struct ocelot *ocelot, int port,
 }
 EXPORT_SYMBOL(ocelot_port_mdb_del);
 
-int ocelot_port_bridge_join(struct ocelot *ocelot, int port,
-   struct net_device *bridge)
+void ocelot_port_bridge_join(struct ocelot *ocelot, int port,
+struct net_device *bridge)
 {
struct ocelot_port *ocelot_port = ocelot->ports[port];
 
ocelot_port->bridge = bridge;
 
-   return 0;
+   ocelot_apply_bridge_fwd_mask(ocelot);
 }
 EXPORT_SYMBOL(ocelot_port_bridge_join);
 
-int ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
-struct net_device *bridge)
+void ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
+ struct net_device *bridge)
 {
struct ocelot_port *ocelot_port = ocelot->ports[port];
struct ocelot_vlan pvid = {0}, native_vlan = {0};
-   int ret;
 
ocelot_port->bridge = NULL;
 
-   ret = ocelot_port_vlan_filtering(ocelot, port, false);
-   if (ret)
-   return ret;
-
ocelot_port_set_pvid(ocelot, port, pvid);
ocelot_port_set_native_vlan(ocelot, port, native_vlan);
-
-   return 0;
+   ocelot_apply_bridge_fwd_mask(ocelot);
 }
 EXPORT_SYMBOL(ocelot_port_bridge_leave);
 
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
b/drivers/net/ethernet/mscc/ocelot_net.c
index d1376f7b34fd..36f32a4d9b0f 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1117,47 +1117,126 @@ static int ocelot_port_obj_del(struct net_device *dev,
return ret;
 }
 
+static void ocelot_inherit_brport_flags(struct ocelot *ocelot, int port,
+   struct net_device *brport_dev)
+{
+   struct switchdev_brport_flags flags = {0};
+   int flag;
+
+   flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
+
+   for_each_set_bit(flag, &flags.mask, 32)
+   if (br_port_flag_is_set(brport_dev, BIT(flag)))
+   flags.val |= BIT(flag);
+
+   ocelot_port_bridge_flags(ocelot, port, flags);
+}
+
+sta

[PATCH v4 net-next 07/11] net: dsa: pass extack to dsa_port_{bridge,lag}_join

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

This is a pretty noisy change that was broken out of the larger change
for replaying switchdev attributes and objects at bridge join time,
which is when these extack objects are actually used.

Signed-off-by: Vladimir Oltean 
Reviewed-by: Florian Fainelli 
Reviewed-by: Tobias Waldekranz 
---
 net/dsa/dsa_priv.h | 6 --
 net/dsa/port.c | 8 +---
 net/dsa/slave.c| 7 +--
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 4c43c5406834..b8778c5d8529 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -181,12 +181,14 @@ int dsa_port_enable_rt(struct dsa_port *dp, struct 
phy_device *phy);
 int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy);
 void dsa_port_disable_rt(struct dsa_port *dp);
 void dsa_port_disable(struct dsa_port *dp);
-int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br);
+int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
+struct netlink_ext_ack *extack);
 void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br);
 int dsa_port_lag_change(struct dsa_port *dp,
struct netdev_lag_lower_state_info *linfo);
 int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag_dev,
- struct netdev_lag_upper_info *uinfo);
+ struct netdev_lag_upper_info *uinfo,
+ struct netlink_ext_ack *extack);
 void dsa_port_lag_leave(struct dsa_port *dp, struct net_device *lag_dev);
 int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
struct netlink_ext_ack *extack);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index d39262a9fe0e..fcbe5b1545b8 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -144,7 +144,8 @@ static void dsa_port_change_brport_flags(struct dsa_port 
*dp,
}
 }
 
-int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
+int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
+struct netlink_ext_ack *extack)
 {
struct dsa_notifier_bridge_info info = {
.tree_index = dp->ds->dst->index,
@@ -241,7 +242,8 @@ int dsa_port_lag_change(struct dsa_port *dp,
 }
 
 int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag,
- struct netdev_lag_upper_info *uinfo)
+ struct netdev_lag_upper_info *uinfo,
+ struct netlink_ext_ack *extack)
 {
struct dsa_notifier_lag_info info = {
.sw_index = dp->ds->index,
@@ -263,7 +265,7 @@ int dsa_port_lag_join(struct dsa_port *dp, struct 
net_device *lag,
if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
return 0;
 
-   err = dsa_port_bridge_join(dp, bridge_dev);
+   err = dsa_port_bridge_join(dp, bridge_dev, extack);
if (err)
goto err_bridge_join;
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 992fcab4b552..1ff48be476bb 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1976,11 +1976,14 @@ static int dsa_slave_changeupper(struct net_device *dev,
 struct netdev_notifier_changeupper_info *info)
 {
struct dsa_port *dp = dsa_slave_to_port(dev);
+   struct netlink_ext_ack *extack;
int err = NOTIFY_DONE;
 
+   extack = netdev_notifier_info_to_extack(&info->info);
+
if (netif_is_bridge_master(info->upper_dev)) {
if (info->linking) {
-   err = dsa_port_bridge_join(dp, info->upper_dev);
+   err = dsa_port_bridge_join(dp, info->upper_dev, extack);
if (!err)
dsa_bridge_mtu_normalization(dp);
err = notifier_from_errno(err);
@@ -1991,7 +1994,7 @@ static int dsa_slave_changeupper(struct net_device *dev,
} else if (netif_is_lag_master(info->upper_dev)) {
if (info->linking) {
err = dsa_port_lag_join(dp, info->upper_dev,
-   info->upper_info);
+   info->upper_info, extack);
if (err == -EOPNOTSUPP) {
NL_SET_ERR_MSG_MOD(info->info.extack,
   "Offloading not supported");
-- 
2.25.1



[PATCH v4 net-next 10/11] net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged LAG

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

Similar to the DSA situation, ocelot supports LAG offload but treats
this scenario improperly:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0

We do the same thing as we do there, which is to simulate a 'bridge join'
on 'lag join', if we detect that the bonding upper has a bridge upper.

Again, same as DSA, ocelot supports software fallback for LAG, and in
that case, we should avoid calling ocelot_netdevice_changeupper.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/ethernet/mscc/ocelot_net.c | 111 +++--
 1 file changed, 86 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot_net.c 
b/drivers/net/ethernet/mscc/ocelot_net.c
index c08164cd88f4..d1376f7b34fd 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1117,10 +1117,15 @@ static int ocelot_port_obj_del(struct net_device *dev,
return ret;
 }
 
-static int ocelot_netdevice_bridge_join(struct ocelot *ocelot, int port,
-   struct net_device *bridge)
+static int ocelot_netdevice_bridge_join(struct net_device *dev,
+   struct net_device *bridge,
+   struct netlink_ext_ack *extack)
 {
+   struct ocelot_port_private *priv = netdev_priv(dev);
+   struct ocelot_port *ocelot_port = &priv->port;
+   struct ocelot *ocelot = ocelot_port->ocelot;
struct switchdev_brport_flags flags;
+   int port = priv->chip_port;
int err;
 
flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
@@ -1135,10 +1140,14 @@ static int ocelot_netdevice_bridge_join(struct ocelot 
*ocelot, int port,
return 0;
 }
 
-static int ocelot_netdevice_bridge_leave(struct ocelot *ocelot, int port,
+static int ocelot_netdevice_bridge_leave(struct net_device *dev,
 struct net_device *bridge)
 {
+   struct ocelot_port_private *priv = netdev_priv(dev);
+   struct ocelot_port *ocelot_port = &priv->port;
+   struct ocelot *ocelot = ocelot_port->ocelot;
struct switchdev_brport_flags flags;
+   int port = priv->chip_port;
int err;
 
flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
@@ -1151,43 +1160,89 @@ static int ocelot_netdevice_bridge_leave(struct ocelot 
*ocelot, int port,
return err;
 }
 
-static int ocelot_netdevice_changeupper(struct net_device *dev,
-   struct netdev_notifier_changeupper_info 
*info)
+static int ocelot_netdevice_lag_join(struct net_device *dev,
+struct net_device *bond,
+struct netdev_lag_upper_info *info,
+struct netlink_ext_ack *extack)
 {
struct ocelot_port_private *priv = netdev_priv(dev);
struct ocelot_port *ocelot_port = &priv->port;
struct ocelot *ocelot = ocelot_port->ocelot;
+   struct net_device *bridge_dev;
int port = priv->chip_port;
+   int err;
+
+   err = ocelot_port_lag_join(ocelot, port, bond, info);
+   if (err == -EOPNOTSUPP) {
+   NL_SET_ERR_MSG_MOD(extack, "Offloading not supported");
+   return 0;
+   }
+
+   bridge_dev = netdev_master_upper_dev_get(bond);
+   if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
+   return 0;
+
+   err = ocelot_netdevice_bridge_join(dev, bridge_dev, extack);
+   if (err)
+   goto err_bridge_join;
+
+   return 0;
+
+err_bridge_join:
+   ocelot_port_lag_leave(ocelot, port, bond);
+   return err;
+}
+
+static int ocelot_netdevice_lag_leave(struct net_device *dev,
+ struct net_device *bond)
+{
+   struct ocelot_port_private *priv = netdev_priv(dev);
+   struct ocelot_port *ocelot_port = &priv->port;
+   struct ocelot *ocelot = ocelot_port->ocelot;
+   struct net_device *bridge_dev;
+   int port = priv->chip_port;
+
+   ocelot_port_lag_leave(ocelot, port, bond);
+
+   bridge_dev = netdev_master_upper_dev_get(bond);
+   if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
+   return 0;
+
+   return ocelot_netdevice_bridge_leave(dev, bridge_dev);
+}
+
+static int ocelot_netdevice_changeupper(struct net_device *dev,
+   struct netdev_notifier_changeupper_info 
*info)
+{
+   struct netlink_ext_ack *extack;
int err = 0;
 
+   extack = netdev_notifier_info_to_extack(&info->info);
+
if (netif_is_bridge_master(info->upper_dev)) {
-   if (info->linking) {
-   err = ocelot_netdevice_bridge_join(ocelot, port,
-  info->upper_dev);
-   } else {
-  

[PATCH v4 net-next 09/11] net: dsa: sync up switchdev objects and port attributes when joining the bridge

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

If we join an already-created bridge port, such as a bond master
interface, then we can miss the initial switchdev notifications emitted
by the bridge for this port, while it wasn't offloaded by anybody.

Signed-off-by: Vladimir Oltean 
---
 net/dsa/dsa_priv.h |  3 +++
 net/dsa/port.c | 46 ++
 net/dsa/slave.c|  4 ++--
 3 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index b8778c5d8529..92282de54230 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -262,6 +262,9 @@ static inline bool dsa_tree_offloads_bridge_port(struct 
dsa_switch_tree *dst,
 
 /* slave.c */
 extern const struct dsa_device_ops notag_netdev_ops;
+extern struct notifier_block dsa_slave_switchdev_notifier;
+extern struct notifier_block dsa_slave_switchdev_blocking_notifier;
+
 void dsa_slave_mii_bus_init(struct dsa_switch *ds);
 int dsa_slave_create(struct dsa_port *dp);
 void dsa_slave_destroy(struct net_device *slave_dev);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index c712bf3da0a0..01e30264b25b 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -170,12 +170,46 @@ static void dsa_port_clear_brport_flags(struct dsa_port 
*dp)
 static int dsa_port_switchdev_sync(struct dsa_port *dp,
   struct netlink_ext_ack *extack)
 {
+   struct net_device *brport_dev = dsa_port_to_bridge_port(dp);
+   struct net_device *br = dp->bridge_dev;
int err;
 
err = dsa_port_inherit_brport_flags(dp, extack);
if (err)
return err;
 
+   err = dsa_port_set_state(dp, br_port_get_stp_state(brport_dev));
+   if (err && err != -EOPNOTSUPP)
+   return err;
+
+   err = dsa_port_vlan_filtering(dp, br_vlan_enabled(br), extack);
+   if (err && err != -EOPNOTSUPP)
+   return err;
+
+   err = dsa_port_mrouter(dp->cpu_dp, br_multicast_router(br), extack);
+   if (err && err != -EOPNOTSUPP)
+   return err;
+
+   err = dsa_port_ageing_time(dp, br_get_ageing_time(br));
+   if (err && err != -EOPNOTSUPP)
+   return err;
+
+   err = br_mdb_replay(br, brport_dev,
+   &dsa_slave_switchdev_blocking_notifier,
+   extack);
+   if (err && err != -EOPNOTSUPP)
+   return err;
+
+   err = br_fdb_replay(br, brport_dev, &dsa_slave_switchdev_notifier);
+   if (err && err != -EOPNOTSUPP)
+   return err;
+
+   err = br_vlan_replay(br, brport_dev,
+&dsa_slave_switchdev_blocking_notifier,
+extack);
+   if (err && err != -EOPNOTSUPP)
+   return err;
+
return 0;
 }
 
@@ -198,6 +232,18 @@ static void dsa_port_switchdev_unsync(struct dsa_port *dp)
 * so allow it to be in BR_STATE_FORWARDING to be kept functional
 */
dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
+
+   /* VLAN filtering is handled by dsa_switch_bridge_leave */
+
+   /* Some drivers treat the notification for having a local multicast
+* router by allowing multicast to be flooded to the CPU, so we should
+* allow this in standalone mode too.
+*/
+   dsa_port_mrouter(dp->cpu_dp, true, NULL);
+
+   /* Ageing time may be global to the switch chip, so don't change it
+* here because we have no good reason (or value) to change it to.
+*/
 }
 
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 1ff48be476bb..c51e52418a62 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -2392,11 +2392,11 @@ static struct notifier_block dsa_slave_nb __read_mostly 
= {
.notifier_call  = dsa_slave_netdevice_event,
 };
 
-static struct notifier_block dsa_slave_switchdev_notifier = {
+struct notifier_block dsa_slave_switchdev_notifier = {
.notifier_call = dsa_slave_switchdev_event,
 };
 
-static struct notifier_block dsa_slave_switchdev_blocking_notifier = {
+struct notifier_block dsa_slave_switchdev_blocking_notifier = {
.notifier_call = dsa_slave_switchdev_blocking_event,
 };
 
-- 
2.25.1



[PATCH v4 net-next 08/11] net: dsa: inherit the actual bridge port flags at join time

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

DSA currently assumes that the bridge port starts off with this
constellation of bridge port flags:

- learning on
- unicast flooding on
- multicast flooding on
- broadcast flooding on

just by virtue of code copy-pasta from the bridge layer (new_nbp).
This was a simple enough strategy thus far, because the 'bridge join'
moment always coincided with the 'bridge port creation' moment.

But with sandwiched interfaces, such as:

 br0
  |
bond0
  |
 swp0

it may happen that the user has had time to change the bridge port flags
of bond0 before enslaving swp0 to it. In that case, swp0 will falsely
assume that the bridge port flags are those determined by new_nbp, when
in fact this can happen:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set bond0 type bridge_slave learning off
ip link set swp0 master br0

Now swp0 has learning enabled, bond0 has learning disabled. Not nice.

Fix this by "dumpster diving" through the actual bridge port flags with
br_port_flag_is_set, at bridge join time.

We use this opportunity to split dsa_port_change_brport_flags into two
distinct functions called dsa_port_inherit_brport_flags and
dsa_port_clear_brport_flags, now that the implementation for the two
cases is no longer similar. This patch also creates two functions called
dsa_port_switchdev_sync and dsa_port_switchdev_unsync which collect what
we have so far, even if that's asymmetrical. More is going to be added
in the next patch.

Signed-off-by: Vladimir Oltean 
---
 net/dsa/port.c | 123 -
 1 file changed, 82 insertions(+), 41 deletions(-)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index fcbe5b1545b8..c712bf3da0a0 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -122,28 +122,84 @@ void dsa_port_disable(struct dsa_port *dp)
rtnl_unlock();
 }
 
-static void dsa_port_change_brport_flags(struct dsa_port *dp,
-bool bridge_offload)
+static int dsa_port_inherit_brport_flags(struct dsa_port *dp,
+struct netlink_ext_ack *extack)
 {
-   struct switchdev_brport_flags flags;
-   int flag;
+   const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
+  BR_BCAST_FLOOD;
+   struct net_device *brport_dev = dsa_port_to_bridge_port(dp);
+   int flag, err;
 
-   flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
-   if (bridge_offload)
-   flags.val = flags.mask;
-   else
-   flags.val = flags.mask & ~BR_LEARNING;
+   for_each_set_bit(flag, &mask, 32) {
+   struct switchdev_brport_flags flags = {0};
+
+   flags.mask = BIT(flag);
 
-   for_each_set_bit(flag, &flags.mask, 32) {
-   struct switchdev_brport_flags tmp;
+   if (br_port_flag_is_set(brport_dev, BIT(flag)))
+   flags.val = BIT(flag);
+
+   err = dsa_port_bridge_flags(dp, flags, extack);
+   if (err && err != -EOPNOTSUPP)
+   return err;
+   }
 
-   tmp.val = flags.val & BIT(flag);
-   tmp.mask = BIT(flag);
+   return 0;
+}
 
-   dsa_port_bridge_flags(dp, tmp, NULL);
+static void dsa_port_clear_brport_flags(struct dsa_port *dp)
+{
+   const unsigned long val = BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
+   const unsigned long mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
+  BR_BCAST_FLOOD;
+   int flag, err;
+
+   for_each_set_bit(flag, &mask, 32) {
+   struct switchdev_brport_flags flags = {0};
+
+   flags.mask = BIT(flag);
+   flags.val = val & BIT(flag);
+
+   err = dsa_port_bridge_flags(dp, flags, NULL);
+   if (err && err != -EOPNOTSUPP)
+   dev_err(dp->ds->dev,
+   "failed to clear bridge port flag %lu: %pe\n",
+   flags.val, ERR_PTR(err));
}
 }
 
+static int dsa_port_switchdev_sync(struct dsa_port *dp,
+  struct netlink_ext_ack *extack)
+{
+   int err;
+
+   err = dsa_port_inherit_brport_flags(dp, extack);
+   if (err)
+   return err;
+
+   return 0;
+}
+
+static void dsa_port_switchdev_unsync(struct dsa_port *dp)
+{
+   /* Configure the port for standalone mode (no address learning,
+* flood everything).
+* The bridge only emits SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS events
+* when the user requests it through netlink or sysfs, but not
+* automatically at port join or leave, so we need to handle resetting
+* the brport flags ourselves. But we even prefer it that way, because
+* otherwise, some setups might never get the notification they need,
+* for example, when a port leaves a LAG 

[PATCH v4 net-next 05/11] net: bridge: add helper to replay VLANs installed on port

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

Currently this simple setup with DSA:

ip link add br0 type bridge vlan_filtering 1
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0

will not work because the bridge has created the PVID in br_add_if ->
nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1,
but that was too early, since swp0 was not yet a lower of bond0, so it
had no reason to act upon that notification.

We need a helper in the bridge to replay the switchdev VLAN objects that
were notified since the bridge port creation, because some of them may
have been missed.

As opposed to the br_mdb_replay function, the vg->vlan_list write side
protection is offered by the rtnl_mutex which is sleepable, so we don't
need to queue up the objects in atomic context, we can replay them right
away.

Signed-off-by: Vladimir Oltean 
---
 include/linux/if_bridge.h | 10 ++
 net/bridge/br_vlan.c  | 73 +++
 2 files changed, 83 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b564c4486a45..2cc35038a8ca 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -111,6 +111,8 @@ int br_vlan_get_pvid_rcu(const struct net_device *dev, u16 
*p_pvid);
 int br_vlan_get_proto(const struct net_device *dev, u16 *p_proto);
 int br_vlan_get_info(const struct net_device *dev, u16 vid,
 struct bridge_vlan_info *p_vinfo);
+int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
+  struct notifier_block *nb, struct netlink_ext_ack *extack);
 #else
 static inline bool br_vlan_enabled(const struct net_device *dev)
 {
@@ -137,6 +139,14 @@ static inline int br_vlan_get_info(const struct net_device 
*dev, u16 vid,
 {
return -EINVAL;
 }
+
+static inline int br_vlan_replay(struct net_device *br_dev,
+struct net_device *dev,
+struct notifier_block *nb,
+struct netlink_ext_ack *extack)
+{
+   return -EOPNOTSUPP;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_BRIDGE)
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 8829f621b8ec..ca8daccff217 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1751,6 +1751,79 @@ void br_vlan_notify(const struct net_bridge *br,
kfree_skb(skb);
 }
 
+static int br_vlan_replay_one(struct notifier_block *nb,
+ struct net_device *dev,
+ struct switchdev_obj_port_vlan *vlan,
+ struct netlink_ext_ack *extack)
+{
+   struct switchdev_notifier_port_obj_info obj_info = {
+   .info = {
+   .dev = dev,
+   .extack = extack,
+   },
+   .obj = &vlan->obj,
+   };
+   int err;
+
+   err = nb->notifier_call(nb, SWITCHDEV_PORT_OBJ_ADD, &obj_info);
+   return notifier_to_errno(err);
+}
+
+int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
+  struct notifier_block *nb, struct netlink_ext_ack *extack)
+{
+   struct net_bridge_vlan_group *vg;
+   struct net_bridge_vlan *v;
+   struct net_bridge_port *p;
+   struct net_bridge *br;
+   int err = 0;
+   u16 pvid;
+
+   ASSERT_RTNL();
+
+   if (!netif_is_bridge_master(br_dev))
+   return -EINVAL;
+
+   if (!netif_is_bridge_master(dev) && !netif_is_bridge_port(dev))
+   return -EINVAL;
+
+   if (netif_is_bridge_master(dev)) {
+   br = netdev_priv(dev);
+   vg = br_vlan_group(br);
+   p = NULL;
+   } else {
+   p = br_port_get_rtnl(dev);
+   if (WARN_ON(!p))
+   return -EINVAL;
+   vg = nbp_vlan_group(p);
+   br = p->br;
+   }
+
+   if (!vg)
+   return 0;
+
+   pvid = br_get_pvid(vg);
+
+   list_for_each_entry(v, &vg->vlan_list, vlist) {
+   struct switchdev_obj_port_vlan vlan = {
+   .obj.orig_dev = dev,
+   .obj.id = SWITCHDEV_OBJ_ID_PORT_VLAN,
+   .flags = br_vlan_flags(v, pvid),
+   .vid = v->vid,
+   };
+
+   if (!br_vlan_should_use(v))
+   continue;
+
+   br_vlan_replay_one(nb, dev, &vlan, extack);
+   if (err)
+   return err;
+   }
+
+   return err;
+}
+EXPORT_SYMBOL_GPL(br_vlan_replay);
+
 /* check if v_curr can enter a range ending in range_end */
 bool br_vlan_can_enter_range(const struct net_bridge_vlan *v_curr,
 const struct net_bridge_vlan *range_end)
-- 
2.25.1



[PATCH v4 net-next 03/11] net: bridge: add helper to replay port and host-joined mdb entries

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.

I create a bridge as follows:

ip link add br0 type bridge

As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:

33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0

br_dev_xmit
-> br_multicast_rcv
   -> br_ip6_multicast_add_group
  -> __br_multicast_add_group
 -> br_multicast_host_join
-> br_mdb_notify

This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.

Then when we add a port to br0:

ip link set swp0 master br0

the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.

The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.

Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.

To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.

There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.

Suggested-by: Ido Schimmel 
Signed-off-by: Vladimir Oltean 
---
 include/linux/if_bridge.h |   9 +++
 include/net/switchdev.h   |   1 +
 net/bridge/br_mdb.c   | 148 +-
 3 files changed, 141 insertions(+), 17 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index ebd16495459c..f6472969bb44 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -69,6 +69,8 @@ bool br_multicast_has_querier_anywhere(struct net_device 
*dev, int proto);
 bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
 bool br_multicast_enabled(const struct net_device *dev);
 bool br_multicast_router(const struct net_device *dev);
+int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
+ struct notifier_block *nb, struct netlink_ext_ack *extack);
 #else
 static inline int br_multicast_list_adjacent(struct net_device *dev,
 struct list_head *br_ip_list)
@@ -93,6 +95,13 @@ static inline bool br_multicast_router(const struct 
net_device *dev)
 {
return false;
 }
+static inline int br_mdb_replay(struct net_device *br_dev,
+   struct net_device *dev,
+   struct notifier_block *nb,
+   struct netlink_ext_ack *extack)
+{
+   return -EOPNOTSUPP;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index b7fc7d0f54e2..8c3218177136 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -68,6 +68,7 @@ enum switchdev_obj_id {
 };
 
 struct switchdev_obj {
+   struct list_head list;
struct net_device *orig_dev;
enum switchdev_obj_id id;
u32 flags;
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 8846c5bcd075..95fa4af0e8dd 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -506,6 +506,134 @@ static void br_mdb_complete(struct net_device *dev, int 
err, void *priv)
kfree(priv);
 }
 
+static void br_switchdev_mdb_populate(struct switchdev_obj_port_mdb *mdb,
+ const struct net_bridge_mdb_entry *mp)

[PATCH v4 net-next 06/11] net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a bridge

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

DSA can properly detect and offload this sequence of operations:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set swp0 master bond0
ip link set bond0 master br0

But not this one:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0

Actually the second one is more complicated, due to the elapsed time
between the enslavement of bond0 and the offloading of it via swp0, a
lot of things could have happened to the bond0 bridge port in terms of
switchdev objects (host MDBs, VLANs, altered STP state etc). So this is
a bit of a can of worms, and making sure that the DSA port's state is in
sync with this already existing bridge port is handled in the next
patches.

Signed-off-by: Vladimir Oltean 
Reviewed-by: Florian Fainelli 
Reviewed-by: Tobias Waldekranz 
---
 net/dsa/port.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index c9c6d7ab3f47..d39262a9fe0e 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -249,17 +249,31 @@ int dsa_port_lag_join(struct dsa_port *dp, struct 
net_device *lag,
.lag = lag,
.info = uinfo,
};
+   struct net_device *bridge_dev;
int err;
 
dsa_lag_map(dp->ds->dst, lag);
dp->lag_dev = lag;
 
err = dsa_port_notify(dp, DSA_NOTIFIER_LAG_JOIN, &info);
-   if (err) {
-   dp->lag_dev = NULL;
-   dsa_lag_unmap(dp->ds->dst, lag);
-   }
+   if (err)
+   goto err_lag_join;
 
+   bridge_dev = netdev_master_upper_dev_get(lag);
+   if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
+   return 0;
+
+   err = dsa_port_bridge_join(dp, bridge_dev);
+   if (err)
+   goto err_bridge_join;
+
+   return 0;
+
+err_bridge_join:
+   dsa_port_notify(dp, DSA_NOTIFIER_LAG_LEAVE, &info);
+err_lag_join:
+   dp->lag_dev = NULL;
+   dsa_lag_unmap(dp->ds->dst, lag);
return err;
 }
 
-- 
2.25.1



[PATCH v4 net-next 04/11] net: bridge: add helper to replay port and local fdb entries

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

When a switchdev port starts offloading a LAG that is already in a
bridge and has an FDB entry pointing to it:

ip link set bond0 master br0
bridge fdb add dev bond0 00:01:02:03:04:05 master static
ip link set swp0 master bond0

the switchdev driver will have no idea that this FDB entry is there,
because it missed the switchdev event emitted at its creation.

Ido Schimmel pointed this out during a discussion about challenges with
switchdev offloading of stacked interfaces between the physical port and
the bridge, and recommended to just catch that condition and deny the
CHANGEUPPER event:
https://lore.kernel.org/netdev/20210210105949.gb287...@shredder.lan/

But in fact, we might need to deal with the hard thing anyway, which is
to replay all FDB addresses relevant to this port, because it isn't just
static FDB entries, but also local addresses (ones that are not
forwarded but terminated by the bridge). There, we can't just say 'oh
yeah, there was an upper already so I'm not joining that'.

So, similar to the logic for replaying MDB entries, add a function that
must be called by individual switchdev drivers and replays local FDB
entries as well as ones pointing towards a bridge port. This time, we
use the atomic switchdev notifier block, since that's what FDB entries
expect for some reason.

Reported-by: Ido Schimmel 
Signed-off-by: Vladimir Oltean 
---
 include/linux/if_bridge.h |  9 +++
 net/bridge/br_fdb.c   | 50 +++
 2 files changed, 59 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index f6472969bb44..b564c4486a45 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -147,6 +147,8 @@ void br_fdb_clear_offload(const struct net_device *dev, u16 
vid);
 bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
 u8 br_port_get_stp_state(const struct net_device *dev);
 clock_t br_get_ageing_time(struct net_device *br_dev);
+int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
+ struct notifier_block *nb);
 #else
 static inline struct net_device *
 br_fdb_find_port(const struct net_device *br_dev,
@@ -175,6 +177,13 @@ static inline clock_t br_get_ageing_time(struct net_device 
*br_dev)
 {
return 0;
 }
+
+static inline int br_fdb_replay(struct net_device *br_dev,
+   struct net_device *dev,
+   struct notifier_block *nb)
+{
+   return -EOPNOTSUPP;
+}
 #endif
 
 #endif
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index b7490237f3fc..698b79747d32 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -726,6 +726,56 @@ static inline size_t fdb_nlmsg_size(void)
+ nla_total_size(sizeof(u8)); /* NFEA_ACTIVITY_NOTIFY */
 }
 
+static int br_fdb_replay_one(struct notifier_block *nb,
+struct net_bridge_fdb_entry *fdb,
+struct net_device *dev)
+{
+   struct switchdev_notifier_fdb_info item;
+   int err;
+
+   item.addr = fdb->key.addr.addr;
+   item.vid = fdb->key.vlan_id;
+   item.added_by_user = test_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
+   item.offloaded = test_bit(BR_FDB_OFFLOADED, &fdb->flags);
+   item.info.dev = dev;
+
+   err = nb->notifier_call(nb, SWITCHDEV_FDB_ADD_TO_DEVICE, &item);
+   return notifier_to_errno(err);
+}
+
+int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
+ struct notifier_block *nb)
+{
+   struct net_bridge_fdb_entry *fdb;
+   struct net_bridge *br;
+   int err = 0;
+
+   if (!netif_is_bridge_master(br_dev) || !netif_is_bridge_port(dev))
+   return -EINVAL;
+
+   br = netdev_priv(br_dev);
+
+   rcu_read_lock();
+
+   hlist_for_each_entry_rcu(fdb, &br->fdb_list, fdb_node) {
+   struct net_bridge_port *dst = READ_ONCE(fdb->dst);
+   struct net_device *dst_dev;
+
+   dst_dev = dst ? dst->dev : br->dev;
+   if (dst_dev != br_dev && dst_dev != dev)
+   continue;
+
+   err = br_fdb_replay_one(nb, fdb, dst_dev);
+   if (err)
+   break;
+   }
+
+   rcu_read_unlock();
+
+   return err;
+}
+EXPORT_SYMBOL_GPL(br_fdb_replay);
+
 static void fdb_notify(struct net_bridge *br,
   const struct net_bridge_fdb_entry *fdb, int type,
   bool swdev_notify)
-- 
2.25.1



[PATCH v4 net-next 02/11] net: bridge: add helper to retrieve the current ageing time

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from:

sysfs/ioctl/netlink
-> br_set_ageing_time
   -> __set_ageing_time

therefore not at bridge port creation time, so:
(a) switchdev drivers have to hardcode the initial value for the address
ageing time, because they didn't get any notification
(b) that hardcoded value can be out of sync, if the user changes the
ageing time before enslaving the port to the bridge

We need a helper in the bridge, such that switchdev drivers can query
the current value of the bridge ageing time when they start offloading
it.

Signed-off-by: Vladimir Oltean 
Reviewed-by: Florian Fainelli 
Reviewed-by: Tobias Waldekranz 
---
 include/linux/if_bridge.h |  6 ++
 net/bridge/br_stp.c   | 13 +
 2 files changed, 19 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 920d3a02cc68..ebd16495459c 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -137,6 +137,7 @@ struct net_device *br_fdb_find_port(const struct net_device 
*br_dev,
 void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
 bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
 u8 br_port_get_stp_state(const struct net_device *dev);
+clock_t br_get_ageing_time(struct net_device *br_dev);
 #else
 static inline struct net_device *
 br_fdb_find_port(const struct net_device *br_dev,
@@ -160,6 +161,11 @@ static inline u8 br_port_get_stp_state(const struct 
net_device *dev)
 {
return BR_STATE_DISABLED;
 }
+
+static inline clock_t br_get_ageing_time(struct net_device *br_dev)
+{
+   return 0;
+}
 #endif
 
 #endif
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 86b5e05d3f21..3dafb6143cff 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t 
ageing_time)
return 0;
 }
 
+clock_t br_get_ageing_time(struct net_device *br_dev)
+{
+   struct net_bridge *br;
+
+   if (!netif_is_bridge_master(br_dev))
+   return 0;
+
+   br = netdev_priv(br_dev);
+
+   return jiffies_to_clock_t(br->ageing_time);
+}
+EXPORT_SYMBOL_GPL(br_get_ageing_time);
+
 /* called under bridge lock */
 void __br_set_topology_change(struct net_bridge *br, unsigned char val)
 {
-- 
2.25.1



[PATCH v4 net-next 01/11] net: bridge: add helper for retrieving the current bridge port STP state

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

It may happen that we have the following topology with DSA or any other
switchdev driver with LAG offload:

ip link add br0 type bridge stp_state 1
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0
ip link set swp1 master bond0

STP decides that it should put bond0 into the BLOCKING state, and
that's that. The ports that are actively listening for the switchdev
port attributes emitted for the bond0 bridge port (because they are
offloading it) and have the honor of seeing that switchdev port
attribute can react to it, so we can program swp0 and swp1 into the
BLOCKING state.

But if then we do:

ip link set swp2 master bond0

then as far as the bridge is concerned, nothing has changed: it still
has one bridge port. But this new bridge port will not see any STP state
change notification and will remain FORWARDING, which is how the
standalone code leaves it in.

We need a function in the bridge driver which retrieves the current STP
state, such that drivers can synchronize to it when they may have missed
switchdev events.

Signed-off-by: Vladimir Oltean 
Reviewed-by: Florian Fainelli 
Reviewed-by: Tobias Waldekranz 
---
 include/linux/if_bridge.h |  6 ++
 net/bridge/br_stp.c   | 14 ++
 2 files changed, 20 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b979005ea39c..920d3a02cc68 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -136,6 +136,7 @@ struct net_device *br_fdb_find_port(const struct net_device 
*br_dev,
__u16 vid);
 void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
 bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
+u8 br_port_get_stp_state(const struct net_device *dev);
 #else
 static inline struct net_device *
 br_fdb_find_port(const struct net_device *br_dev,
@@ -154,6 +155,11 @@ br_port_flag_is_set(const struct net_device *dev, unsigned 
long flag)
 {
return false;
 }
+
+static inline u8 br_port_get_stp_state(const struct net_device *dev)
+{
+   return BR_STATE_DISABLED;
+}
 #endif
 
 #endif
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 21c6781906aa..86b5e05d3f21 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -64,6 +64,20 @@ void br_set_state(struct net_bridge_port *p, unsigned int 
state)
}
 }
 
+u8 br_port_get_stp_state(const struct net_device *dev)
+{
+   struct net_bridge_port *p;
+
+   ASSERT_RTNL();
+
+   p = br_port_get_rtnl(dev);
+   if (!p)
+   return BR_STATE_DISABLED;
+
+   return p->state;
+}
+EXPORT_SYMBOL_GPL(br_port_get_stp_state);
+
 /* called under bridge lock */
 struct net_bridge_port *br_get_port(struct net_bridge *br, u16 port_no)
 {
-- 
2.25.1



[PATCH v4 net-next 00/11] Better support for sandwiched LAGs with bridge and DSA

2021-03-22 Thread Vladimir Oltean
From: Vladimir Oltean 

Changes in v4:
- Added missing EXPORT_SYMBOL_GPL
- Using READ_ONCE(fdb->dst)
- Split patches into (a) adding the bridge helpers (b) making DSA use them
- br_mdb_replay went back to the v1 approach where it allocated memory
  in atomic context
- Created a br_switchdev_mdb_populate which reduces some of the code
  duplication
- Fixed the error message in dsa_port_clear_brport_flags
- Replaced "dsa_port_vlan_filtering(dp, br, extack)" with
  "dsa_port_vlan_filtering(dp, br_vlan_enabled(br), extack)" (duh)
- Added review tags (sorry if I missed any)

The objective of this series is to make LAG uppers on top of switchdev
ports work regardless of which order we link interfaces to their masters
(first make the port join the LAG, then the LAG join the bridge, or the
other way around).

There was a design decision to be made in patches 2-4 on whether we
should adopt the "push" model (which attempts to solve the problem
centrally, in the bridge layer) where the driver just calls:

  switchdev_bridge_port_offloaded(brport_dev,
  &atomic_notifier_block,
  &blocking_notifier_block,
  extack);

and the bridge just replays the entire collection of switchdev port
attributes and objects that it has, in some predefined order and with
some predefined error handling logic;


or the "pull" model (which attempts to solve the problem by giving the
driver the rope to hang itself), where the driver, apart from calling:

  switchdev_bridge_port_offloaded(brport_dev, extack);

has the task of "dumpster diving" (as Tobias puts it) through the bridge
attributes and objects by itself, by calling:

  - br_vlan_replay
  - br_fdb_replay
  - br_mdb_replay
  - br_vlan_enabled
  - br_port_flag_is_set
  - br_port_get_stp_state
  - br_multicast_router
  - br_get_ageing_time

(not necessarily all of them, and not necessarily in this order, and
with driver-defined error handling).

Even though I'm not in love myself with the "pull" model, I chose it
because there is a fundamental trick with replaying switchdev events
like this:

ip link add br0 type bridge
ip link add bond0 type bond
ip link set bond0 master br0
ip link set swp0 master bond0 <- this will replay the objects once for
 the bond0 bridge port, and the swp0
 switchdev port will process them
ip link set swp1 master bond0 <- this will replay the objects again for
 the bond0 bridge port, and the swp1
 switchdev port will see them, but swp0
 will see them for the second time now

Basically I believe that it is implementation defined whether the driver
wants to error out on switchdev objects seen twice on a port, and the
bridge should not enforce a certain model for that. For example, for FDB
entries added to a bonding interface, the underling switchdev driver
might have an abstraction for just that: an FDB entry pointing towards a
logical (as opposed to physical) port. So when the second port joins the
bridge, it doesn't realy need to replay FDB entries, since there is
already at least one hardware port which has been receiving those
events, and the FDB entries don't need to be added a second time to the
same logical port.
In the other corner, we have the drivers that handle switchdev port
attributes on a LAG as individual switchdev port attributes on physical
ports (example: VLAN filtering). In fact, the switchdev_handle_port_attr_set
helper facilitates this: it is a fan-out from a single orig_dev towards
multiple lowers that pass the check_cb().
But that's the point: switchdev_handle_port_attr_set is just a helper
which the driver _opts_ to use. The bridge can't enforce the "push"
model, because that would assume that all drivers handle port attributes
in the same way, which is probably false.

For this reason, I preferred to go with the "pull" mode for this patch
set. Just to see how bad it is for other switchdev drivers to copy-paste
this logic, I added the pull support to ocelot too, and I think it's
pretty manageable.

Vladimir Oltean (11):
  net: bridge: add helper for retrieving the current bridge port STP
state
  net: bridge: add helper to retrieve the current ageing time
  net: bridge: add helper to replay port and host-joined mdb entries
  net: bridge: add helper to replay port and local fdb entries
  net: bridge: add helper to replay VLANs installed on port
  net: dsa: call dsa_port_bridge_join when joining a LAG that is already
in a bridge
  net: dsa: pass extack to dsa_port_{bridge,lag}_join
  net: dsa: inherit the actual bridge port flags at join time
  net: dsa: sync up switchdev objects and port attributes when joining
the bridge
  net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged
LAG
  net: ocelot: replay switchdev events when joining bridge

 drivers/net/dsa/o

Re: [External] : Re: [LKP] Re: [locking/qspinlock] 0e8d8f4f12: stress-ng.zero.ops_per_sec -9.7% regression

2021-03-22 Thread Alex Kogan



> On Mar 22, 2021, at 7:15 PM, Alex Kogan  wrote:
> 
> Many thanks to Zhengjun Xing for the help in reproducing the issue.
> 
> On our system, the regression is less than 7% (the numbers are below), 
> however,
> at least at the full capacity, the numbers are very stable. This allowed me 
> to track down the
> issue and identify unnecessary stores into the queue node structure, which 
> may cause 
> cache misses during lock transfers. Moving those stores into the 
> initialization code (cna_init_nodes())
> solves the problem.
> 
> Below are the numbers of “bogo ops/s” reported by stress-ng with various 
> numbers of workers. 
> Each number represents an average over 25 runs, with the standard deviation 
> reported in ().
> 
> #workers  stock  CNA / speedup   
> CNA+patch / speedup
>  18  16327.844 (581.744) 15480.061 (582.654) / 0.948  16422.349 (473.729) / 
> 1.006
>  368573.557 (285.058)   8003.888 (196.125) / 0.9348457.436 (258.065) 
> / 0.986
>  724042.535 (28.766) 3960.407 (28.648) / 0.980  4107.143 (23.037) 
> / 1.016
> 108   2735.913 (7.440)   2678.888 (7.102) / 0.9792774.751 (4.375) 
> / 1.014
> 144   2093.477 (3.341)   2042.968 (1.982) / 0.9762109.879 (1.714) 
> / 1.008

Those are "bogo ops/s (usr+sys time)", btw. Just in case, below are "bogo ops/s 
(real time)”
numbers, which I believe is what is reported by the kernel test robot:

#workers  stock CNA / speedup   CNA+patch / 
speedup
 18   262932.282 (12638.248)   249653.081 (11822.940) / 0.949  265189.104 
(9271.447) / 1.009
 36   277315.640 (11100.324)   260177.335 (7186.451) / 0.938   274691.250 
(10329.523) / 0.991
 72   263904.000 (2128.206)259967.180 (1857.393) / 0.985   268971.483 
(1713.639) / 1.019
108   273811.373 (664.517) 268949.947 (690.329) / 0.982278196.867 
(403.978) / 1.016
144   284321.364 (399.281) 278153.208 (210.776) / 0.978287343.806 
(280.963) / 1.011

Regards,
— Alex

> The patch is attached. As always, comments are welcome!
> 
> Unless there any objections, I will reintegrate the patch into the series, 
> and post a new
> revision.
> 
> Regards,
> — Alex



Re: [PATCH v5 00/14] Add BLK_CTL support for i.MX8MP

2021-03-22 Thread Adam Ford
On Wed, Mar 3, 2021 at 4:54 AM Marek Vasut  wrote:
>
> On 3/3/21 11:47 AM, Abel Vesa wrote:
> > On 20-11-03 13:18:12, Abel Vesa wrote:
> >> The BLK_CTL according to HW design is basically the wrapper of the entire
> >> function specific group of IPs and holds GPRs that usually cannot be placed
> >> into one specific IP from that group. Some of these GPRs are used to 
> >> control
> >> some clocks, other some resets, others some very specific function that 
> >> does
> >> not fit into clocks or resets. Since the clocks are registered using the 
> >> i.MX
> >> clock subsystem API, the driver is placed into the clock subsystem, but it
> >> also registers the resets. For the other functionalities that other GPRs 
> >> might
> >> have, the syscon is used.
> >>
> >
> > This approach seems to be introducing a possible ABBA deadlock due to
> > the core clock and genpd locking. Here is a backtrace I got from Pete
> > Zhang (he reported the issue on the internal mailing list):
> >
> > [   11.667711][  T108] -> #1 (&genpd->mlock){+.+.}-{3:3}:
> > [   11.675041][  T108]__lock_acquire+0xae4/0xef8
> > [   11.680093][  T108]lock_acquire+0xfc/0x2f8
> > [   11.684888][  T108]__mutex_lock+0x90/0x870
> > [   11.689685][  T108]mutex_lock_nested+0x44/0x50
> > [   11.694826][  T108]genpd_lock_mtx+0x18/0x24
> > [   11.699706][  T108]genpd_runtime_resume+0x90/0x214 (hold 
> > genpd->mlock)
> > [   11.705194][  T108]__rpm_callback+0x80/0x2c0
> > [   11.710160][  T108]rpm_resume+0x468/0x650
> > [   11.714866][  T108]__pm_runtime_resume+0x60/0x88
> > [   11.720180][  T108]clk_pm_runtime_get+0x28/0x9c
> > [   11.725410][  T108]clk_disable_unused_subtree+0x8c/0x144
> > [   11.731420][  T108]clk_disable_unused_subtree+0x124/0x144
> > [   11.737518][  T108]clk_disable_unused+0xa4/0x11c (hold 
> > prepare_lock)
> > [   11.742833][  T108]do_one_initcall+0x98/0x178
> > [   11.747888][  T108]do_initcall_level+0x9c/0xb8
> > [   11.753028][  T108]do_initcalls+0x54/0x94
> > [   11.757736][  T108]do_basic_setup+0x24/0x30
> > [   11.762614][  T108]kernel_init_freeable+0x70/0xa4
> > [   11.768014][  T108]kernel_init+0x14/0x18c
> > [   11.772722][  T108]ret_from_fork+0x10/0x18
> >
> > [   11.777512][  T108] -> #0 (prepare_lock){+.+.}-{3:3}:
> > [   11.784749][  T108]check_noncircular+0x134/0x13c
> > [   11.790064][  T108]validate_chain+0x590/0x2a04
> > [   11.795204][  T108]__lock_acquire+0xae4/0xef8
> > [   11.800258][  T108]lock_acquire+0xfc/0x2f8
> > [   11.805050][  T108]__mutex_lock+0x90/0x870
> > [   11.809841][  T108]mutex_lock_nested+0x44/0x50
> > [   11.814983][  T108]clk_unprepare+0x5c/0x100 ((hold prepare_lock))
> > [   11.819864][  T108]imx8m_pd_power_off+0xac/0x110
> > [   11.825179][  T108]genpd_power_off+0x1b4/0x2dc
> > [   11.830318][  T108]genpd_power_off_work_fn+0x38/0x58 (hold 
> > genpd->mlock)
> > [   11.835981][  T108]process_one_work+0x270/0x444
> > [   11.841208][  T108]worker_thread+0x280/0x4e4
> > [   11.846176][  T108]kthread+0x13c/0x14
> > [   11.850621][  T108]ret_from_fork+0x10/0x18
> >
> > Now, this has been reproduced only on the NXP internal tree, but I think
> > it is pretty obvious this could happen in upstream too, with this
> > patchset applied.
> >
> > First, my thought was to change the prepare_lock/enable_lock in clock
> > core, from a global approach to a per clock basis. But that doesn't
> > actually fix the issue.
> >
> > The usecase seen above is due to clk_disable_unused, but the same could
> > happen when a clock consumer calls prepare/unprepare on a clock.
> >
> > I guess the conclusion is that the current state of the clock core and
> > genpd implementation does not support a usecase where a clock controller
> > has a PD which in turn uses another clock (from another clock controller).
> >
> > Jacky, Pete, did I miss anything here ?
>
> Just for completeness, I have a feeling I already managed to trigger
> this and discussed this with Lucas before, so this concern is certainly
> valid.

I know it may not be ideal, someone tied a soft-reset and soft-enable
to the driver of the Hantro VPU on the IMXMQ [1], and I wonder if
something similar could be done for the drivers who are consumers of
the clocks.

For example:

lcdif could request the power domain.
The power domain soft-resets and enables bus clock (vis syscon)
After successful enabling of power-domain, the LCDIF requests the soft
reset and respective clock bits (also via syscon) similar to how [1]
and [2] do it for the Hantro VPU.

The syscon node could be a common node similar to what was done in
[2], and multiple consumers could control when each soft-reset and
clock-enable get activated.  I know it's probably more of an abuse of
the syscon system, but having the individual dri

Re: [PATCH] powerpc/asm: Fix a typo

2021-03-22 Thread Michael Ellerman
Randy Dunlap  writes:
> On 3/22/21 4:32 AM, Bhaskar Chowdhury wrote:
>> 
>> s/poiner/pointer/
>> 
>> Signed-off-by: Bhaskar Chowdhury 
>
> Acked-by: Randy Dunlap 
>
> However, it would be a GOOD THING to collect multiple similar
> patches that are in e.g. arch/powerpc/ and send them as one patch
> instead of many little patches.

Yes.

Please send me one patch for all of the spelling issues you can
currently find in arch/powerpc.

cheers


[PATCH] kbuild: Merge module sections if and only if CONFIG_LTO_CLANG is enabled

2021-03-22 Thread Sean Christopherson
Merge module sections only when using Clang LTO.  With gcc-10, merging
sections does not appear to update the symbol tables for the module,
e.g. 'readelf -s' shows the value that a symbol would have had, if
sections were not merged.

The stale symbol table breaks gdb's function disassambler, and presumably
other things, e.g.

  gdb -batch -ex "file arch/x86/kvm/kvm.ko" -ex "disassemble kvm_init"

reads the wrong bytes and dumps garbage.

Fixes: dd2776222abb ("kbuild: lto: merge module sections")
Cc: Nick Desaulniers 
Cc: Sami Tolvanen 
Cc: Kees Cook 
Signed-off-by: Sean Christopherson 
---

This is obviously the quick and dirty approach to fixing the problem,
presumably there is a way to actually update the symbols, but that is far,
far outside my area of expertise.

IIUC how the disassemblers work, the section headers are correctly updated,
e.g. objdump displays the correct info, and even gdb's disassembler shows
the "correct" offset, it's just the symbol tables that are out of whack.

An earlier version of the LTO series did have exactly this #ifdef, but
it was dropped when no one objected to Kees' suggestion to unconditionally
merge sections.  https://lkml.kernel.org/r/202010141548.47CB1BC@keescook

 scripts/module.lds.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index 168cd27e6122..3580c6d02957 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -25,6 +25,7 @@ SECTIONS {
 * -ffunction-sections, which increases the size of the final module.
 * Merge the split sections in the final binary.
 */
+#ifdef CONFIG_LTO_CLANG
.bss : {
*(.bss .bss.[0-9a-zA-Z_]*)
*(.bss..L*)
@@ -41,6 +42,7 @@ SECTIONS {
}
 
.text : { *(.text .text.[0-9a-zA-Z_]*) }
+#endif
 }
 
 /* bring in arch-specific sections */
-- 
2.31.0.rc2.261.g7f71774620-goog



Re: [PATCH v4 0/3] net: dsa: lantiq: add support for xRX300 and xRX330

2021-03-22 Thread patchwork-bot+netdevbpf
Hello:

This series was applied to netdev/net-next.git (refs/heads/master):

On Mon, 22 Mar 2021 21:37:14 +0100 you wrote:
> Changed since v3:
>   * fixed last compilation warning
> 
> Changed since v2:
>   * fixed compilation warnings
>   * removed example bindings for xrx330
>   * patches has been refactored due to upstream changes
> 
> [...]

Here is the summary with links:
  - [v4,1/3] net: dsa: lantiq: allow to use all GPHYs on xRX300 and xRX330
https://git.kernel.org/netdev/net-next/c/a09d042b0862
  - [v4,2/3] net: dsa: lantiq: verify compatible strings against hardware
https://git.kernel.org/netdev/net-next/c/204c7614738e
  - [v4,3/3] dt-bindings: net: dsa: lantiq: add xRx300 and xRX330 switch 
bindings
https://git.kernel.org/netdev/net-next/c/ee83d82407e4

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [PATCH v3 2/3] net: ethernet: actions: Add Actions Semi Owl Ethernet MAC driver

2021-03-22 Thread Andrew Lunn
> +static void owl_emac_set_multicast(struct net_device *netdev, int count)
> +{
> + struct owl_emac_priv *priv = netdev_priv(netdev);
> + struct netdev_hw_addr *ha;
> + int index = 0;
> +
> + if (count <= 0) {
> + priv->mcaddr_list.count = 0;
> + return;
> + }
> +
> + netdev_for_each_mc_addr(ha, netdev) {
> + if (!is_multicast_ether_addr(ha->addr))
> + continue;

Is this possible?

> +
> + WARN_ON(index >= OWL_EMAC_MAX_MULTICAST_ADDRS);
> + ether_addr_copy(priv->mcaddr_list.addrs[index++], ha->addr);
> + }
> +
> + priv->mcaddr_list.count = index;
> +
> + owl_emac_setup_frame_xmit(priv);
> +}
> +
> +static void owl_emac_ndo_set_rx_mode(struct net_device *netdev)
> +{
> + struct owl_emac_priv *priv = netdev_priv(netdev);
> + u32 status, val = 0;
> + int mcast_count = 0;
> +
> + if (netdev->flags & IFF_PROMISC) {
> + val = OWL_EMAC_BIT_MAC_CSR6_PR;
> + } else if (netdev->flags & IFF_ALLMULTI) {
> + val = OWL_EMAC_BIT_MAC_CSR6_PM;
> + } else if (netdev->flags & IFF_MULTICAST) {
> + mcast_count = netdev_mc_count(netdev);
> +
> + if (mcast_count > OWL_EMAC_MAX_MULTICAST_ADDRS) {
> + val = OWL_EMAC_BIT_MAC_CSR6_PM;
> + mcast_count = 0;
> + }
> + }
> +
> + spin_lock_bh(&priv->lock);
> +
> + /* Temporarily stop DMA TX & RX. */
> + status = owl_emac_dma_cmd_stop(priv);
> +
> + /* Update operation modes. */
> + owl_emac_reg_update(priv, OWL_EMAC_REG_MAC_CSR6,
> + OWL_EMAC_BIT_MAC_CSR6_PR | OWL_EMAC_BIT_MAC_CSR6_PM,
> + val);
> +
> + /* Restore DMA TX & RX status. */
> + owl_emac_dma_cmd_set(priv, status);
> +
> + spin_unlock_bh(&priv->lock);
> +
> + /* Set/reset multicast addr list. */
> + owl_emac_set_multicast(netdev, mcast_count);
> +}

I think this can be simplified. At least, you had me going around in
circles a while trying to see if WARN_ON() could be triggered from
user space.

If you have more than OWL_EMAC_MAX_MULTICAST_ADDRS MC addresses, you
go into promisc mode. Can you then skip calling
owl_emac_set_multicast(), which appears not to do too much useful when
passed 0?

   Andrew


RE: [PATCH -next] x86: Fix unused variable 'msr_val' warning

2021-03-22 Thread Michael Kelley
From: Ingo Molnar  Sent: Monday, March 22, 2021 
2:08 PM
> 
> * Xu Yihang  wrote:
> 
> > Fixes the following W=1 kernel build warning(s):
> > arch/x86/hyperv/hv_spinlock.c:28:16: warning: variable 'msr_val' set but 
> > not used [-
> Wunused-but-set-variable]
> >   unsigned long msr_val;
> >
> > As Hypervisor Top-Level Functional Specification states in chapter 7.5 
> > Virtual Processor
> Idle Sleep State, "A partition which possesses the AccessGuestIdleMsr 
> privilege (refer to
> section 4.2.2) may trigger entry into the virtual processor idle sleep state 
> through a read to
> the hypervisor-defined MSR HV_X64_MSR_GUEST_IDLE". That means only a read is
> necessary, msr_val is not uesed, so __maybe_unused should be added.
> >
> > Reference:
> >
> > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
> >
> > Reported-by: Hulk Robot 
> > Signed-off-by: Xu Yihang 
> > ---
> >  arch/x86/hyperv/hv_spinlock.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
> > index f3270c1fc48c..67bc15c7752a 100644
> > --- a/arch/x86/hyperv/hv_spinlock.c
> > +++ b/arch/x86/hyperv/hv_spinlock.c
> > @@ -25,7 +25,7 @@ static void hv_qlock_kick(int cpu)
> >
> >  static void hv_qlock_wait(u8 *byte, u8 val)
> >  {
> > -   unsigned long msr_val;
> > +   unsigned long msr_val __maybe_unused;
> > unsigned long flags;
> 
> Please don't add new __maybe_unused annotations to the x86 tree -
> improve the flow instead to help GCC recognize the initialization
> sequence better.
> 
> Thanks,
> 
>   Ingo

Could you elaborate on the thinking here, or point to some written
discussion?   I'm just curious.   In this particular case, it's not a problem
with the flow or gcc detection.  This code really does read an MSR and
ignore that value that is read, so it's not clear how gcc would ever
figure out that's OK.

Michael




[tip: x86/cleanups] x86/fpu/math-emu: Fix function cast warning

2021-03-22 Thread tip-bot2 for Arnd Bergmann
The following commit has been merged into the x86/cleanups branch of tip:

Commit-ID: 279d56abc67ed7568168cb31bf1c7d735efc89a7
Gitweb:
https://git.kernel.org/tip/279d56abc67ed7568168cb31bf1c7d735efc89a7
Author:Arnd Bergmann 
AuthorDate:Mon, 22 Mar 2021 22:48:19 +01:00
Committer: Ingo Molnar 
CommitterDate: Tue, 23 Mar 2021 00:08:02 +01:00

x86/fpu/math-emu: Fix function cast warning

Building with 'make W=1', gcc points out that casting between
incompatible function types can be dangerous:

  arch/x86/math-emu/fpu_trig.c:1638:60: error: cast between incompatible 
function types from ‘int (*)(FPU_REG *, u_char)’ {aka ‘int (*)(struct fpu__reg 
*, unsigned char)’} to ‘void (*)(FPU_REG *, u_char)’ {aka ‘void (*)(struct 
fpu__reg *, unsigned char)’} [-Werror=cast-function-type]
   1638 | fprem, fyl2xp1, fsqrt_, fsincos, frndint_, fscale, (FUNC_ST0) 
fsin, fcos
|^

This one seems harmless, but it is easy enough to work around it by
adding an intermediate function that adjusts the return type.

Signed-off-by: Arnd Bergmann 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20210322214824.974323-1-a...@kernel.org
---
 arch/x86/math-emu/fpu_trig.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/math-emu/fpu_trig.c b/arch/x86/math-emu/fpu_trig.c
index 4a98878..990d847 100644
--- a/arch/x86/math-emu/fpu_trig.c
+++ b/arch/x86/math-emu/fpu_trig.c
@@ -547,7 +547,7 @@ static void frndint_(FPU_REG *st0_ptr, u_char st0_tag)
single_arg_error(st0_ptr, st0_tag);
 }
 
-static int fsin(FPU_REG *st0_ptr, u_char tag)
+static int f_sin(FPU_REG *st0_ptr, u_char tag)
 {
u_char arg_sign = getsign(st0_ptr);
 
@@ -608,6 +608,11 @@ static int fsin(FPU_REG *st0_ptr, u_char tag)
}
 }
 
+static void fsin(FPU_REG *st0_ptr, u_char tag)
+{
+   f_sin(st0_ptr, tag);
+}
+
 static int f_cos(FPU_REG *st0_ptr, u_char tag)
 {
u_char st0_sign;
@@ -724,7 +729,7 @@ static void fsincos(FPU_REG *st0_ptr, u_char st0_tag)
}
 
reg_copy(st0_ptr, &arg);
-   if (!fsin(st0_ptr, st0_tag)) {
+   if (!f_sin(st0_ptr, st0_tag)) {
push();
FPU_copy_to_reg0(&arg, st0_tag);
f_cos(&st(0), st0_tag);
@@ -1635,7 +1640,7 @@ void FPU_triga(void)
 }
 
 static FUNC_ST0 const trig_table_b[] = {
-   fprem, fyl2xp1, fsqrt_, fsincos, frndint_, fscale, (FUNC_ST0) fsin, fcos
+   fprem, fyl2xp1, fsqrt_, fsincos, frndint_, fscale, fsin, fcos
 };
 
 void FPU_trigb(void)


[tip: x86/boot] x86/boot/tboot: Avoid Wstringop-overread-warning

2021-03-22 Thread tip-bot2 for Arnd Bergmann
The following commit has been merged into the x86/boot branch of tip:

Commit-ID: cdc34cb8f25d3125d30868376b8eae6fe690119b
Gitweb:
https://git.kernel.org/tip/cdc34cb8f25d3125d30868376b8eae6fe690119b
Author:Arnd Bergmann 
AuthorDate:Mon, 22 Mar 2021 17:02:40 +01:00
Committer: Ingo Molnar 
CommitterDate: Tue, 23 Mar 2021 00:16:13 +01:00

x86/boot/tboot: Avoid Wstringop-overread-warning

gcc-11 warns about using string operations on pointers that are
defined at compile time as offsets from a NULL pointer. Unfortunately
that also happens on the result of fix_to_virt(), which is a
compile-time constant for a constant input:

  arch/x86/kernel/tboot.c: In function 'tboot_probe':
  arch/x86/kernel/tboot.c:70:13: error: '__builtin_memcmp_eq' specified bound 
16 exceeds source size 0 [-Werror=stringop-overread]
 70 | if (memcmp(&tboot_uuid, &tboot->uuid, sizeof(tboot->uuid))) {
| ^~

I hope this can get addressed in gcc-11 before the release.

As a workaround, split up the tboot_probe() function in two halves
to separate the pointer generation from the usage. This is a bit
ugly, and hopefully gcc understands that the code is actually correct
before it learns to peek into the noinline function.

Signed-off-by: Arnd Bergmann 
Signed-off-by: Ingo Molnar 
Cc: Linus Torvalds 
Cc: Martin Sebor 
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
Link: https://lore.kernel.org/r/20210322160253.4032422-3-a...@kernel.org
---
 arch/x86/kernel/tboot.c | 44 +++-
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 4c09ba1..f9af561 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -49,6 +49,30 @@ bool tboot_enabled(void)
return tboot != NULL;
 }
 
+/* noinline to prevent gcc from warning about dereferencing constant fixaddr */
+static noinline __init bool check_tboot_version(void)
+{
+   if (memcmp(&tboot_uuid, &tboot->uuid, sizeof(tboot->uuid))) {
+   pr_warn("tboot at 0x%llx is invalid\n", boot_params.tboot_addr);
+   return false;
+   }
+
+   if (tboot->version < 5) {
+   pr_warn("tboot version is invalid: %u\n", tboot->version);
+   return false;
+   }
+
+   pr_info("found shared page at phys addr 0x%llx:\n",
+   boot_params.tboot_addr);
+   pr_debug("version: %d\n", tboot->version);
+   pr_debug("log_addr: 0x%08x\n", tboot->log_addr);
+   pr_debug("shutdown_entry: 0x%x\n", tboot->shutdown_entry);
+   pr_debug("tboot_base: 0x%08x\n", tboot->tboot_base);
+   pr_debug("tboot_size: 0x%x\n", tboot->tboot_size);
+
+   return true;
+}
+
 void __init tboot_probe(void)
 {
/* Look for valid page-aligned address for shared page. */
@@ -66,25 +90,9 @@ void __init tboot_probe(void)
 
/* Map and check for tboot UUID. */
set_fixmap(FIX_TBOOT_BASE, boot_params.tboot_addr);
-   tboot = (struct tboot *)fix_to_virt(FIX_TBOOT_BASE);
-   if (memcmp(&tboot_uuid, &tboot->uuid, sizeof(tboot->uuid))) {
-   pr_warn("tboot at 0x%llx is invalid\n", boot_params.tboot_addr);
+   tboot = (void *)fix_to_virt(FIX_TBOOT_BASE);
+   if (!check_tboot_version())
tboot = NULL;
-   return;
-   }
-   if (tboot->version < 5) {
-   pr_warn("tboot version is invalid: %u\n", tboot->version);
-   tboot = NULL;
-   return;
-   }
-
-   pr_info("found shared page at phys addr 0x%llx:\n",
-   boot_params.tboot_addr);
-   pr_debug("version: %d\n", tboot->version);
-   pr_debug("log_addr: 0x%08x\n", tboot->log_addr);
-   pr_debug("shutdown_entry: 0x%x\n", tboot->shutdown_entry);
-   pr_debug("tboot_base: 0x%08x\n", tboot->tboot_base);
-   pr_debug("tboot_size: 0x%x\n", tboot->tboot_size);
 }
 
 static pgd_t *tboot_pg_dir;


[tip: x86/boot] x86/boot/compressed: Avoid gcc-11 -Wstringop-overread warning

2021-03-22 Thread tip-bot2 for Arnd Bergmann
The following commit has been merged into the x86/boot branch of tip:

Commit-ID: e14cfb3bdd0f82147d09e9f46bedda6302f28ee1
Gitweb:
https://git.kernel.org/tip/e14cfb3bdd0f82147d09e9f46bedda6302f28ee1
Author:Arnd Bergmann 
AuthorDate:Mon, 22 Mar 2021 17:02:39 +01:00
Committer: Ingo Molnar 
CommitterDate: Tue, 23 Mar 2021 00:16:25 +01:00

x86/boot/compressed: Avoid gcc-11 -Wstringop-overread warning

GCC gets confused by the comparison of a pointer to an integer literal,
with the assumption that this is an offset from a NULL pointer and that
dereferencing it is invalid:

  In file included from arch/x86/boot/compressed/misc.c:18:
  In function ‘parse_elf’,
  inlined from ‘extract_kernel’ at arch/x86/boot/compressed/misc.c:442:2:
  arch/x86/boot/compressed/../string.h:15:23: error: ‘__builtin_memcpy’ reading 
64 bytes from a region of size 0 [-Werror=stringop-overread]
 15 | #define memcpy(d,s,l) __builtin_memcpy(d,s,l)
|   ^~~
  arch/x86/boot/compressed/misc.c:283:9: note: in expansion of macro ‘memcpy’
283 | memcpy(&ehdr, output, sizeof(ehdr));
| ^~

I could not find any good workaround for this, but as this is only
a warning for a failure during early boot, removing the line entirely
works around the warning.

Signed-off-by: Arnd Bergmann 
Signed-off-by: Ingo Molnar 
Cc: Linus Torvalds 
Cc: Martin Sebor 
Link: https://lore.kernel.org/r/20210322160253.4032422-2-a...@kernel.org
---
 arch/x86/boot/compressed/misc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 267e7f9..1945b8a 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -430,8 +430,6 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
error("Destination address too large");
 #endif
 #ifndef CONFIG_RELOCATABLE
-   if ((unsigned long)output != LOAD_PHYSICAL_ADDR)
-   error("Destination address does not match LOAD_PHYSICAL_ADDR");
if (virt_addr != LOAD_PHYSICAL_ADDR)
error("Destination virtual address changed when not 
relocatable");
 #endif


[tip: locking/core] static_call: Fix function type mismatch

2021-03-22 Thread tip-bot2 for Arnd Bergmann
The following commit has been merged into the locking/core branch of tip:

Commit-ID: 335c73e7c8f7deb23537afbbbe4f8ab48bd5de52
Gitweb:
https://git.kernel.org/tip/335c73e7c8f7deb23537afbbbe4f8ab48bd5de52
Author:Arnd Bergmann 
AuthorDate:Mon, 22 Mar 2021 22:42:24 +01:00
Committer: Ingo Molnar 
CommitterDate: Tue, 23 Mar 2021 00:08:53 +01:00

static_call: Fix function type mismatch

The __static_call_return0() function is declared to return a 'long',
while it aliases a couple of functions that all return 'int'. When
building with 'make W=1', gcc warns about this:

  kernel/sched/core.c:5420:37: error: cast between incompatible function types 
from 'long int (*)(void)' to 'int (*)(void)' [-Werror=cast-function-type]
   5420 |   static_call_update(might_resched, (typeof(&__cond_resched)) 
__static_call_return0);

Change all these function to return 'long' as well, but remove the cast to
ensure we get a warning if any of the types ever change.

Signed-off-by: Arnd Bergmann 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20210322214309.730556-1-a...@kernel.org
---
 include/linux/kernel.h |  4 ++--
 include/linux/sched.h  | 14 +++---
 kernel/sched/core.c|  8 
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5b7ed6d..db24f8c 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -82,12 +82,12 @@ struct user;
 
 #ifdef CONFIG_PREEMPT_VOLUNTARY
 
-extern int __cond_resched(void);
+extern long __cond_resched(void);
 # define might_resched() __cond_resched()
 
 #elif defined(CONFIG_PREEMPT_DYNAMIC)
 
-extern int __cond_resched(void);
+extern long __cond_resched(void);
 
 DECLARE_STATIC_CALL(might_resched, __cond_resched);
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ef00bb2..b08080d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1875,20 +1875,20 @@ static inline int test_tsk_need_resched(struct 
task_struct *tsk)
  * cond_resched_lock() will drop the spinlock before scheduling,
  */
 #if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC)
-extern int __cond_resched(void);
+extern long __cond_resched(void);
 
 #ifdef CONFIG_PREEMPT_DYNAMIC
 
 DECLARE_STATIC_CALL(cond_resched, __cond_resched);
 
-static __always_inline int _cond_resched(void)
+static __always_inline long _cond_resched(void)
 {
return static_call_mod(cond_resched)();
 }
 
 #else
 
-static inline int _cond_resched(void)
+static inline long _cond_resched(void)
 {
return __cond_resched();
 }
@@ -1897,7 +1897,7 @@ static inline int _cond_resched(void)
 
 #else
 
-static inline int _cond_resched(void) { return 0; }
+static inline long _cond_resched(void) { return 0; }
 
 #endif /* !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) */
 
@@ -1906,9 +1906,9 @@ static inline int _cond_resched(void) { return 0; }
_cond_resched();\
 })
 
-extern int __cond_resched_lock(spinlock_t *lock);
-extern int __cond_resched_rwlock_read(rwlock_t *lock);
-extern int __cond_resched_rwlock_write(rwlock_t *lock);
+extern long __cond_resched_lock(spinlock_t *lock);
+extern long __cond_resched_rwlock_read(rwlock_t *lock);
+extern long __cond_resched_rwlock_write(rwlock_t *lock);
 
 #define cond_resched_lock(lock) ({ \
___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9819121..927fd82 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6976,7 +6976,7 @@ SYSCALL_DEFINE0(sched_yield)
 }
 
 #if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC)
-int __sched __cond_resched(void)
+long __sched __cond_resched(void)
 {
if (should_resched(0)) {
preempt_schedule_common();
@@ -7006,7 +7006,7 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
  * operations here to prevent schedule() from being called twice (once via
  * spin_unlock(), once by hand).
  */
-int __cond_resched_lock(spinlock_t *lock)
+long __cond_resched_lock(spinlock_t *lock)
 {
int resched = should_resched(PREEMPT_LOCK_OFFSET);
int ret = 0;
@@ -7026,7 +7026,7 @@ int __cond_resched_lock(spinlock_t *lock)
 }
 EXPORT_SYMBOL(__cond_resched_lock);
 
-int __cond_resched_rwlock_read(rwlock_t *lock)
+long __cond_resched_rwlock_read(rwlock_t *lock)
 {
int resched = should_resched(PREEMPT_LOCK_OFFSET);
int ret = 0;
@@ -7046,7 +7046,7 @@ int __cond_resched_rwlock_read(rwlock_t *lock)
 }
 EXPORT_SYMBOL(__cond_resched_rwlock_read);
 
-int __cond_resched_rwlock_write(rwlock_t *lock)
+long __cond_resched_rwlock_write(rwlock_t *lock)
 {
int resched = should_resched(PREEMPT_LOCK_OFFSET);
int ret = 0;


Re: [RFC PATCH 5/8] hugetlb: change free_pool_huge_page to remove_pool_huge_page

2021-03-22 Thread Mike Kravetz
On 3/22/21 7:31 AM, Michal Hocko wrote:
> On Fri 19-03-21 15:42:06, Mike Kravetz wrote:
> [...]
>> @@ -2090,9 +2084,15 @@ static void return_unused_surplus_pages(struct hstate 
>> *h,
>>  while (nr_pages--) {
>>  h->resv_huge_pages--;
>>  unused_resv_pages--;
>> -if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1))
>> +page = remove_pool_huge_page(h, &node_states[N_MEMORY], 1);
>> +if (!page)
>>  goto out;
>> -cond_resched_lock(&hugetlb_lock);
>> +
>> +/* Drop lock and free page to buddy as it could sleep */
>> +spin_unlock(&hugetlb_lock);
>> +update_and_free_page(h, page);
>> +cond_resched();
>> +spin_lock(&hugetlb_lock);
>>  }
>>  
>>  out:
> 
> This is likely a matter of taste but the repeated pattern of unlock,
> update_and_free_page, cond_resched and lock seems rather clumsy.
> Would it be slightly better/nicer to remove_pool_huge_page into a
> list_head under a single lock invocation and then free up the whole lot
> after the lock is dropped?

Yes, we can certainly do that.
One downside I see is that the list can contain a bunch of pages not
accounted for in hugetlb and not free in buddy (or cma).  Ideally, we
would want to keep those in sync if possible.  Also, the commit that
added the cond_resched talked about freeing up 12 TB worth of huge pages
and it holding the lock for 150 seconds.  The new code is not holding
the lock while calling free to buddy, but I wonder how long it would
take to remove 12 TB worth of huge pages and add them to a separate list?

I do not know how realistic the 12 TB number is.  But, I certainly am
aware of pools that are a few TB in size.
-- 
Mike Kravetz


Re: [PATCH net] net: dsa: don't assign an error value to tag_ops

2021-03-22 Thread Florian Fainelli



On 3/22/2021 1:26 PM, George McCollister wrote:
> Use a temporary variable to hold the return value from
> dsa_tag_driver_get() instead of assigning it to dst->tag_ops. Leaving
> an error value in dst->tag_ops can result in deferencing an invalid
> pointer when a deferred switch configuration happens later.
> 
> Fixes: 357f203bb3b5 ("net: dsa: keep a copy of the tagging protocol in the 
> DSA switch tree")
> 
> Signed-off-by: George McCollister 

Reviewed-by: Florian Fainelli 
-- 
Florian


RE: [PATCH 2/5] cifsd: add server-side procedures for SMB3

2021-03-22 Thread Namjae Jeon


> On Mon, Mar 22, 2021 at 09:47:13AM +0300, Dan Carpenter wrote:
> > On Mon, Mar 22, 2021 at 02:13:41PM +0900, Namjae Jeon wrote:
> > > +static unsigned char
> > > +asn1_octet_decode(struct asn1_ctx *ctx, unsigned char *ch) {
> > > + if (ctx->pointer >= ctx->end) {
> > > + ctx->error = ASN1_ERR_DEC_EMPTY;
> > > + return 0;
> > > + }
> > > + *ch = *(ctx->pointer)++;
> > > + return 1;
> > > +}
> >
> >
> > Make this bool.
> >
> 
> More importantly don't add another ANS1 parser, but use the generic one in 
> lib/asn1_decoder.c instead.
> CIFS should also really use it.
Okay, Let me check it, cifs also...
Thanks!



Re: [PATCH v3] usb: dwc3: gadget: Prevent EP queuing while stopping transfers

2021-03-22 Thread Wesley Cheng
Hi Andy,

On 3/22/2021 2:14 PM, Andy Shevchenko wrote:
> On Mon, Mar 22, 2021 at 10:06 PM Wesley Cheng  wrote:
>>
>> Hi Andy,
>>
>> On 3/22/2021 12:34 PM, Andy Shevchenko wrote:
>>> On Mon, Mar 22, 2021 at 8:49 PM Wesley Cheng  wrote:

 Hi Andy,

 On 3/22/2021 5:48 AM, Andy Shevchenko wrote:
> On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng  
> wrote:
>>
>> In the situations where the DWC3 gadget stops active transfers, once
>> calling the dwc3_gadget_giveback(), there is a chance where a function
>> driver can queue a new USB request in between the time where the dwc3
>> lock has been released and re-aquired.  This occurs after we've already
>> issued an ENDXFER command.  When the stop active transfers continues
>> to remove USB requests from all dep lists, the newly added request will
>> also be removed, while controller still has an active TRB for it.
>> This can lead to the controller accessing an unmapped memory address.
>>
>> Fix this by ensuring parameters to prevent EP queuing are set before
>> calling the stop active transfers API.
>
>
> commit f09ddcfcb8c569675066337adac2ac205113471f
> Author: Wesley Cheng 
> Date:   Thu Mar 11 15:59:02 2021 -0800
>
>usb: dwc3: gadget: Prevent EP queuing while stopping transfers
>
> effectively broke my gadget setup.
>
> The output of the kernel (followed by non responsive state of USB 
> controller):
>
> [  195.228586] using random self ethernet address
> [  195.233104] using random host ethernet address
> [  195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2
> [  195.249732] usb0: MAC aa:bb:cc:dd:ee:f1
> # [  195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready
> [  195.780585] [ cut here ]
> [  195.785217] dwc3 dwc3.0.auto: No resource for ep2in
> [  195.790162] WARNING: CPU: 0 PID: 217 at
> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670
> [  195.799760] Modules linked in: usb_f_eem u_ether libcomposite
> brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps
> s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt
> snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec
> spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci
> extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m
> mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel
> [  195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ 
> #60
> [  195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
> BIOS 542 2015.01.21:18.19.48
> [  195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670
> [  195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee
> f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9
> ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24
> 20 e9 80 fc ff ff 41 83 fe 92
> [  195.875381] RSP: :a53c00373ba8 EFLAGS: 00010086
> [  195.880617] RAX:  RBX: 1387 RCX: 
> dfff
> [  195.887755] RDX: dfff RSI: ffea RDI: 
> 
> [  195.894893] RBP: 9ce8c8f2b028 R08: a0732288 R09: 
> 9ffb
> [  195.902034] R10: e000 R11: 3fff R12: 
> 00041006
> [  195.909170] R13: a53c00373c24 R14: 9ce8c11dadb0 R15: 
> 9ce8c2861700
> [  195.916310] FS:  () GS:9ce8fe20()
> knlGS:
> [  195.924409] CS:  0010 DS:  ES:  CR0: 80050033
> [  195.930161] CR2: f7f694a0 CR3: 38e0c000 CR4: 
> 001006f0
> [  195.937300] Call Trace:
> [  195.939755]  __dwc3_gadget_ep_enable+0x2d4/0x4e0
> [  195.944393]  ? dwc3_remove_requests.constprop.0+0x86/0x170

 Odd that this change would affect the USB enablment path, as they were
 focused on the pullup disable path.  Would you happen to have any
 downstream changes on top of v5.12-rc4 we could review to see if they
 are still required? (ie where is the dwc3_remove_requests() coming from
 during ep enable)
>>>
>>> You may check my branch [1] on GH. Basically you may be interested in
>>> the commit:
>>> 0f86df1294ee7523060cc16eafaf4898c693eab0 REVERTME: usb: dwc3: gadget:
>>> skip endpoints ep[18]{in,out}
>>> Otherwise it's a clean v5.12-rc4 with a revert and another USB PHY
>>> suspend fix (which also shouldn't affect this).
>>
>> Can you link your GH reference?
> 
> Oops, sorry.
> Here we are:
> 
> [1]: https://github.com/andy-shev/linux/tree/eds-acpi
> 
Thanks, I took a look and even tried it on my device running 5.12-rc4,
but wasn't able to see the same problem.  Could you help collect the
ftrace after enabling the tracing KCONFIG and running the below sequence?

1.  Mount debugfs
2.  Set up tracing instance

RE: [PATCH 2/5] cifsd: add server-side procedures for SMB3

2021-03-22 Thread Namjae Jeon


> On Mon, Mar 22, 2021 at 02:13:41PM +0900, Namjae Jeon wrote:
> > +static unsigned char
> > +asn1_octet_decode(struct asn1_ctx *ctx, unsigned char *ch) {
> > +   if (ctx->pointer >= ctx->end) {
> > +   ctx->error = ASN1_ERR_DEC_EMPTY;
> > +   return 0;
> > +   }
> > +   *ch = *(ctx->pointer)++;
> > +   return 1;
> > +}
> 
> 
> Make this bool.
Okay.
> 
> > +
> > +static unsigned char
> > +asn1_tag_decode(struct asn1_ctx *ctx, unsigned int *tag) {
> > +   unsigned char ch;
> > +
> > +   *tag = 0;
> > +
> > +   do {
> > +   if (!asn1_octet_decode(ctx, &ch))
> > +   return 0;
> > +   *tag <<= 7;
> > +   *tag |= ch & 0x7F;
> > +   } while ((ch & 0x80) == 0x80);
> > +   return 1;
> > +}
> 
> Bool.
Okay.
> 
> > +
> > +static unsigned char
> > +asn1_id_decode(struct asn1_ctx *ctx,
> > +  unsigned int *cls, unsigned int *con, unsigned int *tag) {
> > +   unsigned char ch;
> > +
> > +   if (!asn1_octet_decode(ctx, &ch))
> > +   return 0;
> > +
> > +   *cls = (ch & 0xC0) >> 6;
> > +   *con = (ch & 0x20) >> 5;
> > +   *tag = (ch & 0x1F);
> > +
> > +   if (*tag == 0x1F) {
> > +   if (!asn1_tag_decode(ctx, tag))
> > +   return 0;
> > +   }
> > +   return 1;
> > +}
> 
> 
> Same.
Okay.
> 
> > +
> > +static unsigned char
> > +asn1_length_decode(struct asn1_ctx *ctx, unsigned int *def, unsigned
> > +int *len) {
> > +   unsigned char ch, cnt;
> > +
> > +   if (!asn1_octet_decode(ctx, &ch))
> > +   return 0;
> > +
> > +   if (ch == 0x80)
> > +   *def = 0;
> > +   else {
> > +   *def = 1;
> > +
> > +   if (ch < 0x80)
> > +   *len = ch;
> > +   else {
> > +   cnt = (unsigned char) (ch & 0x7F);
> > +   *len = 0;
> > +
> > +   while (cnt > 0) {
> > +   if (!asn1_octet_decode(ctx, &ch))
> > +   return 0;
> > +   *len <<= 8;
> > +   *len |= ch;
> > +   cnt--;
> > +   }
> > +   }
> > +   }
> > +
> > +   /* don't trust len bigger than ctx buffer */
> > +   if (*len > ctx->end - ctx->pointer)
> > +   return 0;
> > +
> > +   return 1;
> > +}
> 
> 
> Same etc for all.
Okay.
> 
> > +
> > +static unsigned char
> > +asn1_header_decode(struct asn1_ctx *ctx,
> > +  unsigned char **eoc,
> > +  unsigned int *cls, unsigned int *con, unsigned int *tag) {
> > +   unsigned int def = 0;
> > +   unsigned int len = 0;
> > +
> > +   if (!asn1_id_decode(ctx, cls, con, tag))
> > +   return 0;
> > +
> > +   if (!asn1_length_decode(ctx, &def, &len))
> > +   return 0;
> > +
> > +   /* primitive shall be definite, indefinite shall be constructed */
> > +   if (*con == ASN1_PRI && !def)
> > +   return 0;
> > +
> > +   if (def)
> > +   *eoc = ctx->pointer + len;
> > +   else
> > +   *eoc = NULL;
> > +   return 1;
> > +}
> > +
> > +static unsigned char
> > +asn1_eoc_decode(struct asn1_ctx *ctx, unsigned char *eoc) {
> > +   unsigned char ch;
> > +
> > +   if (!eoc) {
> > +   if (!asn1_octet_decode(ctx, &ch))
> > +   return 0;
> > +
> > +   if (ch != 0x00) {
> > +   ctx->error = ASN1_ERR_DEC_EOC_MISMATCH;
> > +   return 0;
> > +   }
> > +
> > +   if (!asn1_octet_decode(ctx, &ch))
> > +   return 0;
> > +
> > +   if (ch != 0x00) {
> > +   ctx->error = ASN1_ERR_DEC_EOC_MISMATCH;
> > +   return 0;
> > +   }
> > +   } else {
> > +   if (ctx->pointer != eoc) {
> > +   ctx->error = ASN1_ERR_DEC_LENGTH_MISMATCH;
> > +   return 0;
> > +   }
> > +   }
> > +   return 1;
> > +}
> > +
> > +static unsigned char
> > +asn1_subid_decode(struct asn1_ctx *ctx, unsigned long *subid) {
> > +   unsigned char ch;
> > +
> > +   *subid = 0;
> > +
> > +   do {
> > +   if (!asn1_octet_decode(ctx, &ch))
> > +   return 0;
> > +
> > +   *subid <<= 7;
> > +   *subid |= ch & 0x7F;
> > +   } while ((ch & 0x80) == 0x80);
> > +   return 1;
> > +}
> > +
> > +static int
> > +asn1_oid_decode(struct asn1_ctx *ctx,
> > +   unsigned char *eoc, unsigned long **oid, unsigned int *len) {
> > +   unsigned long subid;
> > +   unsigned int size;
> > +   unsigned long *optr;
> > +
> > +   size = eoc - ctx->pointer + 1;
> > +
> > +   /* first subid actually encodes first two subids */
> > +   if (size < 2 || size > UINT_MAX/sizeof(unsigned long))
> > +   return 0;
> > +
> > +   *oid = kmalloc(size * sizeof(unsigned long), GFP_KERNEL);
> > +   if (!*oid)
> > +   return 0;
> > +
> > +   optr = *oid;
> > +
> > +   if (!asn1_subid_decode(ctx, &subid)) {
> > +   kfree(*oid);
> > +   *oid = NULL;
> > +   return 0;
> > +  

Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

2021-03-22 Thread Kai Huang
On Mon, 22 Mar 2021 23:37:26 +0100 Borislav Petkov wrote:
> On Tue, Mar 23, 2021 at 11:06:43AM +1300, Kai Huang wrote:
> > This path is called by host SGX driver only, so yes this leaking is done by
> > host enclaves only.
> 
> Yes, so I was told.
> 
> > This patch is purpose is to break EREMOVE out of sgx_free_epc_page() so 
> > virtual
> > EPC code can use sgx_free_epc_page(), and handle EREMOVE logic differently.
> > This patch doesn't have intention to bring functional change.  I changed the
> > error msg because Dave said it worth to give a message saying EPC page is
> > leaked, and I thought this minor change won't break anything.
> 
> I read that already - you don't have to repeat it.
> 
> > btw, currently virtual EPC code (patch 5) handles in similar way: There's 
> > one
> > EREMOVE error is expected and virtual EPC code can handle, but for other
> > errors, it means kernel bug, and virtual EPC code gives a WARN(), and that 
> > EPC
> > page is leaked too:
> > 
> > +   WARN_ONCE(ret != SGX_CHILD_PRESENT,
> > + "EREMOVE (EPC page 0x%lx): unexpected error: %d\n",
> > + sgx_get_epc_phys_addr(epc_page), ret);
> > +   return ret;
> > 
> > So to me they are just WARN() to catch kernel bug.
> 
> You don't care about users, do you? Because when that error happens,
> they won't come crying to you to fix it.
> 
> Lemme save you some trouble: we don't take half-baked code into the
> kernel until stuff has been discussed and analyzed properly. So instead
> of trying to downplay this, try answering my questions.
> 
> Here's another one: when does EREMOVE fail?
> 
> /me goes and looks it up
> 
> "The instruction fails if the operand is not properly aligned or does
> not refer to an EPC page or the page is in use by another thread, or
> other threads are running in the enclave to which the page belongs. In
> addition the instruction fails if the operand refers to an SECS with
> associations."
> 
> And I guess those conditions will become more in the future.
> 
> Now, let's play. I'm the cloud admin and you're cloud OS customer
> support. I say:
> 
> "I got this scary error message while running enclaves on my server
> 
> "EREMOVE returned ... .  EPC page leaked.  Reboot required to retrieve leaked 
> pages."
> 
> but I cannot reboot that machine because there are guests running on it
> and I'm getting paid for those guests and I might get sued if I do?"
> 
> Your turn, go wild.

I suppose admin can migrate those VMs, and then engineers can analyse the root
cause of such failure, and then fix it.


Re: [PATCH 02/11] x86: tboot: avoid Wstringop-overread-warning

2021-03-22 Thread Ingo Molnar


* Martin Sebor  wrote:

> > I.e. the real workaround might be to turn off the 
> > -Wstringop-overread-warning,
> > until GCC-11 gets fixed?
> 
> In GCC 10 -Wstringop-overread is a subset of -Wstringop-overflow.
> GCC 11 breaks it out as a separate warning to make it easier to
> control.  Both warnings have caught some real bugs but they both
> have a nonzero rate of false positives.  Other than bug reports
> we don't have enough data to say what their S/N ratio might be
> but my sense is that it's fairly high in general.
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=wstringop-overread
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=wstringop-overflow
> 
> In GCC 11, all access warnings expect objects to be either declared
> or allocated.  Pointers with constant values are taken to point to
> nothing valid (as Arnd mentioned above, this is to detect invalid
> accesses to members of structs at address zero).
> 
> One possible solution to the known address problem is to extend GCC
> attributes address and io that pin an object to a hardwired address
> to all targets (at the moment they're supported on just one or two
> targets).  I'm not sure this can still happen before GCC 11 releases
> sometime in April or May.
> 
> Until then, another workaround is to convert the fixed address to
> a volatile pointer before using it for the access, along the lines
> below.  It should have only a negligible effect on efficiency.

Thank you for the detailed answer!

I think I'll go with Arnd's original patch - which makes the code a 
slightly bit cleaner by separating out the check_tboot_version() check 
into a standalone function.

The only ugly aspect is the global nature of the 'tboot' pointer - but 
that's a self-inflicted wound.

I'd also guess that the S/N ratio somewhat unfairly penalizes this 
warning right now, because the kernel had a decade of growing real 
fixes via other efforts such as static and dynamic instrumentation as 
well.

So the probability of false positive remaining is in fact higher, and 
going forward we should see a better S/N ratio of this warning. Most 
of which will never be seen by upstream maintainers, as the mishaps 
will stay at the individual developer level. :-)

Thanks,

Ingo


Re: [PATCH] arm64: dts: ls1028a: fix optee node

2021-03-22 Thread Li Yang
On Thu, Mar 18, 2021 at 3:36 AM Michael Walle  wrote:
>
> Don't enable the optee node in the SoC include. It is an optional
> component and actually, if enabled, breaks boards which doesn't have it.

Hi Shawn,

Shall we make this a general rule?  I see quite a few SoC dtsi files
are having the optee node enabled by default.

Regards,
Leo

>
> This reverts commit 48787485f8de ("arm64: dts: ls1028a: enable optee
> node") and enables the node per board, assuming the intend of the
> original author was to enable OPTEE for the LS1028A-RDB and the
> LS1028A-QDS.
>
> Fixes: 48787485f8de ("arm64: dts: ls1028a: enable optee node")
> Reported-by: Guillaume Tucker 
> Reported-by: "kernelci.org bot" 
> Tested-by: Michael Walle 
> Signed-off-by: Michael Walle 
> ---
>  arch/arm64/boot/dts/freescale/fsl-ls1028a-qds.dts | 4 
>  arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts | 4 
>  arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi| 3 ++-
>  3 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-qds.dts 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a-qds.dts
> index fbcba9cb8503..060d3c79244d 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-qds.dts
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-qds.dts
> @@ -327,6 +327,10 @@
> status = "okay";
>  };
>
> +&optee {
> +   status = "okay";
> +};
> +
>  &sai1 {
> status = "okay";
>  };
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
> index 41ae6e7675ba..1bdf0104d492 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
> @@ -274,6 +274,10 @@
> status = "okay";
>  };
>
> +&optee {
> +   status = "okay";
> +};
> +
>  &sai4 {
> status = "okay";
>  };
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> index 50d277eb2a54..e2007ebacd69 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> @@ -92,9 +92,10 @@
> };
>
> firmware {
> -   optee {
> +   optee: optee {
> compatible = "linaro,optee-tz";
> method = "smc";
> +   status = "disabled";
> };
> };
>
> --
> 2.20.1
>


Re: [RFC PATCH 2/8] hugetlb: recompute min_count when dropping hugetlb_lock

2021-03-22 Thread Mike Kravetz
On 3/22/21 7:07 AM, Michal Hocko wrote:
> On Fri 19-03-21 15:42:03, Mike Kravetz wrote:
>> The routine set_max_huge_pages reduces the number of hugetlb_pages,
>> by calling free_pool_huge_page in a loop.  It does this as long as
>> persistent_huge_pages() is above a calculated min_count value.
>> However, this loop can conditionally drop hugetlb_lock and in some
>> circumstances free_pool_huge_page can drop hugetlb_lock.  If the
>> lock is dropped, counters could change the calculated min_count
>> value may no longer be valid.
> 
> OK, this one looks like a real bug fix introduced by 55f67141a8927.
> Unless I am missing something we could release pages which are reserved
> already.
>  
>> The routine try_to_free_low has the same issue.
>>
>> Recalculate min_count in each loop iteration as hugetlb_lock may have
>> been dropped.
>>
>> Signed-off-by: Mike Kravetz 
>> ---
>>  mm/hugetlb.c | 25 +
>>  1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index d5be25f910e8..c537274c2a38 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -2521,11 +2521,20 @@ static void __init report_hugepages(void)
>>  }
>>  }
>>  
>> +static inline unsigned long min_hp_count(struct hstate *h, unsigned long 
>> count)
>> +{
>> +unsigned long min_count;
>> +
>> +min_count = h->resv_huge_pages + h->nr_huge_pages - h->free_huge_pages;
>> +return max(count, min_count);
> 
> Just out of curiousity, is compiler allowed to inline this piece of code
> and then cache the value? In other words do we need to make these
> READ_ONCE or otherwise enforce the no-caching behavior?

I honestly do not know if the compiler is allowed to do that.  The
assembly code generated by my compiler does not cache the value, but
that does not guarantee anything.  I can add READ_ONCE to make the
function look something like:

static inline unsigned long min_hp_count(struct hstate *h, unsigned long count)
{
unsigned long min_count;

min_count = READ_ONCE(h->resv_huge_pages) + READ_ONCE(h->nr_huge_pages)
- READ_ONCE(h->free_huge_pages);
return max(count, min_count);
}

-- 
Mike Kravetz


[tip: timers/core] timekeeping, clocksource: Fix various typos in comments

2021-03-22 Thread tip-bot2 for Ingo Molnar
The following commit has been merged into the timers/core branch of tip:

Commit-ID: 4bf07f6562a01a488877e05267808da7147f44a5
Gitweb:
https://git.kernel.org/tip/4bf07f6562a01a488877e05267808da7147f44a5
Author:Ingo Molnar 
AuthorDate:Mon, 22 Mar 2021 22:39:03 +01:00
Committer: Ingo Molnar 
CommitterDate: Mon, 22 Mar 2021 23:06:48 +01:00

timekeeping, clocksource: Fix various typos in comments

Fix ~56 single-word typos in timekeeping & clocksource code comments.

Signed-off-by: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: John Stultz 
Cc: Stephen Boyd 
Cc: Daniel Lezcano 
Cc: linux-kernel@vger.kernel.org
---
 drivers/clocksource/clksrc-dbx500-prcmu.c   |  8 ++---
 drivers/clocksource/dw_apb_timer_of.c   |  2 +-
 drivers/clocksource/hyperv_timer.c  |  2 +-
 drivers/clocksource/timer-atmel-tcb.c   |  4 +--
 drivers/clocksource/timer-fsl-ftm.c |  2 +-
 drivers/clocksource/timer-microchip-pit64b.c|  2 +-
 drivers/clocksource/timer-of.c  |  4 +--
 drivers/clocksource/timer-ti-dm-systimer.c  |  2 +-
 drivers/clocksource/timer-vf-pit.c  |  2 +-
 include/linux/clocksource.h |  2 +-
 include/linux/timex.h   |  2 +-
 kernel/time/alarmtimer.c|  6 ++--
 kernel/time/clocksource.c   |  4 +--
 kernel/time/hrtimer.c   | 18 ++--
 kernel/time/jiffies.c   |  2 +-
 kernel/time/ntp.c   |  2 +-
 kernel/time/posix-cpu-timers.c  |  6 ++--
 kernel/time/tick-broadcast-hrtimer.c|  2 +-
 kernel/time/tick-broadcast.c|  4 +--
 kernel/time/tick-oneshot.c  |  2 +-
 kernel/time/tick-sched.c|  2 +-
 kernel/time/tick-sched.h|  2 +-
 kernel/time/time.c  |  2 +-
 kernel/time/timekeeping.c   | 10 +++
 kernel/time/timer.c |  4 +--
 kernel/time/vsyscall.c  |  2 +-
 tools/testing/selftests/timers/clocksource-switch.c |  4 +--
 tools/testing/selftests/timers/leap-a-day.c |  2 +-
 tools/testing/selftests/timers/leapcrash.c  |  4 +--
 tools/testing/selftests/timers/threadtest.c |  2 +-
 30 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/drivers/clocksource/clksrc-dbx500-prcmu.c 
b/drivers/clocksource/clksrc-dbx500-prcmu.c
index 996900d..2fc93e4 100644
--- a/drivers/clocksource/clksrc-dbx500-prcmu.c
+++ b/drivers/clocksource/clksrc-dbx500-prcmu.c
@@ -18,7 +18,7 @@
 
 #define RATE_32K   32768
 
-#define TIMER_MODE_CONTINOUS   0x1
+#define TIMER_MODE_CONTINUOUS  0x1
 #define TIMER_DOWNCOUNT_VAL0x
 
 #define PRCMU_TIMER_REF0
@@ -55,13 +55,13 @@ static int __init clksrc_dbx500_prcmu_init(struct 
device_node *node)
 
/*
 * The A9 sub system expects the timer to be configured as
-* a continous looping timer.
+* a continuous looping timer.
 * The PRCMU should configure it but if it for some reason
 * don't we do it here.
 */
if (readl(clksrc_dbx500_timer_base + PRCMU_TIMER_MODE) !=
-   TIMER_MODE_CONTINOUS) {
-   writel(TIMER_MODE_CONTINOUS,
+   TIMER_MODE_CONTINUOUS) {
+   writel(TIMER_MODE_CONTINUOUS,
   clksrc_dbx500_timer_base + PRCMU_TIMER_MODE);
writel(TIMER_DOWNCOUNT_VAL,
   clksrc_dbx500_timer_base + PRCMU_TIMER_REF);
diff --git a/drivers/clocksource/dw_apb_timer_of.c 
b/drivers/clocksource/dw_apb_timer_of.c
index 42e7e43..2b2c3b5 100644
--- a/drivers/clocksource/dw_apb_timer_of.c
+++ b/drivers/clocksource/dw_apb_timer_of.c
@@ -38,7 +38,7 @@ static int __init timer_get_base_and_rate(struct device_node 
*np,
}
 
/*
-* Not all implementations use a periphal clock, so don't panic
+* Not all implementations use a peripheral clock, so don't panic
 * if it's not present
 */
pclk = of_clk_get_by_name(np, "pclk");
diff --git a/drivers/clocksource/hyperv_timer.c 
b/drivers/clocksource/hyperv_timer.c
index 269a691..a02b0a2 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -457,7 +457,7 @@ void __init hv_init_clocksource(void)
 {
/*
 * Try to set up the TSC page clocksource. If it succeeds, we're
-* done. Otherwise, set up the MSR clocksoruce.  At least one of
+* done. Otherwise, set up the MSR clocksource.  At least one of
 * these will always be available except on very old versions of
 * Hyper-V on x86.  In that case we won't have a Hyper-V
 * clocksource, but Linux will still run with a

RE: [PATCH v1 1/1] i2c: drivers: Use generic definitions for bus frequencies (part 2)

2021-03-22 Thread Khalil Blaiech
Thanks.
Acked-by: Khalil Blaiech 

> -Original Message-
> From: Andy Shevchenko 
> Sent: Monday, March 22, 2021 12:54 PM
> To: Wolfram Sang ; Khalil Blaiech ;
> Loic Poulain ; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-arm-...@vger.kernel.org
> Cc: Robert Foss ; Andy Shevchenko
> ; Wolfram Sang  dreams.de>
> Subject: [PATCH v1 1/1] i2c: drivers: Use generic definitions for bus
> frequencies (part 2)
> 
> Since we have generic definitions for bus frequencies, let's use them.
> 
> Cc: Wolfram Sang 
> Signed-off-by: Andy Shevchenko 
> ---
>  drivers/i2c/busses/i2c-mlxbf.c| 14 --
>  drivers/i2c/busses/i2c-qcom-cci.c |  4 ++--
>  2 files changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-mlxbf.c b/drivers/i2c/busses/i2c-mlxbf.c
> index 2fb0532d8a16..80ab831df349 100644
> --- a/drivers/i2c/busses/i2c-mlxbf.c
> +++ b/drivers/i2c/busses/i2c-mlxbf.c
> @@ -172,12 +172,6 @@
>  #define MLXBF_I2C_SMBUS_THIGH_MAX_TBUF0x14
>  #define MLXBF_I2C_SMBUS_SCL_LOW_TIMEOUT   0x18
> 
> -enum {
> - MLXBF_I2C_TIMING_100KHZ = 10,
> - MLXBF_I2C_TIMING_400KHZ = 40,
> - MLXBF_I2C_TIMING_1000KHZ = 100,
> -};
> -
>  /*
>   * Defines SMBus operating frequency and core clock frequency.
>   * According to ADB files, default values are compliant to 100KHz SMBus
> @@ -1202,7 +1196,7 @@ static int mlxbf_i2c_init_timings(struct
> platform_device *pdev,
> 
>   ret = device_property_read_u32(dev, "clock-frequency",
> &config_khz);
>   if (ret < 0)
> - config_khz = MLXBF_I2C_TIMING_100KHZ;
> + config_khz = I2C_MAX_STANDARD_MODE_FREQ;
> 
>   switch (config_khz) {
>   default:
> @@ -1210,15 +1204,15 @@ static int mlxbf_i2c_init_timings(struct
> platform_device *pdev,
>   pr_warn("Illegal value %d: defaulting to 100 KHz\n",
>   config_khz);
>   fallthrough;
> - case MLXBF_I2C_TIMING_100KHZ:
> + case I2C_MAX_STANDARD_MODE_FREQ:
>   config_idx = MLXBF_I2C_TIMING_CONFIG_100KHZ;
>   break;
> 
> - case MLXBF_I2C_TIMING_400KHZ:
> + case I2C_MAX_FAST_MODE_FREQ:
>   config_idx = MLXBF_I2C_TIMING_CONFIG_400KHZ;
>   break;
> 
> - case MLXBF_I2C_TIMING_1000KHZ:
> + case I2C_MAX_FAST_MODE_PLUS_FREQ:
>   config_idx = MLXBF_I2C_TIMING_CONFIG_1000KHZ;
>   break;
>   }
> diff --git a/drivers/i2c/busses/i2c-qcom-cci.c b/drivers/i2c/busses/i2c-qcom-
> cci.c
> index 1c259b5188de..c63d5545fc2a 100644
> --- a/drivers/i2c/busses/i2c-qcom-cci.c
> +++ b/drivers/i2c/busses/i2c-qcom-cci.c
> @@ -569,9 +569,9 @@ static int cci_probe(struct platform_device *pdev)
>   cci->master[idx].mode = I2C_MODE_STANDARD;
>   ret = of_property_read_u32(child, "clock-frequency", &val);
>   if (!ret) {
> - if (val == 40)
> + if (val == I2C_MAX_FAST_MODE_FREQ)
>   cci->master[idx].mode = I2C_MODE_FAST;
> - else if (val == 100)
> + else if (val == I2C_MAX_FAST_MODE_PLUS_FREQ)
>   cci->master[idx].mode =
> I2C_MODE_FAST_PLUS;
>   }
> 
> --
> 2.30.2



Re: [PATCH 2/2] mm/vmalloc: Use kvmalloc to allocate the table of pages

2021-03-22 Thread Matthew Wilcox
On Mon, Mar 22, 2021 at 11:36:19PM +0100, Uladzislau Rezki wrote:
> On Mon, Mar 22, 2021 at 07:38:20PM +, Matthew Wilcox (Oracle) wrote:
> > If we're trying to allocate 4MB of memory, the table will be 8KiB in size
> > (1024 pointers * 8 bytes per pointer), which can usually be satisfied
> > by a kmalloc (which is significantly faster).  Instead of changing this
> > open-coded implementation, just use kvmalloc().
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) 
> > ---
> >  mm/vmalloc.c | 7 +--
> >  1 file changed, 1 insertion(+), 6 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 96444d64129a..32b640a84250 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2802,13 +2802,8 @@ static void *__vmalloc_area_node(struct vm_struct 
> > *area, gfp_t gfp_mask,
> > gfp_mask |= __GFP_HIGHMEM;
> >  
> > /* Please note that the recursion is strictly bounded. */
> > -   if (array_size > PAGE_SIZE) {
> > -   pages = __vmalloc_node(array_size, 1, nested_gfp, node,
> > +   pages = kvmalloc_node_caller(array_size, nested_gfp, node,
> > area->caller);
> > -   } else {
> > -   pages = kmalloc_node(array_size, nested_gfp, node);
> > -   }
> > -
> > if (!pages) {
> > free_vm_area(area);
> > return NULL;
> > -- 
> > 2.30.2
> Makes sense to me. Though i expected a bigger difference:
> 
> # patch
> single CPU, 4MB allocation, loops: 100 avg: 85293854 usec
> 
> # default
> single CPU, 4MB allocation, loops: 100 avg: 89275857 usec

Well, 4.5% isn't something to leave on the table ... but yeah, I was
expecting more in the 10-20% range.  It may be more significant if
there's contention on the spinlocks (like if this crazy ksmbd is calling
vmalloc(4MB) on multiple nodes simultaneously).

I suspect the vast majority of the time is spent calling alloc_pages_node()
1024 times.  Have you looked at Mel's patch to do ... well, exactly what
vmalloc() wants?

https://lore.kernel.org/linux-mm/20210322091845.16437-1-mgor...@techsingularity.net/

> One question. Should we care much about fragmentation? I mean
> with the patch, allocations > 2MB will do request to SLAB bigger
> then PAGE_SIZE.

We're pretty good about allocating memory in larger chunks these days.
Looking at my laptop's slabinfo,
kmalloc-8k   219232   819248 : tunables000 : sla
bdata 58 58  0

That's using 8 pages per slab, so that's order-3 allocations.  There's a
few more of those:

$ sudo grep '8 :' /proc/slabinfo |wc
 42 6724508

so I have confidence that kvmalloc() will manage to use kmalloc up to 16MB
vmalloc allocations, and after that it'll tend to fall back to vmalloc.



[PATCH] rcu: Fix various typos in comments

2021-03-22 Thread Ingo Molnar


Hi Paul,

Was working on automation to make it a bit more straightforward to fix 
typos within comments (which we tend to reintroduce during 
development), and here are the ones it found in the RCU code.

Thanks,

Ingo

=>
From: Ingo Molnar 
Date: Mon, 22 Mar 2021 23:57:26 +0100
Subject: [PATCH] rcu: Fix various typos in comments

Fix ~12 single-word typos in RCU code comments.

Signed-off-by: Ingo Molnar 
Cc: Paul E. McKenney 
Cc: linux-kernel@vger.kernel.org
---
 kernel/rcu/srcutree.c   | 4 ++--
 kernel/rcu/sync.c   | 2 +-
 kernel/rcu/tasks.h  | 8 
 kernel/rcu/tree.c   | 4 ++--
 kernel/rcu/tree.h   | 2 +-
 kernel/rcu/tree_plugin.h| 2 +-
 tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/locks.h | 2 +-
 7 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index e26547b34ad3..036ff5499ad5 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -777,9 +777,9 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
spin_unlock_irqrestore_rcu_node(sdp, flags);
 
/*
-* No local callbacks, so probabalistically probe global state.
+* No local callbacks, so probabilistically probe global state.
 * Exact information would require acquiring locks, which would
-* kill scalability, hence the probabalistic nature of the probe.
+* kill scalability, hence the probabilistic nature of the probe.
 */
 
/* First, see if enough time has passed since the last GP. */
diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index d4558ab7a07d..3eeb871cf0de 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -94,7 +94,7 @@ static void rcu_sync_func(struct rcu_head *rhp)
rcu_sync_call(rsp);
} else {
/*
-* We're at least a GP after the last rcu_sync_exit(); eveybody
+* We're at least a GP after the last rcu_sync_exit(); everybody
 * will now have observed the write side critical section.
 * Let 'em rip!.
 */
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index af7c19439f4e..ac3c362e08a3 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -23,7 +23,7 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * Definition for a Tasks-RCU-like mechanism.
  * @cbs_head: Head of callback list.
  * @cbs_tail: Tail pointer for callback list.
- * @cbs_wq: Wait queue allowning new callback to get kthread's attention.
+ * @cbs_wq: Wait queue allowing new callback to get kthread's attention.
  * @cbs_lock: Lock protecting callback list.
  * @kthread_ptr: This flavor's grace-period/callback-invocation kthread.
  * @gp_func: This flavor's grace-period-wait function.
@@ -504,7 +504,7 @@ DEFINE_RCU_TASKS(rcu_tasks, rcu_tasks_wait_gp, 
call_rcu_tasks, "RCU Tasks");
  * or transition to usermode execution.  As such, there are no read-side
  * primitives analogous to rcu_read_lock() and rcu_read_unlock() because
  * this primitive is intended to determine that all tasks have passed
- * through a safe state, not so much for data-strcuture synchronization.
+ * through a safe state, not so much for data-structure synchronization.
  *
  * See the description of call_rcu() for more detailed information on
  * memory ordering guarantees.
@@ -637,7 +637,7 @@ DEFINE_RCU_TASKS(rcu_tasks_rude, rcu_tasks_rude_wait_gp, 
call_rcu_tasks_rude,
  * there are no read-side primitives analogous to rcu_read_lock() and
  * rcu_read_unlock() because this primitive is intended to determine
  * that all tasks have passed through a safe state, not so much for
- * data-strcuture synchronization.
+ * data-structure synchronization.
  *
  * See the description of call_rcu() for more detailed information on
  * memory ordering guarantees.
@@ -1127,7 +1127,7 @@ static void exit_tasks_rcu_finish_trace(struct 
task_struct *t)
  * there are no read-side primitives analogous to rcu_read_lock() and
  * rcu_read_unlock() because this primitive is intended to determine
  * that all tasks have passed through a safe state, not so much for
- * data-strcuture synchronization.
+ * data-structure synchronization.
  *
  * See the description of call_rcu() for more detailed information on
  * memory ordering guarantees.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index da6f5213fb74..ab5bd5b391e6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2490,7 +2490,7 @@ int rcutree_dead_cpu(unsigned int cpu)
 
 /*
  * Invoke any RCU callbacks that have made it to the end of their grace
- * period.  Thottle as specified by rdp->blimit.
+ * period.  Throttle as specified by rdp->blimit.
  */
 static void rcu_do_batch(struct rcu_data 

Re: [PATCH v4 18/19] coresight: sink: Add TRBE driver

2021-03-22 Thread Suzuki K Poulose

On 22/03/2021 21:24, Mathieu Poirier wrote:

On Fri, Mar 19, 2021 at 11:55:10AM +, Mike Leach wrote:

HI Suzuki,

On Fri, 19 Mar 2021 at 10:30, Suzuki K Poulose  wrote:


Hi Mike


On 8 Mar 2021, at 17:26, Mike Leach  wrote:

Hi Suzuki,

On Thu, 25 Feb 2021 at 19:36, Suzuki K Poulose  wrote:


From: Anshuman Khandual 

Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
accessible via the system registers. The TRBE supports different addressing
modes including CPU virtual address and buffer modes including the circular
buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
access to the trace buffer could be prohibited by a higher exception level
(EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
private interrupt (PPI) on address translation errors and when the buffer
is full. Overall implementation here is inspired from the Arm SPE driver.

Cc: Mathieu Poirier 
Cc: Mike Leach 
Cc: Suzuki K Poulose 
Signed-off-by: Anshuman Khandual 
Signed-off-by: Suzuki K Poulose 

+
+static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
+   struct perf_output_handle *handle,
+   void *config)
+{
+   struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+   struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
+   struct trbe_buf *buf = config;
+   enum trbe_fault_action act;
+   unsigned long size, offset;
+   unsigned long write, base, status;
+   unsigned long flags;
+
+   WARN_ON(buf->cpudata != cpudata);
+   WARN_ON(cpudata->cpu != smp_processor_id());
+   WARN_ON(cpudata->drvdata != drvdata);
+   if (cpudata->mode != CS_MODE_PERF)
+   return 0;
+
+   perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW);
+
+   /*
+* We are about to disable the TRBE. And this could in turn
+* fill up the buffer triggering, an IRQ. This could be consumed
+* by the PE asynchronously, causing a race here against
+* the IRQ handler in closing out the handle. So, let us
+* make sure the IRQ can't trigger while we are collecting
+* the buffer. We also make sure that a WRAP event is handled
+* accordingly.
+*/
+   local_irq_save(flags);
+
+   /*
+* If the TRBE was disabled due to lack of space in the AUX buffer or a
+* spurious fault, the driver leaves it disabled, truncating the buffer.
+* Since the etm_perf driver expects to close out the AUX buffer, the
+* driver skips it. Thus, just pass in 0 size here to indicate that the
+* buffer was truncated.
+*/
+   if (!is_trbe_enabled()) {
+   size = 0;
+   goto done;
+   }
+   /*
+* perf handle structure needs to be shared with the TRBE IRQ handler 
for
+* capturing trace data and restarting the handle. There is a 
probability
+* of an undefined reference based crash when etm event is being stopped
+* while a TRBE IRQ also getting processed. This happens due the release
+* of perf handle via perf_aux_output_end() in etm_event_stop(). 
Stopping
+* the TRBE here will ensure that no IRQ could be generated when the 
perf
+* handle gets freed in etm_event_stop().
+*/
+   trbe_drain_and_disable_local();
+   write = get_trbe_write_pointer();
+   base = get_trbe_base_pointer();
+
+   /* Check if there is a pending interrupt and handle it here */
+   status = read_sysreg_s(SYS_TRBSR_EL1);
+   if (is_trbe_irq(status)) {
+
+   /*
+* Now that we are handling the IRQ here, clear the IRQ
+* from the status, to let the irq handler know that it
+* is taken care of.
+*/
+   clr_trbe_irq();
+   isb();
+
+   act = trbe_get_fault_act(status);
+   /*
+* If this was not due to a WRAP event, we have some
+* errors and as such buffer is empty.
+*/
+   if (act != TRBE_FAULT_ACT_WRAP) {
+   size = 0;
+   goto done;
+   }


We are using TRBE FILL mode - which halts capture on a full buffer and
triggers the IRQ, without disabling the source first.
This means that the mode is inherently lossy (unless by some unlikely
co-incidence the last byte that caused the wrap was also the last byte
to be sent from an ETE that was in the process of being disabled.)
Therefore we must have a perf_aux_output_flag(handle,
PERF_AUX_FLAG_TRUNCATED) call in here to signal that some trace was
lost, for consistence of operation with ETR etc, and intelpt.



I agree that the there is a bit of loss here due to the FILL mode. But it is 
not

Re: [PATCH v2] audit: log nftables configuration change events once per table

2021-03-22 Thread Pablo Neira Ayuso
On Mon, Mar 22, 2021 at 04:49:04PM -0400, Richard Guy Briggs wrote:
> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index c1eb5cdb3033..42ba44890523 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
[...]
> @@ -8006,12 +7938,47 @@ static void nft_commit_notify(struct net *net, u32 
> portid)
>   WARN_ON_ONCE(!list_empty(&net->nft.notify_list));
>  }
>  
> +void nf_tables_commit_audit_collect(struct list_head *adl,
> + struct nft_trans *trans) {

nitpick: curly brace starts in the line.

> + struct nft_audit_data *adp;
> +
> + list_for_each_entry(adp, adl, list) {
> + if (adp->table == trans->ctx.table)
> + goto found;
> + }
> + adp = kzalloc(sizeof(*adp), GFP_KERNEL);

if (!adp)
...

> + adp->table = trans->ctx.table;
> + INIT_LIST_HEAD(&adp->list);
> + list_add(&adp->list, adl);
> +found:
> + adp->entries++;
> + if (!adp->op || adp->op > trans->msg_type)
> + adp->op = trans->msg_type;
> +}
> +
> +#define AUNFTABLENAMELEN (NFT_TABLE_MAXNAMELEN + 22)
> +
> +void nf_tables_commit_audit_log(struct list_head *adl, u32 generation) {
  ^
same thing here

> + struct nft_audit_data *adp, *adn;
> + char aubuf[AUNFTABLENAMELEN];
> +
> + list_for_each_entry_safe(adp, adn, adl, list) {
> + snprintf(aubuf, AUNFTABLENAMELEN, "%s:%u", adp->table->name,
> +  generation);
> + audit_log_nfcfg(aubuf, adp->table->family, adp->entries,
> + nft2audit_op[adp->op], GFP_KERNEL);
> + list_del(&adp->list);
> + kfree(adp);
> + }
> +}
> +
>  static int nf_tables_commit(struct net *net, struct sk_buff *skb)
>  {
>   struct nft_trans *trans, *next;
>   struct nft_trans_elem *te;
>   struct nft_chain *chain;
>   struct nft_table *table;
> + LIST_HEAD(adl);
>   int err;
>  
>   if (list_empty(&net->nft.commit_list)) {
> @@ -8206,12 +8173,15 @@ static int nf_tables_commit(struct net *net, struct 
> sk_buff *skb)
>   }
>   break;
>   }
> + nf_tables_commit_audit_collect(&adl, trans);
>   }
>  
>   nft_commit_notify(net, NETLINK_CB(skb).portid);
>   nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN);
>   nf_tables_commit_release(net);
>  
> + nf_tables_commit_audit_log(&adl, net->nft.base_seq);
> +

This looks more self-contained and nicer, thanks.


Re: [PATCH net-next v2 2/2] net: ipa: fix IPA validation

2021-03-22 Thread Andrew Lunn
> The solution is to create a user space tool inside the
> drivers/net/ipa directory that will link with the kernel
> source files and will perform all the basic one-time checks
> I want to make.

Hi Alex

Have you found any other driver doing this?  Where do they keep there
code?

Could this be a selftest, put somewhere in tools/testing/selftests.

Or can this be a test kernel module. Eg. we have crypt/testmsg.c which
runs a number of tests on the crypto subsystem,
./kernel/time/test_udelay.c which runs times on udelay.

Rather than inventing something new, please follow other examples
already in the kernel.

   Andrew 


[PATCH 3/4] selftest/arm64/ptrace: add a test for orig_x0

2021-03-22 Thread Andrei Vagin
The test creates two processes where one traces another one.  The tracee
executes a system call, the tracer traps it, changes orig_x0, triggers a
signal and checks that the syscall is restarted with the setted
argument.

Test output:
 $ ./ptrace_restart_syscall_test
 1..3
 ok 1 orig_x0: 0x3
 ok 2 x0: 0x5
 ok 3 The child exited with code 0.
 # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Andrei Vagin 
---
 tools/testing/selftests/arm64/ptrace/Makefile |   6 +
 tools/testing/selftests/arm64/ptrace/lib.h|  36 ++
 .../ptrace/ptrace_restart_syscall_test.c  | 122 ++
 3 files changed, 164 insertions(+)
 create mode 100644 tools/testing/selftests/arm64/ptrace/Makefile
 create mode 100644 tools/testing/selftests/arm64/ptrace/lib.h
 create mode 100644 
tools/testing/selftests/arm64/ptrace/ptrace_restart_syscall_test.c

diff --git a/tools/testing/selftests/arm64/ptrace/Makefile 
b/tools/testing/selftests/arm64/ptrace/Makefile
new file mode 100644
index ..1bc10e2d2ac8
--- /dev/null
+++ b/tools/testing/selftests/arm64/ptrace/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+
+CFLAGS += -g -I../../../../../usr/include/
+TEST_GEN_PROGS := ptrace_restart_syscall_test
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/arm64/ptrace/lib.h 
b/tools/testing/selftests/arm64/ptrace/lib.h
new file mode 100644
index ..14f4737188a3
--- /dev/null
+++ b/tools/testing/selftests/arm64/ptrace/lib.h
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#ifndef __PTRACE_TEST_LOG_H
+#define __PTRACE_TEST_LOG_H
+
+#define pr_p(func, fmt, ...) func("%s:%d: " fmt ": %m", \
+ __func__, __LINE__, ##__VA_ARGS__)
+
+#define pr_err(fmt, ...)   \
+   ({  \
+   ksft_test_result_error(fmt "\n", ##__VA_ARGS__);\
+   -1; \
+   })
+
+#define pr_fail(fmt, ...)  \
+   ({  \
+   ksft_test_result_fail(fmt "\n", ##__VA_ARGS__); \
+   -1; \
+   })
+
+#define pr_perror(fmt, ...)pr_p(pr_err, fmt, ##__VA_ARGS__)
+
+static inline int ptrace_and_wait(pid_t pid, int cmd, int sig)
+{
+   int status;
+
+   /* Stop on syscall-exit. */
+   if (ptrace(cmd, pid, 0, 0))
+   return pr_perror("Can't resume the child %d", pid);
+   if (waitpid(pid, &status, 0) != pid)
+   return pr_perror("Can't wait for the child %d", pid);
+   if (!WIFSTOPPED(status) || WSTOPSIG(status) != sig)
+   return pr_err("Unexpected status: %x", status);
+   return 0;
+}
+
+#endif
diff --git a/tools/testing/selftests/arm64/ptrace/ptrace_restart_syscall_test.c 
b/tools/testing/selftests/arm64/ptrace/ptrace_restart_syscall_test.c
new file mode 100644
index ..ce59657f41be
--- /dev/null
+++ b/tools/testing/selftests/arm64/ptrace/ptrace_restart_syscall_test.c
@@ -0,0 +1,122 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../../kselftest.h"
+#include "lib.h"
+
+static int child(int fd)
+{
+   char c;
+
+   if (read(fd, &c, 1) != 1)
+   return 1;
+
+   return 0;
+}
+
+int main(int argc, void **argv)
+{
+   union {
+   struct user_regs_struct r;
+   struct {
+   char __regs[272];
+   unsigned long long orig_x0;
+   unsigned long long orig_x7;
+   };
+   } regs = {};
+   struct iovec iov = {
+   .iov_base = ®s,
+   .iov_len = sizeof(regs),
+   };
+   int status;
+   pid_t pid;
+   int p[2], fdzero;
+
+   ksft_set_plan(3);
+
+   if (pipe(p))
+   return pr_perror("Can't create a pipe");
+   fdzero = open("/dev/zero", O_RDONLY);
+   if (fdzero < 0)
+   return pr_perror("Can't open /dev/zero");
+
+   pid = fork();
+   if (pid == 0) {
+   kill(getpid(), SIGSTOP);
+   return child(p[0]);
+   }
+   if (pid < 0)
+   return 1;
+
+   if (ptrace(PTRACE_ATTACH, pid, 0, 0))
+   return pr_perror("Can't attach to the child %d", pid);
+   if (waitpid(pid, &status, 0) != pid)
+   return pr_perror("Can't wait for the child %d", pid);
+   /* Skip SIGSTOP */
+   if (ptrace_and_wait(pid, PTRACE_CONT, SIGSTOP))
+   return 1;
+
+   /* Resume the child to the next system call. */
+   if (ptrace_and_wait(pid, PTRACE_SYSCALL, SIGTRAP))
+   return 1;
+
+   /* Send a

[PATCH 4/4] selftest/arm64/ptrace: add a test for orig_x7

2021-03-22 Thread Andrei Vagin
In system calls, x7 is used to indicate whether a tracee has been
stopped on syscall-enter or syscall-exit and the origin value of x7 is
saved in orig_x7.

Test output:
 $ ./ptrace_syscall_test
 1..4
 ok 1 x7: 0
 ok 2 x7: 1
 ok 3 x7: 686920776f726c64
 ok 4 The child exited with code 0.
 # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Andrei Vagin 
---
 tools/testing/selftests/arm64/ptrace/Makefile |   2 +-
 .../arm64/ptrace/ptrace_syscall_test.c| 158 ++
 2 files changed, 159 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/arm64/ptrace/ptrace_syscall_test.c

diff --git a/tools/testing/selftests/arm64/ptrace/Makefile 
b/tools/testing/selftests/arm64/ptrace/Makefile
index 1bc10e2d2ac8..ea02c1a63806 100644
--- a/tools/testing/selftests/arm64/ptrace/Makefile
+++ b/tools/testing/selftests/arm64/ptrace/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 
 CFLAGS += -g -I../../../../../usr/include/
-TEST_GEN_PROGS := ptrace_restart_syscall_test
+TEST_GEN_PROGS := ptrace_restart_syscall_test ptrace_syscall_test
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/arm64/ptrace/ptrace_syscall_test.c 
b/tools/testing/selftests/arm64/ptrace/ptrace_syscall_test.c
new file mode 100644
index ..ad55b44ae9f5
--- /dev/null
+++ b/tools/testing/selftests/arm64/ptrace/ptrace_syscall_test.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../../kselftest.h"
+#include "lib.h"
+
+#define X7_TEST_VAL 0x686920776f726c64UL
+
+static long test_syscall(long *x7)
+{
+   register long x0 __asm__("x0") = 0;
+   register long *x1 __asm__("x1") = x7;
+   register long x8 __asm__("x8") = 0x555;
+
+   __asm__ (
+   "ldr x7, [x1, 0]\n"
+   "svc 0\n"
+   "str x7, [x1, 0]\n"
+  : "=r"(x0)
+  : "r"(x0), "r"(x1), "r"(x8)
+  :
+   );
+   return x0;
+}
+
+static int child(void)
+{
+   long  val = X7_TEST_VAL;
+
+   if (test_syscall(&val)) {
+   ksft_print_msg("The test syscall failed\n");
+   return 1;
+   }
+   if (val != X7_TEST_VAL) {
+   ksft_print_msg("Unexpected x7: %lx\n", val);
+   return 1;
+   }
+
+   if (test_syscall(&val)) {
+   ksft_print_msg("The test syscall failed\n");
+   return 1;
+   }
+   if (val != ~X7_TEST_VAL) {
+   ksft_print_msg("Unexpected x7: %lx\n", val);
+   return 1;
+   }
+
+   return 0;
+}
+
+#ifndef PTRACE_SYSEMU
+#define PTRACE_SYSEMU 31
+#endif
+
+int main(int argc, void **argv)
+{
+   union {
+   struct user_regs_struct r;
+   struct {
+   char __regs[272];
+   unsigned long long orig_x0;
+   unsigned long long orig_x7;
+   };
+   } regs = {};
+   struct iovec iov = {
+   .iov_base = ®s,
+   .iov_len = sizeof(regs),
+   };
+   int status;
+   pid_t pid;
+
+   ksft_set_plan(4);
+
+   pid = fork();
+   if (pid == 0) {
+   kill(getpid(), SIGSTOP);
+   _exit(child());
+   }
+   if (pid < 0)
+   return 1;
+
+   if (ptrace_and_wait(pid, PTRACE_ATTACH, SIGSTOP))
+   return 1;
+   /* skip SIGSTOP */
+   if (ptrace_and_wait(pid, PTRACE_CONT, SIGSTOP))
+   return 1;
+
+   /* Resume the child to the next system call. */
+   if (ptrace_and_wait(pid, PTRACE_SYSCALL, SIGTRAP))
+   return 1;
+
+   /* Check that x7 is 0 on syscall-enter. */
+   if (ptrace(PTRACE_GETREGSET, pid, NT_PRSTATUS, &iov))
+   return pr_perror("Can't get child registers");
+   if (regs.orig_x7 != X7_TEST_VAL)
+   return pr_fail("Unexpected orig_x7: %lx", regs.orig_x7);
+   if (regs.r.regs[7] != 0)
+   return pr_fail("Unexpected x7: %lx", regs.r.regs[7]);
+   ksft_test_result_pass("x7: %llx\n", regs.r.regs[7]);
+
+   if (ptrace_and_wait(pid, PTRACE_SYSCALL, SIGTRAP))
+   return 1;
+
+   /* Check that x7 is 1 on syscall-exit. */
+   if (ptrace(PTRACE_GETREGSET, pid, NT_PRSTATUS, &iov))
+   return pr_perror("Can't get child registers");
+   if (regs.r.regs[7] != 1)
+   return pr_fail("Unexpected x7: %lx", regs.r.regs[7]);
+   ksft_test_result_pass("x7: %llx\n", regs.r.regs[7]);
+
+   /* Check that the child will not see a new value of x7. */
+   regs.r.regs[0] = 0;
+   regs.r.regs[7] = ~X7_TEST_VAL;
+   if (ptrace(PTRACE_SETREGSET, pid, NT_PRSTATUS, &iov))
+   return pr_perror("Can't set child registers");
+
+   /* Resume the child to th

[PATCH 2/4] arm64/ptrace: introduce orig_x7 in the user_pt_regs structure

2021-03-22 Thread Andrei Vagin
We have some ABI weirdness in the way that we handle syscall exit stops
because we indicate whether or not the stop has been signalled from
syscall entry or syscall exit by clobbering a general purpose register
x7 in the tracee and restoring its old value after the stop.

This behavior was inherited from ARM and it isn't common for other
architectures. Now, we have PTRACE_GET_SYSCALL_INFO that gives all
required information about system calls, so the hack with clobbering
registers isn't needed anymore.

This change instroduces orig_x7 in the user_pt_regs structure that will
contains an origin value of the x7 register if the tracee is stopped in
a system call..

Signed-off-by: Andrei Vagin 
---
 arch/arm64/include/asm/ptrace.h  |  1 +
 arch/arm64/include/uapi/asm/ptrace.h |  1 +
 arch/arm64/kernel/ptrace.c   | 18 --
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index d4cdf98ac003..1008f0fbc5ea 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -184,6 +184,7 @@ struct pt_regs {
u64 pc;
u64 pstate;
u64 orig_x0;
+   u64 orig_x7;
};
};
 #ifdef __AARCH64EB__
diff --git a/arch/arm64/include/uapi/asm/ptrace.h 
b/arch/arm64/include/uapi/asm/ptrace.h
index 3c118c5b0893..be7583ff5f4d 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -91,6 +91,7 @@ struct user_pt_regs {
__u64   pc;
__u64   pstate;
__u64   orig_x0;
+   __u64   orig_x7;
 };
 
 struct user_fpsimd_state {
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 170f42fd6101..1ed5b4aa986b 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1750,7 +1750,7 @@ static void tracehook_report_syscall(struct pt_regs *regs,
 enum ptrace_syscall_dir dir)
 {
int regno;
-   unsigned long saved_reg;
+   u64 _saved_reg, *saved_reg;
 
/*
 * We have some ABI weirdness here in the way that we handle syscall
@@ -1768,19 +1768,25 @@ static void tracehook_report_syscall(struct pt_regs 
*regs,
 * - Syscall stops behave differently to seccomp and pseudo-step traps
 *   (the latter do not nobble any registers).
 */
-   regno = (is_compat_task() ? 12 : 7);
-   saved_reg = regs->regs[regno];
+   if (is_compat_task()) {
+   regno = 12;
+   saved_reg = &_saved_reg;
+   } else {
+   regno = 7;
+   saved_reg = ®s->orig_x7;
+   }
+   *saved_reg = regs->regs[regno];
regs->regs[regno] = dir;
 
if (dir == PTRACE_SYSCALL_ENTER) {
if (tracehook_report_syscall_entry(regs))
forget_syscall(regs);
-   regs->regs[regno] = saved_reg;
+   regs->regs[regno] = *saved_reg;
} else if (!test_thread_flag(TIF_SINGLESTEP)) {
tracehook_report_syscall_exit(regs, 0);
-   regs->regs[regno] = saved_reg;
+   regs->regs[regno] = *saved_reg;
} else {
-   regs->regs[regno] = saved_reg;
+   regs->regs[regno] = *saved_reg;
 
/*
 * Signal a pseudo-step exception since we are stepping but
-- 
2.29.2



[PATCH 1/4] arm64: expose orig_x0 in the user_pt_regs structure

2021-03-22 Thread Andrei Vagin
orig_x0 is recorded at the start of the syscall entry and then it is
used for resetting the argument back to its original value during
syscall restarts.

If orig_x0 isn't available from user-space, this makes it tricky to
manage arguments of restarted system calls.

Cc: Keno Fischer 
Signed-off-by: Andrei Vagin 
---
 arch/arm64/include/asm/ptrace.h  | 2 +-
 arch/arm64/include/uapi/asm/ptrace.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index e58bca832dff..d4cdf98ac003 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -183,9 +183,9 @@ struct pt_regs {
u64 sp;
u64 pc;
u64 pstate;
+   u64 orig_x0;
};
};
-   u64 orig_x0;
 #ifdef __AARCH64EB__
u32 unused2;
s32 syscallno;
diff --git a/arch/arm64/include/uapi/asm/ptrace.h 
b/arch/arm64/include/uapi/asm/ptrace.h
index 758ae984ff97..3c118c5b0893 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -90,6 +90,7 @@ struct user_pt_regs {
__u64   sp;
__u64   pc;
__u64   pstate;
+   __u64   orig_x0;
 };
 
 struct user_fpsimd_state {
-- 
2.29.2



[PATCH 0/4 v3] arm64/ptrace: allow to get all registers on syscall traps

2021-03-22 Thread Andrei Vagin
Here are two known problems with registers when a tracee is stopped in
syscall traps.

The first problem is about the x7 register that is used to indicate
whether or not the stop has been signalled from syscall entry or syscall
exit. This means that:

- Any writes by the tracer to this register during the stop are
  ignored/discarded.

- The actual value of the register is not available during the stop,
  so the tracer cannot save it and restore it later.

For applications like the user-mode Linux or gVisor, it is critical to
have access to the full set of registers in any moment. For example,
they need to change values of all registers to emulate rt_sigreturn or
execve and they need to have the full set of registers to build a signal
frame.

The second problem is that orig_x0 isn't exposed to user-space. orig_x0
is recorded at the start of the syscall entry and then it is used for
resetting the argument back to its original value during syscall
restarts.

This series extends the user_pt_regs structure with orig_x0 and orig_x7.
This doesn't break the backward compatibility. There are two interfaces
how user-space can receive user_pt_regs from the kernel. The first one
is PTRACE_{GET,SET}REGSET. In this case, the user-space provides a
buffer and its size and the kernel fills only the part that fits the
size. The second interface is a core dump file where registers are
written in a separate note and the user-space can parse only the part
that it knows.

Cc: Catalin Marinas 
Cc: Dave Martin 
Cc: Keno Fischer  
Cc: Oleg Nesterov 
Cc: Will Deacon 

Andrei Vagin (4):
  arm64: expose orig_x0 in the user_pt_regs structure
  arm64/ptrace: introduce orig_x7 in the user_pt_regs structure
  selftest/arm64/ptrace: add a test for orig_x0
  selftest/arm64/ptrace: add a test for orig_x7

v2: use the ptrace option instead of adding a new regset.
v3: append orig_x0 and orig_x7 to the user_pt_regs.

 arch/arm64/include/asm/ptrace.h   |   5 +
 arch/arm64/kernel/ptrace.c| 130 +++-
 include/uapi/linux/elf.h  |   1 +
 tools/testing/selftests/arm64/Makefile|   2 +-
 tools/testing/selftests/arm64/ptrace/Makefile |   6 +
 .../arm64/ptrace/ptrace_syscall_regs_test.c   | 142 ++
 6 files changed, 246 insertions(+), 40 deletions(-)
 create mode 100644 tools/testing/selftests/arm64/ptrace/Makefile
 create mode 100644 
tools/testing/selftests/arm64/ptrace/ptrace_syscall_regs_test.c

-- 
2.29.2



Re: [PATCH v2 18/18] vfs: remove unused ioctl helpers

2021-03-22 Thread Darrick J. Wong
On Mon, Mar 22, 2021 at 03:49:16PM +0100, Miklos Szeredi wrote:
> Remove vfs_ioc_setflags_prepare(), vfs_ioc_fssetxattr_check() and
> simple_fill_fsxattr(), which are no longer used.
> 
> Signed-off-by: Miklos Szeredi 

Woo hoo, so much boilerplate goes away!

Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/inode.c | 87 --
>  include/linux/fs.h | 12 ---
>  2 files changed, 99 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index a047ab306f9a..ae526fd9c0a4 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -12,7 +12,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -2314,89 +2313,3 @@ struct timespec64 current_time(struct inode *inode)
>   return timestamp_truncate(now, inode);
>  }
>  EXPORT_SYMBOL(current_time);
> -
> -/*
> - * Generic function to check FS_IOC_SETFLAGS values and reject any invalid
> - * configurations.
> - *
> - * Note: the caller should be holding i_mutex, or else be sure that they have
> - * exclusive access to the inode structure.
> - */
> -int vfs_ioc_setflags_prepare(struct inode *inode, unsigned int oldflags,
> -  unsigned int flags)
> -{
> - /*
> -  * The IMMUTABLE and APPEND_ONLY flags can only be changed by
> -  * the relevant capability.
> -  *
> -  * This test looks nicer. Thanks to Pauline Middelink
> -  */
> - if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL) &&
> - !capable(CAP_LINUX_IMMUTABLE))
> - return -EPERM;
> -
> - return fscrypt_prepare_setflags(inode, oldflags, flags);
> -}
> -EXPORT_SYMBOL(vfs_ioc_setflags_prepare);
> -
> -/*
> - * Generic function to check FS_IOC_FSSETXATTR values and reject any invalid
> - * configurations.
> - *
> - * Note: the caller should be holding i_mutex, or else be sure that they have
> - * exclusive access to the inode structure.
> - */
> -int vfs_ioc_fssetxattr_check(struct inode *inode, const struct fsxattr 
> *old_fa,
> -  struct fsxattr *fa)
> -{
> - /*
> -  * Can't modify an immutable/append-only file unless we have
> -  * appropriate permission.
> -  */
> - if ((old_fa->fsx_xflags ^ fa->fsx_xflags) &
> - (FS_XFLAG_IMMUTABLE | FS_XFLAG_APPEND) &&
> - !capable(CAP_LINUX_IMMUTABLE))
> - return -EPERM;
> -
> - /*
> -  * Project Quota ID state is only allowed to change from within the init
> -  * namespace. Enforce that restriction only if we are trying to change
> -  * the quota ID state. Everything else is allowed in user namespaces.
> -  */
> - if (current_user_ns() != &init_user_ns) {
> - if (old_fa->fsx_projid != fa->fsx_projid)
> - return -EINVAL;
> - if ((old_fa->fsx_xflags ^ fa->fsx_xflags) &
> - FS_XFLAG_PROJINHERIT)
> - return -EINVAL;
> - }
> -
> - /* Check extent size hints. */
> - if ((fa->fsx_xflags & FS_XFLAG_EXTSIZE) && !S_ISREG(inode->i_mode))
> - return -EINVAL;
> -
> - if ((fa->fsx_xflags & FS_XFLAG_EXTSZINHERIT) &&
> - !S_ISDIR(inode->i_mode))
> - return -EINVAL;
> -
> - if ((fa->fsx_xflags & FS_XFLAG_COWEXTSIZE) &&
> - !S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
> - return -EINVAL;
> -
> - /*
> -  * It is only valid to set the DAX flag on regular files and
> -  * directories on filesystems.
> -  */
> - if ((fa->fsx_xflags & FS_XFLAG_DAX) &&
> - !(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
> - return -EINVAL;
> -
> - /* Extent size hints of zero turn off the flags. */
> - if (fa->fsx_extsize == 0)
> - fa->fsx_xflags &= ~(FS_XFLAG_EXTSIZE | FS_XFLAG_EXTSZINHERIT);
> - if (fa->fsx_cowextsize == 0)
> - fa->fsx_xflags &= ~FS_XFLAG_COWEXTSIZE;
> -
> - return 0;
> -}
> -EXPORT_SYMBOL(vfs_ioc_fssetxattr_check);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9e7f6a592a70..1e88ace15004 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3571,18 +3571,6 @@ extern int vfs_fadvise(struct file *file, loff_t 
> offset, loff_t len,
>  extern int generic_fadvise(struct file *file, loff_t offset, loff_t len,
>  int advice);
>  
> -int vfs_ioc_setflags_prepare(struct inode *inode, unsigned int oldflags,
> -  unsigned int flags);
> -
> -int vfs_ioc_fssetxattr_check(struct inode *inode, const struct fsxattr 
> *old_fa,
> -  struct fsxattr *fa);
> -
> -static inline void simple_fill_fsxattr(struct fsxattr *fa, __u32 xflags)
> -{
> - memset(fa, 0, sizeof(*fa));
> - fa->fsx_xflags = xflags;
> -}
> -
>  /*
>   * Flush file data before changing attributes.  Caller must hold any locks
>   * required to prevent further writes to this file until we're done settin

Re: [PATCH v2 10/18] xfs: convert to miscattr

2021-03-22 Thread Darrick J. Wong
On Mon, Mar 22, 2021 at 03:49:08PM +0100, Miklos Szeredi wrote:
> Use the miscattr API to let the VFS handle locking, permission checking and
> conversion.
> 
> Signed-off-by: Miklos Szeredi 
> Cc: Darrick J. Wong 
> ---
>  fs/xfs/libxfs/xfs_fs.h |   4 -
>  fs/xfs/xfs_ioctl.c | 316 -
>  fs/xfs/xfs_ioctl.h |  11 ++
>  fs/xfs/xfs_ioctl32.c   |   2 -
>  fs/xfs/xfs_ioctl32.h   |   2 -
>  fs/xfs/xfs_iops.c  |   7 +
>  6 files changed, 107 insertions(+), 235 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 6fad140d4c8e..6bf7d8b7d743 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -770,8 +770,6 @@ struct xfs_scrub_metadata {
>  /*
>   * ioctl commands that are used by Linux filesystems
>   */
> -#define XFS_IOC_GETXFLAGSFS_IOC_GETFLAGS
> -#define XFS_IOC_SETXFLAGSFS_IOC_SETFLAGS
>  #define XFS_IOC_GETVERSION   FS_IOC_GETVERSION
>  
>  /*
> @@ -782,8 +780,6 @@ struct xfs_scrub_metadata {
>  #define XFS_IOC_ALLOCSP  _IOW ('X', 10, struct xfs_flock64)
>  #define XFS_IOC_FREESP   _IOW ('X', 11, struct xfs_flock64)
>  #define XFS_IOC_DIOINFO  _IOR ('X', 30, struct dioattr)
> -#define XFS_IOC_FSGETXATTR   FS_IOC_FSGETXATTR
> -#define XFS_IOC_FSSETXATTR   FS_IOC_FSSETXATTR
>  #define XFS_IOC_ALLOCSP64_IOW ('X', 36, struct xfs_flock64)
>  #define XFS_IOC_FREESP64 _IOW ('X', 37, struct xfs_flock64)
>  #define XFS_IOC_GETBMAP  _IOWR('X', 38, struct getbmap)
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 99dfe89a8d08..e27e3ff9a651 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -40,6 +40,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * xfs_find_handle maps from userspace xfs_fsop_handlereq structure to
> @@ -1053,97 +1054,51 @@ xfs_ioc_ag_geometry(
>   * Linux extended inode flags interface.
>   */
>  
> -STATIC unsigned int
> -xfs_merge_ioc_xflags(
> - unsigned intflags,
> - unsigned intstart)
> -{
> - unsigned intxflags = start;
> -
> - if (flags & FS_IMMUTABLE_FL)
> - xflags |= FS_XFLAG_IMMUTABLE;
> - else
> - xflags &= ~FS_XFLAG_IMMUTABLE;
> - if (flags & FS_APPEND_FL)
> - xflags |= FS_XFLAG_APPEND;
> - else
> - xflags &= ~FS_XFLAG_APPEND;
> - if (flags & FS_SYNC_FL)
> - xflags |= FS_XFLAG_SYNC;
> - else
> - xflags &= ~FS_XFLAG_SYNC;
> - if (flags & FS_NOATIME_FL)
> - xflags |= FS_XFLAG_NOATIME;
> - else
> - xflags &= ~FS_XFLAG_NOATIME;
> - if (flags & FS_NODUMP_FL)
> - xflags |= FS_XFLAG_NODUMP;
> - else
> - xflags &= ~FS_XFLAG_NODUMP;
> - if (flags & FS_DAX_FL)
> - xflags |= FS_XFLAG_DAX;
> - else
> - xflags &= ~FS_XFLAG_DAX;
> -
> - return xflags;
> -}
> -
> -STATIC unsigned int
> -xfs_di2lxflags(
> - uint16_tdi_flags,
> - uint64_tdi_flags2)
> -{
> - unsigned intflags = 0;
> -
> - if (di_flags & XFS_DIFLAG_IMMUTABLE)
> - flags |= FS_IMMUTABLE_FL;
> - if (di_flags & XFS_DIFLAG_APPEND)
> - flags |= FS_APPEND_FL;
> - if (di_flags & XFS_DIFLAG_SYNC)
> - flags |= FS_SYNC_FL;
> - if (di_flags & XFS_DIFLAG_NOATIME)
> - flags |= FS_NOATIME_FL;
> - if (di_flags & XFS_DIFLAG_NODUMP)
> - flags |= FS_NODUMP_FL;
> - if (di_flags2 & XFS_DIFLAG2_DAX) {
> - flags |= FS_DAX_FL;
> - }
> - return flags;
> -}
> -
>  static void
>  xfs_fill_fsxattr(
>   struct xfs_inode*ip,
>   boolattr,
> - struct fsxattr  *fa)
> + struct miscattr *ma)
>  {
>   struct xfs_ifork*ifp = attr ? ip->i_afp : &ip->i_df;

Hm, could you replace "bool attr" with "int whichfork"?  The new
signature and first line of code becomes:

static void
xfs_fill_fsxattr(
struct xfs_inode*ip,
int whichfork,
struct miscattr *ma)
{
struct xfs_ifork*ifp = XFS_IFORK_PTR(ip, whichfork);

...and then the two wrappers of xfs_fill_fsxattr become:

STATIC int
xfs_ioc_fsgetxattra(
struct xfs_inode*ip,
void__user *arg)
{
struct miscattr ma;

xfs_ilock(ip, XFS_ILOCK_SHARED);
xfs_fill_fsxattr(ip, XFS_ATTR_FORK, &ma);
xfs_iunlock(ip, XFS_ILOCK_SHARED);

return fsxattr_copy_to_user(&ma, arg);
}

int
xfs_miscattr_get(
struct dentry   *dentry,
struct miscattr *ma)
{
struct xfs_inode*ip = XFS_I(d_inode(dentry));

xfs_ilock(ip, XFS_ILOCK_SHARED);
xfs_fill_fsxattr(ip, XFS_DATA_FORK, ma);
xfs_iunlock(ip, XFS_ILOCK_SHARED);

return 0;
}

This makes it clearer that FSGETXATTRA reports on the extended attributes

Re: [PATCH 02/11] x86: tboot: avoid Wstringop-overread-warning

2021-03-22 Thread Arnd Bergmann
On Mon, Mar 22, 2021 at 11:09 PM Martin Sebor  wrote:
> On 3/22/21 2:29 PM, Ingo Molnar wrote:
> > * Arnd Bergmann  wrote:
> >
> > I.e. the real workaround might be to turn off the 
> > -Wstringop-overread-warning,
> > until GCC-11 gets fixed?
>
> In GCC 10 -Wstringop-overread is a subset of -Wstringop-overflow.
> GCC 11 breaks it out as a separate warning to make it easier to
> control.  Both warnings have caught some real bugs but they both
> have a nonzero rate of false positives.  Other than bug reports
> we don't have enough data to say what their S/N ratio might be
> but my sense is that it's fairly high in general.
>
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=wstringop-overread
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=wstringop-overflow

Unfortunately, stringop-overflow is one of the warnings that is
completely disabled in the kernel at the moment, rather than
enabled at one of the user-selectable higher warning levels.

I have a patch series to change that and to pull some of these
into the lower W=1 or W=2 levels or even enable them by default.

To do this though, I first need to ensure that the actual output
is empty for the normal builds. I added -Wstringop-overflow to
the list of warnings I wanted to address because it is a new
warning and only has a dozen or so occurrences throughout the
kernel.

> In GCC 11, all access warnings expect objects to be either declared
> or allocated.  Pointers with constant values are taken to point to
> nothing valid (as Arnd mentioned above, this is to detect invalid
> accesses to members of structs at address zero).
>
> One possible solution to the known address problem is to extend GCC
> attributes address and io that pin an object to a hardwired address
> to all targets (at the moment they're supported on just one or two
> targets).  I'm not sure this can still happen before GCC 11 releases
> sometime in April or May.
>
> Until then, another workaround is to convert the fixed address to
> a volatile pointer before using it for the access, along the lines
> below.  It should have only a negligible effect on efficiency.
>
> diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
> index 4c09ba110204..76326b906010 100644
> --- a/arch/x86/kernel/tboot.c
> +++ b/arch/x86/kernel/tboot.c
> @@ -67,7 +67,9 @@ void __init tboot_probe(void)
>  /* Map and check for tboot UUID. */
>  set_fixmap(FIX_TBOOT_BASE, boot_params.tboot_addr);
>  tboot = (struct tboot *)fix_to_virt(FIX_TBOOT_BASE);
> -   if (memcmp(&tboot_uuid, &tboot->uuid, sizeof(tboot->uuid))) {
> +   if (memcmp(&tboot_uuid,
> +  (*(struct tboot* volatile *)(&tboot))->uuid,
> +  sizeof(tboot->uuid))) {
>  pr_warn("tboot at 0x%llx is invalid\n",

I think a stray 'volatile' would raise too many red flags here, but
I checked that the RELOC_HIDE() macro has the same effect, e.g.

@@ -66,7 +67,8 @@ void __init tboot_probe(void)

/* Map and check for tboot UUID. */
set_fixmap(FIX_TBOOT_BASE, boot_params.tboot_addr);
-   tboot = (struct tboot *)fix_to_virt(FIX_TBOOT_BASE);
+   /* RELOC_HIDE to prevent gcc warnings about NULL pointer */
+   tboot = RELOC_HIDE(NULL, fix_to_virt(FIX_TBOOT_BASE));
if (memcmp(&tboot_uuid, &tboot->uuid, sizeof(tboot->uuid))) {
pr_warn("tboot at 0x%llx is invalid\n", boot_params.tboot_addr);
tboot = NULL;

 Arnd


Re: [PATCH v4 15/19] dts: bindings: Document device tree bindings for ETE

2021-03-22 Thread Suzuki K Poulose

On 22/03/2021 17:28, Rob Herring wrote:

On Mon, Mar 22, 2021 at 10:53 AM Suzuki K Poulose
 wrote:


Hi Rob

On 06/03/2021 21:06, Rob Herring wrote:

On Thu, Feb 25, 2021 at 07:35:39PM +, Suzuki K Poulose wrote:

Document the device tree bindings for Embedded Trace Extensions.
ETE can be connected to legacy coresight components and thus
could optionally contain a connection graph as described by
the CoreSight bindings.

Cc: devicet...@vger.kernel.org
Cc: Mathieu Poirier 
Cc: Mike Leach 
Cc: Rob Herring 
Signed-off-by: Suzuki K Poulose 
---



+  out-ports:
+description: |
+  Output connections from the ETE to legacy CoreSight trace bus.
+$ref: /schemas/graph.yaml#/properties/port


s/port/ports/


Ok.



And then you need:

 properties:
   port:
 description: what this port is
 $ref: /schemas/graph.yaml#/properties/port


Isn't this already covered by the definition of ports ? There are no
fixed connections for ETE. It is optional and could be connected to
any legacy CoreSight component. i.e, a "ports" object can have port
objects inside.


'properties/ports' only defines that you have 'port' nodes within it.


Given we have defined out-ports as an object "confirming to the ports"
do we need to describe the individual port nodes ?


Yes, you have to define what the 'port' nodes are. A port is a data
stream and you should know what your hardware has. What the data
stream is connected to is outside the scope of the binding.


Ok, I have included the above changes for the next version.

Thanks
Suzuki


RE: [PATCH v2] mmc: sdhci-of-dwcmshc: add ACPI support for BlueField-3 SoC

2021-03-22 Thread Liming Sun

> -Original Message-
> From: Ulf Hansson 
> Sent: Monday, March 22, 2021 5:51 AM
> To: Liming Sun 
> Cc: Adrian Hunter ; Khalil Blaiech
> ; linux-mmc ; Linux
> Kernel Mailing List 
> Subject: Re: [PATCH v2] mmc: sdhci-of-dwcmshc: add ACPI support for
> BlueField-3 SoC
> 
> On Fri, 19 Mar 2021 at 21:23, Liming Sun  wrote:
> >
> > Uffe,
> >
> > Can I confirm whether you meant the 'master' branch or some other
> branch?
> > I did a rebase of master and didn't see Shawn Lin's changes in the sdhci-of-
> dwcmshc.c
> 
> git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git next

Thanks! Rebased and posted v3.

> 
> [...]
> 
> Kind regards
> Uffe


[PATCH v3] mmc: sdhci-of-dwcmshc: add ACPI support for BlueField-3 SoC

2021-03-22 Thread Liming Sun
This commit adds ACPI support in the sdhci-of-dwcmshc driver for
BlueField-3 SoC. It has changes to only use the clock hierarchy
for Deviec Tree since the clk is not supported by ACPI. Instead,
ACPI can define 'clock-frequency' which is parsed by existing
sdhci_get_property(). This clock value will be returned in function
dwcmshc_get_max_clock().

Signed-off-by: Liming Sun 
Reviewed-by: Khalil Blaiech 
---
v2->v3:
   Rebase to mmc next.
v1->v2:
   Changes for comments from Adrian Hunter :
   - Make changes in sdhci-of-dwcmshc instead.
v1: Initial version which was done in sdhci-acpi.c
---
 drivers/mmc/host/sdhci-of-dwcmshc.c | 50 ++---
 1 file changed, 36 insertions(+), 14 deletions(-)

diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c 
b/drivers/mmc/host/sdhci-of-dwcmshc.c
index 0687368..1113a56 100644
--- a/drivers/mmc/host/sdhci-of-dwcmshc.c
+++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
@@ -7,6 +7,7 @@
  * Author: Jisheng Zhang 
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -94,6 +95,16 @@ static void dwcmshc_adma_write_desc(struct sdhci_host *host, 
void **desc,
sdhci_adma_write_desc(host, desc, addr, len, cmd);
 }
 
+static unsigned int dwcmshc_get_max_clock(struct sdhci_host *host)
+{
+   struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
+
+   if (pltfm_host->clk)
+   return sdhci_pltfm_clk_get_max_clock(host);
+   else
+   return pltfm_host->clock;
+}
+
 static void dwcmshc_check_auto_cmd23(struct mmc_host *mmc,
 struct mmc_request *mrq)
 {
@@ -248,7 +259,7 @@ static void dwcmshc_rk3568_set_clock(struct sdhci_host 
*host, unsigned int clock
.set_clock  = sdhci_set_clock,
.set_bus_width  = sdhci_set_bus_width,
.set_uhs_signaling  = dwcmshc_set_uhs_signaling,
-   .get_max_clock  = sdhci_pltfm_clk_get_max_clock,
+   .get_max_clock  = dwcmshc_get_max_clock,
.reset  = sdhci_reset,
.adma_write_desc= dwcmshc_adma_write_desc,
 };
@@ -323,8 +334,16 @@ static int dwcmshc_rk3568_init(struct sdhci_host *host, 
struct dwcmshc_priv *dwc
 };
 MODULE_DEVICE_TABLE(of, sdhci_dwcmshc_dt_ids);
 
+#ifdef CONFIG_ACPI
+static const struct acpi_device_id sdhci_dwcmshc_acpi_ids[] = {
+   { .id = "MLNXBF30" },
+   {}
+};
+#endif
+
 static int dwcmshc_probe(struct platform_device *pdev)
 {
+   struct device *dev = &pdev->dev;
struct sdhci_pltfm_host *pltfm_host;
struct sdhci_host *host;
struct dwcmshc_priv *priv;
@@ -347,7 +366,7 @@ static int dwcmshc_probe(struct platform_device *pdev)
/*
 * extra adma table cnt for cross 128M boundary handling.
 */
-   extra = DIV_ROUND_UP_ULL(dma_get_required_mask(&pdev->dev), SZ_128M);
+   extra = DIV_ROUND_UP_ULL(dma_get_required_mask(dev), SZ_128M);
if (extra > SDHCI_MAX_SEGS)
extra = SDHCI_MAX_SEGS;
host->adma_table_cnt += extra;
@@ -355,19 +374,21 @@ static int dwcmshc_probe(struct platform_device *pdev)
pltfm_host = sdhci_priv(host);
priv = sdhci_pltfm_priv(pltfm_host);
 
-   pltfm_host->clk = devm_clk_get(&pdev->dev, "core");
-   if (IS_ERR(pltfm_host->clk)) {
-   err = PTR_ERR(pltfm_host->clk);
-   dev_err(&pdev->dev, "failed to get core clk: %d\n", err);
-   goto free_pltfm;
-   }
-   err = clk_prepare_enable(pltfm_host->clk);
-   if (err)
-   goto free_pltfm;
+   if (dev->of_node) {
+   pltfm_host->clk = devm_clk_get(dev, "core");
+   if (IS_ERR(pltfm_host->clk)) {
+   err = PTR_ERR(pltfm_host->clk);
+   dev_err(dev, "failed to get core clk: %d\n", err);
+   goto free_pltfm;
+   }
+   err = clk_prepare_enable(pltfm_host->clk);
+   if (err)
+   goto free_pltfm;
 
-   priv->bus_clk = devm_clk_get(&pdev->dev, "bus");
-   if (!IS_ERR(priv->bus_clk))
-   clk_prepare_enable(priv->bus_clk);
+   priv->bus_clk = devm_clk_get(dev, "bus");
+   if (!IS_ERR(priv->bus_clk))
+   clk_prepare_enable(priv->bus_clk);
+   }
 
err = mmc_of_parse(host->mmc);
if (err)
@@ -489,6 +510,7 @@ static int dwcmshc_resume(struct device *dev)
.name   = "sdhci-dwcmshc",
.probe_type = PROBE_PREFER_ASYNCHRONOUS,
.of_match_table = sdhci_dwcmshc_dt_ids,
+   .acpi_match_table = ACPI_PTR(sdhci_dwcmshc_acpi_ids),
.pm = &dwcmshc_pmops,
},
.probe  = dwcmshc_probe,
-- 
1.8.3.1



[PATCH] tracing: Fix various typos in comments

2021-03-22 Thread Ingo Molnar


Fix ~59 single-word typos in the tracing code comments.

Signed-off-by: Ingo Molnar 
Cc: Steven Rostedt 
Cc: linux-kernel@vger.kernel.org
---
 arch/microblaze/include/asm/ftrace.h |  2 +-
 arch/nds32/kernel/ftrace.c   |  2 +-
 arch/powerpc/include/asm/ftrace.h|  4 ++--
 arch/sh/kernel/ftrace.c  |  2 +-
 arch/sparc/include/asm/ftrace.h  |  2 +-
 fs/tracefs/inode.c   |  2 +-
 include/linux/ftrace.h   |  4 ++--
 include/linux/trace_events.h |  2 +-
 include/linux/tracepoint.h   |  2 +-
 include/trace/events/io_uring.h  |  2 +-
 include/trace/events/rcu.h   |  2 +-
 include/trace/events/sched.h |  2 +-
 include/trace/events/timer.h |  2 +-
 kernel/trace/bpf_trace.c |  4 ++--
 kernel/trace/fgraph.c|  4 ++--
 kernel/trace/ftrace.c|  8 
 kernel/trace/ring_buffer.c   |  2 +-
 kernel/trace/synth_event_gen_test.c  |  2 +-
 kernel/trace/trace.c | 18 +-
 kernel/trace/trace.h |  4 ++--
 kernel/trace/trace_event_perf.c  |  2 +-
 kernel/trace/trace_events.c  |  4 ++--
 kernel/trace/trace_events_filter.c   |  4 ++--
 kernel/trace/trace_events_synth.c|  2 +-
 kernel/trace/trace_functions_graph.c |  2 +-
 kernel/trace/trace_hwlat.c   |  4 ++--
 kernel/trace/trace_kprobe.c  |  2 +-
 kernel/trace/trace_probe.c   |  6 +++---
 kernel/trace/trace_probe.h   |  2 +-
 kernel/trace/trace_probe_tmpl.h  |  2 +-
 kernel/trace/trace_selftest.c|  4 ++--
 kernel/trace/trace_seq.c | 12 ++--
 32 files changed, 59 insertions(+), 59 deletions(-)

diff --git a/arch/microblaze/include/asm/ftrace.h 
b/arch/microblaze/include/asm/ftrace.h
index 5db7f4489f05..6a92bed37794 100644
--- a/arch/microblaze/include/asm/ftrace.h
+++ b/arch/microblaze/include/asm/ftrace.h
@@ -13,7 +13,7 @@ extern void ftrace_call_graph(void);
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
-/* reloction of mcount call site is the same as the address */
+/* relocation of mcount call site is the same as the address */
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
return addr;
diff --git a/arch/nds32/kernel/ftrace.c b/arch/nds32/kernel/ftrace.c
index 414f8a780cc3..0e23e3a8df6b 100644
--- a/arch/nds32/kernel/ftrace.c
+++ b/arch/nds32/kernel/ftrace.c
@@ -236,7 +236,7 @@ void __naked return_to_handler(void)
"bal ftrace_return_to_handler\n\t"
"move $lp, $r0   \n\t"
 
-   /* restore state nedded by the ABI  */
+   /* restore state needed by the ABI  */
"lmw.bim $r0,[$sp],$r1,#0x0  \n\t");
 }
 
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index bc76970b6ee5..debe8c4f7062 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -12,7 +12,7 @@
 
 #ifdef __ASSEMBLY__
 
-/* Based off of objdump optput from glibc */
+/* Based off of objdump output from glibc */
 
 #define MCOUNT_SAVE_FRAME  \
stwur1,-48(r1); \
@@ -52,7 +52,7 @@ extern void _mcount(void);
 
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
-   /* reloction of mcount call site is the same as the address */
+   /* relocation of mcount call site is the same as the address */
return addr;
 }
 
diff --git a/arch/sh/kernel/ftrace.c b/arch/sh/kernel/ftrace.c
index 0646c5961846..295c43315bbe 100644
--- a/arch/sh/kernel/ftrace.c
+++ b/arch/sh/kernel/ftrace.c
@@ -67,7 +67,7 @@ static unsigned char *ftrace_call_replace(unsigned long ip, 
unsigned long addr)
  * Modifying code must take extra care. On an SMP machine, if
  * the code being modified is also being executed on another CPU
  * that CPU will have undefined results and possibly take a GPF.
- * We use kstop_machine to stop other CPUS from exectuing code.
+ * We use kstop_machine to stop other CPUS from executing code.
  * But this does not stop NMIs from happening. We still need
  * to protect against that. We separate out the modification of
  * the code to take care of this.
diff --git a/arch/sparc/include/asm/ftrace.h b/arch/sparc/include/asm/ftrace.h
index d3aa1a524431..e284394cb3aa 100644
--- a/arch/sparc/include/asm/ftrace.h
+++ b/arch/sparc/include/asm/ftrace.h
@@ -17,7 +17,7 @@ void _mcount(void);
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
-/* reloction of mcount call site is the same as the address */
+/* relocation of mcount call site is the same as the address */
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
return addr;
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 4b83cbded559..1261e8b41edb 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -477,7 +477,7 @@ struct dentry *tracefs_create_dir(const char *name, struct 
dentry *parent)
  *
  * The instances directo

Re: [PATCH] drm/amd: Fix a typo in two different sentences

2021-03-22 Thread Randy Dunlap
On 3/22/21 2:06 PM, Bhaskar Chowdhury wrote:
> 
> s/defintion/definition/ .two different places.
> 
> Signed-off-by: Bhaskar Chowdhury 

Acked-by: Randy Dunlap 

> ---
>  drivers/gpu/drm/amd/include/atombios.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/include/atombios.h 
> b/drivers/gpu/drm/amd/include/atombios.h
> index c1d7b1d0b952..47eb84598b96 100644
> --- a/drivers/gpu/drm/amd/include/atombios.h
> +++ b/drivers/gpu/drm/amd/include/atombios.h
> @@ -1987,9 +1987,9 @@ typedef struct _PIXEL_CLOCK_PARAMETERS_V6
>  #define PIXEL_CLOCK_V6_MISC_HDMI_BPP_MASK   0x0c
>  #define PIXEL_CLOCK_V6_MISC_HDMI_24BPP  0x00
>  #define PIXEL_CLOCK_V6_MISC_HDMI_36BPP  0x04
> -#define PIXEL_CLOCK_V6_MISC_HDMI_36BPP_V6   0x08//for V6, the 
> correct defintion for 36bpp should be 2 for 36bpp(2:1)
> +#define PIXEL_CLOCK_V6_MISC_HDMI_36BPP_V6   0x08//for V6, the 
> correct definition for 36bpp should be 2 for 36bpp(2:1)
>  #define PIXEL_CLOCK_V6_MISC_HDMI_30BPP  0x08
> -#define PIXEL_CLOCK_V6_MISC_HDMI_30BPP_V6   0x04//for V6, the 
> correct defintion for 30bpp should be 1 for 36bpp(5:4)
> +#define PIXEL_CLOCK_V6_MISC_HDMI_30BPP_V6   0x04//for V6, the 
> correct definition for 30bpp should be 1 for 36bpp(5:4)
>  #define PIXEL_CLOCK_V6_MISC_HDMI_48BPP  0x0c
>  #define PIXEL_CLOCK_V6_MISC_REF_DIV_SRC 0x10
>  #define PIXEL_CLOCK_V6_MISC_GEN_DPREFCLK0x40
> --


-- 
~Randy



Re: [PATCH V4 06/10] x86/fault: Adjust WARN_ON for PKey fault

2021-03-22 Thread Ira Weiny
On Mon, Mar 22, 2021 at 09:05:43AM -0700, Sean Christopherson wrote:
> On Sun, Mar 21, 2021, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > PKey faults may now happen on kernel mappings if the feature is enabled.
> > Remove the warning in the fault path if PKS is enabled.
> 
> When/why can they happen?  I read through all the changelogs, as well as the
> cover letters for v1 and the RFC, and didn't see any explicit statement about
> why pkey faults on supervisor accesses are now "legal".

Ok, I have to admit I did not think about documenting this detail...  I'll
update the commit message a bit more.

Prior to this series pkeys were only supported on user page mappings.
Therefore seeing a X86_PF_PK error in this path was completely unexpected and
warranted the extra WARN_ON to indicate that something went very wrong.

> Explaining what happens
> later in the page fault handler would also be helpful, e.g. is the flag simply
> ignored?

Ok I'll do this.  But the behavior does not change.  The fault is unhandled and
results in an Ooops.  The only difference is that if PKS is enabled and
configured on a kernel mapping the oops is to be expected.

> Does it lead directly to OOPS?

Yes, the series concludes with it being an ooops unless the test code is
running.  The behavior does not change from before.  I'll more clearly document
that...

> 
> Documenting what happens on a PKS #PF in the API patch would be nice to have, 
> too.

Ok, yes, good idea.

> 
> > Reviewed-by: Dan Williams 
> > Signed-off-by: Ira Weiny 
> > ---
> >  arch/x86/mm/fault.c | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index a73347e2cdfc..731ec90ed413 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -1141,11 +1141,12 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned 
> > long hw_error_code,
> >unsigned long address)
> >  {
> > /*
> > -* Protection keys exceptions only happen on user pages.  We
> > -* have no user pages in the kernel portion of the address
> > -* space, so do not expect them here.
> > +* PF_PK is expected on kernel addresses when supervisor pkeys are
> 
> "is expected" can be misinterpreted as "PF is expected on all kernel 
> addresses...".

Yes the commit message was more clear by using 'may'.

> 
> This ties in with the lack of an explanation in the changelog.
> 
> > +* enabled.
> 
> It'd be helpful to spell out "Protection keys exceptions" so that random 
> readers
> don't need to search for PF_PK to understand what's up.  Maybe even use it as 
> an
> opportunity to introduce "pkeys", e.g.
> 
>   /* Protection keys (pkeys) exceptions are ... */

Fair enough.  Will do.  I've changed this to:

/*
 * X86_PF_PK (Protection key exceptions) may occur on kernel addresses
 * when PKS (PKeys Supervisor) are enabled.
 *
 * If PKS is not enabled an exception should only happen on user pages.
 * Because, we have no user pages in the kernel portion of the address
 * space something must have gone very wrong and we should WARN.
 */

> 
> >  */
> > -   WARN_ON_ONCE(hw_error_code & X86_PF_PK);
> > +   if (!cpu_feature_enabled(X86_FEATURE_PKS))
> > +   WARN_ON_ONCE(hw_error_code & X86_PF_PK);
> 
> Does this generate the same code if the whole thing is thrown in the WARN?  
> E.g.
> 
>   WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS) &&
>(hw_error_code & X86_PF_PK));

I don't know in the general case.  But if CONFIG_BUG=n this would be better.

I've changed it.

Thanks!
Ira


Re: [PATCH] scsi: bfa: Fix a typo in two places

2021-03-22 Thread Randy Dunlap
On 3/22/21 1:58 PM, Bhaskar Chowdhury wrote:
> 
> s/defintions/definitions/  two different places.
> 
> Signed-off-by: Bhaskar Chowdhury 

Acked-by: Randy Dunlap 

> ---
>  drivers/scsi/bfa/bfa_fc.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/bfa/bfa_fc.h b/drivers/scsi/bfa/bfa_fc.h
> index d536270bbe9f..0314e4b9e1fb 100644
> --- a/drivers/scsi/bfa/bfa_fc.h
> +++ b/drivers/scsi/bfa/bfa_fc.h
> @@ -1193,7 +1193,7 @@ enum {
>  };
> 
>  /*
> - * defintions for CT reason code
> + * definitions for CT reason code
>   */
>  enum {
>   CT_RSN_INV_CMD  = 0x01,
> @@ -1240,7 +1240,7 @@ enum {
>  };
> 
>  /*
> - * defintions for the explanation code for all servers
> + * definitions for the explanation code for all servers
>   */
>  enum {
>   CT_EXP_AUTH_EXCEPTION   = 0xF1,
> --


-- 
~Randy



Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

2021-03-22 Thread Borislav Petkov
On Tue, Mar 23, 2021 at 11:06:43AM +1300, Kai Huang wrote:
> This path is called by host SGX driver only, so yes this leaking is done by
> host enclaves only.

Yes, so I was told.

> This patch is purpose is to break EREMOVE out of sgx_free_epc_page() so 
> virtual
> EPC code can use sgx_free_epc_page(), and handle EREMOVE logic differently.
> This patch doesn't have intention to bring functional change.  I changed the
> error msg because Dave said it worth to give a message saying EPC page is
> leaked, and I thought this minor change won't break anything.

I read that already - you don't have to repeat it.

> btw, currently virtual EPC code (patch 5) handles in similar way: There's one
> EREMOVE error is expected and virtual EPC code can handle, but for other
> errors, it means kernel bug, and virtual EPC code gives a WARN(), and that EPC
> page is leaked too:
> 
> + WARN_ONCE(ret != SGX_CHILD_PRESENT,
> +   "EREMOVE (EPC page 0x%lx): unexpected error: %d\n",
> +   sgx_get_epc_phys_addr(epc_page), ret);
> + return ret;
> 
> So to me they are just WARN() to catch kernel bug.

You don't care about users, do you? Because when that error happens,
they won't come crying to you to fix it.

Lemme save you some trouble: we don't take half-baked code into the
kernel until stuff has been discussed and analyzed properly. So instead
of trying to downplay this, try answering my questions.

Here's another one: when does EREMOVE fail?

/me goes and looks it up

"The instruction fails if the operand is not properly aligned or does
not refer to an EPC page or the page is in use by another thread, or
other threads are running in the enclave to which the page belongs. In
addition the instruction fails if the operand refers to an SECS with
associations."

And I guess those conditions will become more in the future.

Now, let's play. I'm the cloud admin and you're cloud OS customer
support. I say:

"I got this scary error message while running enclaves on my server

"EREMOVE returned ... .  EPC page leaked.  Reboot required to retrieve leaked 
pages."

but I cannot reboot that machine because there are guests running on it
and I'm getting paid for those guests and I might get sued if I do?"

Your turn, go wild.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


[PATCH] x86/mce: Add Skylake quirk for patrol scrub reported errors

2021-03-22 Thread Tony Luck
From: Youquan Song 

Skylake has a mode where the system administrator can use a BIOS setup
option to request that the memory controller report uncorrected errors
found by the patrol scrubber as corrected.  This results in them being
signalled using CMCI, which is less disruptive than a machine check.

Add a quirk to detect that a "corrected" error is actually a downgraded
uncorrected error with model specific checks for the "MSCOD" signature in
MCi_STATUS and that the error was reported from a memory controller bank.

Adjust the severity to MCE_AO_SEVERITY so that Linux will try to take
the affected page offline.

[Tony: Wordsmith commit comment]

Signed-off-by: Youquan Song 
Signed-off-by: Tony Luck 

---
Repost ... looks like this got lost somewhere.

V2:
Boris:  Don't optimize with pointer to quirk function. Just do
the vendor/family/model check in the adjust_mce_log()
function
Tony:   Add check for stepping >= 4
---
 arch/x86/kernel/cpu/mce/core.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index e9265e2f28c9..2d5fe23adf29 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -673,6 +673,35 @@ static void mce_read_aux(struct mce *m, int i)
}
 }
 
+/*
+ * Skylake family CPUs have a mode where the user can request that
+ * the memory controller report uncorrected errors found by the patrol
+ * scrubber as corrected (MCI_STATUS_UC == 0). This results in them being
+ * signalled using CMCI, which is less disruptive that a machine check.
+ * The following quirk detects such errors and adjusts the severity.
+ */
+
+#define MSCOD_UCE_SCRUB(0x0010 << 16) /* UnCorrected Patrol Scrub 
Error */
+#define MSCOD_MASK GENMASK_ULL(31, 16)
+
+static void adjust_mce_log(struct mce *m)
+{
+   struct cpuinfo_x86 *c = &boot_cpu_data;
+
+   if (c->x86_vendor == X86_VENDOR_INTEL && c->x86 == 6 &&
+   c->x86_model == INTEL_FAM6_SKYLAKE_X && c->x86_stepping >= 4) {
+   /*
+* Check the error code to see if this is an uncorrected patrol
+* scrub error from one of the memory controller banks. If so,
+* then adjust the severity level to MCE_AO_SEVERITY
+*/
+   if (((m->status & MCACOD_SCRUBMSK) == MCACOD_SCRUB) &&
+   ((m->status & MSCOD_MASK) == MSCOD_UCE_SCRUB) &&
+   m->bank >= 13 && m->bank <= 18)
+   m->severity = MCE_AO_SEVERITY;
+   }
+}
+
 DEFINE_PER_CPU(unsigned, mce_poll_count);
 
 /*
@@ -772,6 +801,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t 
*b)
if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
goto clear_it;
 
+   adjust_mce_log(&m);
mce_log(&m);
 
 clear_it:
-- 
2.21.1



Re: [PATCH 2/2] mm/vmalloc: Use kvmalloc to allocate the table of pages

2021-03-22 Thread Uladzislau Rezki
On Mon, Mar 22, 2021 at 07:38:20PM +, Matthew Wilcox (Oracle) wrote:
> If we're trying to allocate 4MB of memory, the table will be 8KiB in size
> (1024 pointers * 8 bytes per pointer), which can usually be satisfied
> by a kmalloc (which is significantly faster).  Instead of changing this
> open-coded implementation, just use kvmalloc().
> 
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---
>  mm/vmalloc.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 96444d64129a..32b640a84250 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2802,13 +2802,8 @@ static void *__vmalloc_area_node(struct vm_struct 
> *area, gfp_t gfp_mask,
>   gfp_mask |= __GFP_HIGHMEM;
>  
>   /* Please note that the recursion is strictly bounded. */
> - if (array_size > PAGE_SIZE) {
> - pages = __vmalloc_node(array_size, 1, nested_gfp, node,
> + pages = kvmalloc_node_caller(array_size, nested_gfp, node,
>   area->caller);
> - } else {
> - pages = kmalloc_node(array_size, nested_gfp, node);
> - }
> -
>   if (!pages) {
>   free_vm_area(area);
>   return NULL;
> -- 
> 2.30.2
Makes sense to me. Though i expected a bigger difference:

# patch
single CPU, 4MB allocation, loops: 100 avg: 85293854 usec

# default
single CPU, 4MB allocation, loops: 100 avg: 89275857 usec

One question. Should we care much about fragmentation? I mean
with the patch, allocations > 2MB will do request to SLAB bigger
then PAGE_SIZE.

Thanks!

--
Vlad Rezki


Re: [PATCH v2 01/18] vfs: add miscattr ops

2021-03-22 Thread Darrick J. Wong
On Mon, Mar 22, 2021 at 03:48:59PM +0100, Miklos Szeredi wrote:
> There's a substantial amount of boilerplate in filesystems handling
> FS_IOC_[GS]ETFLAGS/ FS_IOC_FS[GS]ETXATTR ioctls.
> 
> Also due to userspace buffers being involved in the ioctl API this is
> difficult to stack, as shown by overlayfs issues related to these ioctls.
> 
> Introduce a new internal API named "miscattr" (fsxattr can be confused with
> xattr, xflags is inappropriate, since this is more than just flags).
> 
> There's significant overlap between flags and xflags and this API handles
> the conversions automatically, so filesystems may choose which one to use.
> 
> In ->miscattr_get() a hint is provided to the filesystem whether flags or
> xattr are being requested by userspace, but in this series this hint is
> ignored by all filesystems, since generating all the attributes is cheap.
> 
> If a filesystem doesn't implemement the miscattr API, just fall back to
> f_op->ioctl().  When all filesystems are converted, the fallback can be
> removed.
> 
> 32bit compat ioctls are now handled by the generic code as well.
> 
> Signed-off-by: Miklos Szeredi 
> ---
>  Documentation/filesystems/locking.rst |   5 +
>  Documentation/filesystems/vfs.rst |  15 ++
>  fs/ioctl.c| 329 ++
>  include/linux/fs.h|   4 +
>  include/linux/miscattr.h  |  53 +
>  5 files changed, 406 insertions(+)
>  create mode 100644 include/linux/miscattr.h
> 
> diff --git a/Documentation/filesystems/locking.rst 
> b/Documentation/filesystems/locking.rst
> index b7dcc86c92a4..a5aa2046d48f 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -80,6 +80,9 @@ prototypes::
>   struct file *, unsigned open_flag,
>   umode_t create_mode);
>   int (*tmpfile) (struct inode *, struct dentry *, umode_t);
> + int (*miscattr_set)(struct user_namespace *mnt_userns,
> + struct dentry *dentry, struct miscattr *ma);
> + int (*miscattr_get)(struct dentry *dentry, struct miscattr *ma);
>  
>  locking rules:
>   all may block
> @@ -107,6 +110,8 @@ fiemap:   no
>  update_time: no
>  atomic_open: shared (exclusive if O_CREAT is set in open flags)
>  tmpfile: no
> +miscattr_get:no or exclusive
> +miscattr_set:exclusive
>   =
>  
>  
> diff --git a/Documentation/filesystems/vfs.rst 
> b/Documentation/filesystems/vfs.rst
> index 2049bbf5e388..f125ce6c3b47 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -441,6 +441,9 @@ As of kernel 2.6.22, the following members are defined:
>  unsigned open_flag, umode_t create_mode);
>   int (*tmpfile) (struct user_namespace *, struct inode *, struct 
> dentry *, umode_t);
>   int (*set_acl)(struct user_namespace *, struct inode *, struct 
> posix_acl *, int);
> + int (*miscattr_set)(struct user_namespace *mnt_userns,
> + struct dentry *dentry, struct miscattr *ma);
> + int (*miscattr_get)(struct dentry *dentry, struct miscattr *ma);
>   };
>  
>  Again, all methods are called without any locks being held, unless
> @@ -588,6 +591,18 @@ otherwise noted.
>   atomically creating, opening and unlinking a file in given
>   directory.
>  
> +``miscattr_get``

I wish this wasn't named "misc" because miscellaneous is vague.

fileattr_get, perhaps?

(FWIW I'm not /that/ passionate about starting a naming bikeshed, feel
free to ignore.)

> + called on ioctl(FS_IOC_GETFLAGS) and ioctl(FS_IOC_FSGETXATTR) to
> + retrieve miscellaneous filesystem flags and attributes.  Also

"...miscellaneous *file* flags and attributes."

> + called before the relevant SET operation to check what is being
> + changed (in this case with i_rwsem locked exclusive).  If unset,
> + then fall back to f_op->ioctl().
> +
> +``miscattr_set``
> + called on ioctl(FS_IOC_SETFLAGS) and ioctl(FS_IOC_FSSETXATTR) to
> + change miscellaneous filesystem flags and attributes.  Callers hold

Same here.

> + i_rwsem exclusive.  If unset, then fall back to f_op->ioctl().
> +
>  
>  The Address Space Object
>  
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 4e6cc0a7d69c..e5f3820809a4 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -19,6 +19,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  
>  #include "internal.h"
>  
> @@ -657,6 +660,311 @@ static int ioctl_file_dedupe_range(struct file *file,
>   return ret;
>  }
>  
> +/**
> + * miscattr_fill_xflags - initialize miscattr with xflags
> + * @ma:  miscattr pointer
> + * @xflags:  FS_XFLAG_* flags
> + *
> + * Set ->fsx_xflags, ->xattr_valid and ->flags (translated xflags

[PATCH] hfs/hfsplus: use WARN_ON for sanity check

2021-03-22 Thread Arnd Bergmann
From: Arnd Bergmann 

gcc warns about a couple of instances in which a sanity check
exists but the author wasn't sure how to react to it failing,
which makes it look like a possible bug:

fs/hfsplus/inode.c: In function 'hfsplus_cat_read_inode':
fs/hfsplus/inode.c:503:37: error: suggest braces around empty body in an 'if' 
statement [-Werror=empty-body]
  503 | /* panic? */;
  | ^
fs/hfsplus/inode.c:524:37: error: suggest braces around empty body in an 'if' 
statement [-Werror=empty-body]
  524 | /* panic? */;
  | ^
fs/hfsplus/inode.c: In function 'hfsplus_cat_write_inode':
fs/hfsplus/inode.c:582:37: error: suggest braces around empty body in an 'if' 
statement [-Werror=empty-body]
  582 | /* panic? */;
  | ^
fs/hfsplus/inode.c:608:37: error: suggest braces around empty body in an 'if' 
statement [-Werror=empty-body]
  608 | /* panic? */;
  | ^
fs/hfs/inode.c: In function 'hfs_write_inode':
fs/hfs/inode.c:464:37: error: suggest braces around empty body in an 'if' 
statement [-Werror=empty-body]
  464 | /* panic? */;
  | ^
fs/hfs/inode.c:485:37: error: suggest braces around empty body in an 'if' 
statement [-Werror=empty-body]
  485 | /* panic? */;
  | ^

panic() is probably not the correct choice here, but a WARN_ON
seems appropriate and avoids the compile-time warning.

Signed-off-by: Arnd Bergmann 
---
 fs/hfs/inode.c |  6 ++
 fs/hfsplus/inode.c | 12 
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index 3fc5cb346586..4c5610b5356f 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -460,8 +460,7 @@ int hfs_write_inode(struct inode *inode, struct 
writeback_control *wbc)
goto out;
 
if (S_ISDIR(main_inode->i_mode)) {
-   if (fd.entrylength < sizeof(struct hfs_cat_dir))
-   /* panic? */;
+   WARN_ON(fd.entrylength < sizeof(struct hfs_cat_dir));
hfs_bnode_read(fd.bnode, &rec, fd.entryoffset,
   sizeof(struct hfs_cat_dir));
if (rec.type != HFS_CDR_DIR ||
@@ -481,8 +480,7 @@ int hfs_write_inode(struct inode *inode, struct 
writeback_control *wbc)
hfs_bnode_write(fd.bnode, &rec, fd.entryoffset,
sizeof(struct hfs_cat_file));
} else {
-   if (fd.entrylength < sizeof(struct hfs_cat_file))
-   /* panic? */;
+   WARN_ON(fd.entrylength < sizeof(struct hfs_cat_file));
hfs_bnode_read(fd.bnode, &rec, fd.entryoffset,
   sizeof(struct hfs_cat_file));
if (rec.type != HFS_CDR_FIL ||
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index 078c5c8a5156..6bcb0d935472 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -499,8 +499,7 @@ int hfsplus_cat_read_inode(struct inode *inode, struct 
hfs_find_data *fd)
if (type == HFSPLUS_FOLDER) {
struct hfsplus_cat_folder *folder = &entry.folder;
 
-   if (fd->entrylength < sizeof(struct hfsplus_cat_folder))
-   /* panic? */;
+   WARN_ON(fd->entrylength < sizeof(struct hfsplus_cat_folder));
hfs_bnode_read(fd->bnode, &entry, fd->entryoffset,
sizeof(struct hfsplus_cat_folder));
hfsplus_get_perms(inode, &folder->permissions, 1);
@@ -520,8 +519,7 @@ int hfsplus_cat_read_inode(struct inode *inode, struct 
hfs_find_data *fd)
} else if (type == HFSPLUS_FILE) {
struct hfsplus_cat_file *file = &entry.file;
 
-   if (fd->entrylength < sizeof(struct hfsplus_cat_file))
-   /* panic? */;
+   WARN_ON(fd->entrylength < sizeof(struct hfsplus_cat_file));
hfs_bnode_read(fd->bnode, &entry, fd->entryoffset,
sizeof(struct hfsplus_cat_file));
 
@@ -578,8 +576,7 @@ int hfsplus_cat_write_inode(struct inode *inode)
if (S_ISDIR(main_inode->i_mode)) {
struct hfsplus_cat_folder *folder = &entry.folder;
 
-   if (fd.entrylength < sizeof(struct hfsplus_cat_folder))
-   /* panic? */;
+   WARN_ON(fd.entrylength < sizeof(struct hfsplus_cat_folder));
hfs_bnode_read(fd.bnode, &entry, fd.entryoffset,
sizeof(struct hfsplus_cat_folder));
/* simple node checks? */
@@ -604,8 +601,7 @@ int hfsplus_cat_write_inode(struct inode *inode)
} else {
struct hfsplus_cat_file *file = &

Re: [PATCH 1/2] mtd: spi-nor: sfdp: save a copy of the SFDP data

2021-03-22 Thread Michael Walle

Am 2021-03-22 19:42, schrieb Pratyush Yadav:

On 22/03/21 04:32PM, Michael Walle wrote:

Am 2021-03-22 15:21, schrieb Pratyush Yadav:
> On 18/03/21 10:24AM, Michael Walle wrote:
> > +
> > + sfdp->num_dwords = DIV_ROUND_UP(sfdp_size, sizeof(*sfdp->dwords));
>
> The SFDP spec says that Parameter Table Pointer should be DWORD aligned
> and Parameter Table length is specified in number of DWORDs. So,
> sfdp_size should always be a multiple of 4. Any SFDP table where this is
> not true is an invalid one.
>
> Also, the spec says "Device behavior when the Read SFDP command crosses
> the SFDP structure boundary is not defined".
>
> So I think this should be a check for alignment instead of a round-up.

Well, that woundn't help for debugging. I.e. you also want the SFDP 
data

in cases like this. IMHO we should try hard enough to actually get a
reasonable dump.

OTOH we also rely on the header and the pointers in the header. Any
other ideas, but just to chicken out?


Honestly, I don't think reading past the SFDP boundary would be too 
bad.

It probably will just be some garbage data. But if you want to avoid
that, you can always round down instead of up.


Like I said, while the storage will be rounded up to a multiple of
DWORDs, only sfdp_size is transferred. Thus it case a pointer is not
DWORD aligned, we end up with zeros at the end.

I'll add a comment.


This way you will only
miss the last DWORD at most. In either case, a warning should be 
printed

so this problem can be brought to the user's attention.


I was about to add a warning/debug message. But its the wrong place.
It should really be checked in the for loop which iterates over the
headers before parsing them. You could check sfdp_size but then two
unaligned param pointers might cancel each other out.

This can be a seperate patch, besides adding a warning, should there
be any other things to do, e.g. stop parsing and error out?

..


> > + goto exit;
> > + }
> > +
> > + err = spi_nor_read_sfdp_dma_unsafe(nor, 0, sfdp_size, sfdp->dwords);


Btw, this can be spi_nor_read_sfdp(). But I'm not sure, what this
whole dma capable buffer should be. Is kmalloc(GFP_KERNEL)
considered DMA safe?

The buffer ends in spi_nor_read_data(), which is also called from
mtdcore:

spi_nor_read_sfdp()
  spi_nor_read_raw()
spi_nor_read_data()

mtd_read()
  mtd_read_oob()
mtd_read_oob_std()
  spi_nor_read()
spi_nor_read_data()

Is the buffer passed from mtd_read() also DMA-safe? Doesn't the SPI
drivers allocate DMA safe buffers if they need them?

-michael


Re: [PATCH v4 05/19] kvm: arm64: Disable guest access to trace filter controls

2021-03-22 Thread Suzuki K Poulose

Hi Marc,

On 25/02/2021 19:35, Suzuki K Poulose wrote:

Disable guest access to the Trace Filter control registers.
We do not advertise the Trace filter feature to the guest
(ID_AA64DFR0_EL1: TRACE_FILT is cleared) already, but the guest
can still access the TRFCR_EL1 unless we trap it.

This will also make sure that the guest cannot fiddle with
the filtering controls set by a nvhe host.

Cc: Marc Zyngier 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Catalin Marinas 
Signed-off-by: Suzuki K Poulose 


We have already have the v8.4 self hosted tracing support in 5.12-rcX.
Do you think you can pick this up for this 5.12 ?

Cheers
Suzuki


---
New patch
---
  arch/arm64/include/asm/kvm_arm.h | 1 +
  arch/arm64/kvm/debug.c   | 2 ++
  2 files changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 4e90c2debf70..94d4025acc0b 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -278,6 +278,7 @@
  #define CPTR_EL2_DEFAULT  CPTR_EL2_RES1
  
  /* Hyp Debug Configuration Register bits */

+#define MDCR_EL2_TTRF  (1 << 19)
  #define MDCR_EL2_TPMS (1 << 14)
  #define MDCR_EL2_E2PB_MASK(UL(0x3))
  #define MDCR_EL2_E2PB_SHIFT   (UL(12))
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 7a7e425616b5..dbc890511631 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -89,6 +89,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
   *  - Debug ROM Address (MDCR_EL2_TDRA)
   *  - OS related registers (MDCR_EL2_TDOSA)
   *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
+ *  - Self-hosted Trace Filter controls (MDCR_EL2_TTRF)
   *
   * Additionally, KVM only traps guest accesses to the debug registers if
   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
@@ -112,6 +113,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
MDCR_EL2_TPMS |
+   MDCR_EL2_TTRF |
MDCR_EL2_TPMCR |
MDCR_EL2_TDRA |
MDCR_EL2_TDOSA);





Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

2021-03-22 Thread Kai Huang


> 
> Btw, I probably have seen this and forgotten again so pls remind me,
> is the amount of pages available for SGX use static and limited by,
> I believe BIOS, or can a leakage in EPC pages cause system memory
> shortage?
> 

Yes EPC size is fixed and configured in BIOS. Leaking EPC pages may cause EPC
shortage, but not system memory.


Re: [RFC PATCH v2 net-next 14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge

2021-03-22 Thread Vladimir Oltean
On Mon, Mar 22, 2021 at 04:04:01PM +0800, DENG Qingfang wrote:
> On Fri, Mar 19, 2021 at 6:49 PM Vladimir Oltean  wrote:
> > Why would you even want to look at the source net device for forwarding?
> > I'd say that if dp->bridge_dev is NULL in the xmit function, you certainly
> > want to bypass address learning if you can. Maybe also for link-local 
> > traffic.
> 
> Also for trapped traffic (snooping, tc-flower trap action) if the CPU
> sends them back.

This sounds line an interesting use case, please tell me more about what
commands I could run to reinject trapped packets into the hardware data
path.


Re: [PATCH v2 00/10] Rid W=1 warnings from OF

2021-03-22 Thread Rob Herring
On Thu, Mar 18, 2021 at 4:40 AM Lee Jones  wrote:
>
> This set is part of a larger effort attempting to clean-up W=1
> kernel builds, which are currently overwhelmingly riddled with
> niggly little warnings.
>
> v2:
>  - Provided some descriptions to exported functions
>
> Lee Jones (10):
>   of: device: Fix function name in header and provide missing
> descriptions
>   of: dynamic: Fix incorrect parameter name and provide missing
> descriptions
>   of: platform: Demote kernel-doc abuse
>   of: base: Fix some formatting issues and provide missing descriptions
>   of: property: Provide missing member description and remove excess
> param
>   of: address: Provide descriptions for 'of_address_to_resource's params
>   of: fdt: Demote kernel-doc abuses and fix function naming
>   of: of_net: Provide function name and param description
>   of: overlay: Fix function name disparity
>   of: of_reserved_mem: Demote kernel-doc abuses
>
>  drivers/of/address.c |  3 +++
>  drivers/of/base.c| 16 +++-
>  drivers/of/device.c  |  7 ++-
>  drivers/of/dynamic.c |  4 +++-
>  drivers/of/fdt.c | 23 ---
>  drivers/of/of_net.c  |  3 +++
>  drivers/of/of_reserved_mem.c |  6 +++---
>  drivers/of/overlay.c |  2 +-
>  drivers/of/platform.c|  2 +-
>  drivers/of/property.c|  2 +-
>  10 files changed, 44 insertions(+), 24 deletions(-)

I still see some warnings (note this is with DT files added to doc
build). Can you send follow-up patches:

../include/linux/of.h:1193: warning: Function parameter or member
'output' not described in 'of_property_read_string_index'
../include/linux/of.h:1193: warning: Excess function parameter
'out_string' description in 'of_property_read_string_index'
../include/linux/of.h:1461: warning: cannot understand function
prototype: 'enum of_overlay_notify_action '
../drivers/of/base.c:1781: warning: Excess function parameter 'prob'
description in '__of_add_property'
../drivers/of/base.c:1804: warning: Excess function parameter 'prob'
description in 'of_add_property'
../drivers/of/base.c:1855: warning: Function parameter or member
'prop' not described in 'of_remove_property'
../drivers/of/base.c:1855: warning: Excess function parameter 'prob'
description in 'of_remove_property'

BTW, there some more which I guess W=1 doesn't find:

/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:19:
../drivers/of/base.c:906: WARNING: Block quote ends without a blank
line; unexpected unindent.
/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:19:
../drivers/of/base.c:1465: WARNING: Definition list ends without a
blank line; unexpected unindent.
/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:19:
../drivers/of/base.c:1469: WARNING: Definition list ends without a
blank line; unexpected unindent.
/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:19:
../drivers/of/base.c:1473: WARNING: Definition list ends without a
blank line; unexpected unindent.
/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:19:
../drivers/of/base.c:1517: WARNING: Definition list ends without a
blank line; unexpected unindent.
/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:19:
../drivers/of/base.c:1521: WARNING: Definition list ends without a
blank line; unexpected unindent.
/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:19:
../drivers/of/base.c:1526: WARNING: Unexpected indentation.
/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:19:
../drivers/of/base.c:1528: WARNING: Block quote ends without a blank
line; unexpected unindent.
/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:19:
../drivers/of/base.c:1529: WARNING: Definition list ends without a
blank line; unexpected unindent.
/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:19:
../drivers/of/base.c:1533: WARNING: Definition list ends without a
blank line; unexpected unindent.
/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:19:
../drivers/of/base.c:1705: WARNING: Definition list ends without a
blank line; unexpected unindent.
/home/rob/proj/git/linux-dt/Documentation/driver-api/devicetree:49:
../drivers/of/overlay.c:1183: WARNING: Inline emphasis start-string
without end-string.

Rob


Re: [syzbot] KASAN: use-after-free Read in disk_part_iter_next (2)

2021-03-22 Thread Bart Van Assche

On 3/22/21 12:18 AM, Christoph Hellwig wrote:

I've been running the reproducer on a KASAN enable VM for about
15 minutes now, but haven't been able to reproduce it.

Is there a way to inject this proposed fix into the syzbot queue?

diff --git a/block/partitions/core.c b/block/partitions/core.c
index 1a7558917c47d6..f5d5872b89d57e 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -288,15 +288,12 @@ struct device_type part_type = {
  void delete_partition(struct block_device *part)
  {
xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
-   kobject_put(part->bd_holder_dir);
-   device_del(&part->bd_device);
-
-   /*
-* Remove the block device from the inode hash, so that it cannot be
-* looked up any more even when openers still hold references.
-*/
remove_inode_hash(part->bd_inode);
  
+	synchronize_rcu();

+
+   kobject_put(part->bd_holder_dir);
+   device_del(&part->bd_device);
put_device(&part->bd_device);
  }


Hi Christoph,

disk_part_iter_next() calls bdgrab() before returning a pointer to a 
certain partition. 'part' is only freed if its reference count drops to 
zero. The function that frees the partition information, 
bdev_free_inode(), is invoked via call_rcu(). bdgrab() fails if the 
refcount of a partition is zero. Does that mean that it is not necessary 
to call synchronize_rcu() between xa_erase() and put_device()?


Thanks,

Bart.



Re: [PATCH v4 03/19] kvm: arm64: Hide system instruction access to Trace registers

2021-03-22 Thread Suzuki K Poulose

Will, Catalin,

On 25/02/2021 19:35, Suzuki K Poulose wrote:

Currently we advertise the ID_AA6DFR0_EL1.TRACEVER for the guest,
when the trace register accesses are trapped (CPTR_EL2.TTA == 1).
So, the guest will get an undefined instruction, if trusts the
ID registers and access one of the trace registers.
Lets be nice to the guest and hide the feature to avoid
unexpected behavior.

Even though this can be done at KVM sysreg emulation layer,
we do this by removing the TRACEVER from the sanitised feature
register field. This is fine as long as the ETM drivers
can handle the individual trace units separately, even
when there are differences among the CPUs.

Cc: Marc Zyngier 
Cc: Will Deacon 
Cc: Catalin Marinas 
Cc: Mark Rutland 
Signed-off-by: Suzuki K Poulose 
---
New patch
---
  arch/arm64/kernel/cpufeature.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 066030717a4c..a4698f09bf32 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -383,7 +383,6 @@ static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = {
 * of support.
 */
S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, 
ID_AA64DFR0_PMUVER_SHIFT, 4, 0),
-   ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, 
ID_AA64DFR0_TRACEVER_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, 
ID_AA64DFR0_DEBUGVER_SHIFT, 4, 0x6),
ARM64_FTR_END,
  };



Are you happy to pick this patch for 5.12 as a fix ?

Suzuki


Re: [PATCH 1/5] cifsd: add server handler and tranport layers

2021-03-22 Thread Matthew Wilcox
On Mon, Mar 22, 2021 at 02:13:40PM +0900, Namjae Jeon wrote:
> +#define RESPONSE_BUF(w)  ((void *)(w)->response_buf)
> +#define REQUEST_BUF(w)   ((void *)(w)->request_buf)

Why do you do this obfuscation?

> +#define RESPONSE_BUF_NEXT(w) \
> + ((void *)((w)->response_buf + (w)->next_smb2_rsp_hdr_off))
> +#define REQUEST_BUF_NEXT(w)  \
> + ((void *)((w)->request_buf + (w)->next_smb2_rcv_hdr_off))

These obfuscations aren't even used; delete them

> +#define RESPONSE_SZ(w)   ((w)->response_sz)
> +
> +#define INIT_AUX_PAYLOAD(w)  ((w)->aux_payload_buf = NULL)
> +#define HAS_AUX_PAYLOAD(w)   ((w)->aux_payload_sz != 0)

I mean, do you really find it clearer to write:

if (HAS_AUX_PAYLOAD(work))
than
if (work->aux_payload_sz)

The unobfuscated version is actually shorter!



Re: [PATCH 0/3] Apple M1 DART IOMMU driver

2021-03-22 Thread Sven Peter


Hi Mark,

On Sun, Mar 21, 2021, at 19:35, Mark Kettenis wrote:
>
> Guess we do need to understand a little bit better how the USB DART
> actually works.  My hypothesis (based on our discussion on #asahi) is
> that the XHCI host controller and the peripheral controller of the
> DWC3 block use different DMA "streams" that are handled by the
> different sub-DARTs.

I've done some more experiments and the situation is unfortunately more
complicated: Most DMA transfers are translated with the first DART.
But sometimes (and I have not been able to figure out the exact conditions)
transfers instead have to go through the second DART. 
This happens e.g. with one of my USB keyboards after a stop EP command
is issued: Suddenly the xhci_ep_ctx struct must be translated through the
second DART.

What this likely means is that we'll need to point both DARTs
to the same pagetables and just issue the TLB maintenance operations
as a group.

> 
> The Corellium folks use a DART + sub-DART model in their driver and a
> single node in the device tree that represents both.  That might sense
> since the error registers and interrupts are shared.  Maybe it would
> make sense to select the appropriate sub-DART based on the DMA stream
> ID?

dwc3 specifically seems to require stream id #1 from the DART
at <0x5 0x02f0> and stream id #0 from the DART at <0x5 0x02f8>.
Both of these only share a IRQ line but are otherwise completely independent.
Each has their own error registers, etc. and we need some way to
specify these two DARTs + the appropriate stream ID.

Essentially we have three options to represent this now:

1) Add both DARTs as separate regs, use #iommu-cells = <2> and have the
   first cell select the DART and the second one the stream ID.
   We could allow #iommu-cells = <1> in case only one reg is specified
   for the PCIe DART:

   usb_dart1@502f0 {
 compatible = "apple,t8103-dart";
 reg = <0x5 0x02f0 0x0 0x4000>, <0x5 0x02f8 0x0 0x4000>;
 #iommu-cells = <2>;
 ...
   };

   usb1 {
 iommus = <&usb_dart1 0 1>, <&usb_dart1 1 0>;
 ...
   };

   I prefer this option because we fully describe the DART in a single
   device node here. It also feels natural to group them like this because
   they need to share some properties (like dma-window and the interrupt)
   anyway. 

2) Create two DART nodes which share the same IRQ line and attach them
   both to the master node:

   usb_dart1a@502f0 {
 compatible = "apple,t8103-dart";
 reg = <0x5 0x02f0 0x0 0x4000>;
 #iommu-cells = <1>;
 ...
   };
   usb_dart1b@502f8 {
 compatible = "apple,t8103-dart";
 reg = <0x5 0x02f8 0x0 0x4000>;
 #iommu-cells = <1>;
 ...
   };

   usb1 {
 iommus = <&usb_dart1a 1>, <&usb_dart1b 0>;
 ...
   };

   I dislike this one because attaching those two DARTs to a single device
   seems rather unusual. We'd also have to duplicate the dma-window setting,
   make sure it's the same for both DARTs and there are probably even more
   complications I can't think of right now. It seems like this would also
   make the device tree very verbose and the implementation itself more
   complicated.

3) Introduce another property and let the DART driver take care of
   mirroring the pagetables. I believe this would be similar to
   the sid-remap property:

   usb_dart1@502f0 {
 compatible = "apple,t8103-dart";
 reg = <0x5 0x02f0 0x0 0x4000>, <0x5 0x02f8 0x0 0x4000>;
 #iommu-cells = <1>;
 sid-remap = <0 1>;
   };
   usb1 {
 iommus = <&usb_dart1 0>;
   };

   I slightly dislike this one because we now specify which stream id 
   to use in two places: Once in the device node and another time in the
   new property in the DART node. I also don't think the binding is much
   simpler than the first one.


> > where #dma-address-cells and #dma-size-cells default to
> > #address-cells and #size-cells respectively if I understand
> > the code correctly. That way we could also just always use
> > a 64bit address and size in the DT, e.g.
> > 
> >   pcie_dart {
> >   [ ... ]
> >   dma-window = <0 0x10 0 0x3fe0>;
> >   [ ... ]
> >   };
> 
> That sounds like a serious contender to me!  Hopefully one of the
> Linux kernel developers can give this some sort of blessing.
> 
> I think it would make sense for us to just rely on the #address-cells
> and #size-cells defaults for the M1 device tree.
>

Agreed.


Best,

Sven


Re: [PATCH 1/2] zram: fix crashes due to use of cpu hotplug multistate

2021-03-22 Thread Minchan Kim
On Mon, Mar 22, 2021 at 08:41:56PM +, Luis Chamberlain wrote:
> On Mon, Mar 22, 2021 at 09:37:17AM -0700, Minchan Kim wrote:
> > On Fri, Mar 19, 2021 at 07:09:24PM +, Luis Chamberlain wrote:
> > > Indeed one issue is a consequence of the other but a bit better
> > > description can be put together for both separately.
> > > 
> > > The warning comes up when cpu hotplug detects that the callback
> > > is being removed, but yet "instances" for multistate are left behind,
> > > ie, not removed. CPU hotplug multistate allows us to dedicate a callback
> > > for zram so that it gets called every time a CPU hot plugs or unplugs.
> > > In the zram driver's case we want to trigger the callback per each
> > > struct zcomp, one is used per supported and used supported compression
> > 
> >  you meant "each one is used per supported compression"?
> 
> Yup.
> 
> > > algorithm. The zram driver allocates a struct zcomp for an compression
> > > algorithm on a need basis. The default compressor is CONFIG_ZRAM_DEF_COMP
> > > which today is CONFIG_ZRAM_DEF_COMP_LZORLE ("lzo-rle"). Although we may
> > > have different zram devices, zram devices which use the same compression
> > > algorithm share the same struct zcomp. The "multi" in CPU hotplug 
> > > multstate
> > 
> > It allocates each own zcomp, not sharing even though zram devices
> > use the same compression algorithm.
> 
> Right.
> 
> > > refers to these different struct zcomp instances, each of these will
> > > have the callback routine registered run for it. The kernel's CPU
> > > hotplug multistate keeps a linked list of these different structures
> > > so that it will iterate over them on CPU transitions. By default at
> > > driver initialization we will create just one zram device (num_devices=1)
> > > and a zcomp structure then set for the lzo-rle comrpession algorithm.
> > > At driver removal we first remove each zram device, and so we destroy
> > > the struct zcomp. But since we expose sysfs attributes to create new
> > > devices or reset / initialize existing ones, we can easily end up
> > > re-initializing a struct zcomp before the exit routine of the module
> > > removes the cpu hotplug callback. When this happens the kernel's
> > > CPU hotplug will detect that at least one instance (struct zcomp for
> > > us) exists.
> > 
> > It's very helpful to understand. Thanks a lot!S
> > 
> > So IIUC, it's fundamentally race between destroy_devices(i.e., module
> > unload) and every sysfs knobs in zram. Isn't it?
> 
> I would not call it *every* syfs knob as not all deal with things which
> are related to CPU hotplug multistate, right? Note that using just
> try_module_get() alone (that is the second patch only, does not fix the
> race I am describing above).

It wouldn't be CPU hotplug multistate issue but I'd like to call it
as more "zram instance race" bug.
What happens in this case?

CPU 1CPU 2

destroy_devices
..
 compact_store()
 zram = dev_to_zram(dev);
idr_for_each(zram_remove_cb
  zram_remove
  ..
  kfree(zram)
 down_read(&zram->init_lock);


CPU 1CPU 2

hot_remove_store
 compact_store()
 zram = dev_to_zram(dev);
  zram_remove
kfree(zram)
 down_read(&zram->init_lock);

So, for me we need to close the zram instance create/removal
with sysfs rather than focusing on CPU hotplug issue.
Maybe, we could reuse zram_index_mutex with modifying it with
rw_semaphore. What do you think?

> 
> There are two issues.
> 
> > Below specific example is one of them and every sysfs code in zram
> > could access freed object(e.g., zram->something).
> > And you are claiming there isn't good way to fix it in kernfs(please
> > add it in the description, too) even though it's general problem.
> 
> Correct, the easier route would have been through the block layer as
> its the one adding our syfs attributes for us but even it canno deal
> with this race on behalf of drivers given the currently exposed
> semantics on kernfs.
> 
> > (If we had, we may detect the race and make zram_remove_cb fails
> > so unloading modulies fails, finally)
> > 
> > So, shouldn't we introcude a global rw_semaphore?
> > 
> > destroy_devices
> >   class_unregister
> >   down_write(&zram_unload);
> >   idr_for_each(zram_remove_cb);
> >   up_write(&zram_unload);
> > 
> > And every sysfs code hold the lock with down_read in the first place
> > and release the lock right before return. So, introduce a zram sysfs
> > wrapper to centralize all of locking logic.
> 
> Actually that does work but only if we use write lock attempts so to
> be able to knock two birds with one stone, so to address the deadlock
> with sysfs attribute removal. We're not asking politely to read some
> value off of a zram devices with th

[PATCH v4 4/4] ioctl_userfaultfd.2: Add write-protect mode docs

2021-03-22 Thread Peter Xu
Userfaultfd write-protect mode is supported starting from Linux 5.7.

Signed-off-by: Peter Xu 
---
 man2/ioctl_userfaultfd.2 | 84 ++--
 1 file changed, 81 insertions(+), 3 deletions(-)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index d4a8375b8..5419687a6 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -234,6 +234,11 @@ operation is supported.
 The
 .B UFFDIO_UNREGISTER
 operation is supported.
+.TP
+.B 1 << _UFFDIO_WRITEPROTECT
+The
+.B UFFDIO_WRITEPROTECT
+operation is supported.
 .PP
 This
 .BR ioctl (2)
@@ -322,9 +327,6 @@ Track page faults on missing pages.
 .B UFFDIO_REGISTER_MODE_WP
 Track page faults on write-protected pages.
 .PP
-Currently, the only supported mode is
-.BR UFFDIO_REGISTER_MODE_MISSING .
-.PP
 If the operation is successful, the kernel modifies the
 .I ioctls
 bit-mask field to indicate which
@@ -443,6 +445,16 @@ operation:
 .TP
 .B UFFDIO_COPY_MODE_DONTWAKE
 Do not wake up the thread that waits for page-fault resolution
+.TP
+.B UFFDIO_COPY_MODE_WP
+Copy the page with read-only permission.
+This allows the user to trap the next write to the page,
+which will block and generate another write-protect userfault message.
+This is only used when both
+.B UFFDIO_REGISTER_MODE_MISSING
+and
+.B UFFDIO_REGISTER_MODE_WP
+modes are enabled for the registered range.
 .PP
 The
 .I copy
@@ -654,6 +666,72 @@ field of the
 structure was not a multiple of the system page size; or
 .I len
 was zero; or the specified range was otherwise invalid.
+.SS UFFDIO_WRITEPROTECT (Since Linux 5.7)
+Write-protect or write-unprotect an userfaultfd registered memory range
+registered with mode
+.BR UFFDIO_REGISTER_MODE_WP .
+.PP
+The
+.I argp
+argument is a pointer to a
+.I uffdio_range
+structure as shown below:
+.PP
+.in +4n
+.EX
+struct uffdio_writeprotect {
+struct uffdio_range range;  /* Range to change write permission */
+__u64 mode; /* Mode to change write permission */
+};
+.EE
+.in
+There're two mode bits that are supported in this structure:
+.TP
+.B UFFDIO_WRITEPROTECT_MODE_WP
+When this mode bit is set, the ioctl will be a write-protect operation upon the
+memory range specified by
+.IR range .
+Otherwise it'll be a write-unprotect operation upon the specified range,
+which can be used to resolve an userfaultfd write-protect page fault.
+.TP
+.B UFFDIO_WRITEPROTECT_MODE_DONTWAKE
+When this mode bit is set,
+do not wake up any thread that waits for page-fault resolution after the 
operation.
+This could only be specified if
+.B UFFDIO_WRITEPROTECT_MODE_WP
+is not specified.
+.PP
+This
+.BR ioctl (2)
+operation returns 0 on success.
+On error, \-1 is returned and
+.I errno
+is set to indicate the error.
+Possible errors include:
+.TP
+.B EINVAL
+The
+.I start
+or the
+.I len
+field of the
+.I ufdio_range
+structure was not a multiple of the system page size; or
+.I len
+was zero; or the specified range was otherwise invalid.
+.TP
+.B EAGAIN
+The process was interrupted and need to retry.
+.TP
+.B ENOENT
+The range specified in
+.I range
+is not valid.
+For example, the virtual address does not exist,
+or not registered with userfaultfd write-protect mode.
+.TP
+.B EFAULT
+Encountered a generic fault during processing.
 .SH RETURN VALUE
 See descriptions of the individual operations, above.
 .SH ERRORS
-- 
2.26.2



[PATCH v4 3/4] ioctl_userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs

2021-03-22 Thread Peter Xu
UFFD_FEATURE_THREAD_ID is supported in Linux 4.14.

Signed-off-by: Peter Xu 
---
 man2/ioctl_userfaultfd.2 | 5 +
 1 file changed, 5 insertions(+)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 47ae5f473..d4a8375b8 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -208,6 +208,11 @@ signal will be sent to the faulting process.
 Applications using this
 feature will not require the use of a userfaultfd monitor for processing
 memory accesses to the regions registered with userfaultfd.
+.TP
+.BR UFFD_FEATURE_THREAD_ID " (since Linux 4.14)"
+If this feature bit is set,
+.I uffd_msg.pagefault.feat.ptid
+will be set to the faulted thread ID for each page fault message.
 .PP
 The returned
 .I ioctls
-- 
2.26.2



[PATCH v4 1/4] userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs

2021-03-22 Thread Peter Xu
UFFD_FEATURE_THREAD_ID is supported since Linux 4.14.

Signed-off-by: Peter Xu 
---
 man2/userfaultfd.2 | 13 +
 1 file changed, 13 insertions(+)

diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index e7dc9f813..555e37409 100644
--- a/man2/userfaultfd.2
+++ b/man2/userfaultfd.2
@@ -77,6 +77,13 @@ When the last file descriptor referring to a userfaultfd 
object is closed,
 all memory ranges that were registered with the object are unregistered
 and unread events are flushed.
 .\"
+.PP
+Since Linux 4.14, userfaultfd page fault message can selectively embed faulting
+thread ID information into the fault message.
+One needs to enable this feature explicitly using the
+.BR UFFD_FEATURE_THREAD_ID
+feature bit when initializing the userfaultfd context.
+By default, thread ID reporting is diabled.
 .SS Usage
 The userfaultfd mechanism is designed to allow a thread in a multithreaded
 program to perform user-space paging for the other threads in the process.
@@ -229,6 +236,9 @@ struct uffd_msg {
 struct {
 __u64 flags;/* Flags describing fault */
 __u64 address;  /* Faulting address */
+union {
+__u32 ptid; /* Thread ID of the fault */
+} feat;
 } pagefault;
 
 struct {/* Since Linux 4.11 */
@@ -358,6 +368,9 @@ otherwise it is a read fault.
 .\" UFFD_PAGEFAULT_FLAG_WP is not yet supported.
 .RE
 .TP
+.I pagefault.feat.pid
+The thread ID that triggered the page fault.
+.TP
 .I fork.ufd
 The file descriptor associated with the userfault object
 created for the child created by
-- 
2.26.2



[PATCH v4 2/4] userfaultfd.2: Add write-protect mode

2021-03-22 Thread Peter Xu
Write-protect mode is supported starting from Linux 5.7.

Signed-off-by: Peter Xu 
---
 man2/userfaultfd.2 | 104 -
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index 555e37409..8ad4a71b5 100644
--- a/man2/userfaultfd.2
+++ b/man2/userfaultfd.2
@@ -78,6 +78,32 @@ all memory ranges that were registered with the object are 
unregistered
 and unread events are flushed.
 .\"
 .PP
+Userfaultfd supports two modes of registration:
+.TP
+.BR UFFDIO_REGISTER_MODE_MISSING " (since 4.10)"
+When registered with
+.B UFFDIO_REGISTER_MODE_MISSING
+mode, the userspace will receive a page fault message
+when a missing page is accessed.
+The faulted thread will be stopped from execution until the page fault is
+resolved from the userspace by either an
+.B UFFDIO_COPY
+or an
+.B UFFDIO_ZEROPAGE
+ioctl.
+.TP
+.BR UFFDIO_REGISTER_MODE_WP " (since 5.7)"
+When registered with
+.B UFFDIO_REGISTER_MODE_WP
+mode, the userspace will receive a page fault message
+when a write-protected page is written.
+The faulted thread will be stopped from execution
+until the userspace un-write-protect the page using an
+.B UFFDIO_WRITEPROTECT
+ioctl.
+.PP
+Multiple modes can be enabled at the same time for the same memory range.
+.PP
 Since Linux 4.14, userfaultfd page fault message can selectively embed faulting
 thread ID information into the fault message.
 One needs to enable this feature explicitly using the
@@ -144,6 +170,17 @@ single threaded non-cooperative userfaultfd manager 
implementations.
 .\" and limitations remaining in 4.11
 .\" Maybe it's worth adding a dedicated sub-section...
 .\"
+.PP
+Starting from Linux 5.7, userfaultfd is able to do
+synchronous page dirty tracking using the new write-protection register mode.
+One should check against the feature bit
+.B UFFD_FEATURE_PAGEFAULT_FLAG_WP
+before using this feature.
+Similar to the original userfaultfd missing mode, the write-protect mode will
+generate an userfaultfd message when the protected page is written.
+The user needs to resolve the page fault by unprotecting the faulted page and
+kick the faulted thread to continue.
+For more information, please refer to "Userfaultfd write-protect mode" section.
 .SS Userfaultfd operation
 After the userfaultfd object is created with
 .BR userfaultfd (),
@@ -219,6 +256,65 @@ userfaultfd can be used only with anonymous private memory 
mappings.
 Since Linux 4.11,
 userfaultfd can be also used with hugetlbfs and shared memory mappings.
 .\"
+.SS Userfaultfd write-protect mode (since 5.7)
+Since Linux 5.7, userfaultfd supports write-protect mode.
+The user needs to first check availability of this feature using
+.B UFFDIO_API
+ioctl against the feature bit
+.B UFFD_FEATURE_PAGEFAULT_FLAG_WP
+before using this feature.
+.PP
+To register with userfaultfd write-protect mode, the user needs to initiate the
+.B UFFDIO_REGISTER
+ioctl with mode
+.B UFFDIO_REGISTER_MODE_WP
+set.
+Note that it's legal to monitor the same memory range with multiple modes.
+For example, the user can do
+.B UFFDIO_REGISTER
+with the mode set to
+.BR "UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_WP" .
+When there is only
+.B UFFDIO_REGISTER_MODE_WP
+registered, the userspace will
+.I not
+receive any message when a missing page is written.
+Instead, the userspace will only receive a write-protect page fault message
+when an existing but write-protected page got written.
+.PP
+After the
+.B UFFDIO_REGISTER
+ioctl completed with
+.B UFFDIO_REGISTER_MODE_WP
+mode set,
+the user can write-protect any existing memory within the range using the ioctl
+.B UFFDIO_WRITEPROTECT
+where
+.I uffdio_writeprotect.mode
+should be set to
+.BR UFFDIO_WRITEPROTECT_MODE_WP .
+.PP
+When a write-protect event happens,
+the userspace will receive a page fault message whose
+.I uffd_msg.pagefault.flags
+will be with
+.B UFFD_PAGEFAULT_FLAG_WP
+flag set.
+Note: since only writes can trigger such kind of fault,
+write-protect messages will always be with
+.B UFFD_PAGEFAULT_FLAG_WRITE
+bit set too along with bit
+.BR UFFD_PAGEFAULT_FLAG_WP .
+.PP
+To resolve a write-protection page fault, the user should initiate another
+.B UFFDIO_WRITEPROTECT
+ioctl, whose
+.I uffd_msg.pagefault.flags
+should have the flag
+.B UFFDIO_WRITEPROTECT_MODE_WP
+cleared upon the faulted page or range.
+.PP
+Write-protect mode only supports private anonymous memory.
 .SS Reading from the userfaultfd structure
 Each
 .BR read (2)
@@ -364,8 +460,12 @@ flag (see
 .BR ioctl_userfaultfd (2))
 and this flag is set, this a write fault;
 otherwise it is a read fault.
-.\"
-.\" UFFD_PAGEFAULT_FLAG_WP is not yet supported.
+.TP
+.B UFFD_PAGEFAULT_FLAG_WP
+If the address is in a range that was registered with the
+.B UFFDIO_REGISTER_MODE_WP
+flag, when this bit is set it means it's a write-protect fault.
+Otherwise it's a page missing fault.
 .RE
 .TP
 .I pagefault.feat.pid
-- 
2.26.2



[PATCH v4 0/4] man2: udpate mm/userfaultfd manpages to latest

2021-03-22 Thread Peter Xu
v4:
- Fixed a few "subordinate clauses" (SC) cases [Alex]
- Reword in ioctl_userfaultfd.2 to use bold font for the two modes referenced,
  so as to be clear on what is "both" referring to [Alex]

v3:
- Don't use "Currently", instead add "(since x.y)" mark where proper [Alex]
- Always use semantic newlines across the whole patchset [Alex]
- Use quote when possible, rather than escapes [Alex]
- Fix one missing replacement of ".BR" -> ".B" [Alex]
- Some other trivial rephrases here and there when fixing up above

v2 changes:
- Fix wordings as suggested [MikeR]
- convert ".BR" to ".B" where proper for the patchset [Alex]
- rearrange a few lines in the last two patches where they got messed up
- document more things, e.g. UFFDIO_COPY_MODE_WP; and also on how to resolve a
  wr-protect page fault.

There're two features missing in current manpage, namely:

  (1) Userfaultfd Thread-ID feature
  (2) Userfaultfd write protect mode

There's also a 3rd one which was just contributed from Axel - Axel, I think it
would be great if you can add that part too, probably after the whole
hugetlbfs/shmem minor mode reaches the linux master branch.

Please review, thanks.

Peter Xu (4):
  userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
  userfaultfd.2: Add write-protect mode
  ioctl_userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
  ioctl_userfaultfd.2: Add write-protect mode docs

 man2/ioctl_userfaultfd.2 |  89 -
 man2/userfaultfd.2   | 117 ++-
 2 files changed, 201 insertions(+), 5 deletions(-)

-- 
2.26.2




Re: [PATCH 02/11] x86: tboot: avoid Wstringop-overread-warning

2021-03-22 Thread Martin Sebor

On 3/22/21 2:29 PM, Ingo Molnar wrote:


* Arnd Bergmann  wrote:


From: Arnd Bergmann 

gcc-11 warns about using string operations on pointers that are
defined at compile time as offsets from a NULL pointer. Unfortunately
that also happens on the result of fix_to_virt(), which is a
compile-time constant for a constantn input:

arch/x86/kernel/tboot.c: In function 'tboot_probe':
arch/x86/kernel/tboot.c:70:13: error: '__builtin_memcmp_eq' specified bound 16 
exceeds source size 0 [-Werror=stringop-overread]
70 | if (memcmp(&tboot_uuid, &tboot->uuid, sizeof(tboot->uuid))) {
   | ^~

I hope this can get addressed in gcc-11 before the release.

As a workaround, split up the tboot_probe() function in two halves
to separate the pointer generation from the usage. This is a bit
ugly, and hopefully gcc understands that the code is actually correct
before it learns to peek into the noinline function.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
Signed-off-by: Arnd Bergmann 
---
  arch/x86/kernel/tboot.c | 44 -
  1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 4c09ba110204..f9af561c3cd4 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -49,6 +49,30 @@ bool tboot_enabled(void)
return tboot != NULL;
  }
  
+/* noinline to prevent gcc from warning about dereferencing constant fixaddr */

+static noinline __init bool check_tboot_version(void)
+{
+   if (memcmp(&tboot_uuid, &tboot->uuid, sizeof(tboot->uuid))) {
+   pr_warn("tboot at 0x%llx is invalid\n", boot_params.tboot_addr);
+   return false;
+   }
+
+   if (tboot->version < 5) {
+   pr_warn("tboot version is invalid: %u\n", tboot->version);
+   return false;
+   }
+
+   pr_info("found shared page at phys addr 0x%llx:\n",
+   boot_params.tboot_addr);
+   pr_debug("version: %d\n", tboot->version);
+   pr_debug("log_addr: 0x%08x\n", tboot->log_addr);
+   pr_debug("shutdown_entry: 0x%x\n", tboot->shutdown_entry);
+   pr_debug("tboot_base: 0x%08x\n", tboot->tboot_base);
+   pr_debug("tboot_size: 0x%x\n", tboot->tboot_size);
+
+   return true;
+}
+
  void __init tboot_probe(void)
  {
/* Look for valid page-aligned address for shared page. */
@@ -66,25 +90,9 @@ void __init tboot_probe(void)
  
  	/* Map and check for tboot UUID. */

set_fixmap(FIX_TBOOT_BASE, boot_params.tboot_addr);
-   tboot = (struct tboot *)fix_to_virt(FIX_TBOOT_BASE);
-   if (memcmp(&tboot_uuid, &tboot->uuid, sizeof(tboot->uuid))) {
-   pr_warn("tboot at 0x%llx is invalid\n", boot_params.tboot_addr);
+   tboot = (void *)fix_to_virt(FIX_TBOOT_BASE);
+   if (!check_tboot_version())
tboot = NULL;
-   return;
-   }
-   if (tboot->version < 5) {
-   pr_warn("tboot version is invalid: %u\n", tboot->version);
-   tboot = NULL;
-   return;
-   }
-
-   pr_info("found shared page at phys addr 0x%llx:\n",
-   boot_params.tboot_addr);
-   pr_debug("version: %d\n", tboot->version);
-   pr_debug("log_addr: 0x%08x\n", tboot->log_addr);
-   pr_debug("shutdown_entry: 0x%x\n", tboot->shutdown_entry);
-   pr_debug("tboot_base: 0x%08x\n", tboot->tboot_base);
-   pr_debug("tboot_size: 0x%x\n", tboot->tboot_size);


This is indeed rather ugly - and the other patch that removes a debug
check seems counterproductive as well.

Do we know how many genuine bugs -Wstringop-overread-warning has
caught or is about to catch?

I.e. the real workaround might be to turn off the -Wstringop-overread-warning,
until GCC-11 gets fixed?


In GCC 10 -Wstringop-overread is a subset of -Wstringop-overflow.
GCC 11 breaks it out as a separate warning to make it easier to
control.  Both warnings have caught some real bugs but they both
have a nonzero rate of false positives.  Other than bug reports
we don't have enough data to say what their S/N ratio might be
but my sense is that it's fairly high in general.

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=wstringop-overread
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=wstringop-overflow

In GCC 11, all access warnings expect objects to be either declared
or allocated.  Pointers with constant values are taken to point to
nothing valid (as Arnd mentioned above, this is to detect invalid
accesses to members of structs at address zero).

One possible solution to the known address problem is to extend GCC
attributes address and io that pin an object to a hardwired address
to all targets (at the moment they're supported on just one or two
targets).  I'm not sure this can still happen before GCC 11 releases
sometime in April or May.

Until then, another workaround is to convert the fixed address to
a volatile pointer

Re: [PATCH v6] close_range.2: new page documenting close_range(2)

2021-03-22 Thread Stephen Kitt
On Sun, 21 Mar 2021 16:38:59 +0100, "Michael Kerrisk (man-pages)"
 wrote:
> On 3/9/21 8:53 PM, Stephen Kitt wrote:
> > On Thu, 28 Jan 2021 21:50:23 +0100, "Michael Kerrisk (man-pages)"
> >  wrote:  
> >> Thanks for your patch revision. I've merged it, and have
> >> done some light editing, but I still have a question:  
> > 
> > Does this need anything more? I don’t see it in the man-pages repo.  
> 
> Sorry, Stephen. It's just me being slow. I've made a few edits,
> replaced the example program with another that more clearly allows
> the user to see what's going on, and pushed to Git.

Thanks, your example program is indeed much better!

Regards,

Stephen


pgpCskU70zyVN.pgp
Description: OpenPGP digital signature


Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

2021-03-22 Thread Kai Huang
On Mon, 22 Mar 2021 22:06:45 +0100 Borislav Petkov wrote:
> On Mon, Mar 22, 2021 at 12:37:02PM -0700, Sean Christopherson wrote:
> > Yes.  Note, it's still true if you strike out the "too", KVM support is 
> > completely
> > orthogonal to this code.  The purpose of this patch is to separate out the 
> > EREMOVE
> > path used for host enclaves (/dev/sgx_enclave), because EPC virtualization 
> > for
> > KVM will have non-buggy scenarios where EREMOVE can fail.  But the virt EPC 
> > code
> > is designed to handle that gracefully.
> 
> "gracefully" as it won't leak EPC pages which would require a host reboot? 
> That
> leaking is done by host enclaves only?

This path is called by host SGX driver only, so yes this leaking is done by
host enclaves only.

This patch is purpose is to break EREMOVE out of sgx_free_epc_page() so virtual
EPC code can use sgx_free_epc_page(), and handle EREMOVE logic differently.
This patch doesn't have intention to bring functional change.  I changed the
error msg because Dave said it worth to give a message saying EPC page is
leaked, and I thought this minor change won't break anything.

Perpahps we can avoid changing error message but stick to existing SGX driver
behavior? 

> 
> > Hmm.  I don't think it warrants BUG.  At worst, leaking EPC pages is fatal 
> > only
> > to SGX.
> 
> Fatal how? If it keeps leaking, at some point it won't have any pages
> for EPC pages anymore?
> 
> Btw, I probably have seen this and forgotten again so pls remind me,
> is the amount of pages available for SGX use static and limited by,
> I believe BIOS, or can a leakage in EPC pages cause system memory
> shortage?
> 
> > If the underlying bug caused other fallout, e.g. didn't release a
> > lock, then obviously that could be fatal to the kernel. But I don't
> > think there's ever a case where SGX being unusuable would prevent the
> > kernel from functioning.
> 
> This kinda replies my question above but still...
> 
> > Probably something in between.  Odds are good SGX will eventually become
> > unusuable, e.g. either kernel SGX support is completely hosted, or it will 
> > soon
> > leak the majority of EPC pages.  Something like this?
> > 
> >   "EREMOVE returned %d (0x%x), kernel bug likely.  EPC page leaked, SGX may 
> > become unusuable.  Reboot recommended to continue using SGX."
> 
> So all this handwaving I'm doing is to provoke a proper response from
> you guys as to how a EPC page leaking is supposed to be handled by the
> users of the technology:
> 
> 1. Issue a warning message and forget about it, eventual reboot

This is the existing SGX driver behavior IMHO. It just gives a WARN() saying
EREMOVE failed with the error code printed.  The EPC page is leaked w/o any
message to user.  I can live with it, and it is existing code anyway.

btw, currently virtual EPC code (patch 5) handles in similar way: There's one
EREMOVE error is expected and virtual EPC code can handle, but for other
errors, it means kernel bug, and virtual EPC code gives a WARN(), and that EPC
page is leaked too:

+   WARN_ONCE(ret != SGX_CHILD_PRESENT,
+ "EREMOVE (EPC page 0x%lx): unexpected error: %d\n",
+ sgx_get_epc_phys_addr(epc_page), ret);
+   return ret;

So to me they are just WARN() to catch kernel bug.

> 
> 2. Really scary message to make users reboot sooner
> 
> 3. Detect when host enclaves are run while guest enclaves are running
> and issue a warning then.

This code path has nothing to do with guest enclaves.

> 
> 4. Fall on knees and pray to not get sued by customers because their
> enclaves are not working anymore.
> 
> 
> 
> Btw, 4. needs to be considered properly so that people can cover asses.

If we are talking about CSPs being unable to provide correct services to
customers due to kernel bug, I think this is a bigger question but not
just related to SGX,  since other kernel bug can also cause similar problem, for
instance, VM or SGX process itself being killed.

> 
> Oh and whatever we end up deciding, we should document that in
> Documentation/... somewhere and point users to it in that warning
> message where a longer treatise is explaining the whole deal properly.
> 
> -- 
> Regards/Gruss,
> Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v3 4/4] ioctl_userfaultfd.2: Add write-protect mode docs

2021-03-22 Thread Peter Xu
On Fri, Mar 19, 2021 at 11:37:20PM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Peter,

Hi, Alex,

> > +generate another write-protect userfault message.
> > +This is only used in conjunction with write-protect mode when both missing 
> > and
> 
> "when both missing"
> 
> both what?

I modified it to:

This is only used when both
.B UFFDIO_REGISTER_MODE_MISSING
and
.B UFFDIO_REGISTER_MODE_WP
modes are enabled for the registered range.

And I fixed all the rest, including all the comments in patch 3.

Thanks for looking, I'll repost shortly.

-- 
Peter Xu



Re: [PATCH] of: overlay: fix for_each_child.cocci warnings

2021-03-22 Thread Frank Rowand
On 3/22/21 1:21 PM, Julia Lawall wrote:
> From: kernel test robot 
> 
> Function "for_each_child_of_node" should have of_node_put() before goto.
> 
> Generated by: scripts/coccinelle/iterators/for_each_child.cocci
> 
> Fixes: 82c2d81361ec ("coccinelle: iterators: Add for_each_child.cocci script")
> CC: Sumera Priyadarsini 
> Reported-by: kernel test robot 
> Signed-off-by: kernel test robot 
> Signed-off-by: Julia Lawall 
> ---
> 
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   812da4d39463a060738008a46cfc9f775e4bfcf6
> commit: 82c2d81361ecd142a54e84a9da1e287113314a4f coccinelle: iterators: Add 
> for_each_child.cocci script
> :: branch date: 13 hours ago
> :: commit date: 5 months ago
> 
>  overlay.c |1 +
>  1 file changed, 1 insertion(+)
> 
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -796,6 +796,7 @@ static int init_overlay_changeset(struct
>   if (!fragment->target) {
>   of_node_put(fragment->overlay);
>   ret = -EINVAL;
> + of_node_put(node);
>   goto err_free_fragments;
>   }
> 

Reviewed-by: Frank Rowand 
Tested-by: Frank Rowand 

While reading through the code touched by the patch I noticed that
the clean up at label err_free_fragments does not do the required
of_node_put() calls.  I'll add creating a patch to fix that to my
todo list.

-Frank


Re: [PATCH] ftrace: shut up -Wcast-function-type warning for ftrace_ops_no_ops

2021-03-22 Thread Steven Rostedt
On Mon, 22 Mar 2021 22:49:58 +0100
Arnd Bergmann  wrote:

> From: Arnd Bergmann 
> 
> With 'make W=1', gcc warns about casts between incompatible function
> types:
> 
> kernel/trace/ftrace.c:128:31: error: cast between incompatible function types 
> from 'void (*)(long unsigned int,  long unsigned int)' to 'void (*)(long 
> unsigned int,  long unsigned int,  struct ftrace_ops *, struct ftrace_regs 
> *)' [-Werror=cast-function-type]
>   128 | #define ftrace_ops_list_func ((ftrace_func_t)ftrace_ops_no_ops)
>   |   ^
> 
> As the commet here explains, this one was intentional, so shut up the
> warning harder by using a double cast.

Bonus points for reading the comment ;-)

I'll take this patch for the next merge window, thanks!

-- Steve


[PATCH] mm, thp: Relax the VM_DENYWRITE constraint on file-backed THPs

2021-03-22 Thread Collin Fijalkovich
Transparent huge pages are supported for read-only non-shmem filesystems,
but are only used for vmas with VM_DENYWRITE. This condition ensures that
file THPs are protected from writes while an application is running
(ETXTBSY).  Any existing file THPs are then dropped from the page cache
when a file is opened for write in do_dentry_open(). Since sys_mmap
ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
produced by execve().

Systems that make heavy use of shared libraries (e.g. Android) are unable
to apply VM_DENYWRITE through the dynamic linker, preventing them from
benefiting from the resultant reduced contention on the TLB.

This patch reduces the constraint on file THPs allowing use with any
executable mapping from a file not opened for write (see
inode_is_open_for_write()). It also introduces additional conditions to
ensure that files opened for write will never be backed by file THPs.

Restricting the use of THPs to executable mappings eliminates the risk that
a read-only file later opened for write would encounter significant
latencies due to page cache truncation.

The ld linker flag '-z max-page-size=(hugepage size)' can be used to
produce executables with the necessary layout. The dynamic linker must
map these file's segments at a hugepage size aligned vma for the mapping to
be backed with THPs.

Signed-off-by: Collin Fijalkovich 
---
 fs/open.c   | 13 +++--
 mm/khugepaged.c | 16 +++-
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index e53af13b5835..f76e960d10ea 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f,
 * XXX: Huge page cache doesn't support writing yet. Drop all page
 * cache for this file before processing writes.
 */
-   if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping))
-   truncate_pagecache(inode, 0);
+   if (f->f_mode & FMODE_WRITE) {
+   /*
+* Paired with smp_mb() in collapse_file() to ensure nr_thps
+* is up to date and the update to i_writecount by
+* get_write_access() is visible. Ensures subsequent insertion
+* of THPs into the page cache will fail.
+*/
+   smp_mb();
+   if (filemap_nr_thps(inode->i_mapping))
+   truncate_pagecache(inode, 0);
+   }
 
return 0;
 
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a7d6cb912b05..4c7cc877d5e3 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
 
/* Read-only file mappings need to be aligned for THP to work. */
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
-   (vm_flags & VM_DENYWRITE)) {
+   !inode_is_open_for_write(vma->vm_file->f_inode) &&
+   (vm_flags & VM_EXEC)) {
return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
HPAGE_PMD_NR);
}
@@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm,
else {
__mod_lruvec_page_state(new_page, NR_FILE_THPS, nr);
filemap_nr_thps_inc(mapping);
+   /*
+* Paired with smp_mb() in do_dentry_open() to ensure
+* i_writecount is up to date and the update to nr_thps is
+* visible. Ensures the page cache will be truncated if the
+* file is opened writable.
+*/
+   smp_mb();
+   if (inode_is_open_for_write(mapping->host)) {
+   result = SCAN_FAIL;
+   __mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr);
+   filemap_nr_thps_dec(mapping);
+   goto xa_locked;
+   }
}
 
if (nr_none) {
-- 
2.31.0.rc2.261.g7f71774620-goog



Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-22 Thread John Ogness
On 2021-03-22, Petr Mladek  wrote:
> On Mon 2021-03-22 12:16:15, John Ogness wrote:
>> On 2021-03-21, Sergey Senozhatsky  wrote:
>> >> @@ -369,7 +70,10 @@ __printf(1, 0) int vprintk_func(const char *fmt, 
>> >> va_list args)
>> >>* Use the main logbuf even in NMI. But avoid calling console
>> >>* drivers that might have their own locks.
>> >>*/
>> >> - if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
>> >> + if (this_cpu_read(printk_context) &
>> >> + (PRINTK_NMI_DIRECT_CONTEXT_MASK |
>> >> +  PRINTK_NMI_CONTEXT_MASK |
>> >> +  PRINTK_SAFE_CONTEXT_MASK)) {
>> >
>> > Do we need printk_nmi_direct_enter/exit() and
>> > PRINTK_NMI_DIRECT_CONTEXT_MASK?  Seems like all printk_safe() paths
>> > are now DIRECT - we store messages to the prb, but don't call console
>> > drivers.
>>
>> I was planning on waiting until the kthreads are introduced, in which
>> case printk_safe.c is completely removed.
>
> You want to keep printk_safe() context because it prevents calling
> consoles even in normal context. Namely, it prevents deadlock by
> recursively taking, for example, sem->lock in console_lock() or
> console_owner_lock in console_trylock_spinning(). Am I right?

Correct.

>> But I suppose I could switch
>> the 1 printk_nmi_direct_enter() user to printk_nmi_enter() so that
>> PRINTK_NMI_DIRECT_CONTEXT_MASK can be removed now. I would do this in a
>> 4th patch of the series.
>
> Yes, please unify the PRINTK_NMI_CONTEXT. One is enough.

Agreed. (But I'll go even further. See below.)

> I wonder if it would make sense to go even further at this stage.
> There will still be 4 contexts that modify the printk behavior after
> this patchset:
>
>   + printk_count set by printk_enter()/exit()
>   + prevents: infinite recursion
>   + context: any context
>   + action: skips entire printk at 3rd recursion level
>
>   + prink_context set by printk_safe_enter()/exit()
>   + prevents: dead lock caused by recursion into some
>   console code in any context
>   + context: any
>   + action: skips console call at 1st recursion level

Technically, at this point printk_safe_enter() behavior is identical to
printk_nmi_enter(). Namely, prevent any recursive printk calls from
calling into the console code.

>   + printk_context set by printk_nmi_enter()/exit()
>   + prevents: dead lock caused by any console lock recursion
>   + context: NMI
>   + action: skips console calls at 0th recursion level
>
>   + kdb_trap_printk
>   + redirects printk() to kdb_printk() in kdb context
>
>
> What is possible?
>
> 1. We could get rid of printk_nmi_enter()/exit() and
>PRINTK_NMI_CONTEXT completely already now. It is enough
>to check in_nmi() in printk_func().
>
>printk_nmi_enter() was added by the commit 42a0bb3f71383b457a7db362
>("printk/nmi: generic solution for safe printk in NMI"). It was
>really needed to modify @printk_func pointer.
>
>We did not remove it later when printk_function became a real
>function. The idea was to track all printk contexts in a single
>variable. But we never added kdb context.
>
>It might make sense to remove it now. Peter Zijstra would be happy.
>There already were some churns with tracking printk_context in NMI.
>For example, see
>https://lore.kernel.org/r/20200219150744.428764...@infradead.org
>
>IMHO, it does not make sense to wait until the entire console-stuff
>rework is done in this case.

Agreed. in_nmi() within vprintk_emit() is enough to detect if the
console code should be skipped:

if (!in_sched && !in_nmi()) {
...
}

> 2. I thought about unifying printk_safe_enter()/exit() and
>printk_enter()/exit(). They both count recursion with
>IRQs disabled, have similar name. But they are used
>different way.
>
>But better might be to rename printk_safe_enter()/exit() to
>console_enter()/exit() or to printk_deferred_enter()/exit().
>It would make more clear what it does now. And it might help
>to better distinguish it from the new printk_enter()/exit().
>
>This patchset actually splits the original printk_safe()
>functionality into two:
>
>+ printk_count prevents infinite recursion
>+ printk_deferred_enter() deffers console handling.
>
>I am not sure if it is worth it. But it might help people (even me)
>when digging into the printk history. Different name will help to
>understand the functionality at the given time.

I am also not sure if it is worth the extra "noise" just to give the
function a more appropriate name. The plan is to remove it completely
soon anyway. My vote is to leave the name as it is.

But I am willing to do the rename in an addtional patch if you
want. printk_deferred_enter() sounds fine to me. Please confirm if you
want me to do this.

John Ogness


Re: [Ksummit-discuss] RFC: create mailing list "linux-issues" focussed on issues/bugs and regressions

2021-03-22 Thread Theodore Ts'o
On Mon, Mar 22, 2021 at 08:25:15PM +0100, Thorsten Leemhuis wrote:
> I agree to the last point and yeah, maybe regressions are the more
> important problem we should work on – at least from the perspective of
> kernel development.  But from the users perspective (and
> reporting-issues.rst is written for that perspective) it feel a bit
> unsatisfying to not have a solution to query for existing report,
> regressions or not. H...

First of all, thanks for working on reporting-issues.rst.  If we can
actually get users to *read* it, I think it's going to save kernel
developers a huge amount of time and frustration.

I wonder if it might be useful to have a form which users could be
encouraged to fill out so that (a) the information is available in a
structured format so it's easier for developers to find the relevant
information, (b) so it is easier for programs to parse, for easier
reporting or indexing, and (c) as a nudge so that users remember to
report critical bits of information such as the hardware
configuration, the exact kernel version, which distribution userspace
was in use, etc.

There could also be something in the text form which would make it
easier for lore.kernel.org searches to identify bug reports.  (e.g.,
"LINUX KERNEL BUG REPORTER TEMPLATE")

- Ted


[tip:core/entry] BUILD SUCCESS 97258ce902d1e1c396a4d7c38f6ae7085adb73c5

2021-03-22 Thread kernel test robot
parisc   allyesconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386   tinyconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a004-20210322
i386 randconfig-a003-20210322
i386 randconfig-a001-20210322
i386 randconfig-a002-20210322
i386 randconfig-a006-20210322
i386 randconfig-a005-20210322
x86_64   randconfig-a012-20210322
x86_64   randconfig-a015-20210322
x86_64   randconfig-a013-20210322
x86_64   randconfig-a014-20210322
x86_64   randconfig-a016-20210322
x86_64   randconfig-a011-20210322
i386 randconfig-a014-20210322
i386 randconfig-a011-20210322
i386 randconfig-a015-20210322
i386 randconfig-a016-20210322
i386 randconfig-a012-20210322
i386 randconfig-a013-20210322
riscvnommu_k210_defconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-a002-20210322
x86_64   randconfig-a003-20210322
x86_64   randconfig-a001-20210322
x86_64   randconfig-a006-20210322
x86_64   randconfig-a004-20210322
x86_64   randconfig-a005-20210322

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH 4.4 00/14] 4.4.263-rc1 review

2021-03-22 Thread Guenter Roeck
On Mon, Mar 22, 2021 at 01:28:54PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.263 release.
> There are 14 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed, 24 Mar 2021 12:19:09 +.
> Anything received after that time might be too late.
> 

Build results:
total: 163 pass: 163 fail: 0
Qemu test results:
total: 329 pass: 329 fail: 0

Tested-by: Guenter Roeck 

Guenter


Re: [PATCH 4.9 00/25] 4.9.263-rc1 review

2021-03-22 Thread Guenter Roeck
On Mon, Mar 22, 2021 at 01:28:50PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.263 release.
> There are 25 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed, 24 Mar 2021 12:19:09 +.
> Anything received after that time might be too late.
> 

Build results:
total: 166 pass: 166 fail: 0
Qemu test results:
total: 382 pass: 382 fail: 0

Tested-by: Guenter Roeck 

Guenter


Re: [PATCH 4.14 00/43] 4.14.227-rc1 review

2021-03-22 Thread Guenter Roeck
On Mon, Mar 22, 2021 at 01:28:41PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.227 release.
> There are 43 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed, 24 Mar 2021 12:19:09 +.
> Anything received after that time might be too late.
> 

Build results:
total: 168 pass: 168 fail: 0
Qemu test results:
total: 406 pass: 406 fail: 0

Tested-by: Guenter Roeck 

Guenter


<    1   2   3   4   5   6   7   8   9   10   >