Re: [PATCH v3 1/3] clk: analogbits: add Wide-Range PLL library

2019-04-29 Thread Paul Walmsley
On Mon, 29 Apr 2019, Stephen Boyd wrote:

> Quoting Paul Walmsley (2019-04-29 12:42:07)
> > On Fri, 26 Apr 2019, Paul Walmsley wrote:
> > > On Fri, 26 Apr 2019, Stephen Boyd wrote:
> > > 
> > > > Quoting Paul Walmsley (2019-04-11 01:27:32)
> > > > > Add common library code for the Analog Bits Wide-Range PLL (WRPLL) IP
> > > > > block, as implemented in TSMC CLN28HPC.
> > > > 
> > > > I haven't deeply reviewed at all, but I already get two problems when
> > > > compile testing these patches. I can fix them up if nothing else needs
> > > > fixing.
> > > > 
> > > > drivers/clk/analogbits/wrpll-cln28hpc.c:165 __wrpll_calc_divq() warn: 
> > > > should 'target_rate << divq' be a 64 bit type?
> > > > drivers/clk/sifive/fu540-prci.c:214:16: error: return expression in 
> > > > void function
> > > 
> > > Hmm, that's odd.  I will definitely take a look and repost.
> > 
> > I'm not able to reproduce these problems.  The configs tried here were:
> > 
> > - 64-bit RISC-V defconfig w/ PRCI driver enabled (gcc 8.2.0 built with 
> >   crosstool-NG 1.24.0)
> > 
> > - 32-bit ARM defconfig w/ PRCI driver enabled (gcc 8.3.0 built with 
> >   crosstool-NG 1.24.0)
> > 
> > - 32-bit i386 defconfig w/ PRCI driver enabled (gcc 
> >   5.4.0-6ubuntu1~16.04.11)
> > 
> > Could you post the toolchain and kernel config you're using?
> > 
> 
> I'm running sparse and smatch too.

OK.  I was able to reproduce the __wrpll_calc_divq() warning.  It's been 
resolved in the upcoming revision.  

But I don't see the second error with either sparse or smatch.  (This is 
with sparse at commit 2b96cd804dc7 and smatch at commit f0092daff69d.)


- Paul


Re: [tip:sched/urgent] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread Viresh Kumar
On 29-04-19, 22:52, tip-bot for Tobin C. Harding wrote:
> Commit-ID:  8bf7ab9c79f3d1a5f02ebac369f656de9ec0aca8
> Gitweb: 
> https://git.kernel.org/tip/8bf7ab9c79f3d1a5f02ebac369f656de9ec0aca8
> Author: Tobin C. Harding 
> AuthorDate: Tue, 30 Apr 2019 10:11:44 +1000
> Committer:  Ingo Molnar 
> CommitDate: Tue, 30 Apr 2019 06:24:09 +0200
> 
> sched/cpufreq: Fix kobject memleak
> 
> Currently the error return path from kobject_init_and_add() is not
> followed by a call to kobject_put() - which means we are leaking
> the kobject.
> 
> Fix it by adding a call to kobject_put() in the error path of
> kobject_init_and_add().
> 
> Signed-off-by: Tobin C. Harding 
> Add call to kobject_put() in error path of kobject_init_and_add().

This should have been present before the signed-off ?

> Cc: Greg Kroah-Hartman 
> Cc: Linus Torvalds 
> Cc: Peter Zijlstra 
> Cc: Rafael J. Wysocki 
> Cc: Thomas Gleixner 
> Cc: Tobin C. Harding 
> Cc: Vincent Guittot 
> Cc: Viresh Kumar 
> Link: http://lkml.kernel.org/r/20190430001144.24890-1-to...@kernel.org
> Signed-off-by: Ingo Molnar 
> ---
>  kernel/sched/cpufreq_schedutil.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c 
> b/kernel/sched/cpufreq_schedutil.c
> index 5c41ea367422..3638d2377e3c 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -771,6 +771,7 @@ out:
>   return 0;
>  
>  fail:
> + kobject_put(>attr_set.kobj);
>   policy->governor_data = NULL;
>   sugov_tunables_free(tunables);
>  

-- 
viresh


Re: linux-next: build warning after merge of the clk tree

2019-04-29 Thread Stephen Rothwell
Hi Anson,

On Tue, 30 Apr 2019 01:44:58 + Anson Huang  wrote:
>
>   Thanks for notice.
>   As it is intentional, I will send out a patch to add "/* fall through 
> */" to avoid this build warning,

Excellent, thanks.

-- 
Cheers,
Stephen Rothwell


pgpWOKjnAq9zo.pgp
Description: OpenPGP digital signature


[tip:sched/urgent] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread tip-bot for Tobin C. Harding
Commit-ID:  8bf7ab9c79f3d1a5f02ebac369f656de9ec0aca8
Gitweb: https://git.kernel.org/tip/8bf7ab9c79f3d1a5f02ebac369f656de9ec0aca8
Author: Tobin C. Harding 
AuthorDate: Tue, 30 Apr 2019 10:11:44 +1000
Committer:  Ingo Molnar 
CommitDate: Tue, 30 Apr 2019 06:24:09 +0200

sched/cpufreq: Fix kobject memleak

Currently the error return path from kobject_init_and_add() is not
followed by a call to kobject_put() - which means we are leaking
the kobject.

Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().

Signed-off-by: Tobin C. Harding 
Add call to kobject_put() in error path of kobject_init_and_add().
Cc: Greg Kroah-Hartman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rafael J. Wysocki 
Cc: Thomas Gleixner 
Cc: Tobin C. Harding 
Cc: Vincent Guittot 
Cc: Viresh Kumar 
Link: http://lkml.kernel.org/r/20190430001144.24890-1-to...@kernel.org
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cpufreq_schedutil.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 5c41ea367422..3638d2377e3c 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -771,6 +771,7 @@ out:
return 0;
 
 fail:
+   kobject_put(>attr_set.kobj);
policy->governor_data = NULL;
sugov_tunables_free(tunables);
 


Re: [PATCH] RISC-V: Add an Image header that boot loader can parse.

2019-04-29 Thread Atish Patra

On 4/29/19 4:40 PM, Palmer Dabbelt wrote:

On Tue, 23 Apr 2019 16:25:06 PDT (-0700), atish.pa...@wdc.com wrote:

Currently, last stage boot loaders such as U-Boot can accept only
uImage which is an unnecessary additional step in automating boot flows.

Add a simple image header that boot loaders can parse and directly
load kernel flat Image. The existing booting methods will continue to
work as it is.

Tested on both QEMU and HiFive Unleashed using OpenSBI + U-Boot + Linux.

Signed-off-by: Atish Patra 
---
  arch/riscv/include/asm/image.h | 32 
  arch/riscv/kernel/head.S   | 28 
  2 files changed, 60 insertions(+)
  create mode 100644 arch/riscv/include/asm/image.h

diff --git a/arch/riscv/include/asm/image.h b/arch/riscv/include/asm/image.h
new file mode 100644
index ..76a7e0d4068a
--- /dev/null
+++ b/arch/riscv/include/asm/image.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_IMAGE_H
+#define __ASM_IMAGE_H
+
+#define RISCV_IMAGE_MAGIC  "RISCV"
+
+#ifndef __ASSEMBLY__
+/*
+ * struct riscv_image_header - riscv kernel image header
+ *
+ * @code0: Executable code
+ * @code1: Executable code
+ * @text_offset:   Image load offset
+ * @image_size:Effective Image size
+ * @reserved:  reserved
+ * @magic: Magic number
+ * @reserved:  reserved
+ */
+
+struct riscv_image_header {
+   u32 code0;
+   u32 code1;
+   u64 text_offset;
+   u64 image_size;
+   u64 res1;
+   u64 magic;
+   u32 res2;
+   u32 res3;
+};


I don't want to invent our own file format.  Is there a reason we can't just
use something standard?  Off the top of my head I can think of ELF files and
multiboot.



Additional header is required to accommodate PE header format. 
Currently, this is only used for booti command but it will be reused for 
EFI headers as well. Linux kernel Image can pretend as an EFI 
application if PE/COFF header is present. This removes the need of an 
explicit EFI boot loader and EFI firmware can directly load Linux 
(obviously after EFI stub implementation for RISC-V).


ARM64 follows the similar header format as well.
https://www.kernel.org/doc/Documentation/arm64/booting.txt

Regards,
Atish


+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_IMAGE_H */
diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index fe884cd69abd..154647395601 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -19,9 +19,37 @@
  #include 
  #include 
  #include 
+#include 

  __INIT
  ENTRY(_start)
+   /*
+* Image header expected by Linux boot-loaders. The image header data
+* structure is described in asm/image.h.
+* Do not modify it without modifying the structure and all bootloaders
+* that expects this header format!!
+*/
+   /* jump to start kernel */
+   j _start_kernel
+   /* reserved */
+   .word 0
+   .balign 8
+#if __riscv_xlen == 64
+   /* Image load offset(2MB) from start of RAM */
+   .dword 0x20
+#else
+   /* Image load offset(4MB) from start of RAM */
+   .dword 0x40
+#endif
+   /* Effective size of kernel image */
+   .dword _end - _start
+   .dword 0
+   .asciz RISCV_IMAGE_MAGIC
+   .word 0
+   .word 0
+
+.global _start_kernel
+_start_kernel:
/* Mask all interrupts */
csrw sie, zero


___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv





Re: sh4-linux-gnu-ld: arch/sh/kernel/cpu/sh2/clock-sh7619.o:undefined reference to `followparent_recalc'

2019-04-29 Thread Randy Dunlap
On 4/29/19 9:48 PM, kbuild test robot wrote:
> Hi Randy,
> 
> It's probably a bug fix that unveils the link errors.

Yoshinori Sato (cc-ed) has a patch for this.  I guess that it's not in the 
arch/sh
git tree yet ???  or wherever arch/sh changes come from.



> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   83a50840e72a5a964b4704fcdc2fbb2d771015ab
> commit: acaf892ecbf5be7710ae05a61fd43c668f68ad95 sh: fix multiple function 
> definition build errors
> date:   3 weeks ago
> config: sh-allmodconfig (attached as .config)
> compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> git checkout acaf892ecbf5be7710ae05a61fd43c668f68ad95
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=sh 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot 
> 
> All errors (new ones prefixed by >>):
> 
>>> sh4-linux-gnu-ld: arch/sh/kernel/cpu/sh2/clock-sh7619.o:(.data+0x1c): 
>>> undefined reference to `followparent_recalc'
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 


-- 
~Randy


Re: [PATCH 7/7] dmaengine: sprd: Add interrupt support for 2-stage transfer

2019-04-29 Thread Baolin Wang
On Mon, 29 Apr 2019 at 22:10, Vinod Koul  wrote:
>
> On 29-04-19, 20:11, Baolin Wang wrote:
> > On Mon, 29 Apr 2019 at 20:01, Vinod Koul  wrote:
> > > On 15-04-19, 20:15, Baolin Wang wrote:
>
> > > > @@ -429,6 +433,9 @@ static int sprd_dma_set_2stage_config(struct 
> > > > sprd_dma_chn *schan)
> > > >   val = chn & SPRD_DMA_GLB_SRC_CHN_MASK;
> > > >   val |= BIT(schan->trg_mode - 1) << 
> > > > SPRD_DMA_GLB_TRG_OFFSET;
> > > >   val |= SPRD_DMA_GLB_2STAGE_EN;
> > > > + if (schan->int_type != SPRD_DMA_NO_INT)
> > >
> > > Who configure int_type?
> >
> > The int_type is configured through the flags of
> > sprd_dma_prep_slave_sg() by users, see:
> > https://elixir.bootlin.com/linux/v5.1-rc6/source/include/linux/dma/sprd-dma.h#L9
>
> Please use DMA_PREP_INTERRUPT flag instead!

We can not use DMA_PREP_INTERRUPT flag, since we have some Spreadtrum
specific DMA interrupt flags configured by users, which I think we
have made a consensus before. See:
https://elixir.bootlin.com/linux/v5.1-rc6/source/include/linux/dma/sprd-dma.h#L105

-- 
Baolin Wang
Best Regards


[PATCH] pid: Remove unneeded hash header file

2019-04-29 Thread Timmy Li
Hash functions are not needed since idr is used now.
Let's remove hash header file for cleanup.

Signed-off-by: Timmy Li 
---
 kernel/pid.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index 20881598bdfa..89548d35eefb 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -32,7 +32,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.17.1



Re: [PATCH 4/7] dmaengine: sprd: Add device validation to support multiple controllers

2019-04-29 Thread Baolin Wang
On Mon, 29 Apr 2019 at 22:05, Vinod Koul  wrote:
>
> On 29-04-19, 20:20, Baolin Wang wrote:
> > On Mon, 29 Apr 2019 at 19:57, Vinod Koul  wrote:
> > >
> > > On 15-04-19, 20:14, Baolin Wang wrote:
> > > > From: Eric Long 
> > > >
> > > > Since we can support multiple DMA engine controllers, we should add
> > > > device validation in filter function to check if the correct controller
> > > > to be requested.
> > > >
> > > > Signed-off-by: Eric Long 
> > > > Signed-off-by: Baolin Wang 
> > > > ---
> > > >  drivers/dma/sprd-dma.c |5 +
> > > >  1 file changed, 5 insertions(+)
> > > >
> > > > diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
> > > > index 0f92e60..9f99d4b 100644
> > > > --- a/drivers/dma/sprd-dma.c
> > > > +++ b/drivers/dma/sprd-dma.c
> > > > @@ -1020,8 +1020,13 @@ static void sprd_dma_free_desc(struct 
> > > > virt_dma_desc *vd)
> > > >  static bool sprd_dma_filter_fn(struct dma_chan *chan, void *param)
> > > >  {
> > > >   struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> > > > + struct of_phandle_args *dma_spec =
> > > > + container_of(param, struct of_phandle_args, args[0]);
> > > >   u32 slave_id = *(u32 *)param;
> > > >
> > > > + if (chan->device->dev->of_node != dma_spec->np)
> > >
> > > Are you not using of_dma_find_controller() that does this, so this would
> > > be useless!
> >
> > Yes, we can use of_dma_find_controller(), but that will be a little
> > complicated than current solution. Since we need introduce one
> > structure to save the node to validate in the filter function like
> > below, which seems make things complicated. But if you still like to
> > use of_dma_find_controller(), I can change to use it in next version.
>
> Sorry I should have clarified more..
>
> of_dma_find_controller() is called by xlate, so you already run this
> check, so why use this :)

The of_dma_find_controller() can save the requested device node into
dma_spec, and in the of_dma_simple_xlate() function, it will call
dma_request_channel() to request one channel, but it did not validate
the device node to find the corresponding dma device in
dma_request_channel(). So we should in our filter function to validate
the device node with the device node specified by the dma_spec. Hope I
make things clear.

-- 
Baolin Wang
Best Regards


Re: [PATCH v4] panic: add an option to replay all the printk message in buffer

2019-04-29 Thread Sergey Senozhatsky
On (04/29/19 13:44), Petr Mladek wrote:
> On Sat 2019-04-27 02:16:40, Sergey Senozhatsky wrote:
> > On (04/27/19 01:43), Sergey Senozhatsky wrote:
> > [..]
> > > > The console waiter logic is effective but it does not always
> > > > work. The current console owner must be calling the console
> > > > drivers.
> > > >
> > > > >   Hmm, we might have a bit of a problem here, maybe.
> > > >
> > > > Hmm, the printk() might wait forever when NMI stopped
> > > > the current console owner in the console driver code
> > > > or with the logbuf_lock taken.
> > > 
> > > I guess this is why we re-init logbuf lock from panic,
> > > however, we don't do anything with the console_owner.
> 
> > > > The console waiter logic might get solved by clearing
> > > > the console_owner in console_flush_on_panic(). It can't
> > > > be much worse, we already ignore console_lock() there, ...
> > 
> > Hmm, or maybe we are fine... console_waiter logic should work
> > before we send out stop IPI/NMI from panic CPU. When we call
> > flush_on_panic() console_unlock() clears console_owner, so
> > panic_print_sys_info() should not deadlock on console_owner.
> 
> Good point!
> 
> > It's probably only problematic if we kill a console_owner
> > CPU and then try to printk() (from smp_send_stop()) before
> > we do flush_on_panic()->console_unlock().
> 
> Yup. There are called several functions between smp_send_stop()
> and console_flush_on_panic().
> 
> The question is if it is worth a code complication. We could
> never 100% guarantee that printk() would work in panic().
> I more and more understand what Peter Zijlstra means
> by the duct taping.

Agreed.

-ss


Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

2019-04-29 Thread Andreas Dilger

> On Apr 29, 2019, at 10:26 PM, Al Viro  wrote:
> 
> On Mon, Apr 29, 2019 at 10:18:04PM -0600, Andreas Dilger wrote:
>>> 
>>> void*i_private; /* fs or device private pointer */
>>> +   void (*free_inode)(struct inode *);
>> 
>> It seems like a waste to increase the size of every struct inode just to 
>> access
>> a static pointer.  Is this the only place that ->free_inode() is called?  Why
>> not move the ->free_inode() pointer into inode->i_fop->free_inode() so that 
>> it
>> is still directly accessible at this point.
> 
> i_op, surely?

Yes, i_op is what I was thinking.

> In any case, increasing sizeof(struct inode) is not a problem -

> if anything, I'd turn ->i_fop into an anon union with that.  As in,
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index fb45590d284e..627e1766503a 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -211,8 +211,8 @@ EXPORT_SYMBOL(free_inode_nonrcu);
> static void i_callback(struct rcu_head *head)
> {
>   struct inode *inode = container_of(head, struct inode, i_rcu);
> - if (inode->i_sb->s_op->free_inode)
> - inode->i_sb->s_op->free_inode(inode);
> + if (inode->free_inode)
> + inode->free_inode(inode);
>   else
>   free_inode_nonrcu(inode);
> }
> @@ -236,6 +236,7 @@ static struct inode *alloc_inode(struct super_block *sb)
>   if (!ops->free_inode)
>   return NULL;
>   }
> + inode->free_inode = ops->free_inode;
>   i_callback(>i_rcu);
>   return NULL;
>   }

> @@ -276,6 +277,7 @@ static void destroy_inode(struct inode *inode)
>   if (!ops->free_inode)
>   return;
>   }
> + inode->free_inode = ops->free_inode;
>   call_rcu(>i_rcu, i_callback);
> }

This seems like kind of a hack.  I guess your goal is to have ->free_inode
accessible regardless of whether the filesystem has installed its own ->i_op
methods or not, and i_fop is no longer used by this point.

That said, this seems better than increasing the size of struct inode.

> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 2e9b9f87caca..92732286b748 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -694,7 +694,10 @@ struct inode {
> #ifdef CONFIG_IMA
>   atomic_ti_readcount; /* struct files open RO */
> #endif
> - const struct file_operations*i_fop; /* former 
> ->i_op->default_file_ops */
> + union {
> + const struct file_operations*i_fop; /* former 
> ->i_op->default_file_ops */
> + void (*free_inode)(struct inode *);
> + };


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


RE: [PATCH v3 1/1] Add support for IPMB driver

2019-04-29 Thread Vadim Pasternak



> -Original Message-
> From: Asmaa Mnebhi 
> Sent: Tuesday, April 30, 2019 12:57 AM
> To: miny...@acm.org; w...@the-dreams.de; Vadim Pasternak
> ; Michael Shych 
> Cc: Asmaa Mnebhi ; linux-kernel@vger.kernel.org;
> linux-...@vger.kernel.org
> Subject: [PATCH v3 1/1] Add support for IPMB driver
> 
> Support receiving IPMB requests on a Satellite MC from the BMC.
> Once a response is ready, this driver will send back a response to the BMC via
> the IPMB channel.

Hi Asmaa,

Few common questions.

You define this driver as "Mellanox  BlueField IPMB driver".
What makes it Mellanox  BlueField specific?

Which HW configuration you used for testing? Could
you please explain connectivity schema between main BMC and
satellite BMCs?

How this module is supposed to be activated?
Don't you need to add DTS/ACPI records?

Also few comments below.

> 
> Signed-off-by: Asmaa Mnebhi 
> ---
>  drivers/char/ipmi/Kconfig|   8 +
>  drivers/char/ipmi/Makefile   |   1 +
>  drivers/char/ipmi/ipmb_dev_int.c | 386
> +++
>  3 files changed, 395 insertions(+)
>  create mode 100644 drivers/char/ipmi/ipmb_dev_int.c
> 
> diff --git a/drivers/char/ipmi/Kconfig b/drivers/char/ipmi/Kconfig index
> 94719fc..12fe8f2 100644
> --- a/drivers/char/ipmi/Kconfig
> +++ b/drivers/char/ipmi/Kconfig
> @@ -74,6 +74,14 @@ config IPMI_SSIF
>have a driver that must be accessed over an I2C bus instead of a
>standard interface.  This module requires I2C support.
> 
> +config IPMB_DEVICE_INTERFACE
> +   tristate 'IPMB Interface handler'
> +   depends on I2C && I2C_SLAVE
> +   help
> + Provides a driver for a device (Satellite MC) to
> + receive requests and send responses back to the BMC via
> + the IPMB interface. This module requires I2C support.
> +
>  config IPMI_POWERNV
> depends on PPC_POWERNV
> tristate 'POWERNV (OPAL firmware) IPMI interface'
> diff --git a/drivers/char/ipmi/Makefile b/drivers/char/ipmi/Makefile index
> 3f06b20..0822adc 100644
> --- a/drivers/char/ipmi/Makefile
> +++ b/drivers/char/ipmi/Makefile
> @@ -26,3 +26,4 @@ obj-$(CONFIG_IPMI_KCS_BMC) += kcs_bmc.o
>  obj-$(CONFIG_ASPEED_BT_IPMI_BMC) += bt-bmc.o
>  obj-$(CONFIG_ASPEED_KCS_IPMI_BMC) += kcs_bmc_aspeed.o
>  obj-$(CONFIG_NPCM7XX_KCS_IPMI_BMC) += kcs_bmc_npcm7xx.o
> +obj-$(CONFIG_IPMB_DEVICE_INTERFACE) += ipmb_dev_int.o
> diff --git a/drivers/char/ipmi/ipmb_dev_int.c
> b/drivers/char/ipmi/ipmb_dev_int.c
> new file mode 100644
> index 000..63122c3
> --- /dev/null
> +++ b/drivers/char/ipmi/ipmb_dev_int.c
> @@ -0,0 +1,386 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Mellanox IPMB driver to receive a request and send a response
> + *
> + * Copyright (C) 2018 Mellanox Techologies, Ltd.
> + *
> + * This was inspired by Brendan Higgins' ipmi-bmc-bt-i2c driver.
> + */
> +
> +#define  pr_fmt(fmt) "ipmb_dev_int: " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define  MAX_MSG_LEN 128
> +#define  IPMB_REQUEST_LEN_MIN7
> +#define  NETFN_RSP_BIT_MASK  0x4
> +#define  REQUEST_QUEUE_MAX_LEN   256
> +
> +#define  IPMB_MSG_LEN_IDX0
> +#define  RQ_SA_8BIT_IDX  1
> +#define  NETFN_LUN_IDX   2
> +
> +#define  IPMB_MSG_PAYLOAD_LEN_MAX (MAX_MSG_LEN -
> IPMB_REQUEST_LEN_MIN - 1)
> +
> +struct ipmb_msg {
> + u8 len;
> + u8 rs_sa;
> + u8 netfn_rs_lun;
> + u8 checksum1;
> + u8 rq_sa;
> + u8 rq_seq_rq_lun;
> + u8 cmd;
> + u8 payload[IPMB_MSG_PAYLOAD_LEN_MAX];
> + /* checksum2 is included in payload */ } __packed;
> +
> +static u32 ipmb_msg_len(struct ipmb_msg *ipmb_msg) {
> + return ipmb_msg->len + 1;
> +}

Do you really need it as function?

> +
> +struct ipmb_request_elem {
> + struct list_head list;
> + struct ipmb_msg request;
> +};
> +
> +struct ipmb_dev {
> + struct i2c_client *client;
> + struct miscdevice miscdev;
> + struct ipmb_msg request;
> + struct list_head request_queue;
> + atomic_t request_queue_len;
> + struct ipmb_msg response;

Where you are using 'response' field?

> + size_t msg_idx;
> + spinlock_t lock;
> + wait_queue_head_t wait_queue;
> + struct mutex file_mutex;
> +};
> +
> +static int receive_ipmb_request(struct ipmb_dev *ipmb_dev_p,
> + bool non_blocking,
> + struct ipmb_msg *ipmb_request)
> +{
> + struct ipmb_request_elem *queue_elem;
> + unsigned long flags;
> + int res;
> +
> + spin_lock_irqsave(_dev_p->lock, flags);
> +
> + while (!atomic_read(_dev_p->request_queue_len)) {
> + spin_unlock_irqrestore(_dev_p->lock, flags);
> + if (non_blocking)
> + return -EAGAIN;
> +
> + res = wait_event_interruptible(ipmb_dev_p->wait_queue,
> + 

Re: [PATCH RESEND] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread Tobin C. Harding
On Tue, Apr 30, 2019 at 06:24:43AM +0200, Ingo Molnar wrote:
> 
> * Tobin C. Harding  wrote:
> 
> > Currently error return from kobject_init_and_add() is not followed by a
> > call to kobject_put().  This means there is a memory leak.
> > 
> > Add call to kobject_put() in error path of kobject_init_and_add().
> > 
> > Signed-off-by: Tobin C. Harding 
> > ---
> > 
> > Resend with SOB tag.
> 
> Please ignore my previous mail :-)

Cheers Ingo, caught myself not checkpatching :(

thanks,
Tobin.



[PATCH v1] mmc: dt: add DT bindings for ls1028a eSDHC host controller

2019-04-29 Thread Yinbo Zhu
From: Yinbo Zhu 

Add "fsl,ls1028a-esdhc" bindings for ls1028a eSDHC host controller

Signed-off-by: Yinbo Zhu 
---
 .../devicetree/bindings/mmc/fsl-esdhc.txt  |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/Documentation/devicetree/bindings/mmc/fsl-esdhc.txt 
b/Documentation/devicetree/bindings/mmc/fsl-esdhc.txt
index 99c5cf8..a7250b9 100644
--- a/Documentation/devicetree/bindings/mmc/fsl-esdhc.txt
+++ b/Documentation/devicetree/bindings/mmc/fsl-esdhc.txt
@@ -21,6 +21,7 @@ Required properties:
"fsl,ls1043a-esdhc"
"fsl,ls1046a-esdhc"
"fsl,ls2080a-esdhc"
+   "fsl,ls1028a-esdhc"
   - clock-frequency : specifies eSDHC base clock frequency.
 
 Optional properties:
-- 
1.7.1



Re: [PATCH v2 17/19] iommu: Add max num of cache and granu types

2019-04-29 Thread Auger Eric
Hi Jacob,

On 4/29/19 6:17 PM, Jacob Pan wrote:
> On Fri, 26 Apr 2019 18:22:46 +0200
> Auger Eric  wrote:
> 
>> Hi Jacob,
>>
>> On 4/24/19 1:31 AM, Jacob Pan wrote:
>>> To convert to/from cache types and granularities between generic and
>>> VT-d specific counterparts, a 2D arrary is used. Introduce the
>>> limits  
>> array
>>> to help define the converstion array size.  
>> conversion
>>>
> will fix, thanks
>>> Signed-off-by: Jacob Pan 
>>> ---
>>>  include/uapi/linux/iommu.h | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
>>> index 5c95905..2d8fac8 100644
>>> --- a/include/uapi/linux/iommu.h
>>> +++ b/include/uapi/linux/iommu.h
>>> @@ -197,6 +197,7 @@ struct iommu_inv_addr_info {
>>> __u64   granule_size;
>>> __u64   nb_granules;
>>>  };
>>> +#define NR_IOMMU_CACHE_INVAL_GRANU (3)
>>>  
>>>  /**
>>>   * First level/stage invalidation information
>>> @@ -235,6 +236,7 @@ struct iommu_cache_invalidate_info {
>>> struct iommu_inv_addr_info addr_info;
>>> };
>>>  };
>>> +#define NR_IOMMU_CACHE_TYPE(3)
>>>  /**
>>>   * struct gpasid_bind_data - Information about device and guest
>>> PASID binding
>>>   * @gcr3:  Guest CR3 value from guest mm
>>>   
>> Is it really something that needs to be exposed in the uapi?
>>
> I put it in uapi since the related definitions for granularity and
> cache type are in the same file.
> Maybe putting them close together like this? I was thinking you can just
> fold it into your next series as one patch for introducing cache
> invalidation.
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 2d8fac8..4ff6929 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -164,6 +164,7 @@ enum iommu_inv_granularity {
> IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */
> IOMMU_INV_GRANU_PASID,  /* pasid-selective invalidation */
> IOMMU_INV_GRANU_ADDR,   /* page-selective invalidation */
> +   NR_IOMMU_INVAL_GRANU,   /* number of invalidation granularities
> */ };
>  
>  /**
> @@ -228,6 +229,7 @@ struct iommu_cache_invalidate_info {
>  #define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */
>  #define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device IOTLB */
>  #define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */
> +#define NR_IOMMU_CACHE_TYPE(3)

OK I will add this.

Thanks

Eric
> __u8cache;
> __u8granularity;
> 
>> Thanks
>>
>> Eric
> 
> [Jacob Pan]
> 


Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation

2019-04-29 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> On Sat, Apr 27, 2019 at 3:46 AM Ingo Molnar  wrote:

> > So I'm wondering whether there's a 4th choice as well, which avoids
> > control flow corruption *before* it happens:
> >
> >  - A C language runtime that is a subset of current C syntax and
> >semantics used in the kernel, and which doesn't allow access outside
> >of existing objects and thus creates a strictly enforced separation
> >between memory used for data, and memory used for code and control
> >flow.
> >
> >  - This would involve, at minimum:
> >
> > - tracking every type and object and its inherent length and valid
> >   access patterns, and never losing track of its type.
> >
> > - being a lot more organized about initialization, i.e. no
> >   uninitialized variables/fields.
> >
> > - being a lot more strict about type conversions and pointers in
> >   general.
> 
> You're not the only one to suggest this.  There are at least a few
> things that make this extremely difficult if not impossible.  For
> example, consider this code:
> 
> void maybe_buggy(void)
> {
>   int a, b;
>   int *p = 
>   int *q = (int *)some_function((unsigned long)p);
>   *q = 1;
> }
> 
> If some_function() returns , then all is well.  But if
> some_function() returns  or even a valid address of some unrelated
> kernel object, then the code might be entirely valid and correct C,
> but I don't see how the runtime checks are supposed to tell whether
> the resulting address is valid or is a bug.  This type of code is, I
> think, quite common in the kernel -- it happens in every data
> structure where we have unions of pointers and integers or where we
> steal some known-zero bits of a pointer to store something else.

So the thing is, for the infinitely large state space of "valid C code" 
we already disallow an infinitely many versions in the Linux kernel.

We have complicated rules that disallow certain C syntactical and 
semantical constructs, both on the tooling (build failure/warning) and on 
the review (style/taste) level.

So the question IMHO isn't whether it's "valid C", because we already 
have the Linux kernel's own C syntax variant and are enforcing it with 
varying degrees of success.

The question is whether the example you gave can be written in a strongly 
typed fashion, whether it makes sense to do so, and what the costs are.

I think it's evident that it can be written with strongly typed 
constructs, by separating pointers from embedded error codes - with 
negative side effects to code generation: for example it increases 
structure sizes and error return paths.

I think there's four main costs of converting such a pattern to strongly 
typed constructs:

 - memory/cache footprint:  there's a nonzero cost there.
 - performance: this will hurt too.
 - code readability:this will probably improve.
 - code robustness: this will improve too.

So I think the proper question to ask is not whether there's common C 
syntax within the kernel that would have to be rewritten, but whether the 
total sum of memory and runtime overhead of strongly typed C programming 
(if it's possible/desirable) is larger than the total sum of a typical 
Linux distro enabling the various current and proposed kernel hardening 
features that have a runtime overhead:

 - the SMAP/SMEP overhead of STAC/CLAC for every single user copy

 - other usercopy hardening features

 - stackprotector

 - KASLR

 - compiler plugins against information leaks

 - proposed KASLR extension to implement module randomization and -PIE overhead

 - proposed function call integrity checks

 - proposed per system call kernel stack offset randomization

 - ( and I'm sure I forgot about a few more, and it's all still only 
 reactive security, not proactive security. )

That's death by a thousand cuts and CR3 switching during system calls is 
also throwing a hand grenade into the fight ;-)

So if people are also proposing to do CR3 switches in every system call, 
I'm pretty sure the answer is "yes, even a managed C runtime is probably 
faster than *THAT* sum of a performanc mess" - at least with the current 
CR3 switching x86-uarch cost structure...

Thanks,

Ingo


Re: [PATCH v3 1/4] include: dt-bindings: add Performance Monitoring Unit for Exynos

2019-04-29 Thread Chanwoo Choi
Hi,

I agree of this patch. But, I add the minor comments.

If you edit them according to my comment, feel free to add my following tag:
Acked-by: Chanwoo Choi 

On 19. 4. 19. 오후 10:48, Lukasz Luba wrote:
> This patch add support of a new feature which can be used in DT:
> Performance Monitoring Unit with defined event data type.
> In this patch the event data types are defined for Exynos PPMU.
> The patch also updates the MAINTAINERS file accordingly and
> adds the header file to devfreq event subsystem.
> 
> Signed-off-by: Lukasz Luba 
> ---
>  MAINTAINERS   |  1 +
>  include/dt-bindings/pmu/exynos_ppmu.h | 26 ++
>  2 files changed, 27 insertions(+)
>  create mode 100644 include/dt-bindings/pmu/exynos_ppmu.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3671fde..1ba4b9b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4560,6 +4560,7 @@ T:  git 
> git://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq.git
>  S:   Supported
>  F:   drivers/devfreq/event/
>  F:   drivers/devfreq/devfreq-event.c
> +F:   include/dt-bindings/pmu/exynos_ppmu.h
>  F:   include/linux/devfreq-event.h
>  F:   Documentation/devicetree/bindings/devfreq/event/
>  
> diff --git a/include/dt-bindings/pmu/exynos_ppmu.h 
> b/include/dt-bindings/pmu/exynos_ppmu.h
> new file mode 100644
> index 000..08fdce9
> --- /dev/null
> +++ b/include/dt-bindings/pmu/exynos_ppmu.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Samsung Exynos PPMU event types for counting in regs
> + *
> + * Copyright (c) 2019, Samsung

Mabye, "Samsung Electronics" instead of 'Samsung'.

> + * Author: Lukasz Luba 
> + */
> +
> +#ifndef __DT_BINDINGS_PMU_EXYNOS_PPMU_H
> +#define __DT_BINDINGS_PMU_EXYNOS_PPMU_H
> +
> +

Remove unneeded blank line.

> +#define PPMU_RO_BUSY_CYCLE_CNT   0x0
> +#define PPMU_WO_BUSY_CYCLE_CNT   0x1
> +#define PPMU_RW_BUSY_CYCLE_CNT   0x2
> +#define PPMU_RO_REQUEST_CNT  0x3
> +#define PPMU_WO_REQUEST_CNT  0x4
> +#define PPMU_RO_DATA_CNT 0x5
> +#define PPMU_WO_DATA_CNT 0x6
> +#define PPMU_RO_LATENCY  0x12
> +#define PPMU_WO_LATENCY  0x16
> +#define PPMU_V2_RO_DATA_CNT  0x4
> +#define PPMU_V2_WO_DATA_CNT  0x5
> +#define PPMU_V2_EVT3_RW_DATA_CNT 0x22
> +
> +#endif
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics


sh4-linux-gnu-ld: arch/sh/kernel/cpu/sh2/clock-sh7619.o:undefined reference to `followparent_recalc'

2019-04-29 Thread kbuild test robot
Hi Randy,

It's probably a bug fix that unveils the link errors.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   83a50840e72a5a964b4704fcdc2fbb2d771015ab
commit: acaf892ecbf5be7710ae05a61fd43c668f68ad95 sh: fix multiple function 
definition build errors
date:   3 weeks ago
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout acaf892ecbf5be7710ae05a61fd43c668f68ad95
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

>> sh4-linux-gnu-ld: arch/sh/kernel/cpu/sh2/clock-sh7619.o:(.data+0x1c): 
>> undefined reference to `followparent_recalc'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v6 01/10] clk: samsung: add needed IDs for DMC clocks in Exynos5420

2019-04-29 Thread Chanwoo Choi
Hi,

On 19. 4. 19. 오후 11:19, Lukasz Luba wrote:
> Define new IDs for clocks used by Dynamic Memory Controller in
> Exynos5422 SoC.
> 
> Acked-by: Rob Herring 
> Signed-off-by: Lukasz Luba 
> ---
>  include/dt-bindings/clock/exynos5420.h | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/include/dt-bindings/clock/exynos5420.h 
> b/include/dt-bindings/clock/exynos5420.h
> index 355f469..abb1842 100644
> --- a/include/dt-bindings/clock/exynos5420.h
> +++ b/include/dt-bindings/clock/exynos5420.h
> @@ -60,6 +60,7 @@
>  #define CLK_MAU_EPLL 159
>  #define CLK_SCLK_HSIC_12M160
>  #define CLK_SCLK_MPHY_IXTAL24161
> +#define CLK_SCLK_BPLL162
>  
>  /* gate clocks */
>  #define CLK_UART0257
> @@ -195,6 +196,18 @@
>  #define CLK_ACLK432_CAM  518
>  #define CLK_ACLK_FL1550_CAM  519
>  #define CLK_ACLK550_CAM  520
> +#define CLK_CLKM_PHY0521
> +#define CLK_CLKM_PHY1522
> +#define CLK_ACLK_PPMU_DREX0_0523
> +#define CLK_ACLK_PPMU_DREX0_1524
> +#define CLK_ACLK_PPMU_DREX1_0525
> +#define CLK_ACLK_PPMU_DREX1_1526
> +#define CLK_PCLK_PPMU_DREX0_0527
> +#define CLK_PCLK_PPMU_DREX0_1528
> +#define CLK_PCLK_PPMU_DREX1_0529
> +#define CLK_PCLK_PPMU_DREX1_1530
> +#define CLK_CDREX_PAUSE  531
> +#define CLK_CDREX_TIMING_SET 532

I cannot find the usage code of both CLK_CDREX_PAUSE
and CLK_CDREX_TIMING_SET in these patchset. 

Please remove them.

(snip)

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics


[PATCH 1/2] i2c: imx: I2C Driver doesn't consider I2C_IPGCLK_SEL RCW bit when using ls1046a SoC

2019-04-29 Thread Chuanhua Han
The current kernel driver does not consider I2C_IPGCLK_SEL (424 bit
of RCW) in deciding  i2c_clk_rate in function i2c_imx_set_clk()
{ 0 Platform clock/4, 1 Platform clock/2}.

When using ls1046a SoC, this populates incorrect value in IBFD register
if I2C_IPGCLK_SEL = 0, which generates half of the desired Clock.

Therefore, if ls1046a SoC is used, we need to set the i2c clock
according to the corresponding RCW.

Signed-off-by: Sumit Batra 
Signed-off-by: Chuanhua Han 
---
 drivers/i2c/busses/i2c-imx.c | 64 
 1 file changed, 64 insertions(+)

diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index 422f1a445b55..7186cf3c7d24 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -45,6 +45,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /* This will be the driver name the kernel reports */
 #define DRIVER_NAME "imx-i2c"
@@ -109,6 +111,21 @@
 
 #define I2C_PM_TIMEOUT 10 /* ms */
 
+/* 14-1 Since array index starts from 0 */
+#define RCW_I2C_IPGCLK_WORD (14 - 1)
+/*
+ * Set mask for RCW 424th bit, reading from DCFG_CCSR RCW Status Registers
+ * Since this register in RM depicted as big endian,
+ * so consider 31st bit as LSB for creating the mask.
+ */
+#define RCW_I2C_IPGCLK_MASK0x80
+int i2c_ipgclk_sel = 1;
+
+static const struct soc_device_attribute ls1046a_soc[] = {
+  {.family = "QorIQ LS1046A"},
+  { /* sentinel */ }
+};
+
 /*
  * sorted list of clock divider, register value pairs
  * taken from table 26-5, p.26-9, Freescale i.MX
@@ -304,6 +321,11 @@ static const struct platform_device_id imx_i2c_devtype[] = 
{
 };
 MODULE_DEVICE_TABLE(platform, imx_i2c_devtype);
 
+static const struct of_device_id guts_device_ids[] = {
+   { .compatible = "fsl,qoriq-device-config", },
+   {}
+};
+
 static const struct of_device_id i2c_imx_dt_ids[] = {
{ .compatible = "fsl,imx1-i2c", .data = _i2c_hwdata, },
{ .compatible = "fsl,imx21-i2c", .data = _i2c_hwdata, },
@@ -533,6 +555,9 @@ static void i2c_imx_set_clk(struct imx_i2c_struct *i2c_imx,
unsigned int div;
int i;
 
+   if (!i2c_ipgclk_sel)
+   i2c_clk_rate = i2c_clk_rate / 2;
+
/* Divider value calculation */
if (i2c_imx->cur_clk == i2c_clk_rate)
return;
@@ -551,6 +576,10 @@ static void i2c_imx_set_clk(struct imx_i2c_struct *i2c_imx,
/* Store divider value */
i2c_imx->ifdr = i2c_clk_div[i].val;
 
+   pr_alert("[%s] CLK Rate=%u Bitrate =%u Div =%u Value =%d\n",
+__func__, i2c_clk_rate, i2c_imx->bitrate,
+div, i2c_clk_div[i].val);
+
/*
 * There dummy delay is calculated.
 * It should be about one I2C clock period long.
@@ -1116,6 +1145,9 @@ static int i2c_imx_probe(struct platform_device *pdev)
int irq, ret;
dma_addr_t phy_addr;
u32 mul_value;
+   struct device_node *guts_node;
+   static struct ccsr_guts __iomem *guts_regs;
+   u32 rcw_reg;
 
dev_dbg(>dev, "<%s>\n", __func__);
 
@@ -1135,6 +1167,38 @@ static int i2c_imx_probe(struct platform_device *pdev)
if (!i2c_imx)
return -ENOMEM;
 
+   if (soc_device_match(ls1046a_soc)) {
+   /*
+* Make device node for GUTS/DCFG (global utilities block)
+* to read RCW.
+*/
+   guts_node = of_find_matching_node(NULL, guts_device_ids);
+   if (!guts_node) {
+   dev_err(>dev, "Could not find GUTS node\n");
+   return -ENODEV;
+   }
+   /*
+* Memory (IO)  MAP the DCFG registers(for RCW) to
+* be used in kernel virtual address space.
+*/
+   guts_regs = of_iomap(guts_node, 0);
+   of_node_put(guts_node);
+   if (!guts_regs) {
+   dev_err(>dev, "IOREMAP of GUTS node failed\n");
+   return -ENOMEM;
+   }
+   /* Read rcw bit 424 (starting from 0) */
+   rcw_reg = ioread32be(_regs->rcwsr[RCW_I2C_IPGCLK_WORD]);
+   pr_alert("RCW REG[%d]=0x%x\n", RCW_I2C_IPGCLK_WORD, rcw_reg);
+   if (rcw_reg & RCW_I2C_IPGCLK_MASK) {
+   pr_alert("Div by 2 Case Detected in RCW\n");
+   i2c_ipgclk_sel = 1;
+   } else {
+   pr_alert("Div by 4 Case Detected in RCW\n");
+   i2c_ipgclk_sel = 0;
+   }
+   }
+
if (of_id) {
i2c_imx->hwdata = of_id->data;
ret = of_property_read_u32(pdev->dev.of_node,
-- 
2.17.1



Re: [PATCH v6 06/10] dt-bindings: memory-controllers: add Exynos5422 DMC device description

2019-04-29 Thread Chanwoo Choi
On 19. 4. 19. 오후 11:19, Lukasz Luba wrote:
> The patch adds description for DT binding for a new Exynos5422 Dynamic
> Memory Controller device.
> 
> Signed-off-by: Lukasz Luba 
> ---
>  .../bindings/memory-controllers/exynos5422-dmc.txt | 73 
> ++
>  1 file changed, 73 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt
> 
> diff --git 
> a/Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt 
> b/Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt
> new file mode 100644
> index 000..133b3cc
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/memory-controllers/exynos5422-dmc.txt
> @@ -0,0 +1,73 @@
> +* Exynos5422 frequency and voltage scaling for Dynamic Memory Controller 
> device
> +
> +The Samsung Exynos5422 SoC has DMC (Dynamic Memory Controller) to which the 
> DRAM
> +memory chips are connected. The driver is to monitor the controller in 
> runtime
> +and switch frequency and voltage. To monitor the usage of the controller in
> +runtime, the driver uses the PPMU (Platform Performance Monitoring Unit), 
> which
> +is able to measure the current load of the memory.
> +When 'userspace' governor is used for the driver, an application is able to
> +switch the DMC and memory frequency.
> +
> +Required properties for DMC device for Exynos5422:
> +- compatible: Should be "samsung,exynos5422-bus".

As I already mentioned on many times, it is not fixed.
You have to fix it as following:
- exynos5422-bus -> exynos5422-dmc

> +- clock-names : the name of clock used by the bus, "bus".

The below examples doesn't contain the 'bus' clock name.

> +- clocks : phandles for clock specified in "clock-names" property.
> +- devfreq-events : phandles for PPMU devices connected to this DMC.
> +- vdd-supply : phandle for voltage regulator which is connected.
> +- reg : registers of two CDREX controllers, chip information, clocks 
> subsystem.
> +- operating-points-v2 : phandle for OPPs described in v2 definition.
> +- device-handle : phandle of the connected DRAM memory device. For more
> + information please refer to Documentation
> +- devfreq-events : phandles of the PPMU events used by the controller.
> +
> +Example:
> +
> + ppmu_dmc0_0: ppmu@10d0 {
> + compatible = "samsung,exynos-ppmu";
> + reg = <0x10d0 0x2000>;
> + clocks = < CLK_PCLK_PPMU_DREX0_0>;
> + clock-names = "ppmu";
> + status = "okay";
> + events {
> + ppmu_event_dmc0_0: ppmu-event3-dmc0_0 {
> + event-name = "ppmu-event3-dmc0_0";
> + };
> + };
> + };
> +
> + dmc: memory-controller@10c2 {
> + compatible = "samsung,exynos5422-dmc";
> + reg = <0x10c2 0x1>, <0x10c3 0x1>,
> + <0x1000 0x1000>, <0x1003 0x1000>;
> + clocks =< CLK_FOUT_SPLL>,
> + < CLK_MOUT_SCLK_SPLL>,
> + < CLK_FF_DOUT_SPLL2>,
> + < CLK_FOUT_BPLL>,
> + < CLK_MOUT_BPLL>,
> + < CLK_SCLK_BPLL>,
> + < CLK_MOUT_MX_MSPLL_CCORE>,
> + < CLK_MOUT_MX_MSPLL_CCORE_PHY>,
> + < CLK_MOUT_MCLK_CDREX>,
> + < CLK_DOUT_CLK2X_PHY0>,
> + < CLK_CLKM_PHY0>,
> + < CLK_CLKM_PHY1>;
> + clock-names =   "fout_spll",
> + "mout_sclk_spll",
> + "ff_dout_spll2",
> + "fout_bpll",
> + "mout_bpll",
> + "sclk_bpll",
> + "mout_mx_mspll_ccore",
> + "mout_mx_mspll_ccore_phy",
> + "mout_mclk_cdrex",
> + "dout_clk2x_phy0",
> + "clkm_phy0",
> + "clkm_phy1";
> + status = "okay";
> + operating-points-v2 = <_opp_table>;
> + devfreq-events = <_event3_dmc0_0>, <_event3_dmc0_1>,
> + <_event3_dmc1_0>, <_event3_dmc1_1>;
> + operating-points-v2 = <_opp_table>;
> + device-handle = <_K3QF2F20DB>;
> + vdd-supply = <_reg>;
> + };
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics


[PATCH 2/2] arm64: dts: fsl: ls1046a: Add the guts node in dts

2019-04-29 Thread Chuanhua Han
For NXP ls1046a SoC, the i2c clock needs to be configured with the
appropriate bit of RCW, so we add the guts node (GUTS/DCFG global
utilities block) for the driver to read.

Signed-off-by: Sumit Batra 
Signed-off-by: Chuanhua Han 
---
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 373310e4c0ea..f88599df18bb 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -205,6 +205,11 @@
status = "disabled";
};
 
+   guts: global-utilities@1ee {
+   compatible = "fsl,qoriq-device-config";
+   reg = <0x0 0x1ee 0x0 0x1000>;
+   };
+
qspi: spi@155 {
compatible = "fsl,ls1021a-qspi";
#address-cells = <1>;
-- 
2.17.1



Re: [RFC PATCH v2 00/17] Core scheduling v2

2019-04-29 Thread Ingo Molnar


* Aubrey Li  wrote:

> On Tue, Apr 30, 2019 at 12:01 AM Ingo Molnar  wrote:
> > * Li, Aubrey  wrote:
> >
> > > > I.e. showing the approximate CPU thread-load figure column would be
> > > > very useful too, where '50%' shows half-loaded, '100%' fully-loaded,
> > > > '200%' over-saturated, etc. - for each row?
> > >
> > > See below, hope this helps.
> > > .--.
> > > |NA/AVX vanilla-SMT [std% / sem%] cpu% |coresched-SMT   [std% / 
> > > sem%] +/- cpu% |  no-SMT [std% / sem%]   +/-  cpu% |
> > > |--|
> > > |  1/1508.5 [ 0.2%/ 0.0%] 2.1% |504.7   [ 1.1%/ 
> > > 0.1%]-0.8%2.1% |   509.0 [ 0.2%/ 0.0%]   0.1% 4.3% |
> > > |  2/2   1000.2 [ 1.4%/ 0.1%] 4.1% |   1004.1   [ 1.6%/ 
> > > 0.2%] 0.4%4.1% |   997.6 [ 1.2%/ 0.1%]  -0.3% 8.1% |
> > > |  4/4   1912.1 [ 1.0%/ 0.1%] 7.9% |   1904.2   [ 1.1%/ 
> > > 0.1%]-0.4%7.9% |  1914.9 [ 1.3%/ 0.1%]   0.1%15.1% |
> > > |  8/8   3753.5 [ 0.3%/ 0.0%]14.9% |   3748.2   [ 0.3%/ 
> > > 0.0%]-0.1%   14.9% |  3751.3 [ 0.4%/ 0.0%]  -0.1%30.5% |
> > > | 16/16  7139.3 [ 2.4%/ 0.2%]30.3% |   7137.9   [ 1.8%/ 
> > > 0.2%]-0.0%   30.3% |  7049.2 [ 2.4%/ 0.2%]  -1.3%60.4% |
> > > | 32/32 10899.0 [ 4.2%/ 0.4%]60.3% |  10780.3   [ 4.4%/ 
> > > 0.4%]-1.1%   55.9% | 10339.2 [ 9.6%/ 0.9%]  -5.1%97.7% |
> > > | 64/64 15086.1 [11.5%/ 1.2%]97.7% |  14262.0   [ 8.2%/ 
> > > 0.8%]-5.5%   82.0% | 11168.7 [22.2%/ 1.7%] -26.0%   100.0% |
> > > |128/12815371.9 [22.0%/ 2.2%]   100.0% |  14675.8   [14.4%/ 
> > > 1.4%]-4.5%   82.8% | 10963.9 [18.5%/ 1.4%] -28.7%   100.0% |
> > > |256/25615990.8 [22.0%/ 2.2%]   100.0% |  12227.9   [10.3%/ 
> > > 1.0%]   -23.5%   73.2% | 10469.9 [19.6%/ 1.7%] -34.5%   100.0% |
> > > '--'
> >
> > Very nice, thank you!
> >
> > What's interesting is how in the over-saturated case (the last three
> > rows: 128, 256 and 512 total threads) coresched-SMT leaves 20-30% CPU
> > performance on the floor according to the load figures.
> 
> Yeah, I found the next focus.
> 
> > Is this true idle time (which shows up as 'id' during 'top'), or some 
> > load average artifact?
> 
> vmstat periodically reported intermediate CPU utilization in one 
> second, it was running simultaneously when the benchmarks run. The cpu% 
> is computed by the average of (100-idle) series.

Ok - so 'vmstat' uses /proc/stat, which uses cpustat[CPUTIME_IDLE] (or 
its NOHZ work-alike), so this should be true idle time - to the extent 
the HZ process clock's sampling is accurate.

So I guess the answer to my question is "yes". ;-)

BTW., for robustness sake you might want to add iowait to idle time (it's 
the 'wa' field of vmstat) - it shouldn't matter for this particular 
benchmark which doesn't do much IO, but it might for others.

Both CPUTIME_IDLE and CPUTIME_IOWAIT are idle states when a CPU is not 
utilized.

[ Side note: we should really implement precise idle time accounting when 
  CONFIG_IRQ_TIME_ACCOUNTING=y is enabled. We pay all the costs of the 
  timestamps, but AFAICS we don't propagate that into the idle cputime
  metrics. ]

Thanks,

Ingo


Re: [PATCH v3 2/2] dt-bindings: cpufreq: Document allwinner,cpu-operating-points-v2

2019-04-29 Thread Viresh Kumar
On 29-04-19, 11:18, Rob Herring wrote:
> On Sun, Apr 28, 2019 at 4:53 AM Frank Lee  wrote:
> >
> > On Sat, Apr 27, 2019 at 5:15 AM Rob Herring  wrote:
> > >
> > > On Wed, Apr 10, 2019 at 01:41:39PM -0400, Yangtao Li wrote:
> > > > Allwinner Process Voltage Scaling Tables defines the voltage and
> > > > frequency value based on the speedbin blown in the efuse combination.
> > > > The sunxi-cpufreq-nvmem driver reads the efuse value from the SoC to
> > > > provide the OPP framework with required information.
> > > > This is used to determine the voltage and frequency value for each
> > > > OPP of operating-points-v2 table when it is parsed by the OPP framework.
> > > >
> > > > The "allwinner,cpu-operating-points-v2" DT extends the 
> > > > "operating-points-v2"
> > > > with following parameters:
> > > > - nvmem-cells (NVMEM area containig the speedbin information)
> > > > - opp-microvolt-: voltage in micro Volts.
> > > >   At runtime, the platform can pick a  and matching
> > > >   opp-microvolt- property.
> > > >   HW: :
> > > >   sun50iw-h6  speed0 speed1 speed2
> > >
> > > We already have at least one way to support speed bins with QC kryo
> > > binding. Why do we need a different way?
> >
> > For some SOCs, for some reason (making the CPU have approximate 
> > performance),
> > they use the same frequency but different voltage. In the case where
> > this speed bin
> > is not a lot and opp uses the same frequency, too many repeated opp
> > nodes are a bit
> > redundant and not intuitive enough.
> >
> > So, I think it's worth the new method.
> 
> Well, I don't.
> 
> We can't have every SoC vendor doing their own thing just because they
> want to. If there are technical reasons why existing bindings don't
> work, then maybe we need to do something different. But I haven't
> heard any reasons.

Well there is a good reason for attempting the new bindings and I wasn't sure if
updating the earlier bindings or adding another one for platform is correct. As
we aren't really adding new bindings, but just documentation around it.

So there are two ways OPP core support this thing:

- opp-supported-hw: This is a better fit if we have a smaller group of
  frequencies to select from a bigger group, so we disable non-required OPPs
  completely. This is what Qcom did as they wanted to select different
  frequencies all together.

- opp-microvolt-: This is a better fit if the frequencies remain same and
  only few of the properties like voltage/current have a different value. So we
  don't disable any OPPs but just select the right voltage/current for those
  frequencies. This avoids unnecessary duplication of the OPPs in DT and that's
  what allwinner guys want.

The kryo nvmem bindings currently supports opp-supported-hw, maybe we can add
mention support for second one in the same file and rename it well.

-- 
viresh


[PATCH 1/3] dt-bindings: i2c: add optional mul-value property to binding

2019-04-29 Thread Chuanhua Han
NXP Layerscape SoC have up to three MUL options available for all
divider values, we choice of MUL determines the internal monitor rate
of the I2C bus (SCL and SDA signals):
A lower MUL value results in a higher sampling rate of the I2C signals.
A higher MUL value results in a lower sampling rate of the I2C signals.

So in Optional properties we added our custom mul-value property in the
binding to select which mul option for the device tree i2c controller
node.

Signed-off-by: Chuanhua Han 
---
 Documentation/devicetree/bindings/i2c/i2c-imx.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/i2c/i2c-imx.txt 
b/Documentation/devicetree/bindings/i2c/i2c-imx.txt
index b967544590e8..ba8e7b7b3fa8 100644
--- a/Documentation/devicetree/bindings/i2c/i2c-imx.txt
+++ b/Documentation/devicetree/bindings/i2c/i2c-imx.txt
@@ -18,6 +18,9 @@ Optional properties:
 - sda-gpios: specify the gpio related to SDA pin
 - pinctrl: add extra pinctrl to configure i2c pins to gpio function for i2c
   bus recovery, call it "gpio" state
+- mul-value: NXP Layerscape SoC have up to three MUL options available for
+all I2C divider values, it describes which MUL we choose to use for the driver,
+the values should be 1,2,4.
 
 Examples:
 
-- 
2.17.1



[PATCH 2/3] i2c: imx: I2C Driver IBC and SCL Divider for MUL=2 and MUL=4

2019-04-29 Thread Chuanhua Han
NXP Layerscape SoC have up to three MUL options available for all
divider values,we choice of MUL determines the internal monitor rate
of the I2C bus (SCL and SDA signals).

The current kernel driver supports MUL=1 by default ,but doesn't have
the IBC and SCL Divider entries in vf610_i2c_clk_div for MUL=2  and
MUL=4,so we need to add the corresponding support.

Signed-off-by: Sumit Batra 
Signed-off-by: Chuanhua Han 
---
 drivers/i2c/busses/i2c-imx.c | 71 +++-
 1 file changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index 42fed40198a0..ac5a334b7339 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -156,6 +157,44 @@ static struct imx_i2c_clk_pair vf610_i2c_clk_div[] = {
{ 3840, 0x3F }, { 4096, 0x7B }, { 5120, 0x7D }, { 6144, 0x7E },
 };
 
+static struct imx_i2c_clk_pair mul2_i2c_clk_div[] = {
+   { 40,   0x40 }, { 44,   0x41 }, { 48,   0x42 }, { 52,   0x43 },
+   { 56,   0x44 }, { 60,   0x45 }, { 68,   0x46 }, { 80,   0x47 },
+   { 56,   0x48 }, { 64,   0x49 }, { 72,   0x4A }, { 80,   0x4B },
+   { 88,   0x4C }, { 96,   0x4D }, { 112,  0x4E }, { 136,  0x4F },
+   { 96,   0x50 }, { 112,  0x51 }, { 128,  0x52 }, { 144,  0x53 },
+   { 160,  0x54 }, { 176,  0x55 }, { 208,  0x56 }, { 256,  0x57 },
+   { 160,  0x58 }, { 192,  0x59 }, { 224,  0x5A }, { 256,  0x5B },
+   { 288,  0x5C }, { 320,  0x5D }, { 384,  0x5E }, { 480,  0x5F },
+   { 320,  0x60 }, { 384,  0x61 }, { 448,  0x62 }, { 512,  0x63 },
+   { 576,  0x64 }, { 640,  0x65 }, { 768,  0x66 }, { 960,  0x67 },
+   { 640,  0x68 }, { 768,  0x69 }, { 896,  0x6A }, { 1024, 0x6B },
+   { 1152, 0x6C }, { 1280, 0x6D }, { 1536, 0x6E }, { 1920, 0x6F },
+   { 1280, 0x70 }, { 1536, 0x71 }, { 1792, 0x72 }, { 2048, 0x73 },
+   { 2304, 0x74 }, { 2560, 0x75 }, { 3072, 0x76 }, { 3840, 0x77 },
+   { 2560, 0x78 }, { 3072, 0x79 }, { 3584, 0x7A }, { 4096, 0x7B },
+   { 4608, 0x7C }, { 5120, 0x7D }, { 6144, 0x7E }, { 7680, 0x7F },
+};
+
+static struct imx_i2c_clk_pair mul4_i2c_clk_div[] = {
+   { 80,0x80 }, { 88,0x81 }, { 96,0x82 }, { 104,   0x83 },
+   { 112,   0x84 }, { 120,   0x85 }, { 136,   0x86 }, { 160,   0x87 },
+   { 112,   0x88 }, { 128,   0x89 }, { 144,   0x8A }, { 160,   0x8B },
+   { 176,   0x8C }, { 192,   0x8D }, { 224,   0x8E }, { 272,   0x8F },
+   { 192,   0x90 }, { 224,   0x91 }, { 256,   0x92 }, { 288,   0x93 },
+   { 320,   0x94 }, { 352,   0x95 }, { 416,   0x96 }, { 512,   0x97 },
+   { 320,   0x98 }, { 384,   0x99 }, { 448,   0x9A }, { 512,   0x9B },
+   { 576,   0x9C }, { 640,   0x9D }, { 768,   0x9E }, { 960,   0x9F },
+   { 640,   0xA0 }, { 768,   0xA1 }, { 896,   0xA2 }, { 1024,  0xA3 },
+   { 1152,  0xA4 }, { 1280,  0xA5 }, { 1536,  0xA6 }, { 1792,  0xAA },
+   { 1280,  0xA8 }, { 1536,  0xA9 }, { 1920,  0xA7 }, { 2048,  0xAB },
+   { 2304,  0xAC }, { 2560,  0xAD }, { 3072,  0xAE }, { 3584,  0xB2 },
+   { 2560,  0xB0 }, { 3072,  0xB1 }, { 3820,  0xAF }, { 4096,  0xB3 },
+   { 4608,  0xB4 }, { 5120,  0xB5 }, { 6144,  0xB6 }, { 7680,  0xB7 },
+   { 5120,  0xB8 }, { 6144,  0xB9 }, { 7168,  0xBA }, { 8192,  0xBB },
+   { 9216,  0xBC }, { 10240, 0xBD }, { 12288, 0xBE }, { 15360, 0xBF },
+};
+
 enum imx_i2c_type {
IMX1_I2C,
IMX21_I2C,
@@ -234,6 +273,24 @@ static struct imx_i2c_hwdata vf610_i2c_hwdata = {
 
 };
 
+static struct imx_i2c_hwdata mul2_i2c_hwdata = {
+   .devtype= VF610_I2C,
+   .regshift   = VF610_I2C_REGSHIFT,
+   .clk_div= mul2_i2c_clk_div,
+   .ndivs  = ARRAY_SIZE(mul2_i2c_clk_div),
+   .i2sr_clr_opcode= I2SR_CLR_OPCODE_W1C,
+   .i2cr_ien_opcode= I2CR_IEN_OPCODE_0,
+};
+
+static struct imx_i2c_hwdata mul4_i2c_hwdata = {
+   .devtype= VF610_I2C,
+   .regshift   = VF610_I2C_REGSHIFT,
+   .clk_div= mul4_i2c_clk_div,
+   .ndivs  = ARRAY_SIZE(mul4_i2c_clk_div),
+   .i2sr_clr_opcode= I2SR_CLR_OPCODE_W1C,
+   .i2cr_ien_opcode= I2CR_IEN_OPCODE_0,
+};
+
 static const struct platform_device_id imx_i2c_devtype[] = {
{
.name = "imx1-i2c",
@@ -1058,6 +1115,7 @@ static int i2c_imx_probe(struct platform_device *pdev)
void __iomem *base;
int irq, ret;
dma_addr_t phy_addr;
+   u32 mul_value;
 
dev_dbg(>dev, "<%s>\n", __func__);
 
@@ -1077,11 +1135,20 @@ static int i2c_imx_probe(struct platform_device *pdev)
if (!i2c_imx)
return -ENOMEM;
 
-   if (of_id)
+   if (of_id) {
i2c_imx->hwdata = of_id->data;
-   else
+   ret = of_property_read_u32(pdev->dev.of_node,
+  

[PATCH 3/3] arm64: dts: fsl: ls1046a: Add mul-value property of the i2c controller nodes

2019-04-29 Thread Chuanhua Han
According to LS1046A Reference Manual, for the i2c controller, you have
up to three MUL options available for all divider values. Therefore, we
need to determine which MUL to use in the device tree for driver use.

The "mul-value" property provides which mul is used in our driver.

Signed-off-by: Chuanhua Han 
---
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index b0ef08b090dd..373310e4c0ea 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -385,6 +385,7 @@
dmas = < 1 39>,
   < 1 38>;
dma-names = "tx", "rx";
+   mul-value = <4>;
status = "disabled";
};
 
@@ -395,6 +396,7 @@
reg = <0x0 0x219 0x0 0x1>;
interrupts = ;
clocks = < 4 1>;
+   mul-value = <4>;
status = "disabled";
};
 
@@ -405,6 +407,7 @@
reg = <0x0 0x21a 0x0 0x1>;
interrupts = ;
clocks = < 4 1>;
+   mul-value = <4>;
status = "disabled";
};
 
@@ -415,6 +418,7 @@
reg = <0x0 0x21b 0x0 0x1>;
interrupts = ;
clocks = < 4 1>;
+   mul-value = <4>;
status = "disabled";
};
 
-- 
2.17.1



PROBLEM: Elan touchpad regression on Kernel 5.0.10

2019-04-29 Thread Outvi V
Hello,

[1.] One line summary of the problem: Elan touchpad regression on Kernel 5.0.10

[2.] Full description of the problem/report:
  Elan touchpad does not work on 5.0.10 while working on 5.0.9

[3.] Keywords: elan_i2c_core elan i2c touchpad 5.0.10

[4.] Kernel information
[4.1.] Kernel version:
  Linux version 5.0.10-arch1-1-ARCH (builduser@heftig-2592) (gcc version 8.3.0 
(GCC)) #1 SMP PREEMPT Sat Apr 27 20:06:45 UTC 2019
[4.2.] Kernel .config file:
  I'm not sure, but I think it may be referring to
  
https://git.archlinux.org/svntogit/packages.git/tree/trunk/config?h=packages/linux
[5.] Most recent kernel version which did not have the bug: 5.0.9

[6.] Output of Oops.. message (if applicable) with symbolic information
 resolved (Not appliable)
[7.] A small shell script or example program which triggers the
 problem: (Not appliable)

[8.] Environment
[8.1.] Software (add the output of the ver_linux script here)
  
Linux sheltty 5.0.10-arch1-1-ARCH #1 SMP PREEMPT Sat Apr 27 20:06:45 UTC 2019 
x86_64 GNU/Linux

GNU C   8.3.0
GNU Make4.2.1
Binutils2.32
Util-linux  2.33.2
Mount   2.33.2
Module-init-tools   26
E2fsprogs   1.45.0
Jfsutils1.1.15
Reiserfsprogs   3.6.27
Xfsprogs4.20.0
PPP 2.4.7
Linux C Library 2.29
Dynamic linker (ldd)2.29
Linux C++ Library   6.0.25
Procps  3.3.15
Kbd 2.0.4
Console-tools   2.0.4
Sh-utils8.31
Udev242
Modules Loaded  8021q 8250_dw ac ac97_bus acpi_thermal_rel aesni_intel 
aes_x86_64 agpgart ahci arc4 atkbd battery bbswitch bluetooth btbcm btintel 
btrtl btusb cfg80211 coretemp crc16 crc32c_generic crc32c_intel crc32_pclmul 
crct10dif_pclmul cryptd crypto_simd crypto_user drm drm_kms_helper ecdh_generic 
elan_i2c evdev ext4 fat fb_sys_fops fscrypto garp ghash_clmulni_intel 
glue_helper hid hid_generic i2c_algo_bit i2c_hid i2c_i801 i8042 i915 idma64 
input_leds int3400_thermal int3403_thermal int340x_thermal_zone intel_cstate 
intel_gtt intel_lpss intel_lpss_pci intel_pch_thermal intel_powerclamp 
intel_rapl intel_rapl_perf intel_soc_dts_iosf intel_uncore 
intel_wmi_thunderbolt ip_tables irqbypass iTCO_vendor_support iTCO_wdt jbd2 
joydev kvm kvmgt kvm_intel ledtrig_audio libahci libata libphy libps2 llc 
mac80211 mac_hid mbcache mdev media mei mei_me mousedev mrp nls_cp437 
nls_iso8859_1 pcc_cpufreq processor_thermal_device r8169 r8822be realtek rfkill 
rng_core scsi_mod serio serio_raw snd snd_compress snd_hda_codec 
snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_core 
snd_hda_ext_core snd_hda_intel snd_hwdep snd_pcm snd_pcm_dmaengine snd_soc_acpi 
snd_soc_acpi_intel_match snd_soc_core snd_soc_hdac_hda snd_soc_skl 
snd_soc_skl_ipc snd_soc_sst_dsp snd_soc_sst_ipc snd_timer soundcore stp 
syscopyarea sysfillrect sysimgblt tpm tpm_crb tpm_tis tpm_tis_core typec 
typec_ucsi ucsi_acpi usbhid uvcvideo vfat vfio vfio_iommu_type1 vfio_mdev 
videobuf2_common videobuf2_memops videobuf2_v4l2 videobuf2_vmalloc videodev wmi 
wmi_bmof x86_pkg_temp_thermal xhci_hcd xhci_pci x_tables

[8.2.] Processor information (from /proc/cpuinfo): (Maybe not appliable)
[8.3.] Module information (from /proc/modules): 

(Parts related to i2c and elan:)

i2c_algo_bit 16384 1 i915, Live 0x
i2c_hid 32768 0 - Live 0x
hid 147456 3 hid_generic,usbhid,i2c_hid, Live 0x
elan_i2c 49152 0 - Live 0x
i2c_i801 36864 0 - Live 0x

[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)

/proc/ioports:
- : PCI Bus :00
  - : dma1
  - : pic1
  - : iTCO_wdt
  - : timer0
  - : timer1
  - : keyboard
  - : PNP0C09:00
- : EC data
  - : keyboard
  - : PNP0C09:00
- : EC cmd
  - : rtc0
  - : dma page reg
  - : pic2
  - : dma2
  - : fpu
- : PNP0C04:00
  - : iTCO_wdt
  - : pnp 00:02
- : PCI conf1
- : PCI Bus :00
  - : pnp 00:02
  - : pnp 00:00
- : ACPI PM1a_EVT_BLK
- : ACPI PM1a_CNT_BLK
- : ACPI PM_TMR
- : ACPI CPU throttle
- : ACPI PM2_CNT_BLK
- : pnp 00:04
- : ACPI GPE0_BLK
  - : pnp 00:01
  - : PCI Bus :08
- : :08:00.0
  - : PCI Bus :07
- : :07:00.0
  - : r8822be
  - : PCI Bus :01
- : :01:00.0
  - : :00:02.0
  - : :00:1f.4
- : i801_smbus
  - : :00:17.0
- : ahci
  - : :00:17.0
- : ahci
  - : :00:17.0
- : ahci


[8.5.] PCI information
  It seems to be long 

Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

2019-04-29 Thread Al Viro
On Mon, Apr 29, 2019 at 10:18:04PM -0600, Andreas Dilger wrote:
> > 
> > void*i_private; /* fs or device private pointer */
> > +   void (*free_inode)(struct inode *);
> 
> It seems like a waste to increase the size of every struct inode just to 
> access
> a static pointer.  Is this the only place that ->free_inode() is called?  Why
> not move the ->free_inode() pointer into inode->i_fop->free_inode() so that it
> is still directly accessible at this point.

i_op, surely?  In any case, increasing sizeof(struct inode) is not a problem -
if anything, I'd turn ->i_fop into an anon union with that.  As in,

diff --git a/Documentation/filesystems/porting 
b/Documentation/filesystems/porting
index 9d80f9e0855e..b8d3ddd8b8db 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -655,3 +655,11 @@ in your dentry operations instead.
* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
* combination of NULL ->destroy_inode and NULL ->free_inode is
  treated as NULL/free_inode_nonrcu, to preserve the 
compatibility.
+
+   Note that the callback (be it via ->free_inode() or explicit call_rcu()
+   in ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
+   as the matter of fact, the superblock and all associated structures
+   might be already gone.  The filesystem driver is guaranteed to be still
+   there, but that's it.  Freeing memory in the callback is fine; doing
+   more than that is possible, but requires a lot of care and is best
+   avoided.
diff --git a/fs/inode.c b/fs/inode.c
index fb45590d284e..627e1766503a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -211,8 +211,8 @@ EXPORT_SYMBOL(free_inode_nonrcu);
 static void i_callback(struct rcu_head *head)
 {
struct inode *inode = container_of(head, struct inode, i_rcu);
-   if (inode->i_sb->s_op->free_inode)
-   inode->i_sb->s_op->free_inode(inode);
+   if (inode->free_inode)
+   inode->free_inode(inode);
else
free_inode_nonrcu(inode);
 }
@@ -236,6 +236,7 @@ static struct inode *alloc_inode(struct super_block *sb)
if (!ops->free_inode)
return NULL;
}
+   inode->free_inode = ops->free_inode;
i_callback(>i_rcu);
return NULL;
}
@@ -276,6 +277,7 @@ static void destroy_inode(struct inode *inode)
if (!ops->free_inode)
return;
}
+   inode->free_inode = ops->free_inode;
call_rcu(>i_rcu, i_callback);
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2e9b9f87caca..92732286b748 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -694,7 +694,10 @@ struct inode {
 #ifdef CONFIG_IMA
atomic_ti_readcount; /* struct files open RO */
 #endif
-   const struct file_operations*i_fop; /* former 
->i_op->default_file_ops */
+   union {
+   const struct file_operations*i_fop; /* former 
->i_op->default_file_ops */
+   void (*free_inode)(struct inode *);
+   };
struct file_lock_context*i_flctx;
struct address_spacei_data;
struct list_headi_devices;


Re: [PATCH RESEND] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread Ingo Molnar


* Tobin C. Harding  wrote:

> Currently error return from kobject_init_and_add() is not followed by a
> call to kobject_put().  This means there is a memory leak.
> 
> Add call to kobject_put() in error path of kobject_init_and_add().
> 
> Signed-off-by: Tobin C. Harding 
> ---
> 
> Resend with SOB tag.

Please ignore my previous mail :-)

Thanks,

Ingo


Re: [PATCH] sched/cpufreq: Fix kobject memleak

2019-04-29 Thread Ingo Molnar


* Tobin C. Harding  wrote:

> Currently error return from kobject_init_and_add() is not followed by a
> call to kobject_put().  This means there is a memory leak.
> 
> Add call to kobject_put() in error path of kobject_init_and_add().
> ---
>  kernel/sched/cpufreq_schedutil.c | 1 +
>  1 file changed, 1 insertion(+)

I've added your:

   Signed-off-by: Tobin C. Harding 

Which I suppose you intended to include?

Thanks,

Ingo


Re: [PATCH 1/2] RISC-V: Add DT documentation for SiFive L2 Cache Controller

2019-04-29 Thread Yash Shah
On Fri, Apr 26, 2019 at 3:04 PM Sudeep Holla  wrote:
>
> On Fri, Apr 26, 2019 at 11:20:17AM +0530, Yash Shah wrote:
> > On Thu, Apr 25, 2019 at 3:43 PM Sudeep Holla  wrote:
> > >
> > > On Thu, Apr 25, 2019 at 11:24:55AM +0530, Yash Shah wrote:
> > > > Add device tree bindings for SiFive FU540 L2 cache controller driver
> > > >
> > > > Signed-off-by: Yash Shah 
> > > > ---
> > > >  .../devicetree/bindings/riscv/sifive-l2-cache.txt  | 53 
> > > > ++
> > > >  1 file changed, 53 insertions(+)
> > > >  create mode 100644 
> > > > Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt
> > > >
> > > > diff --git 
> > > > a/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt 
> > > > b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt
> > > > new file mode 100644
> > > > index 000..15132e2
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt
> > > > @@ -0,0 +1,53 @@
> > > > +SiFive L2 Cache Controller
> > > > +--
> > > > +The SiFive Level 2 Cache Controller is used to provide access to fast 
> > > > copies
> > > > +of memory for masters in a Core Complex. The Level 2 Cache Controller 
> > > > also
> > > > +acts as directory-based coherency manager.
> > > > +
> > > > +Required Properties:
> > > > +
> > > > +- compatible: Should be "sifive,fu540-c000-ccache"
> > > > +
> > > > +- cache-block-size: Specifies the block size in bytes of the cache
> > > > +
> > > > +- cache-level: Should be set to 2 for a level 2 cache
> > > > +
> > > > +- cache-sets: Specifies the number of associativity sets of the cache
> > > > +
> > > > +- cache-size: Specifies the size in bytes of the cache
> > > > +
> > > > +- cache-unified: Specifies the cache is a unified cache
> > > > +
> > > > +- interrupt-parent: Must be core interrupt controller
> > > > +
> > > > +- interrupts: Must contain 3 entries (DirError, DataError and DataFail 
> > > > signals)
> > > > +
> > > > +- reg: Physical base address and size of L2 cache controller registers 
> > > > map
> > > > +
> > > > +- reg-names: Should be "control"
> > > > +
> > >
> > > It would be good if you mark the properties that are present in DT
> > > specification and those that are added for sifive,fu540-c000-ccache
> >
> > I believe there isn't any property which is added explicitly for
> > sifive,fu540-c000-ccache.
> >
>
> reg and interrupts are generally optional for normal cache and may be
> required for cache controller like this. DT specification[1] covers
> only caches and not cache controllers.

Are you suggesting something like this:

Required Properties:

Standard Properties:
- compatible: Should be "sifive,-ccache"
  Supported compatible strings are:
  "sifive,fu540-c000-ccache" and "sifive,fu740-c000-ccache"

- cache-block-size: Specifies the block size in bytes of the cache

- cache-level: Should be set to 2 for a level 2 cache

- cache-sets: Specifies the number of associativity sets of the cache

- cache-size: Specifies the size in bytes of the cache

- cache-unified: Specifies the cache is a unified cache

Non-Standard Properties:
- interrupt-parent: Must be core interrupt controller

- interrupts: Must contain 3 entries for FU540 (DirError, DataError and
  DataFail signals) or 4 entries for other chips (DirError, DirFail, DataError,
  DataFail signals)

- reg: Physical base address and size of L2 cache controller registers map

- reg-names: Should be "control"

- Yash
>
> --
> Regards,
> Sudeep
>
> [1] 
> https://github.com/devicetree-org/devicetree-specification/releases/download/v0.2/devicetree-specification-v0.2.pdf


Re: [PATCH v4 1/7] ocxl: Split pci.c

2019-04-29 Thread Andrew Donnellan

On 27/3/19 4:31 pm, Alastair D'Silva wrote:

From: Alastair D'Silva 

In preparation for making core code available for external drivers,
move the core code out of pci.c and into core.c

Signed-off-by: Alastair D'Silva 


There doesn't seem to be much left in pci.c, is there?

Acked-by: Andrew Donnellan 


---
  drivers/misc/ocxl/Makefile|   1 +
  drivers/misc/ocxl/core.c  | 517 +
  drivers/misc/ocxl/ocxl_internal.h |   5 +
  drivers/misc/ocxl/pci.c   | 519 +-
  4 files changed, 524 insertions(+), 518 deletions(-)
  create mode 100644 drivers/misc/ocxl/core.c

diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
index 5229dcda8297..bc4e39bfda7b 100644
--- a/drivers/misc/ocxl/Makefile
+++ b/drivers/misc/ocxl/Makefile
@@ -3,6 +3,7 @@ ccflags-$(CONFIG_PPC_WERROR)+= -Werror
  
  ocxl-y+= main.o pci.o config.o file.o pasid.o

  ocxl-y+= link.o context.o afu_irq.o sysfs.o 
trace.o
+ocxl-y += core.o
  obj-$(CONFIG_OCXL)+= ocxl.o
  
  # For tracepoints to include our trace.h from tracepoint infrastructure:

diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
new file mode 100644
index ..1a4411b72d35
--- /dev/null
+++ b/drivers/misc/ocxl/core.c
@@ -0,0 +1,517 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2019 IBM Corp.
+#include 
+#include "ocxl_internal.h"
+
+static struct ocxl_fn *ocxl_fn_get(struct ocxl_fn *fn)
+{
+   return (get_device(>dev) == NULL) ? NULL : fn;
+}
+
+static void ocxl_fn_put(struct ocxl_fn *fn)
+{
+   put_device(>dev);
+}
+
+struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu)
+{
+   return (get_device(>dev) == NULL) ? NULL : afu;
+}
+
+void ocxl_afu_put(struct ocxl_afu *afu)
+{
+   put_device(>dev);
+}
+
+static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
+{
+   struct ocxl_afu *afu;
+
+   afu = kzalloc(sizeof(struct ocxl_afu), GFP_KERNEL);
+   if (!afu)
+   return NULL;
+
+   mutex_init(>contexts_lock);
+   mutex_init(>afu_control_lock);
+   idr_init(>contexts_idr);
+   afu->fn = fn;
+   ocxl_fn_get(fn);
+   return afu;
+}
+
+static void free_afu(struct ocxl_afu *afu)
+{
+   idr_destroy(>contexts_idr);
+   ocxl_fn_put(afu->fn);
+   kfree(afu);
+}
+
+static void free_afu_dev(struct device *dev)
+{
+   struct ocxl_afu *afu = to_ocxl_afu(dev);
+
+   ocxl_unregister_afu(afu);
+   free_afu(afu);
+}
+
+static int set_afu_device(struct ocxl_afu *afu, const char *location)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int rc;
+
+   afu->dev.parent = >dev;
+   afu->dev.release = free_afu_dev;
+   rc = dev_set_name(>dev, "%s.%s.%hhu", afu->config.name, location,
+   afu->config.idx);
+   return rc;
+}
+
+static int assign_afu_actag(struct ocxl_afu *afu, struct pci_dev *dev)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int actag_count, actag_offset;
+
+   /*
+* if there were not enough actags for the function, each afu
+* reduces its count as well
+*/
+   actag_count = afu->config.actag_supported *
+   fn->actag_enabled / fn->actag_supported;
+   actag_offset = ocxl_actag_afu_alloc(fn, actag_count);
+   if (actag_offset < 0) {
+   dev_err(>dev, "Can't allocate %d actags for AFU: %d\n",
+   actag_count, actag_offset);
+   return actag_offset;
+   }
+   afu->actag_base = fn->actag_base + actag_offset;
+   afu->actag_enabled = actag_count;
+
+   ocxl_config_set_afu_actag(dev, afu->config.dvsec_afu_control_pos,
+   afu->actag_base, afu->actag_enabled);
+   dev_dbg(>dev, "actag base=%d enabled=%d\n",
+   afu->actag_base, afu->actag_enabled);
+   return 0;
+}
+
+static void reclaim_afu_actag(struct ocxl_afu *afu)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int start_offset, size;
+
+   start_offset = afu->actag_base - fn->actag_base;
+   size = afu->actag_enabled;
+   ocxl_actag_afu_free(afu->fn, start_offset, size);
+}
+
+static int assign_afu_pasid(struct ocxl_afu *afu, struct pci_dev *dev)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int pasid_count, pasid_offset;
+
+   /*
+* We only support the case where the function configuration
+* requested enough PASIDs to cover all AFUs.
+*/
+   pasid_count = 1 << afu->config.pasid_supported_log;
+   pasid_offset = ocxl_pasid_afu_alloc(fn, pasid_count);
+   if (pasid_offset < 0) {
+   dev_err(>dev, "Can't allocate %d PASIDs for AFU: %d\n",
+   pasid_count, pasid_offset);
+   return pasid_offset;
+   }
+   afu->pasid_base = fn->pasid_base + pasid_offset;
+   afu->pasid_count = 0;
+   afu->pasid_max = pasid_count;
+
+   ocxl_config_set_afu_pasid(dev, 

Re: [PATCH V2] staging: fieldbus: anybus-s: force endiannes annotation

2019-04-29 Thread Al Viro
On Tue, Apr 30, 2019 at 05:33:10AM +0200, Nicholas Mc Guire wrote:

> ok - my bad thn - I had assumed that using __force is reasonable
> if the handling is correct and its a localized conversoin only 
> like var = be16_to_cpu(var) which evaded introducing additinal
> variables just to have different types but no different function.

If compiler can't recognize that in

T1 v1;
T2 v2;

code using v1, but not v2
v2 = f(v1);
code using v2, but not v1

it can use the same memory for v1 and v2, file a bug against the
compiler.  Or stop using that toy altogether - that kind of
optimizations is early 60s stuff and any real compiler will
handle that.  Both gcc and clang certainly do handle that.

Another thing they handle is figuring out that be16_to_cpu()
et.al. are pure functions, so

f(be16_to_cpu(n));
no modifications of n
g(be16_to_cpu(n));

doesn't need to have le16_to_cpu recalculated.  IOW, that particular
code could as well have been
dev_info(dev, "Fieldbus type: %04X", be16_to_cpu(fieldbus_type));
...
cd->client->fieldbus_type = be16_to_cpu(fieldbus_type);

... not that there's much sense keeping ->fieldbus_type in host-endian,
while we are at it.


Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

2019-04-29 Thread Andreas Dilger
On Apr 29, 2019, at 9:09 PM, Al Viro  wrote:
> 
> On Tue, Apr 16, 2019 at 11:01:16AM -0700, Linus Torvalds wrote:
>> 
>> I only skimmed through the actual filesystem (and one networking)
>> patches, but they looked like trivial conversions to a better
>> interface.
> 
> ... except that this callback can (and always could) get executed after
> freeing struct super_block.  So we can't just dereference ->i_sb->s_op
> and expect to survive; the table ->s_op pointed to will still be there,
> but ->i_sb might very well have been freed, with all its contents overwritten.
> We need to copy the callback into struct inode itself, unfortunately.
> The following incremental fixes it; I'm going to fold it into the first
> commit in there.
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index fb45590d284e..855dad43b11d 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -164,6 +164,7 @@ int inode_init_always(struct super_block *sb, struct 
> inode *inode)
>   inode->i_wb_frn_avg_time = 0;
>   inode->i_wb_frn_history = 0;
> #endif
> + inode->free_inode = sb->s_op->free_inode;
> 
>   if (security_inode_alloc(inode))
>   goto out;
> @@ -211,8 +212,8 @@ EXPORT_SYMBOL(free_inode_nonrcu);
> static void i_callback(struct rcu_head *head)
> {
>   struct inode *inode = container_of(head, struct inode, i_rcu);
> - if (inode->i_sb->s_op->free_inode)
> - inode->i_sb->s_op->free_inode(inode);
> + if (inode->free_inode)
> + inode->free_inode(inode);
>   else
>   free_inode_nonrcu(inode);
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 2e9b9f87caca..5ed6b39e588e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -718,6 +718,7 @@ struct inode {
> #endif
> 
>   void*i_private; /* fs or device private pointer */
> + void (*free_inode)(struct inode *);

It seems like a waste to increase the size of every struct inode just to access
a static pointer.  Is this the only place that ->free_inode() is called?  Why
not move the ->free_inode() pointer into inode->i_fop->free_inode() so that it
is still directly accessible at this point.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

2019-04-29 Thread Al Viro
On Mon, Apr 29, 2019 at 08:37:29PM -0700, Linus Torvalds wrote:
> On Mon, Apr 29, 2019, 20:09 Al Viro  wrote:
> 
> >
> > ... except that this callback can (and always could) get executed after
> > freeing struct super_block.
> >
> 
> Ugh.
> 
> That food looks nasty. Shouldn't the super block freeing wait for the
> filesystem to be all done instead? Do a rcu synchronization or something?
> 
> Adding that pointer looks really wrong to me. I'd much rather delay the sb
> freeing. Is there some reason that can't be done that I'm missing?

Where would you put that synchronize_rcu()?  Doing that before ->put_super()
is too early - inode references might be dropped in there.  OTOH, doing
that after that point means that while struct super_block itself will be
there, any number of data structures hanging from it might be not.

So we are still very limited in what we can do inside ->free_inode()
instance *and* we get bunch of synchronize_rcu() for no good reason.

Note that for normal lockless accesses (lockless ->d_revalidate(), ->d_hash(),
etc.) we are just fine with having struct super_block freeing RCU-delayed
(along with any data structures we might need) - the superblock had
been seen at some point after we'd taken rcu_read_lock(), so its
freeing won't happen until we drop it.  So we don't need synchronize_rcu()
for that.

Here the problem is that we are dealing with another RCU callback;
synchronize_rcu() would be needed for it, but it will only protect that
intermediate dereference of ->i_sb; any rcu-delayed stuff scheduled
from inside ->put_super() would not be ordered wrt ->free_inode().
And if we are doing that just for the sake of that one dereference,
we might as well do it before scheduling i_callback().

PS: we *are* guaranteed that module will still be there (unregister_filesystem()
does synchronize_rcu() and rcu_barrier() is done before kmem_cache_destroy()
in assorted exit_foo_fs()).


linux-next: manual merge of the mlx5-next tree with the rdma tree

2019-04-29 Thread Stephen Rothwell
Hi Leon,

Today's linux-next merge of the mlx5-next tree got a conflict in:

  drivers/infiniband/hw/mlx5/main.c

between commit:

  35b0aa67b298 ("RDMA/mlx5: Refactor netdev affinity code")

from the rdma tree and commit:

  c42260f19545 ("net/mlx5: Separate and generalize dma device from pci device")

from the mlx5-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/infiniband/hw/mlx5/main.c
index 6135a0b285de,fae6a6a1fbea..
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@@ -200,12 -172,18 +200,12 @@@ static int mlx5_netdev_event(struct not
  
switch (event) {
case NETDEV_REGISTER:
 +  /* Should already be registered during the load */
 +  if (ibdev->is_rep)
 +  break;
write_lock(>netdev_lock);
-   if (ndev->dev.parent == >pdev->dev)
 -  if (ibdev->rep) {
 -  struct mlx5_eswitch *esw = ibdev->mdev->priv.eswitch;
 -  struct net_device *rep_ndev;
 -
 -  rep_ndev = mlx5_ib_get_rep_netdev(esw,
 -ibdev->rep->vport);
 -  if (rep_ndev == ndev)
 -  roce->netdev = ndev;
 -  } else if (ndev->dev.parent == mdev->device) {
++  if (ndev->dev.parent == mdev->device)
roce->netdev = ndev;
 -  }
write_unlock(>netdev_lock);
break;
  


pgp_PtkGrXy9B.pgp
Description: OpenPGP digital signature


REVIEW NOTICE ???

2019-04-29 Thread Hans erich helmut
Dear friend ,

My name is Hans Erich Helmut .

I have a client who is interested to invest in your country, she is a well 
known politician in her country and deserve a lucrative investment partnership 
with you outside her country without any delay   Please can you manage such 
investment please Kindly reply for further details.

Yours sincerely,
Hans Erich Helmut
London,UK.


linux-next: build warning after merge of the thermal tree

2019-04-29 Thread Stephen Rothwell
Hi Zhang,

After merging the thermal tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:

boolean symbol THERMAL tested for 'm'? test forced to 'n'

Introduced by commit

  be33e4fbbea5 ("thermal/drivers/core: Remove the module Kconfig's option")

There is a test for =m in drivers/net/ethernet/mellanox/mlxsw/Kconfig.

-- 
Cheers,
Stephen Rothwell


pgppg10Zmo5Rl.pgp
Description: OpenPGP digital signature


[PATCH v6 0/4] x86: Add the support of ACRN guest under x86

2019-04-29 Thread Zhao Yakui
ACRN is a flexible, lightweight reference hypervisor, built with real-time
and safety-criticality in mind, optimized to streamline embedded development
through an open source platform. It is built for embedded IOT with small
footprint and real-time features. More details can be found
in https://projectacrn.org/

This is the patch set that allows the Linux to work on ACRN hypervisor and it 
can
work with the following patch set to manage the Linux guest on ACRN hypervisor. 
It
includes the detection of ACRN hypervisor, upcall notification vector from
hypervisor, hypercall. The hypervisor detection is similar to Xen/VMWARE/Hyperv.
ACRN also uses the upcall notification mechanism similar to that in 
Xen/Microsoft
HyperV when it needs to send the notification to Linux guest. The hypercall 
provides
the mechanism that can be used to query/configure the ACRN hypervisor by Linux 
guest.

Following this patch set, we will send acrn driver part, which provides the 
interface
that can be used to manage the virtualized CPU/memory/device/interrupt for 
other guest
OS after the ACRN hypervisor is detected.

v1->v2: Change the CONFIG_ACRN to CONFIG_ACRN_GUEST, which makes it easy to
understand.
Remove the export of x86_hyper_acrn.
Remove the unused API definition of acrn_setup_intr_handler and
acrn_remove_intr_handler.
Adjust the order of header file
Add the declaration of acrn_hv_vector_handler and tracing
definition of acrn_hv_callback_vector.
Refine the comments for the function of acrn_hypercall0/1/2

v2-v3:  Add one new config symbol to unify the conditional definition
of hv_irq_callback_count
Use the "vmcall" mnemonic to replace the hard-code byte definition
Remove the unnecessary dependency of CONFIG_PARAVIRT for ACRN_GUEST

v3-v4:  Rename the file name of acrnhyper.h to acrn.h
Refine the commit log and some other minor changes(more comments and 
redundant ifdef in acrn.h, sorting the header file in acrn.c)

v4->v5: Minor changes of comments/commit log in patch 04
Use _ASM_X86_ACRN_HYPERCALL_H instead of _ASM_X86_ACRNHYPERCALL_H.
Use the "VMCALL" mnemonic in comment/commit log.
Uppercase r8/rdi/rsi/rax for hypercall parameter register in comment.

v5->v6: Remove the explicit register variable for inline assembly
Add the "extern" for the function declaration in acrn.h
Add comments about acking ACPI EOI in acrn_hv_callback_handler
Minor changes for comments/commit log in patch 03/04


Zhao Yakui (4):
  x86/Kconfig: Add new config symbol to unify conditional definition of
hv_irq_callback_count
  x86: Add the support of Linux guest on ACRN hypervisor
  x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector
  x86/acrn: Add hypercall for ACRN guest

 arch/x86/Kconfig  | 16 +++
 arch/x86/entry/entry_64.S |  5 +++
 arch/x86/include/asm/acrn.h   | 11 +
 arch/x86/include/asm/acrn_hypercall.h | 84 +++
 arch/x86/include/asm/hardirq.h|  2 +-
 arch/x86/include/asm/hypervisor.h |  1 +
 arch/x86/kernel/cpu/Makefile  |  1 +
 arch/x86/kernel/cpu/acrn.c| 68 
 arch/x86/kernel/cpu/hypervisor.c  |  4 ++
 arch/x86/kernel/irq.c |  2 +-
 arch/x86/xen/Kconfig  |  1 +
 drivers/hv/Kconfig|  1 +
 12 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/acrn.h
 create mode 100644 arch/x86/include/asm/acrn_hypercall.h
 create mode 100644 arch/x86/kernel/cpu/acrn.c

-- 
2.7.4



[PATCH v6 1/4] x86/Kconfig: Add new config symbol to unify conditional definition of hv_irq_callback_count

2019-04-29 Thread Zhao Yakui
Add a special Kconfig symbol X86_HV_CALLBACK_VECTOR so that the guests
using the hypervisor interrupt callback counter can select and thus
enable that counter. Select it when xen or hyperv support is enabled.
No functional changes.

Signed-off-by: Zhao Yakui 
Reviewed-by: Borislav Petkov 
Reviewed-by: Thomas Gleixner 
---
v3->v4: Follow the comments to refine the commit log.
---
 arch/x86/Kconfig   | 3 +++
 arch/x86/include/asm/hardirq.h | 2 +-
 arch/x86/kernel/irq.c  | 2 +-
 arch/x86/xen/Kconfig   | 1 +
 drivers/hv/Kconfig | 1 +
 5 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 62fc3fd..2fc9297 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -791,6 +791,9 @@ config QUEUED_LOCK_STAT
  behavior of paravirtualized queued spinlocks and report
  them on debugfs.
 
+config X86_HV_CALLBACK_VECTOR
+   def_bool n
+
 source "arch/x86/xen/Kconfig"
 
 config KVM_GUEST
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index d9069bb..0753379 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -37,7 +37,7 @@ typedef struct {
 #ifdef CONFIG_X86_MCE_AMD
unsigned int irq_deferred_error_count;
 #endif
-#if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
+#ifdef CONFIG_X86_HV_CALLBACK_VECTOR
unsigned int irq_hv_callback_count;
 #endif
 #if IS_ENABLED(CONFIG_HYPERV)
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 59b5f2e..a147826 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -134,7 +134,7 @@ int arch_show_interrupts(struct seq_file *p, int prec)
seq_printf(p, "%10u ", per_cpu(mce_poll_count, j));
seq_puts(p, "  Machine check polls\n");
 #endif
-#if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
+#ifdef CONFIG_X86_HV_CALLBACK_VECTOR
if (test_bit(HYPERVISOR_CALLBACK_VECTOR, system_vectors)) {
seq_printf(p, "%*s: ", prec, "HYP");
for_each_online_cpu(j)
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index e07abef..ba5a418 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -7,6 +7,7 @@ config XEN
bool "Xen guest support"
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select X86_HV_CALLBACK_VECTOR
depends on X86_64 || (X86_32 && X86_PAE)
depends on X86_LOCAL_APIC && X86_TSC
help
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 1c1a251..cafcb97 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -6,6 +6,7 @@ config HYPERV
tristate "Microsoft Hyper-V client drivers"
depends on X86 && ACPI && X86_LOCAL_APIC && HYPERVISOR_GUEST
select PARAVIRT
+   select X86_HV_CALLBACK_VECTOR
help
  Select this option to run Linux as a Hyper-V client operating
  system.
-- 
2.7.4



[PATCH v6 4/4] x86/acrn: Add hypercall for ACRN guest

2019-04-29 Thread Zhao Yakui
When the ACRN hypervisor is detected, the hypercall is needed so that the
ACRN guest can query/config some settings. For example: it can be used
to query the resources in hypervisor and manage the CPU/memory/device/
interrupt for guest operating system.

Add the hypercall so that the ACRN guest can communicate with the
low-level ACRN hypervisor. On x86 it is implemented with the VMCALL
instruction.

Co-developed-by: Jason Chen CJ 
Signed-off-by: Jason Chen CJ 
Signed-off-by: Zhao Yakui 
Reviewed-by: Thomas Gleixner 
---
V1->V2: Refine the comments for the function of acrn_hypercall0/1/2
v2->v3: Use the "vmcall" mnemonic to replace hard-code byte definition
v4->v5: Use _ASM_X86_ACRN_HYPERCALL_H instead of _ASM_X86_ACRNHYPERCALL_H.
Use the "VMCALL" mnemonic in comment/commit log.
Uppercase r8/rdi/rsi/rax for hypercall parameter register in comment.
v5->v6: Remove explicit local register variable for inline assembly
---
 arch/x86/include/asm/acrn_hypercall.h | 84 +++
 1 file changed, 84 insertions(+)
 create mode 100644 arch/x86/include/asm/acrn_hypercall.h

diff --git a/arch/x86/include/asm/acrn_hypercall.h 
b/arch/x86/include/asm/acrn_hypercall.h
new file mode 100644
index 000..5cb438e
--- /dev/null
+++ b/arch/x86/include/asm/acrn_hypercall.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_X86_ACRN_HYPERCALL_H
+#define _ASM_X86_ACRN_HYPERCALL_H
+
+#include 
+
+#ifdef CONFIG_ACRN_GUEST
+
+/*
+ * Hypercalls for ACRN guest
+ *
+ * Hypercall number is passed in R8 register.
+ * Up to 2 arguments are passed in RDI, RSI.
+ * Return value will be placed in RAX.
+ */
+
+static inline long acrn_hypercall0(unsigned long hcall_id)
+{
+   long result;
+
+   /* the hypercall is implemented with the VMCALL instruction.
+* volatile qualifier is added to avoid that it is dropped
+* because of compiler optimization.
+*/
+   asm volatile("movq %[hcall_id], %%r8\n\t"
+"vmcall\n\t"
+: "=a" (result)
+: [hcall_id] "g" (hcall_id)
+: "r8");
+
+   return result;
+}
+
+static inline long acrn_hypercall1(unsigned long hcall_id,
+  unsigned long param1)
+{
+   long result;
+
+   asm volatile("movq %[hcall_id], %%r8\n\t"
+"vmcall\n\t"
+: "=a" (result)
+: [hcall_id] "g" (hcall_id), "D" (param1)
+: "r8");
+
+   return result;
+}
+
+static inline long acrn_hypercall2(unsigned long hcall_id,
+  unsigned long param1,
+  unsigned long param2)
+{
+   long result;
+
+   asm volatile("movq %[hcall_id], %%r8\n\t"
+"vmcall\n\t"
+: "=a" (result)
+: [hcall_id] "g" (hcall_id), "D" (param1), "S" (param2)
+: "r8");
+
+   return result;
+}
+
+#else
+
+static inline long acrn_hypercall0(unsigned long hcall_id)
+{
+   return -ENOTSUPP;
+}
+
+static inline long acrn_hypercall1(unsigned long hcall_id,
+  unsigned long param1)
+{
+   return -ENOTSUPP;
+}
+
+static inline long acrn_hypercall2(unsigned long hcall_id,
+  unsigned long param1,
+  unsigned long param2)
+{
+   return -ENOTSUPP;
+}
+#endif /* CONFIG_ACRN_GUEST */
+#endif /* _ASM_X86_ACRN_HYPERCALL_H */
-- 
2.7.4



[PATCH v6 2/4] x86: Add the support of Linux guest on ACRN hypervisor

2019-04-29 Thread Zhao Yakui
ACRN is an open-source hypervisor maintained by Linux Foundation.
It is built for embedded IOT with small footprint and real-time features.
Add the ACRN guest support so that it allows linux to be booted under the
ACRN hypervisor. Following this patch it will setup the upcall
notification vector, enable hypercall and provide the interface that is
used to manage the virtualized CPU/memory/device/interrupt for other
guest OS.

Co-developed-by: Jason Chen CJ 
Signed-off-by: Jason Chen CJ 
Signed-off-by: Zhao Yakui 
Reviewed-by: Thomas Gleixner 
---
v1->v2: Change the CONFIG_ACRN to CONFIG_ACRN_GUEST, which makes it easy to
understand.
Remove the export of x86_hyper_acrn.

v2->v3: Remove the unnecessary dependency of PARAVIRT
v3->v4: Refine the commit log and add more meaningful description in Kconfig
v4->v5: No change
v5->v6: No change
---
 arch/x86/Kconfig  | 12 
 arch/x86/include/asm/hypervisor.h |  1 +
 arch/x86/kernel/cpu/Makefile  |  1 +
 arch/x86/kernel/cpu/acrn.c| 39 +++
 arch/x86/kernel/cpu/hypervisor.c  |  4 
 5 files changed, 57 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/acrn.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2fc9297..8dc4200 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -845,6 +845,18 @@ config JAILHOUSE_GUEST
  cell. You can leave this option disabled if you only want to start
  Jailhouse and run Linux afterwards in the root cell.
 
+config ACRN_GUEST
+   bool "ACRN Guest support"
+   depends on X86_64
+   help
+ This option allows to run Linux as guest in ACRN hypervisor. Enabling
+ this will allow the kernel to boot in virtualized environment under
+ the ACRN hypervisor.
+ ACRN is a flexible, lightweight reference open-source hypervisor, 
built
+ with real-time and safety-criticality in mind. It is built for 
embedded
+ IOT with small footprint and real-time features. More details can be
+ found in https://projectacrn.org/
+
 endif #HYPERVISOR_GUEST
 
 source "arch/x86/Kconfig.cpu"
diff --git a/arch/x86/include/asm/hypervisor.h 
b/arch/x86/include/asm/hypervisor.h
index 8c5aaba..50a30f6 100644
--- a/arch/x86/include/asm/hypervisor.h
+++ b/arch/x86/include/asm/hypervisor.h
@@ -29,6 +29,7 @@ enum x86_hypervisor_type {
X86_HYPER_XEN_HVM,
X86_HYPER_KVM,
X86_HYPER_JAILHOUSE,
+   X86_HYPER_ACRN,
 };
 
 #ifdef CONFIG_HYPERVISOR_GUEST
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index cfd24f9..17a7cdf 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_X86_CPU_RESCTRL) += resctrl/
 obj-$(CONFIG_X86_LOCAL_APIC)   += perfctr-watchdog.o
 
 obj-$(CONFIG_HYPERVISOR_GUEST) += vmware.o hypervisor.o mshyperv.o
+obj-$(CONFIG_ACRN_GUEST)   += acrn.o
 
 ifdef CONFIG_X86_FEATURE_NAMES
 quiet_cmd_mkcapflags = MKCAP   $@
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
new file mode 100644
index 000..f556640
--- /dev/null
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ACRN detection support
+ *
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ *
+ * Jason Chen CJ 
+ * Zhao Yakui 
+ *
+ */
+
+#include 
+
+static uint32_t __init acrn_detect(void)
+{
+   return hypervisor_cpuid_base("ACRNACRNACRN\0\0", 0);
+}
+
+static void __init acrn_init_platform(void)
+{
+}
+
+static bool acrn_x2apic_available(void)
+{
+   /* x2apic is not supported now.
+* Later it needs to check the X86_FEATURE_X2APIC bit of cpu info
+* returned by CPUID to determine whether the x2apic is
+* supported in Linux guest.
+*/
+   return false;
+}
+
+const __initconst struct hypervisor_x86 x86_hyper_acrn = {
+   .name   = "ACRN",
+   .detect = acrn_detect,
+   .type   = X86_HYPER_ACRN,
+   .init.init_platform = acrn_init_platform,
+   .init.x2apic_available  = acrn_x2apic_available,
+};
diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c
index 479ca47..87e39ad 100644
--- a/arch/x86/kernel/cpu/hypervisor.c
+++ b/arch/x86/kernel/cpu/hypervisor.c
@@ -32,6 +32,7 @@ extern const struct hypervisor_x86 x86_hyper_xen_pv;
 extern const struct hypervisor_x86 x86_hyper_xen_hvm;
 extern const struct hypervisor_x86 x86_hyper_kvm;
 extern const struct hypervisor_x86 x86_hyper_jailhouse;
+extern const struct hypervisor_x86 x86_hyper_acrn;
 
 static const __initconst struct hypervisor_x86 * const hypervisors[] =
 {
@@ -49,6 +50,9 @@ static const __initconst struct hypervisor_x86 * const 
hypervisors[] =
 #ifdef CONFIG_JAILHOUSE_GUEST
_hyper_jailhouse,
 #endif
+#ifdef CONFIG_ACRN_GUEST
+   _hyper_acrn,
+#endif
 };
 
 enum x86_hypervisor_type x86_hyper_type;
-- 
2.7.4



[PATCH v6 3/4] x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector

2019-04-29 Thread Zhao Yakui
Linux kernel uses the HYPERVISOR_CALLBACK_VECTOR for hypervisor upcall
vector. It is already used for Xen and HyperV.
After the ACRN hypervisor is detected, it will also use this defined
vector to notify the ACRN guest.

Co-developed-by: Jason Chen CJ 
Signed-off-by: Jason Chen CJ 
Signed-off-by: Zhao Yakui 
Reviewed-by: Thomas Gleixner 
---
V1->V2: Remove the unused API definition of acrn_setup_intr_handler and
acrn_remove_intr_handler.
Adjust the order of header file
Add the declaration of acrn_hv_vector_handler and tracing
definition of acrn_hv_callback_vector.

v2->v3: No change
v3->v4: Refine the file name of acrnhyper.h to acrn.h
v5->v6: Add the "extern" for the function declarations in header file
Add some comments for calling entering_ack_irq
Some other minor changes(unnecessary spliting two lines.
and minor change in commit log)
---
 arch/x86/Kconfig|  1 +
 arch/x86/entry/entry_64.S   |  5 +
 arch/x86/include/asm/acrn.h | 11 +++
 arch/x86/kernel/cpu/acrn.c  | 29 +
 4 files changed, 46 insertions(+)
 create mode 100644 arch/x86/include/asm/acrn.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8dc4200..d7a10f6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -848,6 +848,7 @@ config JAILHOUSE_GUEST
 config ACRN_GUEST
bool "ACRN Guest support"
depends on X86_64
+   select X86_HV_CALLBACK_VECTOR
help
  This option allows to run Linux as guest in ACRN hypervisor. Enabling
  this will allow the kernel to boot in virtualized environment under
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1f0efdb..d1b8ad3 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1129,6 +1129,11 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
+#if IS_ENABLED(CONFIG_ACRN_GUEST)
+apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
+   acrn_hv_callback_vector acrn_hv_vector_handler
+#endif
+
 idtentry debug do_debughas_error_code=0
paranoid=1 shift_ist=DEBUG_STACK
 idtentry int3  do_int3 has_error_code=0
 idtentry stack_segment do_stack_segmenthas_error_code=1
diff --git a/arch/x86/include/asm/acrn.h b/arch/x86/include/asm/acrn.h
new file mode 100644
index 000..4adb13f
--- /dev/null
+++ b/arch/x86/include/asm/acrn.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_ACRN_H
+#define _ASM_X86_ACRN_H
+
+extern void acrn_hv_callback_vector(void);
+#ifdef CONFIG_TRACING
+#define trace_acrn_hv_callback_vector acrn_hv_callback_vector
+#endif
+
+extern void acrn_hv_vector_handler(struct pt_regs *regs);
+#endif /* _ASM_X86_ACRN_H */
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index f556640..ce88d2d 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -9,7 +9,11 @@
  *
  */
 
+#include 
+#include 
+#include 
 #include 
+#include 
 
 static uint32_t __init acrn_detect(void)
 {
@@ -18,6 +22,8 @@ static uint32_t __init acrn_detect(void)
 
 static void __init acrn_init_platform(void)
 {
+   /* Setup the IDT for ACRN hypervisor callback */
+   alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, acrn_hv_callback_vector);
 }
 
 static bool acrn_x2apic_available(void)
@@ -30,6 +36,29 @@ static bool acrn_x2apic_available(void)
return false;
 }
 
+static void (*acrn_intr_handler)(void);
+
+__visible void __irq_entry acrn_hv_vector_handler(struct pt_regs *regs)
+{
+   struct pt_regs *old_regs = set_irq_regs(regs);
+
+   /*
+* The hypervisor requires that the APIC EOI should be acked.
+* If the APIC EOI is not acked, the APIC ISR bit for the
+* HYPERVISOR_CALLBACK_VECTOR will not be cleared and then it
+* will block the interrupt whose vector is lower than
+* HYPERVISOR_CALLBACK_VECTOR.
+*/
+   entering_ack_irq();
+   inc_irq_stat(irq_hv_callback_count);
+
+   if (acrn_intr_handler)
+   acrn_intr_handler();
+
+   exiting_irq();
+   set_irq_regs(old_regs);
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_acrn = {
.name   = "ACRN",
.detect = acrn_detect,
-- 
2.7.4



[PATCH] drivers: thermal: processor_thermal: Read PPCC on resume

2019-04-29 Thread Srinivas Pandruvada
Read PPCC power limits on system resume in case those limits changed
while system was suspended.

Signed-off-by: Srinivas Pandruvada 
---
 .../int340x_thermal/processor_thermal_device.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c 
b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
index 436c256f111d..acb22157b9ac 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
@@ -465,6 +465,18 @@ static void  proc_thermal_pci_remove(struct pci_dev *pdev)
pci_disable_device(pdev);
 }
 
+static int proc_thermal_resume(struct device *dev)
+{
+   struct proc_thermal_device *proc_dev;
+
+   proc_dev = dev_get_drvdata(dev);
+   proc_thermal_read_ppcc(proc_dev);
+
+   return 0;
+}
+
+static SIMPLE_DEV_PM_OPS(proc_thermal_pm, NULL, proc_thermal_resume);
+
 static const struct pci_device_id proc_thermal_pci_ids[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PROC_BDW_THERMAL)},
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PROC_HSB_THERMAL)},
@@ -489,6 +501,7 @@ static struct pci_driver proc_thermal_pci_driver = {
.probe  = proc_thermal_pci_probe,
.remove = proc_thermal_pci_remove,
.id_table   = proc_thermal_pci_ids,
+   .driver.pm  = _thermal_pm,
 };
 
 static const struct acpi_device_id int3401_device_ids[] = {
@@ -503,6 +516,7 @@ static struct platform_driver int3401_driver = {
.driver = {
.name = "int3401 thermal",
.acpi_match_table = int3401_device_ids,
+   .pm = _thermal_pm,
},
 };
 
-- 
2.17.2



[PATCH] drivers: thermal: processor_thermal: Downgrade error message

2019-04-29 Thread Srinivas Pandruvada
Downgrade "Unsupported event" message from dev_err to dev_dbg. Otherwise it
floods with this message one some platforms.

Signed-off-by: Srinivas Pandruvada 
---
 .../thermal/intel/int340x_thermal/processor_thermal_device.c| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c 
b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
index 4b206b594825..436c256f111d 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
@@ -275,7 +275,7 @@ static void proc_thermal_notify(acpi_handle handle, u32 
event, void *data)
THERMAL_DEVICE_POWER_CAPABILITY_CHANGED);
break;
default:
-   dev_err(proc_priv->dev, "Unsupported event [0x%x]\n", event);
+   dev_dbg(proc_priv->dev, "Unsupported event [0x%x]\n", event);
break;
}
 }
-- 
2.17.2



Re: [PATCH V2] staging: fieldbus: anybus-s: force endiannes annotation

2019-04-29 Thread Nicholas Mc Guire
On Tue, Apr 30, 2019 at 04:02:23AM +0100, Al Viro wrote:
> On Tue, Apr 30, 2019 at 04:22:38AM +0200, Nicholas Mc Guire wrote:
> > On Mon, Apr 29, 2019 at 10:03:36AM -0400, Sven Van Asbroeck wrote:
> > > On Mon, Apr 29, 2019 at 2:11 AM Nicholas Mc Guire  
> > > wrote:
> > > >
> > > > V2: As requested by Sven Van Asbroeck  make the
> > > > impact of the patch clear in the commit message.
> > > 
> > > Thank you, but did you miss my comment about creating a local variable
> > > instead? See:
> > > https://lkml.org/lkml/2019/4/28/97
> > 
> > Did not miss it - I just don't think that makes it any more
> > understandable - the __force __be16 makes it clear I believe
> > that this is correct, sparse does not like this though - so tell
> > sparse.
> 
> ... to STFU, 'cause you know better.  The trouble is, how do we
> (or yourself a year or two later) know *why* it is correct?
> Worse, how do we (or yourself, etc.) know if a change about to be
> done to the code won't invalidate the proof of yours?
> 
> > The local variable would need to be explained as it is
> > functionally not necessary - therefor I find it more confusing
> > that using  __force here.
> 
> What's confusing is mixing host- and fixed-endian values in the
> same variable at different times.  Treat those as unrelated
> types that happen to have the same sizeof.
> 
> Quite a few of __force instances in the tree should be taken out
> and shot.  Don't add to their number.

ok - my bad thn - I had assumed that using __force is reasonable
if the handling is correct and its a localized conversoin only 
like var = be16_to_cpu(var) which evaded introducing additinal
variables just to have different types but no different function.
But the long-term issue of hiding bugs by __force makes sesne to
me - will give it another shot at scripting this in coccinelle.

thx!
hofrat


Re: [PATCH 2/2] memcg, fsnotify: no oom-kill for remote memcg charging

2019-04-29 Thread Shakeel Butt
On Mon, Apr 29, 2019 at 5:41 PM Michal Hocko  wrote:
>
> On Mon 29-04-19 10:13:32, Shakeel Butt wrote:
> [...]
> >   /*
> >* For queues with unlimited length lost events are not expected and
> >* can possibly have security implications. Avoid losing events when
> >* memory is short.
> > +  *
> > +  * Note: __GFP_NOFAIL takes precedence over __GFP_RETRY_MAYFAIL.
> >*/
>
> No, I there is no rule like that. Combining the two is undefined
> currently and I do not think we want to legitimize it. What does it even
> mean?
>

Actually the code is doing that but I agree this is not documented and
weird. I will fix this.

Shakeel


Re: [PATCH] riscv: Support non-coherency memory model

2019-04-29 Thread Guo Ren
On Mon, Apr 29, 2019 at 01:11:43PM -0700, Palmer Dabbelt wrote:
> On Mon, 22 Apr 2019 08:44:30 PDT (-0700), guo...@kernel.org wrote:
> >From: Guo Ren 
> >
> >The current riscv linux implementation requires SOC system to support
> >memory coherence between all I/O devices and CPUs. But some SOC systems
> >cannot maintain the coherence and they need support cache clean/invalid
> >operations to synchronize data.
> >
> >Current implementation is no problem with SiFive FU540, because FU540
> >keeps all IO devices and DMA master devices coherence with CPU. But to a
> >traditional SOC vendor, it may already have a stable non-coherency SOC
> >system, the need is simply to replace the CPU with RV CPU and rebuild
> >the whole system with IO-coherency is very expensive.
> >
> >So we should make riscv linux also support non-coherency memory model.
> >Here are the two points that riscv linux needs to be modified:
> >
> > - Add _PAGE_COHERENCY bit in current page table entry attributes. The bit
> >   designates a coherence for this page mapping. Software set the bit to
> >   tell the hardware that the region of the page's memory area must be
> >   coherent with IOs devices in SOC system by PMA settings.
> >   If IOs and CPU are already coherent in SOC system, CPU just ignore
> >   this bit.
> >
> >   PTE format:
> >   | XLEN-1  10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
> > PFN  C  RSW  D   A   G   U   X   W   R   V
> >  ^
> >   BIT(9): Coherence attribute bit
> >  0: hardware needn't keep the page coherenct and software will
> > maintain the coherence with cache clear/invalid operations.
> >  1: hardware must keep the page coherenct and software needn't
> > maintain the coherence.
> >   BIT(8): Reserved for software and now it's _PAGE_SPECIAL in linux
> >
> >   Add a new hardware bit in PTE also need to modify Privileged
> >   Architecture Supervisor-Level ISA:
> >   https://github.com/riscv/riscv-isa-manual/pull/374
> 
> This is a RISC-V ISA modification, which isn't really appropriate to suggest 
> on
> the kernel mailing lists.  The right place to talk about this is at the RISC-V
> foundation, which owns the ISA -- we can't change the hardware with a patch to
> Linux :).
I just want a discussion and a wide discussion is good for all of us :)

> 
> > - Add SBI_FENCE_DMA 9 in riscv-sbi.
> >   sbi_fence_dma(start, size, dir) could synchronize CPU cache data with
> >   DMA device in non-coherency memory model. The third param's definition
> >   is the same with linux's in include/linux/dma-direction.h:
> >
> >   enum dma_data_direction {
> > DMA_BIDIRECTIONAL = 0,
> > DMA_TO_DEVICE = 1,
> > DMA_FROM_DEVICE = 2,
> > DMA_NONE = 3,
> >   };
> >
> >   The first param:start must be physical address which could be handled
> >   in M-state.
> >
> >   Here is a pull request to the riscv-sbi-doc:
> >   https://github.com/riscv/riscv-sbi-doc/pull/15
> >
> >We have tested the patch on our fpga SOC system which network controller
> >connected to a non-cache-coherency interconnect in and it couldn't work
> >without the patch.
> >
> >There is no side effect for FU540 whose CPU don't care _PAGE_COHERENCY
> >in PTE, but FU540's bbl also need to implement a simple sbi_fence_dma
> >by directly return. In fact, if you give a correct configuration for
> >dev_is_dma_conherent(), linux dma framework wouldn't call sbi_fence_dma
> >any more.
> 
> Non-coherent fences also need to be discussed as part of a RISC-V ISA
   ^^
  fences instructions? not page attributes?
> extension.  
> I know people have expressed interest, but I don't know of a
> working group that's already been set up.
Is that mean current RISC-V ISA forces the SOC to be coherent memory model?

Best Regards
 Guo Ren


Re: INFO: task hung in __get_super

2019-04-29 Thread Al Viro
On Tue, Apr 30, 2019 at 04:55:01AM +0200, Jan Kara wrote:

> Yeah, you're right. And if we push the patch a bit further to not take
> loop_ctl_mutex for invalid ioctl number, that would fix the problem. I
> can send a fix.

Huh?  We don't take it until in lo_simple_ioctl(), and that patch doesn't
get to its call on invalid ioctl numbers.  What am I missing here?


[RFC PATCH v4 15/15] dcache: Add CONFIG_DCACHE_SMO

2019-04-29 Thread Tobin C. Harding
In an attempt to make the SMO patchset as non-invasive as possible add a
config option CONFIG_DCACHE_SMO (under "Memory Management options") for
enabling SMO for the DCACHE.  Whithout this option dcache constructor is
used but no other code is built in, with this option enabled slab
mobility is enabled and the isolate/migrate functions are built in.

Add CONFIG_DCACHE_SMO to guard the partial shrinking of the dcache via
Slab Movable Objects infrastructure.

Signed-off-by: Tobin C. Harding 
---
 fs/dcache.c | 4 
 mm/Kconfig  | 7 +++
 2 files changed, 11 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 3f9daba1cc78..9edce104613b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3068,6 +3068,7 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+#ifdef CONFIG_DCACHE_SMO
 /*
  * d_isolate() - Dentry isolation callback function.
  * @s: The dentry cache.
@@ -3140,6 +3141,7 @@ static void d_partial_shrink(struct kmem_cache *s, void 
**_unused, int __unused,
 
kfree(private);
 }
+#endif /* CONFIG_DCACHE_SMO */
 
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
@@ -3186,7 +3188,9 @@ static void __init dcache_init(void)
   sizeof_field(struct dentry, d_iname),
   dcache_ctor);
 
+#ifdef CONFIG_DCACHE_SMO
kmem_cache_setup_mobility(dentry_cache, d_isolate, d_partial_shrink);
+#endif
 
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
diff --git a/mm/Kconfig b/mm/Kconfig
index 47040d939f3b..92fc27ad3472 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -265,6 +265,13 @@ config SMO_NODE
help
  On NUMA systems enable moving objects to and from a specified node.
 
+config DCACHE_SMO
+   bool "Enable Slab Movable Objects for the dcache"
+   depends on SLUB
+   help
+ Under memory pressure we can try to free dentry slab cache objects 
from
+ the partial slab list if this is enabled.
+
 config PHYS_ADDR_T_64BIT
def_bool 64BIT
 
-- 
2.21.0



[RFC PATCH v4 13/15] dcache: Provide a dentry constructor

2019-04-29 Thread Tobin C. Harding
In order to support object migration on the dentry cache we need to have
a determined object state at all times. Without a constructor the object
would have a random state after allocation.

Provide a dentry constructor.

Signed-off-by: Tobin C. Harding 
---
 fs/dcache.c | 30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index aac41adf4743..3d6cc06eca56 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1603,6 +1603,16 @@ void d_invalidate(struct dentry *dentry)
 }
 EXPORT_SYMBOL(d_invalidate);
 
+static void dcache_ctor(void *p)
+{
+   struct dentry *dentry = p;
+
+   /* Mimic lockref_mark_dead() */
+   dentry->d_lockref.count = -128;
+
+   spin_lock_init(>d_lock);
+}
+
 /**
  * __d_alloc   -   allocate a dcache entry
  * @sb: filesystem it will belong to
@@ -1658,7 +1668,6 @@ struct dentry *__d_alloc(struct super_block *sb, const 
struct qstr *name)
 
dentry->d_lockref.count = 1;
dentry->d_flags = 0;
-   spin_lock_init(>d_lock);
seqcount_init(>d_seq);
dentry->d_inode = NULL;
dentry->d_parent = dentry;
@@ -3091,14 +3100,17 @@ static void __init dcache_init_early(void)
 
 static void __init dcache_init(void)
 {
-   /*
-* A constructor could be added for stable state like the lists,
-* but it is probably not worth it because of the cache nature
-* of the dcache.
-*/
-   dentry_cache = KMEM_CACHE_USERCOPY(dentry,
-   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
-   d_iname);
+   slab_flags_t flags =
+   SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | SLAB_MEM_SPREAD | 
SLAB_ACCOUNT;
+
+   dentry_cache =
+   kmem_cache_create_usercopy("dentry",
+  sizeof(struct dentry),
+  __alignof__(struct dentry),
+  flags,
+  offsetof(struct dentry, d_iname),
+  sizeof_field(struct dentry, d_iname),
+  dcache_ctor);
 
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
-- 
2.21.0



[RFC PATCH v4 11/15] slub: Enable moving objects to/from specific nodes

2019-04-29 Thread Tobin C. Harding
We have just implemented Slab Movable Objects (object migration).
Currently object migration is used to defrag a cache.  On NUMA systems
it would be nice to be able to control the source and destination nodes
when moving objects.

Add CONFIG_SMO_NODE to guard this feature.  CONFIG_SMO_NODE depends on
CONFIG_SLUB_DEBUG because we use the full list.  Leave it like this for
the RFC because the patch will be less cluttered to review, separate
full list out of CONFIG_DEBUG before doing a PATCH version.

Implement moving all objects (including those in full slabs) to a
specific node.  Expose this functionality to userspace via a sysfs entry.

Add sysfs entry:

   /sysfs/kernel/slab//move

With this users get access to the following functionality:

 - Move all objects to specified node.

echo "N1" > move

 - Move all objects from specified node to other specified
   node (from N1 -> to N2):

echo "N1 N2" > move

This also enables shrinking slabs on a specific node:

echo "N1 N1" > move

Signed-off-by: Tobin C. Harding 
---
 mm/Kconfig |   7 ++
 mm/slub.c  | 249 +
 2 files changed, 256 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 25c71eb8a7db..47040d939f3b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -258,6 +258,13 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
 config ARCH_ENABLE_THP_MIGRATION
bool
 
+config SMO_NODE
+   bool "Enable per node control of Slab Movable Objects"
+   depends on SLUB && SYSFS
+   select SLUB_DEBUG
+   help
+ On NUMA systems enable moving objects to and from a specified node.
+
 config PHYS_ADDR_T_64BIT
def_bool 64BIT
 
diff --git a/mm/slub.c b/mm/slub.c
index e601c804ed79..e4f3dde443f5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4345,6 +4345,106 @@ static void move_slab_page(struct page *page, void 
*scratch, int node)
s->migrate(s, vector, count, node, private);
 }
 
+#ifdef CONFIG_SMO_NODE
+/*
+ * kmem_cache_move() - Attempt to move all slab objects.
+ * @s: The cache we are working on.
+ * @node: The node to move objects away from.
+ * @target_node: The node to move objects on to.
+ *
+ * Attempts to move all objects (partial slabs and full slabs) to target
+ * node.
+ *
+ * Context: Takes the list_lock.
+ * Return: The number of slabs remaining on node.
+ */
+static unsigned long kmem_cache_move(struct kmem_cache *s,
+int node, int target_node)
+{
+   struct kmem_cache_node *n = get_node(s, node);
+   LIST_HEAD(move_list);
+   struct page *page, *page2;
+   unsigned long flags;
+   void **scratch;
+
+   if (!s->migrate) {
+   pr_warn("%s SMO not enabled, cannot move objects\n", s->name);
+   goto out;
+   }
+
+   scratch = alloc_scratch(s);
+   if (!scratch)
+   goto out;
+
+   spin_lock_irqsave(>list_lock, flags);
+
+   list_for_each_entry_safe(page, page2, >partial, lru) {
+   if (!slab_trylock(page))
+   /* Busy slab. Get out of the way */
+   continue;
+
+   if (page->inuse) {
+   list_move(>lru, _list);
+   /* Stop page being considered for allocations */
+   n->nr_partial--;
+   page->frozen = 1;
+
+   slab_unlock(page);
+   } else {/* Empty slab page */
+   list_del(>lru);
+   n->nr_partial--;
+   slab_unlock(page);
+   discard_slab(s, page);
+   }
+   }
+   list_for_each_entry_safe(page, page2, >full, lru) {
+   if (!slab_trylock(page))
+   continue;
+
+   list_move(>lru, _list);
+   page->frozen = 1;
+   slab_unlock(page);
+   }
+
+   spin_unlock_irqrestore(>list_lock, flags);
+
+   list_for_each_entry(page, _list, lru) {
+   if (page->inuse)
+   move_slab_page(page, scratch, target_node);
+   }
+   kfree(scratch);
+
+   /* Bail here to save taking the list_lock */
+   if (list_empty(_list))
+   goto out;
+
+   /* Inspect results and dispose of pages */
+   spin_lock_irqsave(>list_lock, flags);
+   list_for_each_entry_safe(page, page2, _list, lru) {
+   list_del(>lru);
+   slab_lock(page);
+   page->frozen = 0;
+
+   if (page->inuse) {
+   if (page->inuse == page->objects) {
+   list_add(>lru, >full);
+   slab_unlock(page);
+   } else {
+   n->nr_partial++;
+   list_add_tail(>lru, >partial);
+   slab_unlock(page);
+   }
+   } else {
+

[RFC PATCH v4 12/15] slub: Enable balancing slabs across nodes

2019-04-29 Thread Tobin C. Harding
We have just implemented Slab Movable Objects (SMO).  On NUMA systems
slabs can become unbalanced i.e. many slabs on one node while other
nodes have few slabs.  Using SMO we can balance the slabs across all
the nodes.

The algorithm used is as follows:

 1. Move all objects to node 0 (this has the effect of defragmenting the
cache).

 2. Calculate the desired number of slabs for each node (this is done
using the approximation nr_slabs / nr_nodes).

 3. Loop over the nodes moving the desired number of slabs from node 0
to the node.

Feature is conditionally built in with CONFIG_SMO_NODE, this is because
we need the full list (we enable SLUB_DEBUG to get this).  Future
version may separate final list out of SLUB_DEBUG.

Expose this functionality to userspace via a sysfs entry.  Add sysfs
entry:

   /sysfs/kernel/slab//balance

Write of '1' to this file triggers balance, no other value accepted.

This feature relies on SMO being enable for the cache, this is done with
a call to, after the isolate/migrate functions have been defined.

kmem_cache_setup_mobility(s, isolate, migrate)

Signed-off-by: Tobin C. Harding 
---
 mm/slub.c | 120 ++
 1 file changed, 120 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index e4f3dde443f5..a5c48c41d72b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4583,6 +4583,109 @@ static unsigned long kmem_cache_move_to_node(struct 
kmem_cache *s, int node)
 
return left;
 }
+
+/*
+ * kmem_cache_move_slabs() - Attempt to move @num slabs to target_node,
+ * @s: The cache we are working on.
+ * @node: The node to move objects from.
+ * @target_node: The node to move objects to.
+ * @num: The number of slabs to move.
+ *
+ * Attempts to move @num slabs from @node to @target_node.  This is done
+ * by migrating objects from slabs on the full_list.
+ *
+ * Return: The number of slabs moved or error code.
+ */
+static long kmem_cache_move_slabs(struct kmem_cache *s,
+ int node, int target_node, long num)
+{
+   struct kmem_cache_node *n = get_node(s, node);
+   LIST_HEAD(move_list);
+   struct page *page, *page2;
+   unsigned long flags;
+   void **scratch;
+   long done = 0;
+
+   if (node == target_node)
+   return -EINVAL;
+
+   scratch = alloc_scratch(s);
+   if (!scratch)
+   return -ENOMEM;
+
+   spin_lock_irqsave(>list_lock, flags);
+   list_for_each_entry_safe(page, page2, >full, lru) {
+   if (!slab_trylock(page))
+   /* Busy slab. Get out of the way */
+   continue;
+
+   list_move(>lru, _list);
+   page->frozen = 1;
+   slab_unlock(page);
+
+   if (++done >= num)
+   break;
+   }
+   spin_unlock_irqrestore(>list_lock, flags);
+
+   list_for_each_entry(page, _list, lru) {
+   if (page->inuse)
+   move_slab_page(page, scratch, target_node);
+   }
+   kfree(scratch);
+
+   /* Inspect results and dispose of pages */
+   spin_lock_irqsave(>list_lock, flags);
+   list_for_each_entry_safe(page, page2, _list, lru) {
+   list_del(>lru);
+   slab_lock(page);
+   page->frozen = 0;
+
+   if (page->inuse) {
+   /*
+* This is best effort only, if slab still has
+* objects just put it back on the partial list.
+*/
+   n->nr_partial++;
+   list_add_tail(>lru, >partial);
+   slab_unlock(page);
+   } else {
+   slab_unlock(page);
+   discard_slab(s, page);
+   }
+   }
+   spin_unlock_irqrestore(>list_lock, flags);
+
+   return done;
+}
+
+/*
+ * kmem_cache_balance_nodes() - Balance slabs across nodes.
+ * @s: The cache we are working on.
+ */
+static void kmem_cache_balance_nodes(struct kmem_cache *s)
+{
+   struct kmem_cache_node *n = get_node(s, 0);
+   unsigned long desired_nr_slabs_per_node;
+   unsigned long nr_slabs;
+   int nr_nodes = 0;
+   int nid;
+
+   (void)kmem_cache_move_to_node(s, 0);
+
+   for_each_node_state(nid, N_NORMAL_MEMORY)
+   nr_nodes++;
+
+   nr_slabs = atomic_long_read(>nr_slabs);
+   desired_nr_slabs_per_node = nr_slabs / nr_nodes;
+
+   for_each_node_state(nid, N_NORMAL_MEMORY) {
+   if (nid == 0)
+   continue;
+
+   kmem_cache_move_slabs(s, 0, nid, desired_nr_slabs_per_node);
+   }
+}
 #endif
 
 /**
@@ -5847,6 +5950,22 @@ static ssize_t move_store(struct kmem_cache *s, const 
char *buf, size_t length)
return length;
 }
 SLAB_ATTR(move);
+
+static ssize_t balance_show(struct kmem_cache *s, char *buf)
+{
+   return 0;
+}
+
+static 

[RFC PATCH v4 14/15] dcache: Implement partial shrink via Slab Movable Objects

2019-04-29 Thread Tobin C. Harding
The dentry slab cache is susceptible to internal fragmentation.  Now
that we have Slab Movable Objects we can attempt to defragment the
dcache.  Dentry objects are inherently _not_ relocatable however under
some conditions they can be free'd.  This is the same as shrinking the
dcache but instead of shrinking the whole cache we only attempt to free
those objects that are located in partially full slab pages.  There is
no guarantee that this will reduce the memory usage of the system, it is
a compromise between fragmented memory and total cache shrinkage with
the hope that some memory pressure can be alleviated.

This is implemented using the newly added Slab Movable Objects
infrastructure.  The dcache 'migration' function is intentionally _not_
called 'd_migrate' because we only free, we do not migrate.  Call it
'd_partial_shrink' to make explicit that no reallocation is done.

Implement isolate and 'migrate' functions for the dentry slab cache.

Signed-off-by: Tobin C. Harding 
---
 fs/dcache.c | 76 +
 1 file changed, 76 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 3d6cc06eca56..3f9daba1cc78 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "mount.h"
 
@@ -3067,6 +3068,79 @@ void d_tmpfile(struct dentry *dentry, struct inode 
*inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+/*
+ * d_isolate() - Dentry isolation callback function.
+ * @s: The dentry cache.
+ * @v: Vector of pointers to the objects to isolate.
+ * @nr: Number of objects in @v.
+ *
+ * The slab allocator is holding off frees. We can safely examine
+ * the object without the danger of it vanishing from under us.
+ */
+static void *d_isolate(struct kmem_cache *s, void **v, int nr)
+{
+   struct list_head *dispose;
+   struct dentry *dentry;
+   int i;
+
+   dispose = kmalloc(sizeof(*dispose), GFP_KERNEL);
+   if (!dispose)
+   return NULL;
+
+   INIT_LIST_HEAD(dispose);
+
+   for (i = 0; i < nr; i++) {
+   dentry = v[i];
+   spin_lock(>d_lock);
+
+   if (dentry->d_lockref.count > 0 ||
+   dentry->d_flags & DCACHE_SHRINK_LIST) {
+   spin_unlock(>d_lock);
+   continue;
+   }
+
+   if (dentry->d_flags & DCACHE_LRU_LIST)
+   d_lru_del(dentry);
+
+   d_shrink_add(dentry, dispose);
+   spin_unlock(>d_lock);
+   }
+
+   return dispose;
+}
+
+/*
+ * d_partial_shrink() - Dentry migration callback function.
+ * @s: The dentry cache.
+ * @_unused: We do not access the vector.
+ * @__unused: No need for length of vector.
+ * @___unused: We do not do any allocation.
+ * @private: list_head pointer representing the shrink list.
+ *
+ * Dispose of the shrink list created during isolation function.
+ *
+ * Dentry objects can _not_ be relocated and shrinking the whole dcache
+ * can be expensive.  This is an effort to free dentry objects that are
+ * stopping slab pages from being free'd without clearing the whole dcache.
+ *
+ * This callback is called from the SLUB allocator object migration
+ * infrastructure in attempt to free up slab pages by freeing dentry
+ * objects from partially full slabs.
+ */
+static void d_partial_shrink(struct kmem_cache *s, void **_unused, int 
__unused,
+int ___unused, void *private)
+{
+   struct list_head *dispose = private;
+
+   if (!private)   /* kmalloc error during isolate. */
+   return;
+
+   if (!list_empty(dispose))
+   shrink_dentry_list(dispose);
+
+   kfree(private);
+}
+
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
 {
@@ -3112,6 +3186,8 @@ static void __init dcache_init(void)
   sizeof_field(struct dentry, d_iname),
   dcache_ctor);
 
+   kmem_cache_setup_mobility(dentry_cache, d_isolate, d_partial_shrink);
+
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
return;
-- 
2.21.0



[RFC PATCH v4 10/15] tools/testing/slab: Add XArray movable objects tests

2019-04-29 Thread Tobin C. Harding
We just implemented movable objects for the XArray.  Let's test it
intree.

Add test module for the XArray's movable objects implementation.

Functionality of the XArray Slab Movable Object implementation can
usually be seen by simply by using `slabinfo` on a running machine since
the radix tree is typically in use on a running machine and will have
partial slabs.  For repeated testing we can use the test module to run
to simulate a workload on the XArray then use `slabinfo` to test object
migration is functioning.

If testing on freshly spun up VM (low radix tree workload) it may be
necessary to load/unload the module a number of times to create partial
slabs.

Example test session


Relevant /proc/slabinfo column headers:

  name   

Prior to testing slabinfo report for radix_tree_node:

  # slabinfo radix_tree_node --report

  Slabcache: radix_tree_node  Aliases:  0 Order :  2 Objects: 8352
  ** Reclaim accounting active
  ** Defragmentation at 30%

  Sizes (bytes) Slabs  DebugMemory
  
  Object : 576  Total  : 497   Sanity Checks : On   Total: 8142848
  SlabObj: 912  Full   : 473   Redzoning : On   Used : 4810752
  SlabSiz:   16384  Partial:  24   Poisoning : On   Loss : 3332096
  Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig: 2806272
  Align  :   8  Objects:  17   Tracing   : Off  Lpadd:  437360

Here you can see the kernel was built with Slab Movable Objects enabled
for the XArray (XArray uses the radix tree below the surface).

After inserting the test module (note we have triggered allocation of a
number of radix tree nodes increasing the object count but decreasing the
number of partial slabs):

  # slabinfo radix_tree_node --report

  Slabcache: radix_tree_node  Aliases:  0 Order :  2 Objects: 8442
  ** Reclaim accounting active
  ** Defragmentation at 30%

  Sizes (bytes) Slabs  DebugMemory
  
  Object : 576  Total  : 499   Sanity Checks : On   Total: 8175616
  SlabObj: 912  Full   : 484   Redzoning : On   Used : 4862592
  SlabSiz:   16384  Partial:  15   Poisoning : On   Loss : 3313024
  Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig: 2836512
  Align  :   8  Objects:  17   Tracing   : Off  Lpadd:  439120

Now we can shrink the radix_tree_node cache:

  # slabinfo radix_tree_node --shrink
  # slabinfo radix_tree_node --report

  Slabcache: radix_tree_node  Aliases:  0 Order :  2 Objects: 8515
  ** Reclaim accounting active
  ** Defragmentation at 30%

  Sizes (bytes) Slabs  DebugMemory
  
  Object : 576  Total  : 501   Sanity Checks : On   Total: 8208384
  SlabObj: 912  Full   : 500   Redzoning : On   Used : 4904640
  SlabSiz:   16384  Partial:   1   Poisoning : On   Loss : 3303744
  Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig: 2861040
  Align  :   8  Objects:  17   Tracing   : Off  Lpadd:  440880

Note the single remaining partial slab.

Signed-off-by: Tobin C. Harding 
---
 tools/testing/slab/Makefile |   2 +-
 tools/testing/slab/slub_defrag_xarray.c | 211 
 2 files changed, 212 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/slab/slub_defrag_xarray.c

diff --git a/tools/testing/slab/Makefile b/tools/testing/slab/Makefile
index 440c2e3e356f..44c18d9a4d52 100644
--- a/tools/testing/slab/Makefile
+++ b/tools/testing/slab/Makefile
@@ -1,4 +1,4 @@
-obj-m += slub_defrag.o
+obj-m += slub_defrag.o slub_defrag_xarray.o
 
 KTREE=../../..
 
diff --git a/tools/testing/slab/slub_defrag_xarray.c 
b/tools/testing/slab/slub_defrag_xarray.c
new file mode 100644
index ..41143f73256c
--- /dev/null
+++ b/tools/testing/slab/slub_defrag_xarray.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SMOX_CACHE_NAME "smox_test"
+static struct kmem_cache *cachep;
+
+/*
+ * Declare XArrays globally so we can clean them up on module unload.
+ */
+
+/* Used by test_smo_xarray()*/
+DEFINE_XARRAY(things);
+
+/* Thing to store pointers to in the XArray */
+struct smox_thing {
+   long id;
+};
+
+/* It's up to the caller to ensure id is unique */
+static struct smox_thing *alloc_thing(int id)
+{
+   struct smox_thing *thing;
+
+   thing = kmem_cache_alloc(cachep, GFP_KERNEL);
+   if (!thing)
+   return ERR_PTR(-ENOMEM);
+
+   thing->id = id;
+   return thing;
+}
+
+/**
+ * smox_object_ctor() - SMO object constructor function.
+ * @ptr: Pointer to memory where the object should be constructed.
+ */
+void 

[RFC PATCH v4 09/15] xarray: Implement migration function for objects

2019-04-29 Thread Tobin C. Harding
Implement functions to migrate objects. This is based on initial code by
Matthew Wilcox and was modified to work with slab object migration.

This patch can not be merged until all radix tree & IDR users are
converted to the XArray because xa_nodes and radix tree nodes share the
same slab cache (thanks Matthew).

Co-developed-by: Christoph Lameter 
Signed-off-by: Tobin C. Harding 
---
 lib/radix-tree.c | 13 +
 lib/xarray.c | 49 
 2 files changed, 62 insertions(+)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 14d51548bea6..9412c2853726 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1613,6 +1613,17 @@ static int radix_tree_cpu_dead(unsigned int cpu)
return 0;
 }
 
+extern void xa_object_migrate(void *tree_node, int numa_node);
+
+static void radix_tree_migrate(struct kmem_cache *s, void **objects, int nr,
+  int node, void *private)
+{
+   int i;
+
+   for (i = 0; i < nr; i++)
+   xa_object_migrate(objects[i], node);
+}
+
 void __init radix_tree_init(void)
 {
int ret;
@@ -1627,4 +1638,6 @@ void __init radix_tree_init(void)
ret = cpuhp_setup_state_nocalls(CPUHP_RADIX_DEAD, "lib/radix:dead",
NULL, radix_tree_cpu_dead);
WARN_ON(ret < 0);
+   kmem_cache_setup_mobility(radix_tree_node_cachep, NULL,
+ radix_tree_migrate);
 }
diff --git a/lib/xarray.c b/lib/xarray.c
index 6be3acbb861f..731dd3d8ddb8 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1971,6 +1971,55 @@ void xa_destroy(struct xarray *xa)
 }
 EXPORT_SYMBOL(xa_destroy);
 
+void xa_object_migrate(struct xa_node *node, int numa_node)
+{
+   struct xarray *xa = READ_ONCE(node->array);
+   void __rcu **slot;
+   struct xa_node *new_node;
+   int i;
+
+   /* Freed or not yet in tree then skip */
+   if (!xa || xa == XA_RCU_FREE)
+   return;
+
+   new_node = kmem_cache_alloc_node(radix_tree_node_cachep,
+GFP_KERNEL, numa_node);
+   if (!new_node)
+   return;
+
+   xa_lock_irq(xa);
+
+   /* Check again. */
+   if (xa != node->array) {
+   node = new_node;
+   goto unlock;
+   }
+
+   memcpy(new_node, node, sizeof(struct xa_node));
+
+   if (list_empty(>private_list))
+   INIT_LIST_HEAD(_node->private_list);
+   else
+   list_replace(>private_list, _node->private_list);
+
+   for (i = 0; i < XA_CHUNK_SIZE; i++) {
+   void *x = xa_entry_locked(xa, new_node, i);
+
+   if (xa_is_node(x))
+   rcu_assign_pointer(xa_to_node(x)->parent, new_node);
+   }
+   if (!new_node->parent)
+   slot = >xa_head;
+   else
+   slot = _parent_locked(xa, new_node)->slots[new_node->offset];
+   rcu_assign_pointer(*slot, xa_mk_node(new_node));
+
+unlock:
+   xa_unlock_irq(xa);
+   xa_node_free(node);
+   rcu_barrier();
+}
+
 #ifdef XA_DEBUG
 void xa_dump_node(const struct xa_node *node)
 {
-- 
2.21.0



[RFC PATCH v4 08/15] tools/testing/slab: Add object migration test suite

2019-04-29 Thread Tobin C. Harding
We just added a module that enables testing the SLUB allocators ability
to defrag/shrink caches via movable objects.  Tests are better when they
are automated.

Add automated testing via a python script for SLUB movable objects.

Example output:

  $ cd path/to/linux/tools/testing/slab
  $ /slub_defrag.py
  Please run script as root

  $ sudo ./slub_defrag.py
  

  $ sudo ./slub_defrag.py --debug
  Loading module ...
  Slab cache smo_test created
  Objects per slab: 20
  Running sanity checks ...

  Running module stress test (see dmesg for additional test output) ...
  Removing module slub_defrag ...
  Loading module ...
  Slab cache smo_test created

  Running test non-movable ...
  testing slab 'smo_test' prior to enabling movable objects ...
  verified non-movable slabs are NOT shrinkable

  Running test movable ...
  testing slab 'smo_test' after enabling movable objects ...
  verified movable slabs are shrinkable

  Removing module slub_defrag ...

Signed-off-by: Tobin C. Harding 
---
 tools/testing/slab/slub_defrag.c  |   1 +
 tools/testing/slab/slub_defrag.py | 451 ++
 2 files changed, 452 insertions(+)
 create mode 100755 tools/testing/slab/slub_defrag.py

diff --git a/tools/testing/slab/slub_defrag.c b/tools/testing/slab/slub_defrag.c
index 4a5c24394b96..8332e69ee868 100644
--- a/tools/testing/slab/slub_defrag.c
+++ b/tools/testing/slab/slub_defrag.c
@@ -337,6 +337,7 @@ static int smo_run_module_tests(int nr_objs, int keep)
 
 /*
  * struct functions() - Map command to a function pointer.
+ * If you update this please update the documentation in slub_defrag.py
  */
 struct functions {
char *fn_name;
diff --git a/tools/testing/slab/slub_defrag.py 
b/tools/testing/slab/slub_defrag.py
new file mode 100755
index ..41747c0db39b
--- /dev/null
+++ b/tools/testing/slab/slub_defrag.py
@@ -0,0 +1,451 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+import subprocess
+import sys
+from os import path
+
+# SLUB Movable Objects test suite.
+#
+# Requirements:
+#  - CONFIG_SLUB=y
+#  - CONFIG_SLUB_DEBUG=y
+#  - The slub_defrag module in this directory.
+
+# Test SMO using a kernel module that enables triggering arbitrary
+# kernel code from userspace via a debugfs file.
+#
+# Module code is in ./slub_defrag.c, basically the functionality is as
+# follows:
+#
+#  - Creates debugfs file /sys/kernel/debugfs/smo/callfn
+#  - Writes to 'callfn' are parsed as a command string and the function
+#associated with command is called.
+#  - Defines 4 commands (all commands operate on smo_test cache):
+# - 'test': Runs module stress tests.
+# - 'alloc N': Allocates N slub objects
+# - 'free N POS': Frees N objects starting at POS (see below)
+# - 'enable': Enables SLUB Movable Objects
+#
+# The module maintains a list of allocated objects.  Allocation adds
+# objects to the tail of the list.  Free'ing frees from the head of the
+# list.  This has the effect of creating free slots in the slab.  For
+# finer grained control over where in the cache slots are free'd POS
+# (position) argument may be used.
+
+# The main() function is reasonably readable; the test suite does the
+# following:
+#
+# 1. Runs the module stress tests.
+# 2. Tests the cache without movable objects enabled.
+#- Creates multiple partial slabs as explained above.
+#- Verifies that partial slabs are _not_ removed by shrink (see below).
+# 3. Tests the cache with movable objects enabled.
+#- Creates multiple partial slabs as explained above.
+#- Verifies that partial slabs _are_ removed by shrink (see below).
+
+# The sysfs file /sys/kernel/slab//shrink enables calling the
+# function kmem_cache_shrink() (see mm/slab_common.c and mm/slub.cc).
+# Shrinking a cache attempts to consolidate all partial slabs by moving
+# objects if object migration is enable for the cache, otherwise
+# shrinking a cache simply re-orders the partial list so as most densely
+# populated slab are at the head of the list.
+
+# Enable/disable debugging output (also enabled via -d | --debug).
+debug = False
+
+# Used in debug messages and when running `insmod`.
+MODULE_NAME = "slub_defrag"
+
+# Slab cache created by the test module.
+CACHE_NAME = "smo_test"
+
+# Set by get_slab_config()
+objects_per_slab = 0
+pages_per_slab = 0
+debugfs_mounted = False # Set to true if we mount debugfs.
+
+
+def eprint(*args, **kwargs):
+print(*args, file=sys.stderr, **kwargs)
+
+
+def dprint(*args, **kwargs):
+if debug:
+print(*args, file=sys.stderr, **kwargs)
+
+
+def run_shell(cmd):
+return subprocess.call([cmd], shell=True)
+
+
+def run_shell_get_stdout(cmd):
+return subprocess.check_output([cmd], shell=True)
+
+
+def assert_root():
+user = run_shell_get_stdout('whoami')
+if user != b'root\n':
+eprint("Please run script as root")
+sys.exit(1)
+
+
+def mount_debugfs():
+mounted = False
+
+# Check if debugfs is mounted at a known 

Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

2019-04-29 Thread Al Viro
On Tue, Apr 16, 2019 at 11:01:16AM -0700, Linus Torvalds wrote:
> On Tue, Apr 16, 2019 at 10:49 AM Al Viro  wrote:
> >
> >  83 files changed, 241 insertions(+), 516 deletions(-)
> 
> I think this single line is pretty convincing on its own. Ignoring
> docs and fs/inode.c, we have
> 
>  80 files changed, 190 insertions(+), 494 deletions(-)
> 
> IOW, just over 300 lines of boiler plate code removed.
> 
> The additions are
> 
>  - Ten more lines of actual code in fs/inode.c (and that's not
> actually added complexity, it looks simpler if anything - most of it
> is the new "i_callback()" helper function)
> 
>  - 19 lines of doc updates.
> 
> So it absolutely looks fine to me.
> 
> I only skimmed through the actual filesystem (and one networking)
> patches, but they looked like trivial conversions to a better
> interface.

... except that this callback can (and always could) get executed after
freeing struct super_block.  So we can't just dereference ->i_sb->s_op
and expect to survive; the table ->s_op pointed to will still be there,
but ->i_sb might very well have been freed, with all its contents overwritten.
We need to copy the callback into struct inode itself, unfortunately.
The following incremental fixes it; I'm going to fold it into the first
commit in there.

diff --git a/Documentation/filesystems/porting 
b/Documentation/filesystems/porting
index 9d80f9e0855e..b8d3ddd8b8db 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -655,3 +655,11 @@ in your dentry operations instead.
* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
* combination of NULL ->destroy_inode and NULL ->free_inode is
  treated as NULL/free_inode_nonrcu, to preserve the 
compatibility.
+
+   Note that the callback (be it via ->free_inode() or explicit call_rcu()
+   in ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
+   as the matter of fact, the superblock and all associated structures
+   might be already gone.  The filesystem driver is guaranteed to be still
+   there, but that's it.  Freeing memory in the callback is fine; doing
+   more than that is possible, but requires a lot of care and is best
+   avoided.
diff --git a/fs/inode.c b/fs/inode.c
index fb45590d284e..855dad43b11d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -164,6 +164,7 @@ int inode_init_always(struct super_block *sb, struct inode 
*inode)
inode->i_wb_frn_avg_time = 0;
inode->i_wb_frn_history = 0;
 #endif
+   inode->free_inode = sb->s_op->free_inode;
 
if (security_inode_alloc(inode))
goto out;
@@ -211,8 +212,8 @@ EXPORT_SYMBOL(free_inode_nonrcu);
 static void i_callback(struct rcu_head *head)
 {
struct inode *inode = container_of(head, struct inode, i_rcu);
-   if (inode->i_sb->s_op->free_inode)
-   inode->i_sb->s_op->free_inode(inode);
+   if (inode->free_inode)
+   inode->free_inode(inode);
else
free_inode_nonrcu(inode);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2e9b9f87caca..5ed6b39e588e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -718,6 +718,7 @@ struct inode {
 #endif
 
void*i_private; /* fs or device private pointer */
+   void (*free_inode)(struct inode *);
 } __randomize_layout;
 
 static inline unsigned int i_blocksize(const struct inode *node)


[RFC PATCH v4 07/15] tools/testing/slab: Add object migration test module

2019-04-29 Thread Tobin C. Harding
We just implemented slab movable objects for the SLUB allocator.  We
should test that code.  In order to do so we need to be able to do a
number of things

 - Create a cache
 - Enable Slab Movable Objects for the cache
 - Allocate objects to the cache
 - Free objects from within specific slabs of the cache

We can do all this via a loadable module.

Add a module that defines functions that can be triggered from userspace
via a debugfs entry. From the source:

  /*
   * SLUB defragmentation a.k.a. Slab Movable Objects (SMO).
   *
   * This module is used for testing the SLUB allocator.  Enables
   * userspace to run kernel functions via a debugfs file.
   *
   *   debugfs: /sys/kernel/debugfs/smo/callfn (write only)
   *
   * String written to `callfn` is parsed by the module and associated
   * function is called.  See fn_tab for mapping of strings to functions.
   */

References to allocated objects are kept by the module in a linked list
so that userspace can control which object to free.

We introduce the following four functions via the function table

  "enable": Enables object migration for the test cache.
  "alloc X": Allocates X objects
  "free X [Y]": Frees X objects starting at list position Y (default Y==0)
  "test": Runs [stress] tests from within the module (see below).

   {"enable", smo_enable_cache_mobility},
   {"alloc", smo_alloc_objects},
   {"free", smo_free_object},
   {"test", smo_run_module_tests},

Freeing from the start of the list creates a hole in the slab being
freed from (i.e. creates a partial slab).  The results of running these
commands can be see using `slabinfo` (available in tools/vm/):

make -o slabinfo tools/vm/slabinfo.c

Stress tests can be run from within the module.  These tests are
internal to the module because we verify that object references are
still good after object migration.  These are called 'stress' tests
because it is intended that they create/free a lot of objects.
Userspace can control the number of objects to create, default is 1000.

Example test session


Relevant /proc/slabinfo column headers:

  name   

  # mount -t debugfs none /sys/kernel/debug/
  $ cd path/to/linux/tools/testing/slab; make
  ...

  # insmod slub_defrag.ko
  # cat /proc/slabinfo | grep smo_test | sed 's/:.*//'
  smo_test   0  0392   202

>From this we can see that the module created cache 'smo_test' with 20
objects per slab and 2 pages per slab (and cache is currently empty).

We can play with the slab allocator manually:

  # insmod slub_defrag.ko
  # echo 'alloc 21' > callfn
  # cat /proc/slabinfo | grep smo_test | sed 's/:.*//'
  smo_test  21 40392   202

We see here that 21 active objects have been allocated creating 2
slabs (40 total objects).

  # slabinfo smo_test --report

  Slabcache: smo_test Aliases:  0 Order :  1 Objects: 21

  Sizes (bytes) Slabs  DebugMemory
  
  Object :  56  Total  :   2   Sanity Checks : On   Total:   16384
  SlabObj: 392  Full   :   1   Redzoning : On   Used :1176
  SlabSiz:8192  Partial:   1   Poisoning : On   Loss :   15208
  Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig:7056
  Align  :   8  Objects:  20   Tracing   : Off  Lpadd: 704

Now free an object from the first slot of the first slab

  # echo 'free 1' > callfn
  # cat /proc/slabinfo | grep smo_test | sed 's/:.*//'
  smo_test  20 40392   202

  # slabinfo smo_test --report

  Slabcache: smo_test Aliases:  0 Order :  1 Objects: 20

  Sizes (bytes) Slabs  DebugMemory
  
  Object :  56  Total  :   2   Sanity Checks : On   Total:   16384
  SlabObj: 392  Full   :   0   Redzoning : On   Used :1120
  SlabSiz:8192  Partial:   2   Poisoning : On   Loss :   15264
  Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig:6720
  Align  :   8  Objects:  20   Tracing   : Off  Lpadd: 704

Calling shrink now on the cache does nothing because object migration is
not enabled (output omitted).  If we enable object migration then shrink
the cache we expect the object from the second slab to me moved to the
first slot in the first slab and the second slab to be removed from the
partial list.

  # echo 'enable' > callfn
  # slabinfo smo_test --shrink
  # slabinfo smo_test --report

  Slabcache: smo_test Aliases:  0 Order :  1 Objects: 20
  ** Defragmentation at 30%

  Sizes (bytes) Slabs  DebugMemory
  
  Object :  56  Total  :   1   Sanity Checks : On   Total:8192
  SlabObj: 392  Full   :   1   

[RFC PATCH v4 06/15] tools/vm/slabinfo: Add defrag_used_ratio output

2019-04-29 Thread Tobin C. Harding
Add output for the newly added defrag_used_ratio sysfs knob.

Signed-off-by: Tobin C. Harding 
---
 tools/vm/slabinfo.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index d2c22f9ee2d8..ef4ff93df4cc 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -34,6 +34,7 @@ struct slabinfo {
unsigned int sanity_checks, slab_size, store_user, trace;
int order, poison, reclaim_account, red_zone;
int movable, ctor;
+   int defrag_used_ratio;
int remote_node_defrag_ratio;
unsigned long partial, objects, slabs, objects_partial, objects_total;
unsigned long alloc_fastpath, alloc_slowpath;
@@ -549,6 +550,8 @@ static void report(struct slabinfo *s)
printf("** Slabs are destroyed via RCU\n");
if (s->reclaim_account)
printf("** Reclaim accounting active\n");
+   if (s->movable)
+   printf("** Defragmentation at %d%%\n", s->defrag_used_ratio);
 
printf("\nSizes (bytes) Slabs  Debug
Memory\n");

printf("\n");
@@ -1279,6 +1282,7 @@ static void read_slab_dir(void)
slab->deactivate_bypass = get_obj("deactivate_bypass");
slab->remote_node_defrag_ratio =
get_obj("remote_node_defrag_ratio");
+   slab->defrag_used_ratio = get_obj("defrag_used_ratio");
chdir("..");
if (read_slab_obj(slab, "ops")) {
if (strstr(buffer, "ctor :"))
-- 
2.21.0



[RFC PATCH v4 05/15] tools/vm/slabinfo: Add remote node defrag ratio output

2019-04-29 Thread Tobin C. Harding
Add output line for NUMA remote node defrag ratio.

Signed-off-by: Tobin C. Harding 
---
 tools/vm/slabinfo.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index cbfc56c44c2f..d2c22f9ee2d8 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -34,6 +34,7 @@ struct slabinfo {
unsigned int sanity_checks, slab_size, store_user, trace;
int order, poison, reclaim_account, red_zone;
int movable, ctor;
+   int remote_node_defrag_ratio;
unsigned long partial, objects, slabs, objects_partial, objects_total;
unsigned long alloc_fastpath, alloc_slowpath;
unsigned long free_fastpath, free_slowpath;
@@ -377,6 +378,10 @@ static void slab_numa(struct slabinfo *s, int mode)
if (skip_zero && !s->slabs)
return;
 
+   if (mode) {
+   printf("\nNUMA remote node defrag ratio: %3d\n",
+  s->remote_node_defrag_ratio);
+   }
if (!line) {
printf("\n%-21s:", mode ? "NUMA nodes" : "Slab");
for(node = 0; node <= highest_node; node++)
@@ -1272,6 +1277,8 @@ static void read_slab_dir(void)
slab->cpu_partial_free = get_obj("cpu_partial_free");
slab->alloc_node_mismatch = 
get_obj("alloc_node_mismatch");
slab->deactivate_bypass = get_obj("deactivate_bypass");
+   slab->remote_node_defrag_ratio =
+   get_obj("remote_node_defrag_ratio");
chdir("..");
if (read_slab_obj(slab, "ops")) {
if (strstr(buffer, "ctor :"))
-- 
2.21.0



[RFC PATCH v4 04/15] slub: Slab defrag core

2019-04-29 Thread Tobin C. Harding
Internal fragmentation can occur within pages used by the slub
allocator.  Under some workloads large numbers of pages can be used by
partial slab pages.  This under-utilisation is bad simply because it
wastes memory but also because if the system is under memory pressure
higher order allocations may become difficult to satisfy.  If we can
defrag slab caches we can alleviate these problems.

Implement Slab Movable Objects in order to defragment slab caches.

Slab defragmentation may occur:

1. Unconditionally when __kmem_cache_shrink() is called on a slab cache
   by the kernel calling kmem_cache_shrink().

2. Unconditionally through the use of the slabinfo command.

slabinfo  -s

3. Conditionally via the use of kmem_cache_defrag()

- Use Slab Movable Objects when shrinking cache.

Currently when the kernel calls kmem_cache_shrink() we curate the
partial slabs list.  If object migration is not enabled for the cache we
still do this, if however, SMO is enabled we attempt to move objects in
partially full slabs in order to defragment the cache.  Shrink attempts
to move all objects in order to reduce the cache to a single partial
slab for each node.

- Add conditional per node defrag via new function:

kmem_defrag_slabs(int node).

kmem_defrag_slabs() attempts to defragment all slab caches for node.
 Defragmentation is done conditionally dependent on MAX_PARTIAL _AND_
 defrag_used_ratio.

   Caches are only considered for defragmentation if the number of
   partial slabs exceeds MAX_PARTIAL (per node).

   Also, defragmentation only occurs if the usage ratio of the slab is
   lower than the configured percentage (sysfs field added in this
   patch).  Fragmentation ratios are measured by calculating the
   percentage of objects in use compared to the total number of objects
   that the slab page can accommodate.

   The scanning of slab caches is optimized because the defragmentable
   slabs come first on the list. Thus we can terminate scans on the
   first slab encountered that does not support defragmentation.

   kmem_defrag_slabs() takes a node parameter. This can either be -1 if
   defragmentation should be performed on all nodes, or a node number.

   Defragmentation may be disabled by setting defrag ratio to 0

echo 0 > /sys/kernel/slab//defrag_used_ratio

- Add a defrag ratio sysfs field and set it to 30% by default. A limit
of 30% specifies that more than 3 out of 10 available slots for objects
need to be in use otherwise slab defragmentation will be attempted on
the remaining objects.

In order for a cache to be defragmentable the cache must support object
migration (SMO).  Enabling SMO for a cache is done via a call to the
recently added function:

void kmem_cache_setup_mobility(struct kmem_cache *,
   kmem_cache_isolate_func,
   kmem_cache_migrate_func);

Co-developed-by: Christoph Lameter 
Signed-off-by: Tobin C. Harding 
---
 Documentation/ABI/testing/sysfs-kernel-slab |  14 +
 include/linux/slab.h|   1 +
 include/linux/slub_def.h|   7 +
 mm/slub.c   | 385 
 4 files changed, 334 insertions(+), 73 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
b/Documentation/ABI/testing/sysfs-kernel-slab
index 29601d93a1c2..7770c03be6b4 100644
--- a/Documentation/ABI/testing/sysfs-kernel-slab
+++ b/Documentation/ABI/testing/sysfs-kernel-slab
@@ -180,6 +180,20 @@ Description:
list.  It can be written to clear the current count.
Available when CONFIG_SLUB_STATS is enabled.
 
+What:  /sys/kernel/slab/cache/defrag_used_ratio
+Date:  February 2019
+KernelVersion: 5.0
+Contact:   Christoph Lameter 
+   Pekka Enberg ,
+Description:
+   The defrag_used_ratio file allows the control of how aggressive
+   slab fragmentation reduction works at reclaiming objects from
+   sparsely populated slabs. This is a percentage. If a slab has
+   less than this percentage of objects allocated then reclaim will
+   attempt to reclaim objects so that the whole slab page can be
+   freed. 0% specifies no reclaim attempt (defrag disabled), 100%
+   specifies attempt to reclaim all pages.  The default is 30%.
+
 What:  /sys/kernel/slab/cache/deactivate_to_tail
 Date:  February 2008
 KernelVersion: 2.6.25
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 886fc130334d..4bf381b34829 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -149,6 +149,7 @@ struct kmem_cache *kmem_cache_create_usercopy(const char 
*name,
void (*ctor)(void *));
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
+unsigned long kmem_defrag_slabs(int node);
 
 void memcg_create_kmem_cache(struct 

[RFC PATCH v4 02/15] tools/vm/slabinfo: Add support for -C and -M options

2019-04-29 Thread Tobin C. Harding
-C lists caches that use a ctor.

-M lists caches that support object migration.

Add command line options to show caches with a constructor and caches
that are movable (i.e. have migrate function).

Co-developed-by: Christoph Lameter 
Signed-off-by: Tobin C. Harding 
---
 tools/vm/slabinfo.c | 40 
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index 73818f1b2ef8..cbfc56c44c2f 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -33,6 +33,7 @@ struct slabinfo {
unsigned int hwcache_align, object_size, objs_per_slab;
unsigned int sanity_checks, slab_size, store_user, trace;
int order, poison, reclaim_account, red_zone;
+   int movable, ctor;
unsigned long partial, objects, slabs, objects_partial, objects_total;
unsigned long alloc_fastpath, alloc_slowpath;
unsigned long free_fastpath, free_slowpath;
@@ -67,6 +68,8 @@ int show_report;
 int show_alias;
 int show_slab;
 int skip_zero = 1;
+int show_movable;
+int show_ctor;
 int show_numa;
 int show_track;
 int show_first_alias;
@@ -109,11 +112,13 @@ static void fatal(const char *x, ...)
 
 static void usage(void)
 {
-   printf("slabinfo 4/15/2011. (c) 2007 sgi/(c) 2011 Linux Foundation.\n\n"
-   "slabinfo [-aADefhilnosrStTvz1LXBU] [N=K] [-dafzput] 
[slab-regexp]\n"
+   printf("slabinfo 4/15/2017. (c) 2007 sgi/(c) 2011 Linux Foundation/(c) 
2017 Jump Trading LLC.\n\n"
+  "slabinfo [-aACDefhilMnosrStTvz1LXBU] [N=K] [-dafzput] 
[slab-regexp]\n"
+
"-a|--aliases   Show aliases\n"
"-A|--activity  Most active slabs first\n"
"-B|--Bytes Show size in bytes\n"
+   "-C|--ctor  Show slabs with ctors\n"
"-D|--display-activeSwitch line format to activity\n"
"-e|--empty Show empty slabs\n"
"-f|--first-alias   Show first alias\n"
@@ -121,6 +126,7 @@ static void usage(void)
"-i|--inverted  Inverted list\n"
"-l|--slabs Show slabs\n"
"-L|--Loss  Sort by loss\n"
+   "-M|--movable   Show caches that support movable 
objects\n"
"-n|--numa  Show NUMA information\n"
"-N|--lines=K   Show the first K slabs\n"
"-o|--ops   Show kmem_cache_ops\n"
@@ -588,6 +594,12 @@ static void slabcache(struct slabinfo *s)
if (show_empty && s->slabs)
return;
 
+   if (show_ctor && !s->ctor)
+   return;
+
+   if (show_movable && !s->movable)
+   return;
+
if (sort_loss == 0)
store_size(size_str, slab_size(s));
else
@@ -602,6 +614,10 @@ static void slabcache(struct slabinfo *s)
*p++ = '*';
if (s->cache_dma)
*p++ = 'd';
+   if (s->ctor)
+   *p++ = 'C';
+   if (s->movable)
+   *p++ = 'M';
if (s->hwcache_align)
*p++ = 'A';
if (s->poison)
@@ -636,7 +652,8 @@ static void slabcache(struct slabinfo *s)
printf("%-21s %8ld %7d %15s %14s %4d %1d %3ld %3ld %s\n",
s->name, s->objects, s->object_size, size_str, dist_str,
s->objs_per_slab, s->order,
-   s->slabs ? (s->partial * 100) / s->slabs : 100,
+   s->slabs ? (s->partial * 100) /
+   (s->slabs * s->objs_per_slab) : 100,
s->slabs ? (s->objects * s->object_size * 100) /
(s->slabs * (page_size << s->order)) : 100,
flags);
@@ -1256,6 +1273,13 @@ static void read_slab_dir(void)
slab->alloc_node_mismatch = 
get_obj("alloc_node_mismatch");
slab->deactivate_bypass = get_obj("deactivate_bypass");
chdir("..");
+   if (read_slab_obj(slab, "ops")) {
+   if (strstr(buffer, "ctor :"))
+   slab->ctor = 1;
+   if (strstr(buffer, "migrate :"))
+   slab->movable = 1;
+   }
+
if (slab->name[0] == ':')
alias_targets++;
slab++;
@@ -1332,6 +1356,8 @@ static void xtotals(void)
 }
 
 struct option opts[] = {
+   { "ctor", no_argument, NULL, 'C' },
+   { "movable", no_argument, NULL, 'M' },
{ "aliases", no_argument, NULL, 'a' },
{ "activity", no_argument, NULL, 'A' },
{ "debug", optional_argument, NULL, 'd' },
@@ -1367,7 +1393,7 @@ int main(int argc, char *argv[])
 
page_size = getpagesize();
 
- 

[RFC PATCH v4 03/15] slub: Sort slab cache list

2019-04-29 Thread Tobin C. Harding
It is advantageous to have all defragmentable slabs together at the
beginning of the list of slabs so that there is no need to scan the
complete list. Put defragmentable caches first when adding a slab cache
and others last.

Co-developed-by: Christoph Lameter 
Signed-off-by: Tobin C. Harding 
---
 mm/slab_common.c | 2 +-
 mm/slub.c| 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 58251ba63e4a..db5e9a0b1535 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -393,7 +393,7 @@ static struct kmem_cache *create_cache(const char *name,
goto out_free_cache;
 
s->refcount = 1;
-   list_add(>list, _caches);
+   list_add_tail(>list, _caches);
memcg_link_cache(s);
 out:
if (err)
diff --git a/mm/slub.c b/mm/slub.c
index ae44d640b8c1..f6b0e4a395ef 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4342,6 +4342,8 @@ void kmem_cache_setup_mobility(struct kmem_cache *s,
return;
}
 
+   mutex_lock(_mutex);
+
s->isolate = isolate;
s->migrate = migrate;
 
@@ -4350,6 +4352,10 @@ void kmem_cache_setup_mobility(struct kmem_cache *s,
 * to disable fast cmpxchg based processing.
 */
s->flags &= ~__CMPXCHG_DOUBLE;
+
+   list_move(>list, _caches);  /* Move to top */
+
+   mutex_unlock(_mutex);
 }
 EXPORT_SYMBOL(kmem_cache_setup_mobility);
 
-- 
2.21.0



[RFC PATCH v4 01/15] slub: Add isolate() and migrate() methods

2019-04-29 Thread Tobin C. Harding
Add the two methods needed for moving objects and enable the display of
the callbacks via the /sys/kernel/slab interface.

Add documentation explaining the use of these methods and the prototypes
for slab.h. Add functions to setup the callbacks method for a slab
cache.

Add empty functions for SLAB/SLOB. The API is generic so it could be
theoretically implemented for these allocators as well.

Change sysfs 'ctor' field to be 'ops' to contain all the callback
operations defined for a slab cache.  Display the existing 'ctor'
callback in the ops fields contents along with 'isolate' and 'migrate'
callbacks.

Co-developed-by: Christoph Lameter 
Signed-off-by: Tobin C. Harding 
---
 include/linux/slab.h | 70 
 include/linux/slub_def.h |  3 ++
 mm/slub.c| 59 +
 3 files changed, 126 insertions(+), 6 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 9449b19c5f10..886fc130334d 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -154,6 +154,76 @@ void memcg_create_kmem_cache(struct mem_cgroup *, struct 
kmem_cache *);
 void memcg_deactivate_kmem_caches(struct mem_cgroup *);
 void memcg_destroy_kmem_caches(struct mem_cgroup *);
 
+/*
+ * Function prototypes passed to kmem_cache_setup_mobility() to enable
+ * mobile objects and targeted reclaim in slab caches.
+ */
+
+/**
+ * typedef kmem_cache_isolate_func - Object migration callback function.
+ * @s: The cache we are working on.
+ * @ptr: Pointer to an array of pointers to the objects to isolate.
+ * @nr: Number of objects in @ptr array.
+ *
+ * The purpose of kmem_cache_isolate_func() is to pin each object so that
+ * they cannot be freed until kmem_cache_migrate_func() has processed
+ * them. This may be accomplished by increasing the refcount or setting
+ * a flag.
+ *
+ * The object pointer array passed is also passed to
+ * kmem_cache_migrate_func().  The function may remove objects from the
+ * array by setting pointers to %NULL. This is useful if we can
+ * determine that an object is being freed because
+ * kmem_cache_isolate_func() was called when the subsystem was calling
+ * kmem_cache_free().  In that case it is not necessary to increase the
+ * refcount or specially mark the object because the release of the slab
+ * lock will lead to the immediate freeing of the object.
+ *
+ * Context: Called with locks held so that the slab objects cannot be
+ *  freed.  We are in an atomic context and no slab operations
+ *  may be performed.
+ * Return: A pointer that is passed to the migrate function. If any
+ * objects cannot be touched at this point then the pointer may
+ * indicate a failure and then the migration function can simply
+ * remove the references that were already obtained. The private
+ * data could be used to track the objects that were already pinned.
+ */
+typedef void *kmem_cache_isolate_func(struct kmem_cache *s, void **ptr, int 
nr);
+
+/**
+ * typedef kmem_cache_migrate_func - Object migration callback function.
+ * @s: The cache we are working on.
+ * @ptr: Pointer to an array of pointers to the objects to migrate.
+ * @nr: Number of objects in @ptr array.
+ * @node: The NUMA node where the object should be allocated.
+ * @private: The pointer returned by kmem_cache_isolate_func().
+ *
+ * This function is responsible for migrating objects.  Typically, for
+ * each object in the input array you will want to allocate an new
+ * object, copy the original object, update any pointers, and free the
+ * old object.
+ *
+ * After this function returns all pointers to the old object should now
+ * point to the new object.
+ *
+ * Context: Called with no locks held and interrupts enabled.  Sleeping
+ *  is possible.  Any operation may be performed.
+ */
+typedef void kmem_cache_migrate_func(struct kmem_cache *s, void **ptr,
+int nr, int node, void *private);
+
+/*
+ * kmem_cache_setup_mobility() is used to setup callbacks for a slab cache.
+ */
+#ifdef CONFIG_SLUB
+void kmem_cache_setup_mobility(struct kmem_cache *, kmem_cache_isolate_func,
+  kmem_cache_migrate_func);
+#else
+static inline void
+kmem_cache_setup_mobility(struct kmem_cache *s, kmem_cache_isolate_func 
isolate,
+ kmem_cache_migrate_func migrate) {}
+#endif
+
 /*
  * Please use this macro to create slab caches. Simply specify the
  * name of the structure and maybe some flags that are listed above.
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index d2153789bd9f..2879a2f5f8eb 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -99,6 +99,9 @@ struct kmem_cache {
gfp_t allocflags;   /* gfp flags to use on each alloc */
int refcount;   /* Refcount for slab cache destroy */
void (*ctor)(void *);
+   kmem_cache_isolate_func *isolate;
+   

[RFC PATCH v4 00/15] Slab Movable Objects (SMO)

2019-04-29 Thread Tobin C. Harding
Hi,

Another iteration of the SMO patch set, updates to this version are
restricted to the dcache patch #14.

Applies on top of Linus' tree (tag: v5.1-rc6).

This is a patch set implementing movable objects within the SLUB
allocator.  This is work based on Christopher Lameter's patch set:

 https://lore.kernel.org/patchwork/project/lkml/list/?series=377335

The original code logic is from that set and implemented by Christopher.
Clean up, refactoring, documentation, and additional features by myself.
Responsibility for any bugs remaining falls solely with myself.

Changes to this version:

Re-write the dcache Slab Movable Objects isolate/migrate functions.
Based on review/suggestions by Alexander on the last version.

In this version the isolate function loops over the object vector and
builds a shrink list for all objects that have refcount==0 AND are NOT
on anyone else's shrink list.  A pointer to this list is returned from
the isolate function and passed to the migrate function (by the SMO
infrastructure).  The dentry migration function d_partial_shrink()
simply calls shrink_dentry_list() on the received shrink list pointer
and frees the memory associated with the list_head.

Hopefully if this is all ok I can move on to violating the inode
slab cache :)

FWIW testing on a VM in Qemu brings this mild benefit to the dentry slab
cache with no _apparent_ negatives.

CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y
CONFIG_SMO_NODE=y
CONFIG_DCACHE_SMO=y

[root@vm ~]# slabinfo  dentry -r | head -n 13

Slabcache: dentry   Aliases:  0 Order :  1 Objects: 38585
** Reclaim accounting active
** Defragmentation at 30%

Sizes (bytes) Slabs  DebugMemory

Object : 192  Total  :2582   Sanity Checks : On   Total: 21151744
SlabObj: 528  Full   :2547   Redzoning : On   Used : 7408320
SlabSiz:8192  Partial:  35   Poisoning : On   Loss : 13743424
Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig: 12964560
Align  :   8  Objects:  15   Tracing   : Off  Lpadd:  702304

[root@vm ~]# slabinfo  dentry --shrink
[root@vm ~]# slabinfo  dentry -r | head -n 13

Slabcache: dentry   Aliases:  0 Order :  1 Objects: 38426
** Reclaim accounting active
** Defragmentation at 30%

Sizes (bytes) Slabs  DebugMemory

Object : 192  Total  :2578   Sanity Checks : On   Total: 21118976
SlabObj: 528  Full   :2547   Redzoning : On   Used : 7377792
SlabSiz:8192  Partial:  31   Poisoning : On   Loss : 13741184
Loss   : 336  CpuSlab:   0   Tracking  : On   Lalig: 12911136
Align  :   8  Objects:  15   Tracing   : Off  Lpadd:  701216


Please note, this dentry shrink implementation is 'best effort', results
vary.  This is as is expected.  We are trying to unobtrusively shrink
the dentry cache.

thanks,
Tobin.


Tobin C. Harding (15):
  slub: Add isolate() and migrate() methods
  tools/vm/slabinfo: Add support for -C and -M options
  slub: Sort slab cache list
  slub: Slab defrag core
  tools/vm/slabinfo: Add remote node defrag ratio output
  tools/vm/slabinfo: Add defrag_used_ratio output
  tools/testing/slab: Add object migration test module
  tools/testing/slab: Add object migration test suite
  xarray: Implement migration function for objects
  tools/testing/slab: Add XArray movable objects tests
  slub: Enable moving objects to/from specific nodes
  slub: Enable balancing slabs across nodes
  dcache: Provide a dentry constructor
  dcache: Implement partial shrink via Slab Movable Objects
  dcache: Add CONFIG_DCACHE_SMO

 Documentation/ABI/testing/sysfs-kernel-slab |  14 +
 fs/dcache.c | 110 ++-
 include/linux/slab.h|  71 ++
 include/linux/slub_def.h|  10 +
 lib/radix-tree.c|  13 +
 lib/xarray.c|  49 ++
 mm/Kconfig  |  14 +
 mm/slab_common.c|   2 +-
 mm/slub.c   | 819 ++--
 tools/testing/slab/Makefile |  10 +
 tools/testing/slab/slub_defrag.c| 567 ++
 tools/testing/slab/slub_defrag.py   | 451 +++
 tools/testing/slab/slub_defrag_xarray.c | 211 +
 tools/vm/slabinfo.c |  51 +-
 14 files changed, 2299 insertions(+), 93 deletions(-)
 create mode 100644 tools/testing/slab/Makefile
 create mode 100644 tools/testing/slab/slub_defrag.c
 create mode 100755 tools/testing/slab/slub_defrag.py
 create mode 100644 tools/testing/slab/slub_defrag_xarray.c

-- 
2.21.0



Re: [PATCH -next] ASoC: sprd: Fix to use list_for_each_entry_safe() when delete items

2019-04-29 Thread Baolin Wang
Hi,

On Mon, 29 Apr 2019 at 20:27, Wei Yongjun  wrote:
>
> Since we will remove items off the list using list_del() we need
> to use a safe version of the list_for_each_entry() macro aptly named
> list_for_each_entry_safe().
>
> Fixes: d7bff893e04f ("ASoC: sprd: Add Spreadtrum multi-channel data transfer 
> support")
> Signed-off-by: Wei Yongjun 

Yes, thanks for your fixes.
Reviewed-by: Baolin Wang 

> ---
>  sound/soc/sprd/sprd-mcdt.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/sound/soc/sprd/sprd-mcdt.c b/sound/soc/sprd/sprd-mcdt.c
> index 28f5e649733d..df250f7f2b6f 100644
> --- a/sound/soc/sprd/sprd-mcdt.c
> +++ b/sound/soc/sprd/sprd-mcdt.c
> @@ -978,12 +978,12 @@ static int sprd_mcdt_probe(struct platform_device *pdev)
>
>  static int sprd_mcdt_remove(struct platform_device *pdev)
>  {
> -   struct sprd_mcdt_chan *temp;
> +   struct sprd_mcdt_chan *chan, *temp;
>
> mutex_lock(_mcdt_list_mutex);
>
> -   list_for_each_entry(temp, _mcdt_chan_list, list)
> -   list_del(>list);
> +   list_for_each_entry_safe(chan, temp, _mcdt_chan_list, list)
> +   list_del(>list);
>
> mutex_unlock(_mcdt_list_mutex);
>
>
>


-- 
Baolin Wang
Best Regards


Re: [PATCH V2] staging: fieldbus: anybus-s: force endiannes annotation

2019-04-29 Thread Al Viro
On Tue, Apr 30, 2019 at 04:22:38AM +0200, Nicholas Mc Guire wrote:
> On Mon, Apr 29, 2019 at 10:03:36AM -0400, Sven Van Asbroeck wrote:
> > On Mon, Apr 29, 2019 at 2:11 AM Nicholas Mc Guire  wrote:
> > >
> > > V2: As requested by Sven Van Asbroeck  make the
> > > impact of the patch clear in the commit message.
> > 
> > Thank you, but did you miss my comment about creating a local variable
> > instead? See:
> > https://lkml.org/lkml/2019/4/28/97
> 
> Did not miss it - I just don't think that makes it any more
> understandable - the __force __be16 makes it clear I believe
> that this is correct, sparse does not like this though - so tell
> sparse.

... to STFU, 'cause you know better.  The trouble is, how do we
(or yourself a year or two later) know *why* it is correct?
Worse, how do we (or yourself, etc.) know if a change about to be
done to the code won't invalidate the proof of yours?

> The local variable would need to be explained as it is
> functionally not necessary - therefor I find it more confusing
> that using  __force here.

What's confusing is mixing host- and fixed-endian values in the
same variable at different times.  Treat those as unrelated
types that happen to have the same sizeof.

Quite a few of __force instances in the tree should be taken out
and shot.  Don't add to their number.


RE: [PATCH] clk: imx: pllv3: Fix fall through build warning

2019-04-29 Thread Aisheng Dong
> From: Anson Huang
> Sent: Tuesday, April 30, 2019 9:55 AM
> Subject: [PATCH] clk: imx: pllv3: Fix fall through build warning
> 
> Fix below fall through build warning:
> 
> drivers/clk/imx/clk-pllv3.c:453:21: warning:
> this statement may fall through [-Wimplicit-fallthrough=]
> 
>pll->denom_offset = PLL_IMX7_DENOM_OFFSET;
>  ^
> drivers/clk/imx/clk-pllv3.c:454:2: note: here
>   case IMX_PLLV3_AV:
>   ^~~~
> 
> Signed-off-by: Anson Huang 

Reviewed-by: Dong Aisheng 

Regards
Dong Aisheng


Re: [PATCH -next] ASoC: sprd: Fix return value check in sprd_mcdt_probe()

2019-04-29 Thread Baolin Wang
On Mon, 29 Apr 2019 at 20:15, Wei Yongjun  wrote:
>
> In case of error, the function devm_ioremap_resource() returns ERR_PTR()
> and never returns NULL. The NULL test in the return value check should
> be replaced with IS_ERR().
>
> Fixes: d7bff893e04f ("ASoC: sprd: Add Spreadtrum multi-channel data transfer 
> support")
> Signed-off-by: Wei Yongjun 

Thanks for fixing my mistake.
Reviewed-by: Baolin Wang 

> ---
>  sound/soc/sprd/sprd-mcdt.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/sound/soc/sprd/sprd-mcdt.c b/sound/soc/sprd/sprd-mcdt.c
> index 28f5e649733d..e9318d7a4810 100644
> --- a/sound/soc/sprd/sprd-mcdt.c
> +++ b/sound/soc/sprd/sprd-mcdt.c
> @@ -951,8 +951,8 @@ static int sprd_mcdt_probe(struct platform_device *pdev)
>
> res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> mcdt->base = devm_ioremap_resource(>dev, res);
> -   if (!mcdt->base)
> -   return -ENOMEM;
> +   if (IS_ERR(mcdt->base))
> +   return PTR_ERR(mcdt->base);
>
> mcdt->dev = >dev;
> spin_lock_init(>lock);
>
>
>


-- 
Baolin Wang
Best Regards


Re: INFO: task hung in __get_super

2019-04-29 Thread Jan Kara
On Sun 28-04-19 19:51:09, Al Viro wrote:
> On Sun, Apr 28, 2019 at 11:14:06AM -0700, syzbot wrote:
> >  down_read+0x49/0x90 kernel/locking/rwsem.c:26
> >  __get_super.part.0+0x203/0x2e0 fs/super.c:788
> >  __get_super include/linux/spinlock.h:329 [inline]
> >  get_super+0x2e/0x50 fs/super.c:817
> >  fsync_bdev+0x19/0xd0 fs/block_dev.c:525
> >  invalidate_partition+0x36/0x60 block/genhd.c:1581
> >  drop_partitions block/partition-generic.c:443 [inline]
> >  rescan_partitions+0xef/0xa20 block/partition-generic.c:516
> >  __blkdev_reread_part+0x1a2/0x230 block/ioctl.c:173
> >  blkdev_reread_part+0x27/0x40 block/ioctl.c:193
> >  loop_reread_partitions+0x1c/0x40 drivers/block/loop.c:633
> >  loop_set_status+0xe57/0x1380 drivers/block/loop.c:1296
> >  loop_set_status64+0xc2/0x120 drivers/block/loop.c:1416
> >  lo_ioctl+0x8fc/0x2150 drivers/block/loop.c:1559
> >  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
> >  blkdev_ioctl+0x6f2/0x1d10 block/ioctl.c:605
> >  block_ioctl+0xee/0x130 fs/block_dev.c:1933
> >  vfs_ioctl fs/ioctl.c:46 [inline]
> >  file_ioctl fs/ioctl.c:509 [inline]
> >  do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696
> >  ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
> >  __do_sys_ioctl fs/ioctl.c:720 [inline]
> >  __se_sys_ioctl fs/ioctl.c:718 [inline]
> >  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
> >  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> ioctl(..., BLKRRPART) blocked on ->s_umount in __get_super().
> The trouble is, the only things holding ->s_umount appears to be
> these:
> 
> > 2 locks held by syz-executor274/11716:
> >  #0: a19e2025 (>s_umount_key#38/1){+.+.}, at:
> > alloc_super+0x158/0x890 fs/super.c:228
> >  #1: bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> > drivers/block/loop.c:1514 [inline]
> >  #1: bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> > drivers/block/loop.c:1572
> 
> > 2 locks held by syz-executor274/11717:
> >  #0: e185c083 (>s_umount_key#38/1){+.+.}, at:
> > alloc_super+0x158/0x890 fs/super.c:228
> >  #1: bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> > drivers/block/loop.c:1514 [inline]
> >  #1: bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> > drivers/block/loop.c:1572
> 
> ... and that's bollocks.  ->s_umount held there is that on freshly allocated
> superblock.  It *MUST* be in mount(2); no other syscall should be able to
> call alloc_super() in the first place.  So what the hell is that doing
> trying to call lo_ioctl() inside mount(2)?  Something like isofs attempting
> cdrom ioctls on the underlying device?

Actually UDF also calls CDROMMULTISESSION ioctl during mount. So I could
see how we get to lo_simple_ioctl() and indeed that would acquire
loop_ctl_mutex under s_umount which is the other way around than in
BLKRRPART ioctl. 

> Why do we have loop_func_table->ioctl(), BTW?  All in-tree instances are
> either NULL or return -EINVAL unconditionally.  Considering that the
> caller is
> err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
> we could bloody well just get rid of cryptoloop_ioctl() (the only
> non-NULL instance) and get rid of calling lo_simple_ioctl() in
> lo_ioctl() switch's default.

Yeah, you're right. And if we push the patch a bit further to not take
loop_ctl_mutex for invalid ioctl number, that would fix the problem. I
can send a fix.

Honza

> 
> Something like this:
> 
> diff --git a/drivers/block/cryptoloop.c b/drivers/block/cryptoloop.c
> index 254ee7d54e91..f16468a562f5 100644
> --- a/drivers/block/cryptoloop.c
> +++ b/drivers/block/cryptoloop.c
> @@ -167,12 +167,6 @@ cryptoloop_transfer(struct loop_device *lo, int cmd,
>  }
>  
>  static int
> -cryptoloop_ioctl(struct loop_device *lo, int cmd, unsigned long arg)
> -{
> - return -EINVAL;
> -}
> -
> -static int
>  cryptoloop_release(struct loop_device *lo)
>  {
>   struct crypto_sync_skcipher *tfm = lo->key_data;
> @@ -188,7 +182,6 @@ cryptoloop_release(struct loop_device *lo)
>  static struct loop_func_table cryptoloop_funcs = {
>   .number = LO_CRYPT_CRYPTOAPI,
>   .init = cryptoloop_init,
> - .ioctl = cryptoloop_ioctl,
>   .transfer = cryptoloop_transfer,
>   .release = cryptoloop_release,
>   .owner = THIS_MODULE
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index bf1c61cab8eb..2ec162b80562 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -955,7 +955,6 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
> mode,
>   lo->lo_flags = lo_flags;
>   lo->lo_backing_file = file;
>   lo->transfer = NULL;
> - lo->ioctl = NULL;
>   lo->lo_sizelimit = 0;
>   lo->old_gfp_mask = mapping_gfp_mask(mapping);
>   mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
> @@ -1064,7 +1063,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
> release)
>  
>   

Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation

2019-04-29 Thread Linus Torvalds
On Mon, Apr 29, 2019 at 5:45 PM Sean Christopherson
 wrote:
>
> On Mon, Apr 29, 2019 at 05:08:46PM -0700, Sean Christopherson wrote:
> >
> > It's 486 based, but either way I suspect the answer is "yes".  IIRC,
> > Knights Corner, a.k.a. Larrabee, also had funkiness around SMM and that
> > was based on P54C, though I'm struggling to recall exactly what the
> > Larrabee weirdness was.
>
> Aha!  Found an ancient comment that explicitly states P5 does not block
> NMI/SMI in the STI shadow, while P6 does block NMI/SMI.

Ok, so the STI shadow really wouldn't be reliable on those machines. Scary.

Of course, the good news is that hopefully nobody has them any more,
and if they do, they presumably don't use fancy NMI profiling etc, so
any actual NMI's are probably relegated purely to largely rare and
effectively fatal errors anyway (ie memory parity errors).

 Linus


Re: [PATCH] quota: set init_needed flag only when successfully getting dquot

2019-04-29 Thread cgxu519

On 4/30/19 5:49 AM, Jan Kara wrote:

On Sun 28-04-19 13:39:21, Chengguang Xu wrote:

Set init_needed flag only when successfully getting dquot,
so that we can skip unnecessary subsequent operation.

Signed-off-by: Chengguang Xu 

Thanks for the patch but I don't think it's really useful. It will be very
rare that we race with quotaoff of dqget() fails due to error. So the
additional overhead of iterating over dquots doesn't really matter in that
case.


Hi Jan,

Thanks for the comment, I got it.

Chengguang.



Re: [PATCH V2] staging: fieldbus: anybus-s: force endiannes annotation

2019-04-29 Thread Nicholas Mc Guire
On Mon, Apr 29, 2019 at 10:03:36AM -0400, Sven Van Asbroeck wrote:
> On Mon, Apr 29, 2019 at 2:11 AM Nicholas Mc Guire  wrote:
> >
> > V2: As requested by Sven Van Asbroeck  make the
> > impact of the patch clear in the commit message.
> 
> Thank you, but did you miss my comment about creating a local variable
> instead? See:
> https://lkml.org/lkml/2019/4/28/97

Did not miss it - I just don't think that makes it any more
understandable - the __force __be16 makes it clear I believe
that this is correct, sparse does not like this though - so tell
sparse. The local variable would need to be explained as it is
functionally not necessary - therefor I find it more confusing
that using  __force here.

If that rational is wrong let me know.

thx!
hofrat



[PATCH] treewide: fix awk regexp over-escaping

2019-04-29 Thread Alex Xu (Hello71)
Fix "warning: regexp escape sequence is not a known regexp operator" on
gawk 5.0.0.

Results found by:

- grepping '\\[^\[\\^$.|?*+()a-z]' on *.awk
- grepping 'awk.*\\[^\[\\^$.|?*+()a-z]'
- running awk --lint -f /dev/null on *.awk

Signed-off-by: Alex Xu (Hello71) 
---
 Documentation/arm/Samsung/clksrc-change-registers.awk  | 2 +-
 arch/x86/tools/gen-insn-attr-x86.awk   | 4 ++--
 lib/raid6/unroll.awk   | 2 +-
 tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk | 4 ++--
 tools/perf/arch/x86/tests/gen-insn-x86-dat.awk | 2 +-
 tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk | 4 ++--
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/Documentation/arm/Samsung/clksrc-change-registers.awk 
b/Documentation/arm/Samsung/clksrc-change-registers.awk
index 7be1b8aa7cd9..d853f750c861 100755
--- a/Documentation/arm/Samsung/clksrc-change-registers.awk
+++ b/Documentation/arm/Samsung/clksrc-change-registers.awk
@@ -67,7 +67,7 @@ BEGIN {
 # to replace and create an associative array of values
 
 while (getline line < ARGV[1] > 0) {
-   if (line ~ /\#define.*_MASK/ &&
+   if (line ~ /#define.*_MASK/ &&
!(line ~ /USB_SIG_MASK/)) {
splitdefine(line, fields)
name = fields[0]
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk 
b/arch/x86/tools/gen-insn-attr-x86.awk
index b02a36b2c14f..a42015b305f4 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -69,7 +69,7 @@ BEGIN {
 
lprefix1_expr = "\\((66|!F3)\\)"
lprefix2_expr = "\\(F3\\)"
-   lprefix3_expr = "\\((F2|!F3|66\\)\\)"
+   lprefix3_expr = "\\((F2|!F3|66)\\)"
lprefix_expr = "\\((66|F2|F3)\\)"
max_lprefix = 4
 
@@ -257,7 +257,7 @@ function convert_operands(count,opnd,   i,j,imm,mod)
return add_flags(imm, mod)
 }
 
-/^[0-9a-f]+\:/ {
+/^[0-9a-f]+:/ {
if (NR == 1)
next
# get index
diff --git a/lib/raid6/unroll.awk b/lib/raid6/unroll.awk
index c6aa03631df8..0809805a7e23 100644
--- a/lib/raid6/unroll.awk
+++ b/lib/raid6/unroll.awk
@@ -13,7 +13,7 @@ BEGIN {
for (i = 0; i < rep; ++i) {
tmp = $0
gsub(/\$\$/, i, tmp)
-   gsub(/\$\#/, n, tmp)
+   gsub(/\$#/, n, tmp)
gsub(/\$\*/, "$", tmp)
print tmp
}
diff --git a/tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk 
b/tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk
index b02a36b2c14f..a42015b305f4 100644
--- a/tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk
@@ -69,7 +69,7 @@ BEGIN {
 
lprefix1_expr = "\\((66|!F3)\\)"
lprefix2_expr = "\\(F3\\)"
-   lprefix3_expr = "\\((F2|!F3|66\\)\\)"
+   lprefix3_expr = "\\((F2|!F3|66)\\)"
lprefix_expr = "\\((66|F2|F3)\\)"
max_lprefix = 4
 
@@ -257,7 +257,7 @@ function convert_operands(count,opnd,   i,j,imm,mod)
return add_flags(imm, mod)
 }
 
-/^[0-9a-f]+\:/ {
+/^[0-9a-f]+:/ {
if (NR == 1)
next
# get index
diff --git a/tools/perf/arch/x86/tests/gen-insn-x86-dat.awk 
b/tools/perf/arch/x86/tests/gen-insn-x86-dat.awk
index a21454835cd4..27585d032ee6 100644
--- a/tools/perf/arch/x86/tests/gen-insn-x86-dat.awk
+++ b/tools/perf/arch/x86/tests/gen-insn-x86-dat.awk
@@ -31,7 +31,7 @@ BEGIN {
going = 0
 }
 
-/^\s*[0-9a-fA-F]+\:/ {
+/^\s*[0-9a-fA-F]+:/ {
if (going) {
colon_pos = index($0, ":")
useful_line = substr($0, colon_pos + 1)
diff --git a/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk 
b/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
index ddd5c4c21129..606ccd154392 100644
--- a/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
+++ b/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
@@ -69,7 +69,7 @@ BEGIN {
 
lprefix1_expr = "\\((66|!F3)\\)"
lprefix2_expr = "\\(F3\\)"
-   lprefix3_expr = "\\((F2|!F3|66\\)\\)"
+   lprefix3_expr = "\\((F2|!F3|66)\\)"
lprefix_expr = "\\((66|F2|F3)\\)"
max_lprefix = 4
 
@@ -257,7 +257,7 @@ function convert_operands(count,opnd,   i,j,imm,mod)
return add_flags(imm, mod)
 }
 
-/^[0-9a-f]+\:/ {
+/^[0-9a-f]+:/ {
if (NR == 1)
next
# get index
-- 
2.21.0



Re: [PATCH v3 5/8] iommu/vt-d: Implement def_domain_type iommu ops entry

2019-04-29 Thread Lu Baolu

Hi Christoph,

On 4/30/19 4:03 AM, Christoph Hellwig wrote:

@@ -3631,35 +3607,30 @@ static int iommu_no_mapping(struct device *dev)
if (iommu_dummy(dev))
return 1;
  
-	if (!iommu_identity_mapping)

-   return 0;
-


FYI, iommu_no_mapping has been refactored in for-next:

https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=x86/vt-d=48b2c937ea37a3bece0094b46450ed5267525289


Oh, yes! Thanks for letting me know this. Will rebase the code.




found = identity_mapping(dev);
if (found) {
+   /*
+* If the device's dma_mask is less than the system's memory
+* size then this is not a candidate for identity mapping.
+*/
+   u64 dma_mask = *dev->dma_mask;
+
+   if (dev->coherent_dma_mask &&
+   dev->coherent_dma_mask < dma_mask)
+   dma_mask = dev->coherent_dma_mask;
+
+   if (dma_mask < dma_get_required_mask(dev)) {


I know this is mostly existing code moved around, but it really needs
some fixing.  For one dma_get_required_mask is supposed to return the
required to not bounce mask for the given device.  E.g. for a device
behind an iommu it should always just return 32-bit.  If you really
want to check vs system memory please call dma_direct_get_required_mask
without the dma_ops indirection.

Second I don't even think we need to check the coherent_dma_mask,
dma_direct is pretty good at always finding memory even without
an iommu.

Third this doesn't take take the bus_dma_mask into account.

This probably should just be:

if (min(*dev->dma_mask, dev->bus_dma_mask) <
dma_direct_get_required_mask(dev)) {


Agreed and will add this in the next version.

Best regards,
Lu Baolu


Re: RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD]

2019-04-29 Thread Linus Torvalds
On Mon, Apr 29, 2019 at 5:39 PM Jann Horn  wrote:
>
> ... uuuh, whoops. Turns out I don't know what I'm talking about.

Well, apparently there's some odd libc issue accoprding to Florian, so
there *might* be something to it.

> Nevermind. For some reason I thought vfork() was just
> CLONE_VFORK|SIGCHLD, but now I see I got that completely wrong.

Well, inside the kernel, that's actually *very* close to what vfork() is:

  SYSCALL_DEFINE0(vfork)
  {
return _do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0,
0, NULL, NULL, 0);
  }

but that's just an internal implementation detail. It's a real vfork()
and should act as the traditional BSD "share everything" without any
address space copying. The CLONE_VFORK flag is what does the "wait for
child to exit or execve" magic.

Note that vfork() is "exciting" for the compiler in much the same way
"setjmp/longjmp()" is, because of the shared stack use in the child
and the parent. It is *very* easy to get this wrong and cause massive
and subtle memory corruption issues because the parent returns to
something that has been messed up by the child.

That may be why some libc might end up just using "fork()", because it
ends up avoiding bugs in user space.
(In fact, if I recall correctly, the _reason_ we have an explicit
'vfork()' entry point rather than using clone() with magic parameters
was that the lack of arguments meant that you didn't have to
save/restore any registers in user space, which made the whole stack
issue simpler. But it's been two decades, so my memory is bitrotting).

Also, particularly if you have a big address space, vfork()+execve()
can be quite a bit faster than fork()+execve(). Linux fork() is pretty
efficient, but if you have gigabytes of VM space to copy, it's going
to take time even if you do it fairly well.

   Linus


Re: [PATCH v3] pinctrl:intel: Retain HOSTSW_OWN for requested gpio pin

2019-04-29 Thread Chris Chiu
On Fri, Apr 26, 2019 at 8:50 PM Andriy Shevchenko
 wrote:
>
> On Tue, Apr 23, 2019 at 12:38:17PM +0200, Linus Walleij wrote:
> > On Mon, Apr 15, 2019 at 7:54 AM Chris Chiu  wrote:
> >
> > > The touchpad of the ASUS laptops E403NA, X540NA, X541NA are not
> > > responsive after suspend/resume. The following error message
> > > shows after resume.
> > >  i2c_hid i2c-ELAN1200:00: failed to reset device.
> > >
> > > On these laptops, the touchpad interrupt is connected via a GPIO
> > > pin which is controlled by Intel pinctrl. After system resumes,
> > > the GPIO is in ACPI mode and no longer works as an IRQ.
> > >
> > > This commit saves the HOSTSW_OWN value during suspend, make sure
> > > the HOSTSW_OWN mode remains the same after resume.
> > >
> > > Signed-off-by: Chris Chiu 
> >
> > This v3 patch applied with Mika's ACK.
>
> Hmm... It's supposed to go along with our PR.

Anything I can help with?

Chris
>
> --
> With Best Regards,
> Andy Shevchenko
>
>


[PATCH] NTB: correct ntb_dev_ops and ntb_dev comment typos

2019-04-29 Thread Wesley Sheng
The comment for ntb_dev_ops and ntb_dev incorrectly referred to
ntb_ctx_ops and ntb_device.

Signed-off-by: Wesley Sheng 
Reviewed-by: Logan Gunthorpe 
---
 include/linux/ntb.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index 56a92e3..604abc8 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -205,7 +205,7 @@ static inline int ntb_ctx_ops_is_valid(const struct 
ntb_ctx_ops *ops)
 }
 
 /**
- * struct ntb_ctx_ops - ntb device operations
+ * struct ntb_dev_ops - ntb device operations
  * @port_number:   See ntb_port_number().
  * @peer_port_count:   See ntb_peer_port_count().
  * @peer_port_number:  See ntb_peer_port_number().
@@ -404,7 +404,7 @@ struct ntb_client {
 #define drv_ntb_client(__drv) container_of((__drv), struct ntb_client, drv)
 
 /**
- * struct ntb_device - ntb device
+ * struct ntb_dev - ntb device
  * @dev:   Linux device object.
  * @pdev:  PCI device entry of the ntb.
  * @topo:  Detected topology of the ntb.
-- 
2.7.4



Re: [RFC] Bluetooth: Retry configure request if result is L2CAP_CONF_UNKNOWN

2019-04-29 Thread Andrey Smirnov
On Tue, Apr 23, 2019 at 1:08 PM Marcel Holtmann  wrote:
>
> Hi Andrey,
>
> > Due to:
> >
> > - current implementation of l2cap_config_rsp() dropping BT
> >   connection if sender of configuration response replied with unknown
> >   option failure (Result=0x0003/L2CAP_CONF_UNKNOWN)
> >
> > - current implementation of l2cap_build_conf_req() adding
> >   L2CAP_CONF_RFC(0x04) option to initial configure request sent by
> >   the Linux host.
> >
> > devices that do no recongninze L2CAP_CONF_RFC, such as Xbox One S
> > controllers, will get stuck in endless connect -> configure ->
> > disconnect loop, never connect and be generaly unusable.
> >
> > To avoid this problem add code to do the following:
> >
> > 1. Store a mask of supported conf option types per connection
> >
> > 2. Parse the body of response L2CAP_CONF_UNKNOWN and adjust
> >connection's supported conf option types mask
> >
> > 3. Retry configuration step the same way it's done for
> >L2CAP_CONF_UNACCEPT
> >
> > Signed-off-by: Andrey Smirnov 
> > Cc: Pierre-Loup A. Griffais 
> > Cc: Florian Dollinger 
> > Cc: Marcel Holtmann 
> > Cc: Johan Hedberg 
> > Cc: linux-blueto...@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >
> > Everyone:
> >
> > I marked this as an RFC, since I don't have a lot of experience with
> > Bluetooth subsystem and don't have hight degree of confidence about
> > choices made in this patch. I do, however, thins is is good enough to
> > start a discussion about the problem.
> >
> > Thanks,
> > Andrey Smirnov
>
> so it seems that the remote side claims to support Streaming Mode and that is 
> why we are trying to set it up.
>
> > ACL Data RX: Handle 12 flags 0x02 dlen 16
>   L2CAP: Information Response (0x0b) ident 1 len 8
> Type: Extended features supported (0x0002)
> Result: Success (0x)
> Features: 0x0010
>   Streaming Mode
>
> And that is why we do this.
>
> < ACL Data TX: Handle 12 flags 0x00 dlen 23
>   L2CAP: Configure Request (0x04) ident 2 len 15
> Destination CID: 64
> Flags: 0x
> Option: Retransmission and Flow Control (0x04) [mandatory]
>   Mode: Basic (0x00)
>   TX window size: 0
>   Max transmit: 0
>   Retransmission timeout: 0
>   Monitor timeout: 0
>   Maximum PDU size: 0
>
> > ACL Data RX: Handle 12 flags 0x02 dlen 15
>   L2CAP: Configure Response (0x05) ident 2 len 7
> Source CID: 64
> Flags: 0x
> Result: Failure - unknown options (0x0003)
> 04
>
> So btmon needs a patch to decide the failed option octet here. We really want 
> do provide a human description of the failed option.
>

I'll see if that's an easy thing to add. Can't promise anything though.

> >
> > include/net/bluetooth/l2cap.h |  1 +
> > net/bluetooth/l2cap_core.c| 58 ++-
> > 2 files changed, 51 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h
> > index 093aedebdf0c..6898bba5d9a8 100644
> > --- a/include/net/bluetooth/l2cap.h
> > +++ b/include/net/bluetooth/l2cap.h
> > @@ -632,6 +632,7 @@ struct l2cap_conn {
> >   unsigned intmtu;
> >
> >   __u32   feat_mask;
> > + __u32   known_options;
> >   __u8remote_fixed_chan;
> >   __u8local_fixed_chan;
> >
> > diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> > index f17e393b43b4..49be98b6de72 100644
> > --- a/net/bluetooth/l2cap_core.c
> > +++ b/net/bluetooth/l2cap_core.c
> > @@ -3243,8 +3243,10 @@ static int l2cap_build_conf_req(struct l2cap_chan 
> > *chan, void *data, size_t data
> >   rfc.monitor_timeout = 0;
> >   rfc.max_pdu_size= 0;
> >
> > - l2cap_add_conf_opt(, L2CAP_CONF_RFC, sizeof(rfc),
> > -(unsigned long) , endptr - ptr);
> > + if (chan->conn->known_options & BIT(L2CAP_CONF_RFC)) {
> > + l2cap_add_conf_opt(, L2CAP_CONF_RFC, sizeof(rfc),
> > +(unsigned long), endptr - ptr);
> > + }
> >   break;
> >
> >   case L2CAP_MODE_ERTM:
> > @@ -3263,8 +3265,10 @@ static int l2cap_build_conf_req(struct l2cap_chan 
> > *chan, void *data, size_t data
> >   rfc.txwin_size = min_t(u16, chan->tx_win,
> >  L2CAP_DEFAULT_TX_WINDOW);
> >
> > - l2cap_add_conf_opt(, L2CAP_CONF_RFC, sizeof(rfc),
> > -(unsigned long) , endptr - ptr);
> > + if (chan->conn->known_options & BIT(L2CAP_CONF_RFC)) {
> > + l2cap_add_conf_opt(, L2CAP_CONF_RFC, sizeof(rfc),
> > +(unsigned long), endptr - ptr);
> > + }
> >
> >   if (test_bit(FLAG_EFS_ENABLE, >flags))
> >   

[PATCH] clk: imx: pllv3: Fix fall through build warning

2019-04-29 Thread Anson Huang
Fix below fall through build warning:

drivers/clk/imx/clk-pllv3.c:453:21: warning:
this statement may fall through [-Wimplicit-fallthrough=]

   pll->denom_offset = PLL_IMX7_DENOM_OFFSET;
 ^
drivers/clk/imx/clk-pllv3.c:454:2: note: here
  case IMX_PLLV3_AV:
  ^~~~

Signed-off-by: Anson Huang 
---
 drivers/clk/imx/clk-pllv3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/clk/imx/clk-pllv3.c b/drivers/clk/imx/clk-pllv3.c
index e892b9a..fbe4fe0 100644
--- a/drivers/clk/imx/clk-pllv3.c
+++ b/drivers/clk/imx/clk-pllv3.c
@@ -451,6 +451,7 @@ struct clk *imx_clk_pllv3(enum imx_pllv3_type type, const 
char *name,
case IMX_PLLV3_AV_IMX7:
pll->num_offset = PLL_IMX7_NUM_OFFSET;
pll->denom_offset = PLL_IMX7_DENOM_OFFSET;
+   /* fall through */
case IMX_PLLV3_AV:
ops = _pllv3_av_ops;
break;
-- 
2.7.4



RE: linux-next: build warning after merge of the clk tree

2019-04-29 Thread Anson Huang
Hi, Stephen
Thanks for notice.
As it is intentional, I will send out a patch to add "/* fall through 
*/" to avoid this build warning,

Anson.

> -Original Message-
> From: Stephen Rothwell [mailto:s...@canb.auug.org.au]
> Sent: Tuesday, April 30, 2019 8:20 AM
> To: Mike Turquette ; Stephen Boyd
> 
> Cc: Linux Next Mailing List ; Linux Kernel Mailing
> List ; Anson Huang ;
> Gustavo A. R. Silva ; Kees Cook
> 
> Subject: linux-next: build warning after merge of the clk tree
> 
> Hi all,
> 
> After merging the clk tree, today's linux-next build (arm
> multi_v7_defconfig) produced this warning:
> 
> drivers/clk/imx/clk-pllv3.c:453:21: warning: this statement may fall through 
> [-
> Wimplicit-fallthrough=]
>pll->denom_offset = PLL_IMX7_DENOM_OFFSET;
>  ^
> drivers/clk/imx/clk-pllv3.c:454:2: note: here
>   case IMX_PLLV3_AV:
>   ^~~~
> 
> Introduced by commit
> 
>   01d0a541ff4b ("clk: imx: correct i.MX7D AV PLL num/denom offset")
> 
> I get this warning because I am building with -Wimplicit-fallthrough in
> attempt to catch new additions early.  The gcc warning can be turned off by
> adding a /* fall through */ comment at the point the fall through happens
> (assuming that the fall through is intentional).
> 
> --
> Cheers,
> Stephen Rothwell


Re: [RFC PATCH v2 00/17] Core scheduling v2

2019-04-29 Thread Aubrey Li
On Tue, Apr 30, 2019 at 12:01 AM Ingo Molnar  wrote:
> * Li, Aubrey  wrote:
>
> > > I.e. showing the approximate CPU thread-load figure column would be
> > > very useful too, where '50%' shows half-loaded, '100%' fully-loaded,
> > > '200%' over-saturated, etc. - for each row?
> >
> > See below, hope this helps.
> > .--.
> > |NA/AVX vanilla-SMT [std% / sem%] cpu% |coresched-SMT   [std% / 
> > sem%] +/- cpu% |  no-SMT [std% / sem%]   +/-  cpu% |
> > |--|
> > |  1/1508.5 [ 0.2%/ 0.0%] 2.1% |504.7   [ 1.1%/ 
> > 0.1%]-0.8%2.1% |   509.0 [ 0.2%/ 0.0%]   0.1% 4.3% |
> > |  2/2   1000.2 [ 1.4%/ 0.1%] 4.1% |   1004.1   [ 1.6%/ 
> > 0.2%] 0.4%4.1% |   997.6 [ 1.2%/ 0.1%]  -0.3% 8.1% |
> > |  4/4   1912.1 [ 1.0%/ 0.1%] 7.9% |   1904.2   [ 1.1%/ 
> > 0.1%]-0.4%7.9% |  1914.9 [ 1.3%/ 0.1%]   0.1%15.1% |
> > |  8/8   3753.5 [ 0.3%/ 0.0%]14.9% |   3748.2   [ 0.3%/ 
> > 0.0%]-0.1%   14.9% |  3751.3 [ 0.4%/ 0.0%]  -0.1%30.5% |
> > | 16/16  7139.3 [ 2.4%/ 0.2%]30.3% |   7137.9   [ 1.8%/ 
> > 0.2%]-0.0%   30.3% |  7049.2 [ 2.4%/ 0.2%]  -1.3%60.4% |
> > | 32/32 10899.0 [ 4.2%/ 0.4%]60.3% |  10780.3   [ 4.4%/ 
> > 0.4%]-1.1%   55.9% | 10339.2 [ 9.6%/ 0.9%]  -5.1%97.7% |
> > | 64/64 15086.1 [11.5%/ 1.2%]97.7% |  14262.0   [ 8.2%/ 
> > 0.8%]-5.5%   82.0% | 11168.7 [22.2%/ 1.7%] -26.0%   100.0% |
> > |128/12815371.9 [22.0%/ 2.2%]   100.0% |  14675.8   [14.4%/ 
> > 1.4%]-4.5%   82.8% | 10963.9 [18.5%/ 1.4%] -28.7%   100.0% |
> > |256/25615990.8 [22.0%/ 2.2%]   100.0% |  12227.9   [10.3%/ 
> > 1.0%]   -23.5%   73.2% | 10469.9 [19.6%/ 1.7%] -34.5%   100.0% |
> > '--'
>
> Very nice, thank you!
>
> What's interesting is how in the over-saturated case (the last three
> rows: 128, 256 and 512 total threads) coresched-SMT leaves 20-30% CPU
> performance on the floor according to the load figures.

Yeah, I found the next focus.

>
> Is this true idle time (which shows up as 'id' during 'top'), or some
> load average artifact?
>

vmstat periodically reported intermediate CPU utilization in one second, it was
running simultaneously when the benchmarks run. The cpu% is computed by
the average of (100-idle) series.

Thanks,
-Aubrey


Re: [PATCH v6 02/10] clk: samsung: add new clocks for DMC for Exynos5422 SoC

2019-04-29 Thread Chanwoo Choi
Hi Lukasz,

I have no objection about this patch. 
Instead, as I commented on v4, in order to reduce the confusion
about multiple clock definitions with same bit range of DIV_CDREX0,

You need to add the additional comment and you better to
define the three clocks at the nearby in this driver.
(CLKDIV_PCLK_CDREX, CLKDIV_PCLK_DREX0, CLKDIV_PCLK_DREX1)
If they are scattered, it is difficult for understanding
why they are developed like this.

[1] [v4,2/8] clk: samsung: add new clocks for DMC for Exynos5422 SoC
- https://lkml.org/lkml/2019/2/12/12

Regards,
Chanwoo Choi


On 19. 4. 19. 오후 11:19, Lukasz Luba wrote:
> This patch provides support for clocks needed for Dynamic Memory Controller
> in Exynos5422 SoC. It adds CDREX base register addresses, new DIV, MUX and
> GATE entries.
> 
> Signed-off-by: Lukasz Luba 
> ---
>  drivers/clk/samsung/clk-exynos5420.c | 46 
> 
>  1 file changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/clk/samsung/clk-exynos5420.c 
> b/drivers/clk/samsung/clk-exynos5420.c
> index 34cce3c..d9e6653 100644
> --- a/drivers/clk/samsung/clk-exynos5420.c
> +++ b/drivers/clk/samsung/clk-exynos5420.c
> @@ -134,6 +134,8 @@
>  #define SRC_CDREX0x20200
>  #define DIV_CDREX0   0x20500
>  #define DIV_CDREX1   0x20504
> +#define GATE_BUS_CDREX0  0x20700
> +#define GATE_BUS_CDREX1  0x20704
>  #define KPLL_LOCK0x28000
>  #define KPLL_CON00x28100
>  #define SRC_KFC  0x28200
> @@ -248,6 +250,8 @@ static const unsigned long exynos5x_clk_regs[] 
> __initconst = {
>   DIV_CDREX1,
>   SRC_KFC,
>   DIV_KFC0,
> + GATE_BUS_CDREX0,
> + GATE_BUS_CDREX1,
>  };
>  
>  static const unsigned long exynos5800_clk_regs[] __initconst = {
> @@ -425,6 +429,9 @@ PNAME(mout_group13_5800_p)= { "dout_osc_div", 
> "mout_sw_aclkfl1_550_cam" };
>  PNAME(mout_group14_5800_p)   = { "dout_aclk550_cam", "dout_sclk_sw" };
>  PNAME(mout_group15_5800_p)   = { "dout_osc_div", "mout_sw_aclk550_cam" };
>  PNAME(mout_group16_5800_p)   = { "dout_osc_div", "mout_mau_epll_clk" };
> +PNAME(mout_mx_mspll_ccore_phy_p) = { "sclk_bpll", "mout_sclk_dpll",
> + "mout_sclk_mpll", "ff_dout_spll2",
> + "mout_sclk_spll", "mout_sclk_epll"};
>  
>  /* fixed rate clocks generated outside the soc */
>  static struct samsung_fixed_rate_clock
> @@ -450,7 +457,7 @@ static const struct samsung_fixed_factor_clock
>  static const struct samsung_fixed_factor_clock
>   exynos5800_fixed_factor_clks[] __initconst = {
>   FFACTOR(0, "ff_dout_epll2", "mout_sclk_epll", 1, 2, 0),
> - FFACTOR(0, "ff_dout_spll2", "mout_sclk_spll", 1, 2, 0),
> + FFACTOR(CLK_FF_DOUT_SPLL2, "ff_dout_spll2", "mout_sclk_spll", 1, 2, 0),
>  };
>  
>  static const struct samsung_mux_clock exynos5800_mux_clks[] __initconst = {
> @@ -472,11 +479,14 @@ static const struct samsung_mux_clock 
> exynos5800_mux_clks[] __initconst = {
>   MUX(0, "mout_aclk300_disp1", mout_group5_5800_p, SRC_TOP2, 24, 2),
>   MUX(0, "mout_aclk300_gscl", mout_group5_5800_p, SRC_TOP2, 28, 2),
>  
> + MUX(CLK_MOUT_MX_MSPLL_CCORE_PHY, "mout_mx_mspll_ccore_phy",
> + mout_mx_mspll_ccore_phy_p, SRC_TOP7, 0, 3),
> +
>   MUX(CLK_MOUT_MX_MSPLL_CCORE, "mout_mx_mspll_ccore",
> - mout_mx_mspll_ccore_p, SRC_TOP7, 16, 2),
> + mout_mx_mspll_ccore_p, SRC_TOP7, 16, 3),
>   MUX_F(CLK_MOUT_MAU_EPLL, "mout_mau_epll_clk", mout_mau_epll_clk_5800_p,
>   SRC_TOP7, 20, 2, CLK_SET_RATE_PARENT, 0),
> - MUX(0, "sclk_bpll", mout_bpll_p, SRC_TOP7, 24, 1),
> + MUX(CLK_SCLK_BPLL, "sclk_bpll", mout_bpll_p, SRC_TOP7, 24, 1),
>   MUX(0, "mout_epll2", mout_epll2_5800_p, SRC_TOP7, 28, 1),
>  
>   MUX(0, "mout_aclk550_cam", mout_group3_5800_p, SRC_TOP8, 16, 3),
> @@ -648,7 +658,7 @@ static const struct samsung_mux_clock exynos5x_mux_clks[] 
> __initconst = {
>  
>   MUX(0, "mout_sclk_mpll", mout_mpll_p, SRC_TOP6, 0, 1),
>   MUX(CLK_MOUT_VPLL, "mout_sclk_vpll", mout_vpll_p, SRC_TOP6, 4, 1),
> - MUX(0, "mout_sclk_spll", mout_spll_p, SRC_TOP6, 8, 1),
> + MUX(CLK_MOUT_SCLK_SPLL, "mout_sclk_spll", mout_spll_p, SRC_TOP6, 8, 1),
>   MUX(0, "mout_sclk_ipll", mout_ipll_p, SRC_TOP6, 12, 1),
>   MUX(0, "mout_sclk_rpll", mout_rpll_p, SRC_TOP6, 16, 1),
>   MUX_F(CLK_MOUT_EPLL, "mout_sclk_epll", mout_epll_p, SRC_TOP6, 20, 1,
> @@ -817,6 +827,8 @@ static const struct samsung_div_clock exynos5x_div_clks[] 
> __initconst = {
>   DIV(CLK_DOUT_CLK2X_PHY0, "dout_clk2x_phy0", "dout_sclk_cdrex",
>   DIV_CDREX0, 3, 5),
>  
> + DIV(0, "dout_pclk_drex0", "dout_cclk_drex0", DIV_CDREX0, 28, 3),
> +
>   DIV(CLK_DOUT_PCLK_CORE_MEM, "dout_pclk_core_mem", "mout_mclk_cdrex",
>   DIV_CDREX1, 8, 3),
>  
> @@ -1170,6 +1182,32 @@ static const 

Re: [RFC PATCH v2 00/17] Core scheduling v2

2019-04-29 Thread Aubrey Li
On Mon, Apr 29, 2019 at 11:39 PM Phil Auld  wrote:
>
> On Mon, Apr 29, 2019 at 09:25:35PM +0800 Li, Aubrey wrote:
> > .--.
> > |NA/AVX vanilla-SMT [std% / sem%] cpu% |coresched-SMT   [std% / 
> > sem%] +/- cpu% |  no-SMT [std% / sem%]   +/-  cpu% |
> > |--|
> > |  1/1508.5 [ 0.2%/ 0.0%] 2.1% |504.7   [ 1.1%/ 
> > 0.1%]-0.8%2.1% |   509.0 [ 0.2%/ 0.0%]   0.1% 4.3% |
> > |  2/2   1000.2 [ 1.4%/ 0.1%] 4.1% |   1004.1   [ 1.6%/ 
> > 0.2%] 0.4%4.1% |   997.6 [ 1.2%/ 0.1%]  -0.3% 8.1% |
> > |  4/4   1912.1 [ 1.0%/ 0.1%] 7.9% |   1904.2   [ 1.1%/ 
> > 0.1%]-0.4%7.9% |  1914.9 [ 1.3%/ 0.1%]   0.1%15.1% |
> > |  8/8   3753.5 [ 0.3%/ 0.0%]14.9% |   3748.2   [ 0.3%/ 
> > 0.0%]-0.1%   14.9% |  3751.3 [ 0.4%/ 0.0%]  -0.1%30.5% |
> > | 16/16  7139.3 [ 2.4%/ 0.2%]30.3% |   7137.9   [ 1.8%/ 
> > 0.2%]-0.0%   30.3% |  7049.2 [ 2.4%/ 0.2%]  -1.3%60.4% |
> > | 32/32 10899.0 [ 4.2%/ 0.4%]60.3% |  10780.3   [ 4.4%/ 
> > 0.4%]-1.1%   55.9% | 10339.2 [ 9.6%/ 0.9%]  -5.1%97.7% |
> > | 64/64 15086.1 [11.5%/ 1.2%]97.7% |  14262.0   [ 8.2%/ 
> > 0.8%]-5.5%   82.0% | 11168.7 [22.2%/ 1.7%] -26.0%   100.0% |
> > |128/12815371.9 [22.0%/ 2.2%]   100.0% |  14675.8   [14.4%/ 
> > 1.4%]-4.5%   82.8% | 10963.9 [18.5%/ 1.4%] -28.7%   100.0% |
> > |256/25615990.8 [22.0%/ 2.2%]   100.0% |  12227.9   [10.3%/ 
> > 1.0%]   -23.5%   73.2% | 10469.9 [19.6%/ 1.7%] -34.5%   100.0% |
> > '--'
> >
>
> That's really nice and clear.
>
> We start to see the penalty for the coresched at 32/32, leaving some cpus 
> more idle than otherwise.
> But it's pretty good overall, for this benchmark at least.
>
> Is this with stock v2 or with any of the fixes posted after? I wonder how 
> much the fixes for
> the race that violates the rule effects this, for example.
>

Yeah, this data is based on v2 without any fixes after.
I also tried some fixes potential to performance impact but no luck so far.
Please let me know if anything I missed.

Thanks,
-Aubrey


Re: [PATCH v3 12/16] PM / devfreq: tegra: Reconfigure hardware on governor's restart

2019-04-29 Thread Chanwoo Choi
Hi,

On 19. 4. 18. 오전 7:29, Dmitry Osipenko wrote:
> Move hardware configuration to governor's start/resume methods.
> This allows to re-initialize hardware counters and reconfigure
> cleanly if governor was stopped/paused. That is needed because we
> are not aware of all hardware changes that happened while governor
> was stopped and the paused state may get out of sync with reality,
> hence it's better to start with a clean slate after the pause. In
> a result there is no memory bandwidth starvation after resume from
> suspend-to-ram that results in display controller underflowing that
> happens on resume because of improper decision made by devfreq about
> the required memory frequency. This change also cleans up code a tad
> by moving hardware-configuration code into a single location.
> 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/devfreq/tegra-devfreq.c | 98 ++---
>  1 file changed, 40 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/devfreq/tegra-devfreq.c b/drivers/devfreq/tegra-devfreq.c
> index 62f35e818122..e9ab49394d35 100644
> --- a/drivers/devfreq/tegra-devfreq.c
> +++ b/drivers/devfreq/tegra-devfreq.c
> @@ -392,55 +392,6 @@ static int tegra_actmon_rate_notify_cb(struct 
> notifier_block *nb,
>   return NOTIFY_OK;
>  }
>  
> -static void tegra_actmon_enable_interrupts(struct tegra_devfreq *tegra)
> -{
> - struct tegra_devfreq_device *dev;
> - u32 val;
> - unsigned int i;
> -
> - for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
> - dev = >devices[i];
> -
> - val = device_readl(dev, ACTMON_DEV_CTRL);
> - val |= ACTMON_DEV_CTRL_AVG_ABOVE_WMARK_EN;
> - val |= ACTMON_DEV_CTRL_AVG_BELOW_WMARK_EN;
> - val |= ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
> - val |= ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN;
> -
> - device_writel(dev, val, ACTMON_DEV_CTRL);
> - }
> -
> - actmon_write_barrier(tegra);
> -}
> -
> -static void tegra_actmon_disable_interrupts(struct tegra_devfreq *tegra)
> -{
> - struct tegra_devfreq_device *dev;
> - u32 val;
> - unsigned int i;
> -
> - disable_irq(tegra->irq);
> -
> - for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
> - dev = >devices[i];
> -
> - val = device_readl(dev, ACTMON_DEV_CTRL);
> - val &= ~ACTMON_DEV_CTRL_AVG_ABOVE_WMARK_EN;
> - val &= ~ACTMON_DEV_CTRL_AVG_BELOW_WMARK_EN;
> - val &= ~ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
> - val &= ~ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN;
> -
> - device_writel(dev, val, ACTMON_DEV_CTRL);
> -
> - device_writel(dev, ACTMON_INTR_STATUS_CLEAR,
> -   ACTMON_DEV_INTR_STATUS);
> - }
> -
> - actmon_write_barrier(tegra);
> -
> - enable_irq(tegra->irq);
> -}
> -
>  static void tegra_actmon_configure_device(struct tegra_devfreq *tegra,
> struct tegra_devfreq_device *dev)
>  {
> @@ -464,11 +415,47 @@ static void tegra_actmon_configure_device(struct 
> tegra_devfreq *tegra,
>   << ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_NUM_SHIFT;
>   val |= (ACTMON_ABOVE_WMARK_WINDOW - 1)
>   << ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_NUM_SHIFT;
> + val |= ACTMON_DEV_CTRL_AVG_ABOVE_WMARK_EN;
> + val |= ACTMON_DEV_CTRL_AVG_BELOW_WMARK_EN;
> + val |= ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
> + val |= ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN;
>   val |= ACTMON_DEV_CTRL_ENB;
>  
>   device_writel(dev, val, ACTMON_DEV_CTRL);
> +}
> +
> +static void tegra_actmon_start(struct tegra_devfreq *tegra)
> +{
> + unsigned int i;
> +
> + disable_irq(tegra->irq);
> +
> + actmon_writel(tegra, ACTMON_SAMPLING_PERIOD - 1,
> +   ACTMON_GLB_PERIOD_CTRL);
> +
> + for (i = 0; i < ARRAY_SIZE(tegra->devices); i++)
> + tegra_actmon_configure_device(tegra, >devices[i]);

nitpick.
I agree this patch.

In order to make it more simple, I think that you can remove
tegra_actmon_configure() function and then just do some opertion
under the for loop in the tegra_actmon_start() to keep
similar style with tegra_actmon_stop(). But there is perfect solution.
If you agree, edit it on next patch. If you think that it is not necessary,
just keep this code.

> +
> + actmon_write_barrier(tegra);
> +
> + enable_irq(tegra->irq);
> +}
> +
> +static void tegra_actmon_stop(struct tegra_devfreq *tegra)
> +{
> + unsigned int i;
> +
> + disable_irq(tegra->irq);
> +
> + for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
> + device_writel(>devices[i], 0x, ACTMON_DEV_CTRL);
> + device_writel(>devices[i], ACTMON_INTR_STATUS_CLEAR,
> +   ACTMON_DEV_INTR_STATUS);
> + }
>  
>   actmon_write_barrier(tegra);
> +
> + enable_irq(tegra->irq);
>  }
>  
>  static int 

Re: [PATCH v3 2/3] power: supply: Add driver for Microchip UCS1002

2019-04-29 Thread Andrey Smirnov
On Mon, Apr 29, 2019 at 1:36 PM Guenter Roeck  wrote:
>
> On Mon, Apr 29, 2019 at 12:53:48PM -0700, Andrey Smirnov wrote:
> > Add driver for Microchip UCS1002 Programmable USB Port Power
> > Controller with Charger Emulation. The driver exposed a power supply
> > device to control/monitor various parameter of the device as well as a
> > regulator to allow controlling VBUS line.
> >
> > Signed-off-by: Enric Balletbo Serra 
> > Signed-off-by: Andrey Smirnov 
> > Cc: Chris Healy 
> > Cc: Lucas Stach 
> > Cc: Fabio Estevam 
> > Cc: Guenter Roeck 
> > Cc: Sebastian Reichel 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux...@vger.kernel.org
> > ---
> >  drivers/power/supply/Kconfig |   9 +
> >  drivers/power/supply/Makefile|   1 +
> >  drivers/power/supply/ucs1002_power.c | 646 +++
> >  3 files changed, 656 insertions(+)
> >  create mode 100644 drivers/power/supply/ucs1002_power.c
> >
> > diff --git a/drivers/power/supply/Kconfig b/drivers/power/supply/Kconfig
> > index e901b9879e7e..c614c8a196f3 100644
> > --- a/drivers/power/supply/Kconfig
> > +++ b/drivers/power/supply/Kconfig
> > @@ -660,4 +660,13 @@ config FUEL_GAUGE_SC27XX
> >Say Y here to enable support for fuel gauge with SC27XX
> >PMIC chips.
> >
> > +config CHARGER_UCS1002
> > +tristate "Microchip UCS1002 USB Port Power Controller"
> > + depends on I2C
> > + depends on OF
> > + select REGMAP_I2C
> > + help
> > +   Say Y to enable support for Microchip UCS1002 Programmable
> > +   USB Port Power Controller with Charger Emulation.
> > +
> >  endif # POWER_SUPPLY
> > diff --git a/drivers/power/supply/Makefile b/drivers/power/supply/Makefile
> > index b731c2a9b695..c56803a9e4fe 100644
> > --- a/drivers/power/supply/Makefile
> > +++ b/drivers/power/supply/Makefile
> > @@ -87,3 +87,4 @@ obj-$(CONFIG_AXP288_CHARGER)+= axp288_charger.o
> >  obj-$(CONFIG_CHARGER_CROS_USBPD) += cros_usbpd-charger.o
> >  obj-$(CONFIG_CHARGER_SC2731) += sc2731_charger.o
> >  obj-$(CONFIG_FUEL_GAUGE_SC27XX)  += sc27xx_fuel_gauge.o
> > +obj-$(CONFIG_CHARGER_UCS1002)+= ucs1002_power.o
> > diff --git a/drivers/power/supply/ucs1002_power.c 
> > b/drivers/power/supply/ucs1002_power.c
> > new file mode 100644
> > index ..677f20a4d76f
> > --- /dev/null
> > +++ b/drivers/power/supply/ucs1002_power.c
> > @@ -0,0 +1,646 @@
> ...
> > +
> > +static enum power_supply_usb_type ucs1002_usb_types[] = {
> > + POWER_SUPPLY_USB_TYPE_PD,
> > + POWER_SUPPLY_USB_TYPE_SDP,
> > + POWER_SUPPLY_USB_TYPE_DCP,
> > + POWER_SUPPLY_USB_TYPE_CDP,
> > + POWER_SUPPLY_USB_TYPE_UNKNOWN,
> > +};
> > +
> > +static int ucs1002_set_usb_type(struct ucs1002_info *info, int val)
> > +{
> > + unsigned int mode;
> > +
> > + if (val >= ARRAY_SIZE(ucs1002_usb_types))
> > + return -EINVAL;
> > +
> I hate to bring it up that late, but I don't see a check
> against val being negative anywhere in the calling code.
>

Sure, I'll add it in v4

Thanks,
Andrey Smirnov


Re: [PATCH 1/2] dt-bindings: Add CDTech S050WV43-CT5 panel bindings

2019-04-29 Thread Rob Herring
On Thu, 18 Apr 2019 00:38:44 +0100, Florent TOMASIN wrote:
> Add documentation for S050WV43-CT5 panel
> 
> Signed-off-by: Florent TOMASIN 
> ---
>  .../bindings/display/panel/cdtech,s050wv43-ct5.txt   | 12 
>  1 file changed, 12 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/display/panel/cdtech,s050wv43-ct5.txt
> 

Reviewed-by: Rob Herring 


Re: [PATCH v3 1/3] clk: analogbits: add Wide-Range PLL library

2019-04-29 Thread Paul Walmsley
On Mon, 29 Apr 2019, Stephen Boyd wrote:

> Quoting Paul Walmsley (2019-04-11 01:27:32)
> > diff --git a/drivers/clk/analogbits/Kconfig b/drivers/clk/analogbits/Kconfig
> > new file mode 100644
> > index ..b5fd60c7f136
> > --- /dev/null
> > +++ b/drivers/clk/analogbits/Kconfig
> > @@ -0,0 +1,2 @@
> 
> Add SPDX for this file?

Done.

> > +config CLK_ANALOGBITS_WRPLL_CLN28HPC
> > +   bool
> > diff --git a/drivers/clk/analogbits/Makefile 
> > b/drivers/clk/analogbits/Makefile
> > new file mode 100644
> > index ..bb51a3ae77a7
> > --- /dev/null
> > +++ b/drivers/clk/analogbits/Makefile
> > @@ -0,0 +1 @@
> 
> Add SPDX for this file?

Done.

> > +obj-$(CONFIG_CLK_ANALOGBITS_WRPLL_CLN28HPC)+= wrpll-cln28hpc.o
> > diff --git a/drivers/clk/analogbits/wrpll-cln28hpc.c 
> > b/drivers/clk/analogbits/wrpll-cln28hpc.c
> > new file mode 100644
> > index ..2027872719e1
> > --- /dev/null
> > +++ b/drivers/clk/analogbits/wrpll-cln28hpc.c
> > @@ -0,0 +1,360 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2018-2019 SiFive, Inc.
> > + * Wesley Terpstra
> > + * Paul Walmsley
> > + *
> > + * This library supports configuration parsing and reprogramming of
> > + * the CLN28HPC variant of the Analog Bits Wide Range PLL.  The
> > + * intention is for this library to be reusable for any device that
> > + * integrates this PLL; thus the register structure and programming
> > + * details are expected to be provided by a separate IP block driver.
> > + *
> > + * The bulk of this code is primarily useful for clock configurations
> > + * that must operate at arbitrary rates, as opposed to clock configurations
> > + * that are restricted by software or manufacturer guidance to a small,
> > + * pre-determined set of performance points.
> > + *
> > + * References:
> > + * - Analog Bits "Wide Range PLL Datasheet", version 2015.10.01
> > + * - SiFive FU540-C000 Manual v1p0, Chapter 7 "Clocking and Reset"
> > + *   https://static.dev.sifive.com/FU540-C000-v1.0.pdf
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +/* MIN_INPUT_FREQ: minimum input clock frequency, in Hz (Fref_min) */
> > +#define MIN_INPUT_FREQ 700
> > +
> > +/* MAX_INPUT_FREQ: maximum input clock frequency, in Hz (Fref_max) */
> > +#define MAX_INPUT_FREQ 6
> > +
> > +/* MIN_POST_DIVIDE_REF_FREQ: minimum post-divider reference frequency, in 
> > Hz */
> > +#define MIN_POST_DIVR_FREQ 700
> > +
> > +/* MAX_POST_DIVIDE_REF_FREQ: maximum post-divider reference frequency, in 
> > Hz */
> > +#define MAX_POST_DIVR_FREQ 2
> > +
> > +/* MIN_VCO_FREQ: minimum VCO frequency, in Hz (Fvco_min) */
> > +#define MIN_VCO_FREQ   24UL
> > +
> > +/* MAX_VCO_FREQ: maximum VCO frequency, in Hz (Fvco_max) */
> > +#define MAX_VCO_FREQ   48ULL
> > +
> > +/* MAX_DIVQ_DIVISOR: maximum output divisor.  Selected by DIVQ = 6 */
> > +#define MAX_DIVQ_DIVISOR   64
> > +
> > +/* MAX_DIVR_DIVISOR: maximum reference divisor.  Selected by DIVR = 63 */
> > +#define MAX_DIVR_DIVISOR   64
> > +
> > +/* MAX_LOCK_US: maximum PLL lock time, in microseconds (tLOCK_max) */
> > +#define MAX_LOCK_US70
> > +
> > +/*
> > + * ROUND_SHIFT: number of bits to shift to avoid precision loss in the 
> > rounding
> > + *  algorithm
> > + */
> > +#define ROUND_SHIFT20
> > +
> > +/*
> > + * Private functions
> > + */
> > +
> > +/**
> > + * __wrpll_calc_filter_range() - determine PLL loop filter bandwidth
> > + * @post_divr_freq: input clock rate after the R divider
> > + *
> > + * Select the value to be presented to the PLL RANGE input signals, based
> > + * on the input clock frequency after the post-R-divider @post_divr_freq.
> > + * This code follows the recommendations in the PLL datasheet for filter
> > + * range selection.
> > + *
> > + * Return: The RANGE value to be presented to the PLL configuration inputs,
> > + * or -1 upon error.
> > + */
> > +static int __wrpll_calc_filter_range(unsigned long post_divr_freq)
> > +{
> > +   u8 range;
> > +
> > +   if (post_divr_freq < MIN_POST_DIVR_FREQ ||
> > +   post_divr_freq > MAX_POST_DIVR_FREQ) {
> > +   WARN(1, "%s: post-divider reference freq out of range: %lu",
> > +__func__, post_divr_freq);
> > +   return -1;
> > +   }
> > +
> > +   if (post_divr_freq < 1100)
> > +   range = 1;
> > +   else if (post_divr_freq < 1800)
> > +   range = 2;
> > +   else if (post_divr_freq < 3000)
> > +   range = 3;
> > +   else if (post_divr_freq < 5000)
> > +   range = 4;
> > +   else if (post_divr_freq < 8000)
> > +   range = 5;
> > +   else if (post_divr_freq < 13000)
> > +   range = 6;
> > 

RE: [EXT] Re: [PATCHv5 1/6] PCI: mobiveil: Refactor Mobiveil PCIe Host Bridge IP driver

2019-04-29 Thread Z.q. Hou
Hi Subbu,

> -Original Message-
> From: Subrahmanya Lingappa [mailto:l.subrahma...@mobiveil.co.in]
> Sent: 2019年4月24日 13:36
> To: Z.q. Hou 
> Cc: linux-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> shawn...@kernel.org; Leo Li ;
> lorenzo.pieral...@arm.com; catalin.mari...@arm.com;
> will.dea...@arm.com; Mingkai Hu ; M.h. Lian
> ; Xiaowei Bao 
> Subject: [EXT] Re: [PATCHv5 1/6] PCI: mobiveil: Refactor Mobiveil PCIe Host
> Bridge IP driver
> 
> WARNING: This email was created outside of NXP. DO NOT CLICK links or
> attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> ZQ,
> 
> On Fri, Apr 12, 2019 at 3:22 PM Z.q. Hou  wrote:
> >
> > From: Hou Zhiqiang 
> >
> > Refactor the Mobiveil PCIe Host Bridge IP driver to make
> > it easier to add support for both RC and EP mode driver.
> > This patch moved the Mobiveil driver to an new directory
> > 'drivers/pci/controller/mobiveil' and refactor it according
> > to the RC and EP abstraction.
> >
> > Signed-off-by: Hou Zhiqiang 
> > Reviewed-by: Minghuan Lian 
> > Reviewed-by: Subrahmanya Lingappa 
> > ---
> > V5:
> >  - Regenerated this patch on the new base.
> >  - Retouched the changelog.
> >  - Updated the Copyright.
> >
> >  MAINTAINERS   |   2 +-
> >  drivers/pci/controller/Kconfig|  11 +-
> >  drivers/pci/controller/Makefile   |   2 +-
> >  drivers/pci/controller/mobiveil/Kconfig   |  24 +
> >  drivers/pci/controller/mobiveil/Makefile  |   4 +
> >  .../pcie-mobiveil-host.c} | 570 +++---
> >  .../controller/mobiveil/pcie-mobiveil-plat.c  |  56 ++
> >  .../pci/controller/mobiveil/pcie-mobiveil.c   | 248 
> >  .../pci/controller/mobiveil/pcie-mobiveil.h   | 211 +++
> >  9 files changed, 636 insertions(+), 492 deletions(-)
> >  create mode 100644 drivers/pci/controller/mobiveil/Kconfig
> >  create mode 100644 drivers/pci/controller/mobiveil/Makefile
> >  rename drivers/pci/controller/{pcie-mobiveil.c =>
> mobiveil/pcie-mobiveil-host.c} (53%)
> >  create mode 100644 drivers/pci/controller/mobiveil/pcie-mobiveil-plat.c
> >  create mode 100644 drivers/pci/controller/mobiveil/pcie-mobiveil.c
> >  create mode 100644 drivers/pci/controller/mobiveil/pcie-mobiveil.h
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 1e64279f338a..1013e74b14f2 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -11877,7 +11877,7 @@ M:  Subrahmanya Lingappa
> 
> >  L: linux-...@vger.kernel.org
> >  S: Supported
> >  F: Documentation/devicetree/bindings/pci/mobiveil-pcie.txt
> > -F: drivers/pci/controller/pcie-mobiveil.c
> > +F: drivers/pci/controller/mobiveil/pcie-mobiveil*
> >
> 
> Please add yourself as co-maintainer of the mobiveil driver.

Thanks for your invite, will add in v6.

Regards,
Zhiqiang


RE: [PATCH] rtc: snvs: Use __maybe_unused instead of #if CONFIG_PM_SLEEP

2019-04-29 Thread Anson Huang
Hi, Trent

> -Original Message-
> From: Trent Piepho [mailto:tpie...@impinj.com]
> Sent: Tuesday, April 30, 2019 1:13 AM
> To: linux-...@vger.kernel.org; Anson Huang ;
> a.zu...@towertech.it; linux-kernel@vger.kernel.org;
> alexandre.bell...@bootlin.com
> Cc: dl-linux-imx 
> Subject: Re: [PATCH] rtc: snvs: Use __maybe_unused instead of #if
> CONFIG_PM_SLEEP
> 
> On Mon, 2019-04-29 at 07:02 +, Anson Huang wrote:
> > Use __maybe_unused for power management related functions instead of
> > #if CONFIG_PM_SLEEP to simply the code.
> >
> > Signed-off-by: Anson Huang 
> 
> This will result in the functions always being included, even if PM_SLEEP is
> off...
> 
> >
> > @@ -387,14 +385,6 @@ static const struct dev_pm_ops snvs_rtc_pm_ops
> = {
> > .resume_noirq = snvs_rtc_resume_noirq,  };
> 
> ...because they will always be used by the definition of snvs_rtc_pm_ops
> here.

You are right, I missed this part, have sent out V2 patch with 
SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() used to define the ops, please help review.

Thanks,
Anson.

> 
> In order for this to work, SIMPLE_DEV_PM_OPS() needs to be used, so that
> the dev_pm_ops struct is empty when PM is off and the functions don't get
> referenced. See:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.o
> rg%2Flkml%2F2019%2F1%2F17%2F376data=02%7C01%7Canson.huan
> g%40nxp.com%7C5b5aea8d276d4a3e195008d6ccc5e1b5%7C686ea1d3bc2b4
> c6fa92cd99c5c301635%7C0%7C1%7C636921547599787617sdata=K8jv
> KXTCIPw4IDgx8aA2Nn%2Fs64FiSpmf7GVuzuXulbI%3Dreserved=0


Re: [PATCH] Revert "PCI/LINK: Report degraded links via link bandwidth notification"

2019-04-29 Thread Alex G

On 4/29/19 1:56 PM, Bjorn Helgaas wrote:

From: Bjorn Helgaas 

This reverts commit e8303bb7a75c113388badcc49b2a84b4121c1b3e.

e8303bb7a75c added logging whenever a link changed speed or width to a
state that is considered degraded.  Unfortunately, it cannot differentiate
signal integrity-related link changes from those intentionally initiated by
an endpoint driver, including drivers that may live in userspace or VMs
when making use of vfio-pci.  Some GPU drivers actively manage the link
state to save power, which generates a stream of messages like this:

   vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 
GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link)

We really *do* want to be alerted when the link bandwidth is reduced
because of hardware failures, but degradation by intentional link state
management is probably far more common, so the signal-to-noise ratio is
currently low.

Until we figure out a way to identify the real problems or silence the
intentional situations, revert the following commits, which include the
initial implementation (e8303bb7a75c) and subsequent fixes:


I think we're overreacting to a bit of perceived verbosity in the system 
log. Intentional degradation does not seem to me to be as common as 
advertised. I have not observed this with either radeon, nouveau, or 
amdgpu, and the proper mechanism to save power at the link level is 
ASPM. I stand to be corrected and we have on CC some very knowledgeable 
fellows that I am certain will jump at the opportunity to do so.


What it seems like to me is that a proprietary driver running in a VM is 
initiating these changes. And if that is the case then it seems this is 
a virtualization problem. A quick glance over GPU drivers in linux did 
not reveal any obvious places where we intentionally downgrade a link.


I'm not convinced a revert is the best call.

Alex


 e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth 
notification")
 3e82a7f9031f ("PCI/LINK: Supply IRQ handler so level-triggered IRQs are 
acked")
 55397ce8df48 ("PCI/LINK: Clear bandwidth notification interrupt before enabling 
it")
 0fa635aec9ab ("PCI/LINK: Deduplicate bandwidth reports for multi-function 
devices")

Link: 
https://lore.kernel.org/lkml/155597243666.19387.1205950870601742062.st...@gimli.home
Link: 
https://lore.kernel.org/lkml/155605909349.3575.13433421148215616375.st...@gimli.home
Signed-off-by: Bjorn Helgaas 
CC: Alexandru Gagniuc 
CC: Lukas Wunner 
CC: Alex Williamson 
---
  drivers/pci/pci.h  |   1 -
  drivers/pci/pcie/Makefile  |   1 -
  drivers/pci/pcie/bw_notification.c | 121 -
  drivers/pci/pcie/portdrv.h |   6 +-
  drivers/pci/pcie/portdrv_core.c|  17 ++--
  drivers/pci/pcie/portdrv_pci.c |   1 -
  drivers/pci/probe.c|   2 +-
  7 files changed, 7 insertions(+), 142 deletions(-)
  delete mode 100644 drivers/pci/pcie/bw_notification.c

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d994839a3e24..224d88634115 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -273,7 +273,6 @@ enum pcie_link_width pcie_get_width_cap(struct pci_dev 
*dev);
  u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
   enum pcie_link_width *width);
  void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
-void pcie_report_downtraining(struct pci_dev *dev);
  
  /* Single Root I/O Virtualization */

  struct pci_sriov {
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index f1d7bc1e5efa..ab514083d5d4 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -3,7 +3,6 @@
  # Makefile for PCI Express features and port driver
  
  pcieportdrv-y			:= portdrv_core.o portdrv_pci.o err.o

-pcieportdrv-y  += bw_notification.o
  
  obj-$(CONFIG_PCIEPORTBUS)	+= pcieportdrv.o
  
diff --git a/drivers/pci/pcie/bw_notification.c b/drivers/pci/pcie/bw_notification.c

deleted file mode 100644
index 4fa9e3523ee1..
--- a/drivers/pci/pcie/bw_notification.c
+++ /dev/null
@@ -1,121 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0+
-/*
- * PCI Express Link Bandwidth Notification services driver
- * Author: Alexandru Gagniuc 
- *
- * Copyright (C) 2019, Dell Inc
- *
- * The PCIe Link Bandwidth Notification provides a way to notify the
- * operating system when the link width or data rate changes.  This
- * capability is required for all root ports and downstream ports
- * supporting links wider than x1 and/or multiple link speeds.
- *
- * This service port driver hooks into the bandwidth notification interrupt
- * and warns when links become degraded in operation.
- */
-
-#include "../pci.h"
-#include "portdrv.h"
-
-static bool pcie_link_bandwidth_notification_supported(struct pci_dev *dev)
-{
-   int ret;
-   u32 lnk_cap;
-
-   ret = pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, _cap);

Re: [PATCH v1 1/2] perf cs-etm: Always allocate memory for cs_etm_queue::prev_packet

2019-04-29 Thread Arnaldo Carvalho de Melo
Em Sun, Apr 28, 2019 at 04:32:27PM +0800, Leo Yan escreveu:
> Robert Walker reported a segmentation fault is observed when process
> CoreSight trace data; this issue can be easily reproduced by the
> command 'perf report --itrace=i1000i' for decoding tracing data.
> 
> If neither the 'b' flag (synthesize branches events) nor 'l' flag
> (synthesize last branch entries) are specified to option '--itrace',
> cs_etm_queue::prev_packet will not been initialised.  After merging
> the code to support exception packets and sample flags, there
> introduced a number of uses of cs_etm_queue::prev_packet without
> checking whether it is valid, for these cases any accessing to
> uninitialised prev_packet will cause crash.
> 
> As cs_etm_queue::prev_packet is used more widely now and it's already
> hard to follow which functions have been called in a context where the
> validity of cs_etm_queue::prev_packet has been checked, this patch
> always allocates memory for cs_etm_queue::prev_packet.
> 
> Reported-by: Robert Walker 
> Suggested-by: Robert Walker 
> Fixes: 7100b12cf474 ("perf cs-etm: Generate branch sample for exception 
> packet")

Thanks, applied both to perf/urgent, testing them now in the containers.

- Arnaldo

> Fixes: 24fff5eb2b93 ("perf cs-etm: Avoid stale branch samples when flush 
> packet")
> Signed-off-by: Leo Yan 
> ---
>  tools/perf/util/cs-etm.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 110804936fc3..054b480aab04 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -422,11 +422,9 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct 
> cs_etm_auxtrace *etm)
>   if (!etmq->packet)
>   goto out_free;
>  
> - if (etm->synth_opts.last_branch || etm->sample_branches) {
> - etmq->prev_packet = zalloc(szp);
> - if (!etmq->prev_packet)
> - goto out_free;
> - }
> + etmq->prev_packet = zalloc(szp);
> + if (!etmq->prev_packet)
> + goto out_free;
>  
>   if (etm->synth_opts.last_branch) {
>   size_t sz = sizeof(struct branch_stack);
> -- 
> 2.17.1

-- 

- Arnaldo


[PATCH V2] rtc: snvs: Use __maybe_unused instead of #if CONFIG_PM_SLEEP

2019-04-29 Thread Anson Huang
Use __maybe_unused for power management related functions
instead of #if CONFIG_PM_SLEEP to simply the code.

Signed-off-by: Anson Huang 
Reviewed-by: Dong Aisheng 
---
Changes since V1:
- use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() to make sure snvs_rtc_pm_ops is 
empty when PM is off.
---
 drivers/rtc/rtc-snvs.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/rtc/rtc-snvs.c b/drivers/rtc/rtc-snvs.c
index e0edd594..7ee673a2 100644
--- a/drivers/rtc/rtc-snvs.c
+++ b/drivers/rtc/rtc-snvs.c
@@ -360,9 +360,7 @@ static int snvs_rtc_probe(struct platform_device *pdev)
return ret;
 }
 
-#ifdef CONFIG_PM_SLEEP
-
-static int snvs_rtc_suspend_noirq(struct device *dev)
+static int __maybe_unused snvs_rtc_suspend_noirq(struct device *dev)
 {
struct snvs_rtc_data *data = dev_get_drvdata(dev);
 
@@ -372,7 +370,7 @@ static int snvs_rtc_suspend_noirq(struct device *dev)
return 0;
 }
 
-static int snvs_rtc_resume_noirq(struct device *dev)
+static int __maybe_unused snvs_rtc_resume_noirq(struct device *dev)
 {
struct snvs_rtc_data *data = dev_get_drvdata(dev);
 
@@ -383,18 +381,9 @@ static int snvs_rtc_resume_noirq(struct device *dev)
 }
 
 static const struct dev_pm_ops snvs_rtc_pm_ops = {
-   .suspend_noirq = snvs_rtc_suspend_noirq,
-   .resume_noirq = snvs_rtc_resume_noirq,
+   SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(snvs_rtc_suspend_noirq, 
snvs_rtc_resume_noirq)
 };
 
-#define SNVS_RTC_PM_OPS(_rtc_pm_ops)
-
-#else
-
-#define SNVS_RTC_PM_OPSNULL
-
-#endif
-
 static const struct of_device_id snvs_dt_ids[] = {
{ .compatible = "fsl,sec-v4.0-mon-rtc-lp", },
{ /* sentinel */ }
@@ -404,7 +393,7 @@ MODULE_DEVICE_TABLE(of, snvs_dt_ids);
 static struct platform_driver snvs_rtc_driver = {
.driver = {
.name   = "snvs_rtc",
-   .pm = SNVS_RTC_PM_OPS,
+   .pm = _rtc_pm_ops,
.of_match_table = snvs_dt_ids,
},
.probe  = snvs_rtc_probe,
-- 
2.7.4



[PATCH] kbuild: Enable -Wsometimes-uninitialized

2019-04-29 Thread Nathan Chancellor
This is Clang's version of GCC's -Wmaybe-uninitialized. Up to this
point, it has not been used because -Wuninitialized has been disabled,
which also turns off -Wsometimes-uninitialized, meaning that we miss out
on finding some bugs [1]. In my experience, it appears to be more
accurate than GCC and catch some things that GCC can't.

All of these warnings have now been fixed in -next across arm, arm64,
and x86_64 defconfig/allyesconfig so this should be enabled for everyone
to prevent more from easily creeping in.

As of next-20190429:

$ git log --oneline --grep="sometimes-uninitialized" | wc -l
45

[1]: https://lore.kernel.org/lkml/86649ee4-9794-77a3-502c-f4cd10019...@lca.pw/

Link: https://github.com/ClangBuiltLinux/linux/issues/381
Signed-off-by: Nathan Chancellor 
---

Masahiro, I am not sure how you want to handle merging this with regards
to all of the patches floating around in -next but I wanted to send this
out to let everyone know this is ready to be turned on.

Arnd, are there many remaning -Wsometimes-uninitialized warnings in
randconfigs?

 scripts/Makefile.extrawarn | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index 768306add591..f4332981ea85 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -72,5 +72,6 @@ KBUILD_CFLAGS += $(call cc-disable-warning, format)
 KBUILD_CFLAGS += $(call cc-disable-warning, sign-compare)
 KBUILD_CFLAGS += $(call cc-disable-warning, format-zero-length)
 KBUILD_CFLAGS += $(call cc-disable-warning, uninitialized)
+KBUILD_CFLAGS += $(call cc-option, -Wsometimes-uninitialized)
 endif
 endif
-- 
2.21.0



[PATCH V2] clk: imx: pllv4: add fractional-N pll support

2019-04-29 Thread Anson Huang
The pllv4 supports fractional-N function, the formula is:

PLL output freq = input * (mult + num/denom),

This patch adds fractional-N function support, including
clock round rate, calculate rate and set rate, with this
patch, the clock rate of APLL in clock tree is more accurate
than before:

Without fraction:
apll_pre_sel  1112400  0
 0  5
   apll_pre_div   1122400  0
 0  5
  apll112   52800  0
 0  5
 apll_pfd3000   79200  0
 0  5
 apll_pfd2000   339428571  0
 0  5
 apll_pfd1000   35200  0
 0  5
usdhc0000   35200  0
 0  5
 apll_pfd0111   35200  0
 0  5

With fraction:
apll_pre_sel  1112400  0
 0  5
   apll_pre_div   1122400  0
 0  5
  apll112   52920  0
 0  5
 apll_pfd3000   79380  0
 0  5
 apll_pfd2000   34020  0
 0  5
 apll_pfd1000   35280  0
 0  5
usdhc0000   35280  0
 0  5
 apll_pfd0111   35280  0
 0  5

Signed-off-by: Anson Huang 
Reviewed-by: Dong Aisheng 
---
 drivers/clk/imx/clk-pllv4.c | 72 +++--
 1 file changed, 63 insertions(+), 9 deletions(-)

diff --git a/drivers/clk/imx/clk-pllv4.c b/drivers/clk/imx/clk-pllv4.c
index d38bc9f..d7e62c3 100644
--- a/drivers/clk/imx/clk-pllv4.c
+++ b/drivers/clk/imx/clk-pllv4.c
@@ -30,6 +30,9 @@
 /* PLL Denominator Register (xPLLDENOM) */
 #define PLL_DENOM_OFFSET   0x14
 
+#define MAX_MFD0x3fff
+#define DEFAULT_MFD100
+
 struct clk_pllv4 {
struct clk_hw   hw;
void __iomem*base;
@@ -64,13 +67,20 @@ static unsigned long clk_pllv4_recalc_rate(struct clk_hw 
*hw,
   unsigned long parent_rate)
 {
struct clk_pllv4 *pll = to_clk_pllv4(hw);
-   u32 div;
+   u32 mult, mfn, mfd;
+   u64 temp64;
+
+   mult = readl_relaxed(pll->base + PLL_CFG_OFFSET);
+   mult &= BM_PLL_MULT;
+   mult >>= BP_PLL_MULT;
 
-   div = readl_relaxed(pll->base + PLL_CFG_OFFSET);
-   div &= BM_PLL_MULT;
-   div >>= BP_PLL_MULT;
+   mfn = readl_relaxed(pll->base + PLL_NUM_OFFSET);
+   mfd = readl_relaxed(pll->base + PLL_DENOM_OFFSET);
+   temp64 = parent_rate;
+   temp64 *= mfn;
+   do_div(temp64, mfd);
 
-   return parent_rate * div;
+   return (parent_rate * mult) + (u32)temp64;
 }
 
 static long clk_pllv4_round_rate(struct clk_hw *hw, unsigned long rate,
@@ -78,14 +88,46 @@ static long clk_pllv4_round_rate(struct clk_hw *hw, 
unsigned long rate,
 {
unsigned long parent_rate = *prate;
unsigned long round_rate, i;
+   u32 mfn, mfd = DEFAULT_MFD;
+   bool found = false;
+   u64 temp64;
 
for (i = 0; i < ARRAY_SIZE(pllv4_mult_table); i++) {
round_rate = parent_rate * pllv4_mult_table[i];
-   if (rate >= round_rate)
-   return round_rate;
+   if (rate >= round_rate) {
+   found = true;
+   break;
+   }
+   }
+
+   if (!found) {
+   pr_warn("%s: unable to round rate %lu, parent rate %lu\n",
+   clk_hw_get_name(hw), rate, parent_rate);
+   return 0;
}
 
-   return round_rate;
+   if (parent_rate <= MAX_MFD)
+   mfd = parent_rate;
+
+   temp64 = (u64)(rate - round_rate);
+   temp64 *= mfd;
+   do_div(temp64, parent_rate);
+   mfn = temp64;
+
+   /*
+* NOTE: The value of numerator must always be configured to be
+* less than the value of the denominator. If we can't get a proper
+* pair of mfn/mfd, we simply return the round_rate without using
+* the frac part.
+*/
+   if (mfn >= mfd)
+   return round_rate;
+
+   temp64 = (u64)parent_rate;
+   temp64 *= mfn;
+   do_div(temp64, mfd);
+
+   return round_rate + (u32)temp64;
 }
 
 static bool clk_pllv4_is_valid_mult(unsigned int mult)
@@ -105,18 +147,30 @@ static int clk_pllv4_set_rate(struct clk_hw *hw, unsigned 
long rate,
  unsigned long parent_rate)
 

Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation

2019-04-29 Thread Sean Christopherson
On Mon, Apr 29, 2019 at 05:08:46PM -0700, Sean Christopherson wrote:
> On Mon, Apr 29, 2019 at 03:22:09PM -0700, Linus Torvalds wrote:
> > On Mon, Apr 29, 2019 at 3:08 PM Sean Christopherson
> >  wrote:
> > >
> > > FWIW, Lakemont (Quark) doesn't block NMI/SMI in the STI shadow, but I'm
> > > not sure that counters the "horrible errata" statement ;-).  SMI+RSM saves
> > > and restores STI blocking in that case, but AFAICT NMI has no such
> > > protection and will effectively break the shadow on its IRET.
> > 
> > Ugh. I can't say I care deeply about Quark (ie never seemed to go
> > anywhere), but it's odd. I thought it was based on a Pentium core (or
> > i486+?). Are you saying those didn't do it either?
> 
> It's 486 based, but either way I suspect the answer is "yes".  IIRC,
> Knights Corner, a.k.a. Larrabee, also had funkiness around SMM and that
> was based on P54C, though I'm struggling to recall exactly what the
> Larrabee weirdness was.

Aha!  Found an ancient comment that explicitly states P5 does not block
NMI/SMI in the STI shadow, while P6 does block NMI/SMI.


Re: [PATCH 1/2] dt-bindings: Add ir38064 as a trivial device

2019-04-29 Thread Rob Herring
On Tue, Apr 16, 2019 at 08:41:38AM -0700, Patrick Venture wrote:
> The ir38064 is a voltage regulator from Infineon.
> 
> Signed-off-by: Patrick Venture 
> ---
>  Documentation/devicetree/bindings/trivial-devices.yaml | 2 ++
>  1 file changed, 2 insertions(+)

Patch 1 and 2 applied.

Rob


Re: [PATCH v2 2/8] dt-bindings: remoteproc: add bindings for stm32 remote processor driver

2019-04-29 Thread Rob Herring
On Tue, Apr 16, 2019 at 04:58:13PM +0200, Fabien Dessenne wrote:
> Add the device tree bindings document for the stm32 remoteproc devices.
> 
> Signed-off-by: Fabien Dessenne 
> ---
>  .../devicetree/bindings/remoteproc/stm32-rproc.txt | 64 
> ++
>  1 file changed, 64 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/remoteproc/stm32-rproc.txt
> 
> diff --git a/Documentation/devicetree/bindings/remoteproc/stm32-rproc.txt 
> b/Documentation/devicetree/bindings/remoteproc/stm32-rproc.txt
> new file mode 100644
> index 000..430132c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/remoteproc/stm32-rproc.txt
> @@ -0,0 +1,64 @@
> +STMicroelectronics STM32 Remoteproc
> +---
> +This document defines the binding for the remoteproc component that loads and
> +boots firmwares on the ST32MP family chipset.
> +
> +Required properties:
> +- compatible:Must be "st,stm32mp1-m4"
> +- reg:   Address ranges of the remote processor dedicated 
> memories.
> + The parent node should provide an appropriate ranges property
> + for properly translating these into bus addresses.

dma-ranges, but that's independent of 'reg'.

It needs to list how many reg regions and what they are.

> +- resets:Reference to a reset controller asserting the remote processor.
> +- st,syscfg-holdboot: Reference to the system configuration which holds the
> + remote processor reset hold boot
> + 1st cell: phandle of syscon block
> + 2nd cell: register offset containing the hold boot setting
> + 3rd cell: register bitmask for the hold boot field
> +- st,syscfg-tz: Reference to the system configuration which holds the RCC 
> trust
> + zone mode
> + 1st cell: phandle to syscon block
> + 2nd cell: register offset containing the RCC trust zone mode setting
> + 3rd cell: register bitmask for the RCC trust zone mode bit
> +
> +Optional properties:
> +- interrupts:Should contain the watchdog interrupt
> +- mboxes:This property is required only if the rpmsg/virtio functionality
> + is used. List of phandle and mailbox channel specifiers:
> + - a channel (a) used to communicate through virtqueues with the
> +   remote proc.
> +   Bi-directional channel:
> +   - from local to remote = send message
> +   - from remote to local = send message ack
> + - a channel (b) working the opposite direction of channel (a)
> + - a channel (c) used by the local proc to notify the remote proc
> +   that it is about to be shut down.
> +   Unidirectional channel:
> +   - from local to remote, where ACK from the remote means
> + that it is ready for shutdown
> +- mbox-names:This property is required if the mboxes property is 
> used.
> + - must be "vq0" for channel (a)
> + - must be "vq1" for channel (b)
> + - must be "shutdown" for channel (c)
> +- memory-region: List of phandles to the reserved memory regions associated 
> with
> + the remoteproc device. This is variable and describes the
> + memories shared with the remote processor (eg: remoteproc
> + firmware and carveouts, rpmsg vrings, ...).
> + (see ../reserved-memory/reserved-memory.txt)
> +- st,syscfg-pdds: Reference to the system configuration which holds the 
> remote
> + processor deep sleep setting
> + 1st cell: phandle to syscon block
> + 2nd cell: register offset containing the deep sleep setting
> + 3rd cell: register bitmask for the deep sleep bit
> +- auto_boot: If defined, when remoteproc is probed, it loads the default
> + firmware and starts the remote processor.

st,auto-boot

> +
> +Example:
> + m4_rproc: m4@0 {
> + compatible = "st,stm32mp1-m4";
> + reg = <0x 0x1>,
> +   <0x1000 0x4>,
> +   <0x3000 0x4>;
> + resets = < MCU_R>;
> + st,syscfg-holdboot = < 0x10C 0x1>;
> + st,syscfg-tz = < 0x000 0x1>;
> + };
> -- 
> 2.7.4
> 


  1   2   3   4   5   6   7   8   >