Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-23 Thread Thomas Gleixner
On Wed, Sep 23 2020 at 17:12, Steven Rostedt wrote:
> On Wed, 23 Sep 2020 22:55:54 +0200
> Then scratch the idea of having anonymous local_lock() and just bring
> local_lock in directly? Then have a kmap local lock, which would only
> block those that need to do a kmap.

That's still going to end up in lock ordering nightmares and you lose
the ability to use kmap_local from arbitrary contexts which was again
one of the goals of this exercise.

Aside of that you're imposing reentrancy protections on something which
does not need it in the first place.

> Now as for migration disabled nesting, at least now we would have
> groupings of this, and perhaps the theorists can handle that. I mean,
> how is this much different that having a bunch of tasks blocked on a
> mutex with the owner is pinned on a CPU?
>
> migrate_disable() is a BKL of pinning affinity.

No. That's just wrong. preempt disable is a concurrency control,
i.e. protecting against reentrancy on a given CPU. But it's a cpu global
protection which means that it's not protecting a specific code path.

Contrary to preempt disable, migrate disable is not protecting against
reentrancy on a given CPU. It's a temporary restriction to the scheduler
on placement.

The fact that disabling preemption implicitely disables migration does
not make them semantically equivalent.

> If we only have local_lock() available (even on !RT), then it makes
> the blocking in groups. At least this way you could grep for all the
> different local_locks in the system and plug that into the algorithm
> for WCS, just like one would with a bunch of mutexes.

You cannot do that on RT at all where migrate disable is substituting
preempt disable in spin and rw locks. The result would be the same as
with a !RT kernel just with horribly bad performance.

That means the stacking problem has to be solved anyway.

So why on earth do you want to create yet another special duct tape case
for kamp_local() which proliferates inconsistency instead of aiming for
consistency accross all preemption models?

Thanks,

tglx


Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference

2020-09-23 Thread Huang, Ying
Rafael Aquini  writes:

>> 
>> If there's a race, we should fix the race.  But the code path for
>> swapcache insertion is,
>> 
>> add_to_swap()
>>   get_swap_page() /* Return if fails to allocate */
>>   add_to_swap_cache()
>> SetPageSwapCache()
>> 
>> While the code path to split THP is,
>> 
>> split_huge_page_to_list()
>>   if PageSwapCache()
>> split_swap_cluster()
>> 
>> Both code paths are protected by the page lock.  So there should be some
>> other reasons to trigger the bug.
>
> As mentioned above, no they seem to not be protected (at least, not the
> same page, depending on the case). While add_to_swap() will assure a 
> page_lock on the compound head, split_huge_page_to_list() does not.
>

int split_huge_page_to_list(struct page *page, struct list_head *list)
{
struct page *head = compound_head(page);
struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
struct deferred_split *ds_queue = get_deferred_split_queue(head);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
int count, mapcount, extra_pins, ret;
unsigned long flags;
pgoff_t end;

VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
VM_BUG_ON_PAGE(!PageLocked(head), head);

I found there's page lock checking in split_huge_page_to_list().

Best Regards,
Huang, Ying


Re: [PATCH v13 2/2] Add PWM fan controller driver for LGM SoC

2020-09-23 Thread Uwe Kleine-König
Hello,

(hhm Thierry already announced to have taken this patch, so my review is
late.)

On Tue, Sep 15, 2020 at 04:23:37PM +0800, Rahul Tanwar wrote:
> Intel Lightning Mountain(LGM) SoC contains a PWM fan controller.
> This PWM controller does not have any other consumer, it is a
> dedicated PWM controller for fan attached to the system. Add
> driver for this PWM fan controller.
> 
> Signed-off-by: Rahul Tanwar 
> Reviewed-by: Andy Shevchenko 
> ---
>  drivers/pwm/Kconfig |  11 ++
>  drivers/pwm/Makefile|   1 +
>  drivers/pwm/pwm-intel-lgm.c | 246 
> 
>  3 files changed, 258 insertions(+)
>  create mode 100644 drivers/pwm/pwm-intel-lgm.c
> 
> diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
> index 7dbcf6973d33..4949c51fe90b 100644
> --- a/drivers/pwm/Kconfig
> +++ b/drivers/pwm/Kconfig
> @@ -232,6 +232,17 @@ config PWM_IMX_TPM
> To compile this driver as a module, choose M here: the module
> will be called pwm-imx-tpm.
>  
> +config PWM_INTEL_LGM
> + tristate "Intel LGM PWM support"
> + depends on HAS_IOMEM
> + depends on (OF && X86) || COMPILE_TEST
> + select REGMAP_MMIO
> + help
> +   Generic PWM fan controller driver for LGM SoC.
> +
> +   To compile this driver as a module, choose M here: the module
> +   will be called pwm-intel-lgm.
> +
>  config PWM_IQS620A
>   tristate "Azoteq IQS620A PWM support"
>   depends on MFD_IQS62X || COMPILE_TEST
> diff --git a/drivers/pwm/Makefile b/drivers/pwm/Makefile
> index 2c2ba0a03557..e9431b151694 100644
> --- a/drivers/pwm/Makefile
> +++ b/drivers/pwm/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_PWM_IMG)   += pwm-img.o
>  obj-$(CONFIG_PWM_IMX1)   += pwm-imx1.o
>  obj-$(CONFIG_PWM_IMX27)  += pwm-imx27.o
>  obj-$(CONFIG_PWM_IMX_TPM)+= pwm-imx-tpm.o
> +obj-$(CONFIG_PWM_INTEL_LGM)  += pwm-intel-lgm.o
>  obj-$(CONFIG_PWM_IQS620A)+= pwm-iqs620a.o
>  obj-$(CONFIG_PWM_JZ4740) += pwm-jz4740.o
>  obj-$(CONFIG_PWM_LP3943) += pwm-lp3943.o
> diff --git a/drivers/pwm/pwm-intel-lgm.c b/drivers/pwm/pwm-intel-lgm.c
> new file mode 100644
> index ..ea3df75a5971
> --- /dev/null
> +++ b/drivers/pwm/pwm-intel-lgm.c
> @@ -0,0 +1,246 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2020 Intel Corporation.
> + *
> + * Limitations:
> + * - The hardware supports fixed period which is dependent on 2/3 or 4
> + *   wire fan mode.

The driver now hardcodes 2-wire mode. IMHO that is worth mentioning.

> +static void lgm_clk_disable(void *data)
> +{
> + struct lgm_pwm_chip *pc = data;
> +
> + clk_disable_unprepare(pc->clk);
> +}
> +
> +static int lgm_clk_enable(struct device *dev, struct lgm_pwm_chip *pc)
> +{
> + int ret;
> +
> + ret = clk_prepare_enable(pc->clk);
> + if (ret)
> + return ret;
> +
> + return devm_add_action_or_reset(dev, lgm_clk_disable, pc);
> +}

My first reflex here was to point out that lgm_clk_disable() isn't the
counter part to lgm_clk_enable() and so lgm_clk_disable() needs
adaption. On a second look this is correct and so I think the function
names are wrong. The usual naming would be to use _release instead of
_disable. Having said that the enable function could be named
devm_clk_enable and live in drivers/clk/clk-devres.c. (Or
devm_clk_get_enabled()?)

> +static void lgm_reset_control_assert(void *data)
> +{
> + struct lgm_pwm_chip *pc = data;
> +
> + reset_control_assert(pc->rst);
> +}
> +
> +static int lgm_reset_control_deassert(struct device *dev, struct 
> lgm_pwm_chip *pc)
> +{
> + int ret;
> +
> + ret = reset_control_deassert(pc->rst);
> + if (ret)
> + return ret;
> +
> + return devm_add_action_or_reset(dev, lgm_reset_control_assert, pc);
> +}

A similar comment applies here.

> +static int lgm_pwm_probe(struct platform_device *pdev)
> +{
> + struct device *dev = &pdev->dev;
> + struct lgm_pwm_chip *pc;
> + void __iomem *io_base;
> + int ret;
> +
> + pc = devm_kzalloc(dev, sizeof(*pc), GFP_KERNEL);
> + if (!pc)
> + return -ENOMEM;
> +
> + platform_set_drvdata(pdev, pc);
> +
> + io_base = devm_platform_ioremap_resource(pdev, 0);
> + if (IS_ERR(io_base))
> + return PTR_ERR(io_base);
> +
> + pc->regmap = devm_regmap_init_mmio(dev, io_base, 
> &lgm_pwm_regmap_config);
> + if (IS_ERR(pc->regmap))
> + return dev_err_probe(dev, PTR_ERR(pc->regmap),
> +  "failed to init register map\n");
> +
> + pc->clk = devm_clk_get(dev, NULL);
> + if (IS_ERR(pc->clk))
> + return dev_err_probe(dev, PTR_ERR(pc->clk), "failed to get 
> clock\n");
> +
> + ret = lgm_clk_enable(dev, pc);
> + if (ret) {
> + dev_err(dev, "failed to enable clock\n");

You used dev_err_probe four times for six error paths. I wonder why you
didn't use it here (and below for a failing pwmchip

[PATCH v4 1/2] Add UFFD_USER_MODE_ONLY

2020-09-23 Thread Lokesh Gidra
userfaultfd handles page faults from both user and kernel code.
Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes
the resulting userfaultfd object refuse to handle faults from kernel
mode, treating these faults as if SIGBUS were always raised, causing
the kernel code to fail with EFAULT.

A future patch adds a knob allowing administrators to give some
processes the ability to create userfaultfd file objects only if they
pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
will exploit userfaultfd's ability to delay kernel page faults to open
timing windows for future exploits.

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 fs/userfaultfd.c | 6 +-
 include/uapi/linux/userfaultfd.h | 9 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..3191434057f3 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -405,6 +405,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 
if (ctx->features & UFFD_FEATURE_SIGBUS)
goto out;
+   if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
+   ctx->flags & UFFD_USER_MODE_ONLY)
+   goto out;
 
/*
 * If it's already released don't get it. This avoids to loop
@@ -1975,10 +1978,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
BUG_ON(!current->mm);
 
/* Check the UFFD_* constants for consistency.  */
+   BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS);
BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
 
-   if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
+   if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY))
return -EINVAL;
 
ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index e7e98bde221f..5f2d88212f7c 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -257,4 +257,13 @@ struct uffdio_writeprotect {
__u64 mode;
 };
 
+/*
+ * Flags for the userfaultfd(2) system call itself.
+ */
+
+/*
+ * Create a userfaultfd that can handle page faults only in user mode.
+ */
+#define UFFD_USER_MODE_ONLY 1
+
 #endif /* _LINUX_USERFAULTFD_H */
-- 
2.28.0.681.g6f77f65b4e-goog



[PATCH v4 0/2] Control over userfaultfd kernel-fault handling

2020-09-23 Thread Lokesh Gidra
This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and
movement can be controlled.

It has been demonstrated on various occasions that suspending kernel
code execution for an arbitrary amount of time at any access to
userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
to change the intended behavior of the kernel. For instance, handling
page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
Likewise, FUSE, which is similar to userfaultfd in this respect, has been
exploited in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to
the unprivileged_userfaultfd sysctl knob to require unprivileged
callers to use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to
enhance security vulnerabilities by lengthening the race window in
kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] 
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

Changes since v3:

  - Modified the meaning of value '0' of unprivileged_userfaultfd
sysctl knob. Setting this knob to '0' now allows unprivileged users
to use userfaultfd, but can handle page faults in user-mode only.
  - The default value of unprivileged_userfaultfd sysctl knob is changed
to '0'.

Changes since v2:

  - Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
userfaultfd().

Changes since v1:

  - Added external references to the threats from allowing unprivileged
users to handle page faults from kernel-mode.
  - Removed the new sysctl knob restricting handling of page
faults from kernel-mode, and added an option for the same
in the existing 'unprivileged_userfaultfd' knob.

Lokesh Gidra (2):
  Add UFFD_USER_MODE_ONLY
  Add user-mode only option to unprivileged_userfaultfd sysctl knob

 Documentation/admin-guide/sysctl/vm.rst | 15 ++-
 fs/userfaultfd.c| 12 +---
 include/uapi/linux/userfaultfd.h|  9 +
 3 files changed, 28 insertions(+), 8 deletions(-)

-- 
2.28.0.681.g6f77f65b4e-goog



[PATCH v4 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

2020-09-23 Thread Lokesh Gidra
With this change, when the knob is set to 0, it allows unprivileged
users to call userfaultfd, like when it is set to 1, but with the
restriction that page faults from only user-mode can be handled.
In this mode, an unprivileged user (without SYS_CAP_PTRACE capability)
must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with
EPERM.

This enables administrators to reduce the likelihood that
an attacker with access to userfaultfd can delay faulting kernel
code to widen timing windows for other exploits.

The default value of this knob is changed to 0. This is required for
correct functioning of pipe mutex. However, this will fail postcopy
live migration, which will be unnoticeable to the VM guests. To avoid
this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details,
refer to Andrea's reply [1].

[1] https://lore.kernel.org/lkml/20200904033438.gi9...@redhat.com/

Signed-off-by: Lokesh Gidra 
---
 Documentation/admin-guide/sysctl/vm.rst | 15 ++-
 fs/userfaultfd.c|  6 --
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst 
b/Documentation/admin-guide/sysctl/vm.rst
index 4b9d2e8e9142..4263d38c3c21 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -871,12 +871,17 @@ file-backed pages is less than the high watermark in a 
zone.
 unprivileged_userfaultfd
 
 
-This flag controls whether unprivileged users can use the userfaultfd
-system calls.  Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
+This flag controls the mode in which unprivileged users can use the
+userfaultfd system calls. Set this to 0 to restrict unprivileged users
+to handle page faults in user mode only. In this case, users without
+SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
+succeed. Prohibiting use of userfaultfd for handling faults from kernel
+mode may make certain vulnerabilities more difficult to exploit.
 
-The default value is 1.
+Set this to 1 to allow unprivileged users to use the userfaultfd system
+calls without any restrictions.
+
+The default value is 0.
 
 
 user_reserve_kbytes
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 3191434057f3..3816c11a986a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -28,7 +28,7 @@
 #include 
 #include 
 
-int sysctl_unprivileged_userfaultfd __read_mostly = 1;
+int sysctl_unprivileged_userfaultfd __read_mostly;
 
 static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
 
@@ -1972,7 +1972,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
struct userfaultfd_ctx *ctx;
int fd;
 
-   if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
+   if (!sysctl_unprivileged_userfaultfd &&
+   (flags & UFFD_USER_MODE_ONLY) == 0 &&
+   !capable(CAP_SYS_PTRACE))
return -EPERM;
 
BUG_ON(!current->mm);
-- 
2.28.0.681.g6f77f65b4e-goog



[PATCH] clk/qcom: fix spelling typo

2020-09-23 Thread Wang Qing
Modify the comment typo: "compliment" -> "complement".

Signed-off-by: Wang Qing 
---
 drivers/clk/qcom/clk-alpha-pll.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/clk/qcom/clk-alpha-pll.c b/drivers/clk/qcom/clk-alpha-pll.c
index 26139ef..5644311
--- a/drivers/clk/qcom/clk-alpha-pll.c
+++ b/drivers/clk/qcom/clk-alpha-pll.c
@@ -609,7 +609,7 @@ static unsigned long
 alpha_huayra_pll_calc_rate(u64 prate, u32 l, u32 a)
 {
/*
-* a contains 16 bit alpha_val in two’s compliment number in the range
+* a contains 16 bit alpha_val in two’s complement number in the range
 * of [-0.5, 0.5).
 */
if (a >= BIT(PLL_HUAYRA_ALPHA_WIDTH - 1))
@@ -641,7 +641,7 @@ alpha_huayra_pll_round_rate(unsigned long rate, unsigned 
long prate,
quotient++;
 
/*
-* alpha_val should be in two’s compliment number in the range
+* alpha_val should be in two’s complement number in the range
 * of [-0.5, 0.5) so if quotient >= 0.5 then increment the l value
 * since alpha value will be subtracted in this case.
 */
@@ -666,7 +666,7 @@ alpha_pll_huayra_recalc_rate(struct clk_hw *hw, unsigned 
long parent_rate)
regmap_read(pll->clkr.regmap, PLL_ALPHA_VAL(pll), &alpha);
/*
 * Depending upon alpha_mode, it can be treated as M/N value or
-* as a two’s compliment number. When alpha_mode=1,
+* as a two’s complement number. When alpha_mode=1,
 * pll_alpha_val<15:8>=M and pll_apla_val<7:0>=N
 *
 *  Fout=FIN*(L+(M/N))
@@ -674,12 +674,12 @@ alpha_pll_huayra_recalc_rate(struct clk_hw *hw, unsigned 
long parent_rate)
 * M is a signed number (-128 to 127) and N is unsigned
 * (0 to 255). M/N has to be within +/-0.5.
 *
-* When alpha_mode=0, it is a two’s compliment number in the
+* When alpha_mode=0, it is a two’s complement number in the
 * range [-0.5, 0.5).
 *
 *  Fout=FIN*(L+(alpha_val)/2^16)
 *
-* where alpha_val is two’s compliment number.
+* where alpha_val is two’s complement number.
 */
if (!(ctl & PLL_ALPHA_MODE))
return alpha_huayra_pll_calc_rate(rate, l, alpha);
-- 
2.7.4



Re: [PATCH 00/10] rpmsg: Make RPMSG name service modular

2020-09-23 Thread Guennadi Liakhovetski
Hi Mathieu,

Sorry for a delayed response, after I'd sent that my message I've 
subscribed to remoteproc and it seems during that transition some 
messages were only delivered from the list and not directly to me 
or something similar has happened.

On Tue, Sep 22, 2020 at 01:12:41PM -0600, Mathieu Poirier wrote:
> Good day Guennadi,
> 
> On Tue, 22 Sep 2020 at 02:09, Guennadi Liakhovetski
>  wrote:
> >
> > Hi Mathieu,
> >
> > Thanks for the patches. I'm trying to understand the concept of
> > this approach and I'm probably failing at that. It seems to me
> > that this patch set is making the NS announcement service to a
> > separate RPMsg device and I don't understand the reasoning for
> > doing this. As far as I understand namespace announcements
> > belong to RPMsg devices / channels, they create a dedicated
> > endpoint on them with a fixed pre-defined address. But they
> > don't form a separate RPMsg device. I think the current
> > virtio_rpmsg_bus.c has that correctly: for each rpmsg device /
> > channel multiple endpoints can be created, where the NS
> > service is one of them. It's just an endpoing of an rpmsg
> > device, not a complete separate device. Have I misunderstood
> > anything?
> 
> This patchset does not introduce any new features - the end result in
> terms of functionality is exactly the same.  It is also a carbon copy
> of the work introduced by Arnaud (hence reusing his patches), with the
> exception that the code is presented in a slightly different order to
> allow for a complete dissociation of RPMSG name service from the
> virtIO transport mechanic.
> 
> To make that happen rpmsg device specific byte conversion operations
> had to be introduced in struct rpmsg_device_ops and the explicit
> creation of an rpmsg_device associated with the name service (that
> wasn't needed when name service was welded to virtIO).  But
> associating a rpmsg_device to the name service doesn't change anything
> - RPMSG devices are created the same way when name service messages
> are received from the host or the remote processor.

Yes, the current rpmsg-virtio code does create *one* rpmsg device when 
an NS announcement arrives. Whereas with this patch set the first rpmsg 
device would be created to probe the NS service driver and the next one 
would still be created following the code borrowed from rpmsg-virtio 
when an NS announcement arrives. And I don't see how those two devices 
now make sense, sorry. I understand one device per channel, but two, of 
which one is for a certain endpoing only, whereas other endpoints don't 
create their devices, don't seem very logical to me.

Thanks
Guennadi

> To prove my theory I ran the rpmsg_client_sample.c and it just worked,
> no changes to client code needed.
> 
> Let's keep talking, it's the only way we'll get through this.
> 
> Mathieu
> 
> >
> > Thanks
> > Guennadi
> >
> > On Mon, Sep 21, 2020 at 06:09:50PM -0600, Mathieu Poirier wrote:
> > > Hi all,
> > >
> > > After looking at Guennadi[1] and Arnaud's patchsets[2] it became
> > > clear that we need to go back to a generic rpmsg_ns_msg structure
> > > if we wanted to make progress.  To do that some of the work from
> > > Arnaud had to be modified in a way that common name service
> > > functionality was transport agnostic.
> > >
> > > This patchset is based on Arnaud's work but also include a patch
> > > from Guennadi and some input from me.  It should serve as a
> > > foundation for the next revision of [1].
> > >
> > > Applies on rpmsg-next (4e3dda0bc603) and tested on stm32mp157. I
> > > did not test the modularisation.
> > >
> > > Comments and feedback would be greatly appreciated.
> > >
> > > Thanks,
> > > Mathieu
> > >
> > > [1]. 
> > > https://patchwork.kernel.org/project/linux-remoteproc/list/?series=346593
> > > [2]. 
> > > https://patchwork.kernel.org/project/linux-remoteproc/list/?series=338335
> > >
> > > Arnaud Pouliquen (5):
> > >   rpmsg: virtio: rename rpmsg_create_channel
> > >   rpmsg: core: Add channel creation internal API
> > >   rpmsg: virtio: Add rpmsg channel device ops
> > >   rpmsg: Turn name service into a stand alone driver
> > >   rpmsg: virtio: use rpmsg ns device for the ns announcement
> > >
> > > Guennadi Liakhovetski (1):
> > >   rpmsg: Move common structures and defines to headers
> > >
> > > Mathieu Poirier (4):
> > >   rpmsg: virtio: Move virtio RPMSG structures to private header
> > >   rpmsg: core: Add RPMSG byte conversion operations
> > >   rpmsg: virtio: Make endianness conversion virtIO specific
> > >   rpmsg: ns: Make Name service module transport agnostic
> > >
> > >  drivers/rpmsg/Kconfig|   9 +
> > >  drivers/rpmsg/Makefile   |   1 +
> > >  drivers/rpmsg/rpmsg_core.c   |  96 +++
> > >  drivers/rpmsg/rpmsg_internal.h   | 102 +++
> > >  drivers/rpmsg/rpmsg_ns.c | 108 
> > >  drivers/rpmsg/virtio_rpmsg_bus.c | 284 +--
> > >  include/linux/rpmsg_ns.h |  83 +

Re: [PATCH 1/2 v2] iio: event: use short-hand variable in iio_device_{un}register_eventset functions

2020-09-23 Thread Alexandru Ardelean
On Wed, Sep 23, 2020 at 11:13 PM Jonathan Cameron  wrote:
>
> On Mon, 21 Sep 2020 13:31:55 +0300
> Alexandru Ardelean  wrote:
>
> > With the recent 'iio_dev_opaque' variable name, these two functions are
> > looking a bit ugly.
> >
> > This change uses an 'ev_int' variable for the
> > iio_device_{un}register_eventset functions to make the code a little easier
> > to read.
> >
> > Signed-off-by: Alexandru Ardelean 
>
> Seems sensible.  Series applied to the togreg branch of iio.git and pushed 
> out as
> testing.  Not sure if this will make it into a final pull request for this
> cycle or not. Kind of depends what Linus says on Sunday about whether we are
> going to see an rc8.
>

No hurry from my side when this goes in.
This is in a longer series of things that do with the whole
multiple-IIO-buffers-per-IIO-device.
I might need to take care [again] so that I don't block myself again
with too many small/parallel series.

> Thanks,
>
> Jonathan
>
> > ---
> >
> > Changelog v1 -> v2:
> > * move 'iio_dev_opaque->event_interface = ev_int;' assigment right after
> >   allocation to avoid crash; 'iio_dev_opaque->event_interface' is accessed
> >   after init
> >
> >  drivers/iio/industrialio-event.c | 50 +++-
> >  1 file changed, 24 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/iio/industrialio-event.c 
> > b/drivers/iio/industrialio-event.c
> > index 2ab4d4c44427..a85919eb7c4a 100644
> > --- a/drivers/iio/industrialio-event.c
> > +++ b/drivers/iio/industrialio-event.c
> > @@ -477,6 +477,7 @@ static const char *iio_event_group_name = "events";
> >  int iio_device_register_eventset(struct iio_dev *indio_dev)
> >  {
> >   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
> > + struct iio_event_interface *ev_int;
> >   struct iio_dev_attr *p;
> >   int ret = 0, attrcount_orig = 0, attrcount, attrn;
> >   struct attribute **attr;
> > @@ -485,14 +486,15 @@ int iio_device_register_eventset(struct iio_dev 
> > *indio_dev)
> > iio_check_for_dynamic_events(indio_dev)))
> >   return 0;
> >
> > - iio_dev_opaque->event_interface =
> > - kzalloc(sizeof(struct iio_event_interface), GFP_KERNEL);
> > - if (iio_dev_opaque->event_interface == NULL)
> > + ev_int = kzalloc(sizeof(struct iio_event_interface), GFP_KERNEL);
> > + if (ev_int == NULL)
> >   return -ENOMEM;
> >
> > - INIT_LIST_HEAD(&iio_dev_opaque->event_interface->dev_attr_list);
> > + iio_dev_opaque->event_interface = ev_int;
> > +
> > + INIT_LIST_HEAD(&ev_int->dev_attr_list);
> >
> > - iio_setup_ev_int(iio_dev_opaque->event_interface);
> > + iio_setup_ev_int(ev_int);
> >   if (indio_dev->info->event_attrs != NULL) {
> >   attr = indio_dev->info->event_attrs->attrs;
> >   while (*attr++ != NULL)
> > @@ -506,34 +508,29 @@ int iio_device_register_eventset(struct iio_dev 
> > *indio_dev)
> >   attrcount += ret;
> >   }
> >
> > - iio_dev_opaque->event_interface->group.name = iio_event_group_name;
> > - iio_dev_opaque->event_interface->group.attrs = kcalloc(attrcount + 1,
> > -   
> > sizeof(iio_dev_opaque->event_interface->group.attrs[0]),
> > -   GFP_KERNEL);
> > - if (iio_dev_opaque->event_interface->group.attrs == NULL) {
> > + ev_int->group.name = iio_event_group_name;
> > + ev_int->group.attrs = kcalloc(attrcount + 1,
> > +   sizeof(ev_int->group.attrs[0]),
> > +   GFP_KERNEL);
> > + if (ev_int->group.attrs == NULL) {
> >   ret = -ENOMEM;
> >   goto error_free_setup_event_lines;
> >   }
> >   if (indio_dev->info->event_attrs)
> > - memcpy(iio_dev_opaque->event_interface->group.attrs,
> > + memcpy(ev_int->group.attrs,
> >  indio_dev->info->event_attrs->attrs,
> > -sizeof(iio_dev_opaque->event_interface->group.attrs[0])
> > -*attrcount_orig);
> > +sizeof(ev_int->group.attrs[0]) * attrcount_orig);
> >   attrn = attrcount_orig;
> >   /* Add all elements from the list. */
> > - list_for_each_entry(p,
> > - &iio_dev_opaque->event_interface->dev_attr_list,
> > - l)
> > - iio_dev_opaque->event_interface->group.attrs[attrn++] =
> > - &p->dev_attr.attr;
> > - indio_dev->groups[indio_dev->groupcounter++] =
> > - &iio_dev_opaque->event_interface->group;
> > + list_for_each_entry(p, &ev_int->dev_attr_list, l)
> > + ev_int->group.attrs[attrn++] = &p->dev_attr.attr;
> > + indio_dev->groups[indio_dev->groupcounter++] = &ev_int->group;
> >
> >   return 0;
> >
> >  error_free_setup_event_lines:
> > - 
> > iio_free_chan_devattr_

[PATCH 11/13] bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag

2020-09-23 Thread Christoph Hellwig
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it.  This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.

One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore.  It is replaced with a queue attribute which
also is writable for easier testing.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Reviewed-by: Johannes Thumshirn 
---
 block/blk-integrity.c |  4 ++--
 block/blk-mq-debugfs.c|  1 +
 block/blk-sysfs.c |  3 +++
 drivers/block/rbd.c   |  2 +-
 drivers/block/zram/zram_drv.c |  2 +-
 drivers/md/dm-table.c |  6 +++---
 drivers/md/raid5.c|  8 
 drivers/mmc/core/queue.c  |  3 +--
 drivers/nvme/host/core.c  |  3 +--
 drivers/nvme/host/multipath.c | 10 +++---
 drivers/scsi/iscsi_tcp.c  |  4 ++--
 fs/super.c|  2 ++
 include/linux/backing-dev.h   |  6 --
 include/linux/blkdev.h|  3 +++
 include/linux/fs.h|  1 +
 mm/backing-dev.c  |  7 +++
 mm/page-writeback.c   |  2 +-
 mm/swapfile.c |  2 +-
 18 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index c03705cbb9c9f2..2b36a8f9b81390 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -408,7 +408,7 @@ void blk_integrity_register(struct gendisk *disk, struct 
blk_integrity *template
bi->tuple_size = template->tuple_size;
bi->tag_size = template->tag_size;
 
-   disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
+   blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
 
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
if (disk->queue->ksm) {
@@ -428,7 +428,7 @@ EXPORT_SYMBOL(blk_integrity_register);
  */
 void blk_integrity_unregister(struct gendisk *disk)
 {
-   disk->queue->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES;
+   blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, disk->queue);
memset(&disk->queue->integrity, 0, sizeof(struct blk_integrity));
 }
 EXPORT_SYMBOL(blk_integrity_unregister);
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 645b7f800cb827..3094542e12ae0f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -116,6 +116,7 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(SAME_FORCE),
QUEUE_FLAG_NAME(DEAD),
QUEUE_FLAG_NAME(INIT_DONE),
+   QUEUE_FLAG_NAME(STABLE_WRITES),
QUEUE_FLAG_NAME(POLL),
QUEUE_FLAG_NAME(WC),
QUEUE_FLAG_NAME(FUA),
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 869ed21a9edcab..76b54c7750b07e 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -287,6 +287,7 @@ queue_##name##_store(struct request_queue *q, const char 
*page, size_t count) \
 QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
+QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -613,6 +614,7 @@ static struct queue_sysfs_entry queue_hw_sector_size_entry 
= {
 QUEUE_RW_ENTRY(queue_nonrot, "rotational");
 QUEUE_RW_ENTRY(queue_iostats, "iostats");
 QUEUE_RW_ENTRY(queue_random, "add_random");
+QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
 
 static struct attribute *queue_attrs[] = {
&queue_requests_entry.attr,
@@ -645,6 +647,7 @@ static struct attribute *queue_attrs[] = {
&queue_nomerges_entry.attr,
&queue_rq_affinity_entry.attr,
&queue_iostats_entry.attr,
+   &queue_stable_writes_entry.attr,
&queue_random_entry.attr,
&queue_poll_entry.attr,
&queue_wc_entry.attr,
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 5d3923c0997ce0..cf5b016358cdab 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -5022,7 +5022,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
}
 
if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
-   q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
+   blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
 
/*
 * disk_release() expects a queue ref from add_disk() and will
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e21ca844d7c291..bff3d4021c18e1 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1955,7 +1955,7 @@ static int zram_add(void)
if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX

bdi cleanups v7

2020-09-23 Thread Christoph Hellwig
Hi Jens,

this series contains a bunch of different BDI cleanups.  The biggest item
is to isolate block drivers from the BDI in preparation of changing the
lifetime of the block device BDI in a follow up series.

Changes since v6:
 - add a new blk_queue_update_readahead helper and use it in stacking
   drivers
 - improve another commit log

Changes since v5:
 - improve a commit message
 - improve the stable_writes deprecation printk
 - drop "drbd: remove RB_CONGESTED_REMOTE"
 - drop a few hunks that add a local variable in a otherwise unchanged
   file due to changes in the previous revisions
 - keep updating ->io_pages in queue_max_sectors_store
 - set an optimal I/O size in aoe
 - inherit the optimal I/O size in bcache

Changes since v4:
 - add a back a prematurely removed assignment in dm-table.c
 - pick up a few reviews from Johannes that got lost

Changes since v3:
 - rebased on the lasted block tree, which has some of the prep
   changes merged
 - extend the ->ra_pages changes to ->io_pages
 - move initializing ->ra_pages and ->io_pages for block devices to
   blk_register_queue

Changes since v2:
 - fix a rw_page return value check
 - fix up various changelogs

Changes since v1:
 - rebased to the for-5.9/block-merge branch
 - explicitly set the readahead to 0 for ubifs, vboxsf and mtd
 - split the zram block_device operations
 - let rw_page users fall back to bios in swap_readpage


Diffstat:


[PATCH 12/13] bdi: invert BDI_CAP_NO_ACCT_WB

2020-09-23 Thread Christoph Hellwig
Replace BDI_CAP_NO_ACCT_WB with a positive BDI_CAP_WRITEBACK_ACCT to
make the checks more obvious.  Also remove the pointless
bdi_cap_account_writeback wrapper that just obsfucates the check.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Reviewed-by: Johannes Thumshirn 
---
 fs/fuse/inode.c |  3 ++-
 include/linux/backing-dev.h | 13 +++--
 mm/backing-dev.c|  1 +
 mm/page-writeback.c |  4 ++--
 4 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 17b00670fb539e..581329203d6860 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1050,7 +1050,8 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct 
super_block *sb)
return err;
 
/* fuse does it's own writeback accounting */
-   sb->s_bdi->capabilities = BDI_CAP_NO_ACCT_WB | BDI_CAP_STRICTLIMIT;
+   sb->s_bdi->capabilities &= ~BDI_CAP_WRITEBACK_ACCT;
+   sb->s_bdi->capabilities |= BDI_CAP_STRICTLIMIT;
 
/*
 * For a single fuse filesystem use max 1% of dirty +
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 5da4ea3dd0cc5c..b217344a2c63be 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -120,17 +120,17 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, 
unsigned int max_ratio);
  *
  * BDI_CAP_NO_ACCT_DIRTY:  Dirty pages shouldn't contribute to accounting
  * BDI_CAP_NO_WRITEBACK:   Don't write pages back
- * BDI_CAP_NO_ACCT_WB: Don't automatically account writeback pages
+ * BDI_CAP_WRITEBACK_ACCT: Automatically account writeback pages
  * BDI_CAP_STRICTLIMIT:Keep number of dirty pages below bdi threshold.
  */
 #define BDI_CAP_NO_ACCT_DIRTY  0x0001
 #define BDI_CAP_NO_WRITEBACK   0x0002
-#define BDI_CAP_NO_ACCT_WB 0x0004
+#define BDI_CAP_WRITEBACK_ACCT 0x0004
 #define BDI_CAP_STRICTLIMIT0x0010
 #define BDI_CAP_CGROUP_WRITEBACK 0x0020
 
 #define BDI_CAP_NO_ACCT_AND_WRITEBACK \
-   (BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_ACCT_WB)
+   (BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY)
 
 extern struct backing_dev_info noop_backing_dev_info;
 
@@ -179,13 +179,6 @@ static inline bool bdi_cap_account_dirty(struct 
backing_dev_info *bdi)
return !(bdi->capabilities & BDI_CAP_NO_ACCT_DIRTY);
 }
 
-static inline bool bdi_cap_account_writeback(struct backing_dev_info *bdi)
-{
-   /* Paranoia: BDI_CAP_NO_WRITEBACK implies BDI_CAP_NO_ACCT_WB */
-   return !(bdi->capabilities & (BDI_CAP_NO_ACCT_WB |
- BDI_CAP_NO_WRITEBACK));
-}
-
 static inline bool mapping_cap_writeback_dirty(struct address_space *mapping)
 {
return bdi_cap_writeback_dirty(inode_to_bdi(mapping->host));
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8e3802bf03a968..df18f0088dd3f5 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -745,6 +745,7 @@ struct backing_dev_info *bdi_alloc(int node_id)
kfree(bdi);
return NULL;
}
+   bdi->capabilities = BDI_CAP_WRITEBACK_ACCT;
bdi->ra_pages = VM_READAHEAD_PAGES;
bdi->io_pages = VM_READAHEAD_PAGES;
return bdi;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index e9c36521461aaa..0139f9622a92da 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2738,7 +2738,7 @@ int test_clear_page_writeback(struct page *page)
if (ret) {
__xa_clear_mark(&mapping->i_pages, page_index(page),
PAGECACHE_TAG_WRITEBACK);
-   if (bdi_cap_account_writeback(bdi)) {
+   if (bdi->capabilities & BDI_CAP_WRITEBACK_ACCT) {
struct bdi_writeback *wb = inode_to_wb(inode);
 
dec_wb_stat(wb, WB_WRITEBACK);
@@ -2791,7 +2791,7 @@ int __test_set_page_writeback(struct page *page, bool 
keep_write)
   PAGECACHE_TAG_WRITEBACK);
 
xas_set_mark(&xas, PAGECACHE_TAG_WRITEBACK);
-   if (bdi_cap_account_writeback(bdi))
+   if (bdi->capabilities & BDI_CAP_WRITEBACK_ACCT)
inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK);
 
/*
-- 
2.28.0



[PATCH 10/13] mm: use SWP_SYNCHRONOUS_IO more intelligently

2020-09-23 Thread Christoph Hellwig
There is no point in trying to call bdev_read_page if SWP_SYNCHRONOUS_IO
is not set, as the device won't support it.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Reviewed-by: Johannes Thumshirn 
---
 mm/page_io.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index e485a6e8a6cddb..b199b87e0aa92b 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -403,15 +403,17 @@ int swap_readpage(struct page *page, bool synchronous)
goto out;
}
 
-   ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
-   if (!ret) {
-   if (trylock_page(page)) {
-   swap_slot_free_notify(page);
-   unlock_page(page);
-   }
+   if (sis->flags & SWP_SYNCHRONOUS_IO) {
+   ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
+   if (!ret) {
+   if (trylock_page(page)) {
+   swap_slot_free_notify(page);
+   unlock_page(page);
+   }
 
-   count_vm_event(PSWPIN);
-   goto out;
+   count_vm_event(PSWPIN);
+   goto out;
+   }
}
 
ret = 0;
-- 
2.28.0



[PATCH 13/13] bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag

2020-09-23 Thread Christoph Hellwig
Replace the two negative flags that are always used together with a
single positive flag that indicates the writeback capability instead
of two related non-capabilities.  Also remove the pointless wrappers
to just check the flag.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Reviewed-by: Johannes Thumshirn 
---
 fs/9p/vfs_file.c|  2 +-
 fs/fs-writeback.c   |  7 +++---
 include/linux/backing-dev.h | 48 -
 mm/backing-dev.c|  6 ++---
 mm/filemap.c|  4 ++--
 mm/memcontrol.c |  2 +-
 mm/memory-failure.c |  2 +-
 mm/migrate.c|  2 +-
 mm/mmap.c   |  2 +-
 mm/page-writeback.c | 12 +-
 10 files changed, 29 insertions(+), 58 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 3576123d82990e..6ecf863bfa2f4b 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -625,7 +625,7 @@ static void v9fs_mmap_vm_close(struct vm_area_struct *vma)
 
inode = file_inode(vma->vm_file);
 
-   if (!mapping_cap_writeback_dirty(inode->i_mapping))
+   if (!mapping_can_writeback(inode->i_mapping))
wbc.nr_to_write = 0;
 
might_sleep();
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 149227160ff0b0..d4f84a2fe0878e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2321,7 +2321,7 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 
wb = locked_inode_to_wb_and_lock_list(inode);
 
-   WARN(bdi_cap_writeback_dirty(wb->bdi) &&
+   WARN((wb->bdi->capabilities & BDI_CAP_WRITEBACK) &&
 !test_bit(WB_registered, &wb->state),
 "bdi-%s not registered\n", bdi_dev_name(wb->bdi));
 
@@ -2346,7 +2346,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 * to make sure background write-back happens
 * later.
 */
-   if (bdi_cap_writeback_dirty(wb->bdi) && wakeup_bdi)
+   if (wakeup_bdi &&
+   (wb->bdi->capabilities & BDI_CAP_WRITEBACK))
wb_wakeup_delayed(wb);
return;
}
@@ -2581,7 +2582,7 @@ int write_inode_now(struct inode *inode, int sync)
.range_end = LLONG_MAX,
};
 
-   if (!mapping_cap_writeback_dirty(inode->i_mapping))
+   if (!mapping_can_writeback(inode->i_mapping))
wbc.nr_to_write = 0;
 
might_sleep();
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index b217344a2c63be..44df4fcef65c1e 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -110,27 +110,14 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, 
unsigned int max_ratio);
 /*
  * Flags in backing_dev_info::capability
  *
- * The first three flags control whether dirty pages will contribute to the
- * VM's accounting and whether writepages() should be called for dirty pages
- * (something that would not, for example, be appropriate for ramfs)
- *
- * WARNING: these flags are closely related and should not normally be
- * used separately.  The BDI_CAP_NO_ACCT_AND_WRITEBACK combines these
- * three flags into a single convenience macro.
- *
- * BDI_CAP_NO_ACCT_DIRTY:  Dirty pages shouldn't contribute to accounting
- * BDI_CAP_NO_WRITEBACK:   Don't write pages back
- * BDI_CAP_WRITEBACK_ACCT: Automatically account writeback pages
- * BDI_CAP_STRICTLIMIT:Keep number of dirty pages below bdi threshold.
+ * BDI_CAP_WRITEBACK:  Supports dirty page writeback, and dirty pages
+ * should contribute to accounting
+ * BDI_CAP_WRITEBACK_ACCT: Automatically account writeback pages
+ * BDI_CAP_STRICTLIMIT:Keep number of dirty pages below bdi 
threshold
  */
-#define BDI_CAP_NO_ACCT_DIRTY  0x0001
-#define BDI_CAP_NO_WRITEBACK   0x0002
-#define BDI_CAP_WRITEBACK_ACCT 0x0004
-#define BDI_CAP_STRICTLIMIT0x0010
-#define BDI_CAP_CGROUP_WRITEBACK 0x0020
-
-#define BDI_CAP_NO_ACCT_AND_WRITEBACK \
-   (BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY)
+#define BDI_CAP_WRITEBACK  (1 << 0)
+#define BDI_CAP_WRITEBACK_ACCT (1 << 1)
+#define BDI_CAP_STRICTLIMIT(1 << 2)
 
 extern struct backing_dev_info noop_backing_dev_info;
 
@@ -169,24 +156,9 @@ static inline int wb_congested(struct bdi_writeback *wb, 
int cong_bits)
 long congestion_wait(int sync, long timeout);
 long wait_iff_congested(int sync, long timeout);
 
-static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi)
-{
-   return !(bdi->capabilities & BDI_CAP_NO_WRITEBACK);
-}
-
-static inline bool bdi_cap_account_dirty(struct backing_dev_info *bdi)
-{
-   return !(bdi->capabilities & BDI_CAP_NO_ACCT_DIRTY);
-}
-
-static inline bool mapping_cap_writ

[PATCH 05/13] bdi: initialize ->ra_pages and ->io_pages in bdi_init

2020-09-23 Thread Christoph Hellwig
Set up a readahead size by default, as very few users have a good
reason to change it.  This means code, ecryptfs, and orangefs now
set up the values while they were previously missing it, while ubifs,
mtd and vboxsf manually set it to 0 to avoid readahead.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Acked-by: David Sterba  [btrfs]
Acked-by: Richard Weinberger  [ubifs, mtd]
---
 block/blk-core.c  | 2 --
 drivers/mtd/mtdcore.c | 2 ++
 fs/9p/vfs_super.c | 6 --
 fs/afs/super.c| 1 -
 fs/btrfs/disk-io.c| 1 -
 fs/fuse/inode.c   | 1 -
 fs/nfs/super.c| 9 +
 fs/ubifs/super.c  | 2 ++
 fs/vboxsf/super.c | 2 ++
 mm/backing-dev.c  | 2 ++
 10 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ca3f0f00c9435f..865d39e5be2b28 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -538,8 +538,6 @@ struct request_queue *blk_alloc_queue(int node_id)
if (!q->stats)
goto fail_stats;
 
-   q->backing_dev_info->ra_pages = VM_READAHEAD_PAGES;
-   q->backing_dev_info->io_pages = VM_READAHEAD_PAGES;
q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
q->node = node_id;
 
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 7d930569a7dfb7..b5e5d3140f578e 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -2196,6 +2196,8 @@ static struct backing_dev_info * __init mtd_bdi_init(char 
*name)
bdi = bdi_alloc(NUMA_NO_NODE);
if (!bdi)
return ERR_PTR(-ENOMEM);
+   bdi->ra_pages = 0;
+   bdi->io_pages = 0;
 
/*
 * We put '-0' suffix to the name to get the same name format as we
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 74df32be4c6a52..e34fa20acf612e 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -80,8 +80,10 @@ v9fs_fill_super(struct super_block *sb, struct 
v9fs_session_info *v9ses,
if (ret)
return ret;
 
-   if (v9ses->cache)
-   sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
+   if (!v9ses->cache) {
+   sb->s_bdi->ra_pages = 0;
+   sb->s_bdi->io_pages = 0;
+   }
 
sb->s_flags |= SB_ACTIVE | SB_DIRSYNC;
if (!v9ses->cache)
diff --git a/fs/afs/super.c b/fs/afs/super.c
index b552357b1d1379..3a40ee752c1e3f 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -456,7 +456,6 @@ static int afs_fill_super(struct super_block *sb, struct 
afs_fs_context *ctx)
ret = super_setup_bdi(sb);
if (ret)
return ret;
-   sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
 
/* allocate the root inode and dentry */
if (as->dyn_root) {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f6bba7eb1fa171..047934cea25efa 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3092,7 +3092,6 @@ int __cold open_ctree(struct super_block *sb, struct 
btrfs_fs_devices *fs_device
}
 
sb->s_bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK;
-   sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index bba747520e9b08..17b00670fb539e 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1049,7 +1049,6 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct 
super_block *sb)
if (err)
return err;
 
-   sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
/* fuse does it's own writeback accounting */
sb->s_bdi->capabilities = BDI_CAP_NO_ACCT_WB | BDI_CAP_STRICTLIMIT;
 
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 7a70287f21a2c1..f943e37853fa25 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1200,13 +1200,6 @@ static void nfs_get_cache_cookie(struct super_block *sb,
 }
 #endif
 
-static void nfs_set_readahead(struct backing_dev_info *bdi,
- unsigned long iomax_pages)
-{
-   bdi->ra_pages = VM_READAHEAD_PAGES;
-   bdi->io_pages = iomax_pages;
-}
-
 int nfs_get_tree_common(struct fs_context *fc)
 {
struct nfs_fs_context *ctx = nfs_fc2context(fc);
@@ -1251,7 +1244,7 @@ int nfs_get_tree_common(struct fs_context *fc)
 MINOR(server->s_dev));
if (error)
goto error_splat_super;
-   nfs_set_readahead(s->s_bdi, server->rpages);
+   s->s_bdi->io_pages = server->rpages;
server->super = s;
}
 
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index a2420c900275a8..fbddb2a1c03f5e 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2177,6 +2177,8 @@ static int ubifs_fill_super(struct super_block *sb, void 
*data, int silent)
   c->vi.vol_id);
if (err)
goto out_close;
+   sb->s_bdi->ra_pages = 0;
+

[PATCH 04/13] aoe: set an optimal I/O size

2020-09-23 Thread Christoph Hellwig
aoe forces a larger readahead size, but any reason to do larger I/O
is not limited to readahead.  Also set the optimal I/O size, and
remove the local constants in favor of just using SZ_2G.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
---
 drivers/block/aoe/aoeblk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 5ca7216e9e01f3..d8cfc233e64b93 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -347,7 +347,6 @@ aoeblk_gdalloc(void *vp)
mempool_t *mp;
struct request_queue *q;
struct blk_mq_tag_set *set;
-   enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
ulong flags;
int late = 0;
int err;
@@ -407,7 +406,8 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
-   q->backing_dev_info->ra_pages = READ_AHEAD / PAGE_SIZE;
+   q->backing_dev_info->ra_pages = SZ_2M / PAGE_SIZE;
+   blk_queue_io_opt(q, SZ_2M);
d->bufpool = mp;
d->blkq = gd->queue = q;
q->queuedata = d;
-- 
2.28.0



[PATCH 09/13] bdi: remove BDI_CAP_SYNCHRONOUS_IO

2020-09-23 Thread Christoph Hellwig
BDI_CAP_SYNCHRONOUS_IO is only checked in the swap code, and used to
decided if ->rw_page can be used on a block device.  Just check up for
the method instead.  The only complication is that zram needs a second
set of block_device_operations as it can switch between modes that
actually support ->rw_page and those who don't.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Reviewed-by: Johannes Thumshirn 
---
 drivers/block/brd.c   |  1 -
 drivers/block/zram/zram_drv.c | 19 +--
 drivers/nvdimm/btt.c  |  2 --
 drivers/nvdimm/pmem.c |  1 -
 include/linux/backing-dev.h   |  9 -
 mm/swapfile.c |  2 +-
 6 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 2723a70eb85593..cc49a921339f77 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -403,7 +403,6 @@ static struct brd_device *brd_alloc(int i)
disk->flags = GENHD_FL_EXT_DEVT;
sprintf(disk->disk_name, "ram%d", i);
set_capacity(disk, rd_size * 2);
-   brd->brd_queue->backing_dev_info->capabilities |= 
BDI_CAP_SYNCHRONOUS_IO;
 
/* Tell the block layer that this is not a rotational device */
blk_queue_flag_set(QUEUE_FLAG_NONROT, brd->brd_queue);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 91ccfe444525b4..e21ca844d7c291 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -52,6 +52,9 @@ static unsigned int num_devices = 1;
  */
 static size_t huge_class_size;
 
+static const struct block_device_operations zram_devops;
+static const struct block_device_operations zram_wb_devops;
+
 static void zram_free_page(struct zram *zram, size_t index);
 static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
u32 index, int offset, struct bio *bio);
@@ -408,8 +411,7 @@ static void reset_bdev(struct zram *zram)
zram->backing_dev = NULL;
zram->old_block_size = 0;
zram->bdev = NULL;
-   zram->disk->queue->backing_dev_info->capabilities |=
-   BDI_CAP_SYNCHRONOUS_IO;
+   zram->disk->fops = &zram_devops;
kvfree(zram->bitmap);
zram->bitmap = NULL;
 }
@@ -529,8 +531,7 @@ static ssize_t backing_dev_store(struct device *dev,
 * freely but in fact, IO is going on so finally could cause
 * use-after-free when the IO is really done.
 */
-   zram->disk->queue->backing_dev_info->capabilities &=
-   ~BDI_CAP_SYNCHRONOUS_IO;
+   zram->disk->fops = &zram_wb_devops;
up_write(&zram->init_lock);
 
pr_info("setup backing device %s\n", file_name);
@@ -1820,6 +1821,13 @@ static const struct block_device_operations zram_devops 
= {
.owner = THIS_MODULE
 };
 
+static const struct block_device_operations zram_wb_devops = {
+   .open = zram_open,
+   .submit_bio = zram_submit_bio,
+   .swap_slot_free_notify = zram_slot_free_notify,
+   .owner = THIS_MODULE
+};
+
 static DEVICE_ATTR_WO(compact);
 static DEVICE_ATTR_RW(disksize);
 static DEVICE_ATTR_RO(initstate);
@@ -1947,8 +1955,7 @@ static int zram_add(void)
if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
-   zram->disk->queue->backing_dev_info->capabilities |=
-   (BDI_CAP_STABLE_WRITES | BDI_CAP_SYNCHRONOUS_IO);
+   zram->disk->queue->backing_dev_info->capabilities |= 
BDI_CAP_STABLE_WRITES;
device_add_disk(NULL, zram->disk, zram_disk_attr_groups);
 
strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 0d710140bf93be..12ff6f8784ac11 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1537,8 +1537,6 @@ static int btt_blk_init(struct btt *btt)
btt->btt_disk->private_data = btt;
btt->btt_disk->queue = btt->btt_queue;
btt->btt_disk->flags = GENHD_FL_EXT_DEVT;
-   btt->btt_disk->queue->backing_dev_info->capabilities |=
-   BDI_CAP_SYNCHRONOUS_IO;
 
blk_queue_logical_block_size(btt->btt_queue, btt->sector_size);
blk_queue_max_hw_sectors(btt->btt_queue, UINT_MAX);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 140cf3b9000c60..1711fdfd8d2816 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -475,7 +475,6 @@ static int pmem_attach_disk(struct device *dev,
disk->queue = q;
disk->flags = GENHD_FL_EXT_DEVT;
disk->private_data  = pmem;
-   disk->queue->backing_dev_info->capabilities |= BDI_CAP_SYNCHRONOUS_IO;
nvdimm_namespace_disk_name(ndns, disk->disk_name);
set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
/ 512);
diff --git a/include/linux/backing-dev.h b/include/li

[PATCH 03/13] bcache: inherit the optimal I/O size

2020-09-23 Thread Christoph Hellwig
Inherit the optimal I/O size setting just like the readahead window,
as any reason to do larger I/O does not apply to just readahead.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Acked-by: Coly Li 
---
 drivers/md/bcache/super.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 1bbdc410ee3c51..48113005ed86ad 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1430,6 +1430,8 @@ static int cached_dev_init(struct cached_dev *dc, 
unsigned int block_size)
dc->disk.disk->queue->backing_dev_info->ra_pages =
max(dc->disk.disk->queue->backing_dev_info->ra_pages,
q->backing_dev_info->ra_pages);
+   blk_queue_io_opt(dc->disk.disk->queue,
+   max(queue_io_opt(dc->disk.disk->queue), queue_io_opt(q)));
 
atomic_set(&dc->io_errors, 0);
dc->io_disable = false;
-- 
2.28.0



[PATCH 08/13] bdi: remove BDI_CAP_CGROUP_WRITEBACK

2020-09-23 Thread Christoph Hellwig
Just checking SB_I_CGROUPWB for cgroup writeback support is enough.
Either the file system allocates its own bdi (e.g. btrfs), in which case
it is known to support cgroup writeback, or the bdi comes from the block
layer, which always supports cgroup writeback.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Reviewed-by: Johannes Thumshirn 
---
 block/blk-core.c| 1 -
 fs/btrfs/disk-io.c  | 1 -
 include/linux/backing-dev.h | 8 +++-
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 865d39e5be2b28..1cc4fa6bc7fe1f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -538,7 +538,6 @@ struct request_queue *blk_alloc_queue(int node_id)
if (!q->stats)
goto fail_stats;
 
-   q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
q->node = node_id;
 
atomic_set(&q->nr_active_requests_shared_sbitmap, 0);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 047934cea25efa..e24927bddd5829 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3091,7 +3091,6 @@ int __cold open_ctree(struct super_block *sb, struct 
btrfs_fs_devices *fs_device
goto fail_sb_buffer;
}
 
-   sb->s_bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK;
sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 0b06b2d26c9aa3..52583b6f2ea05d 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -123,7 +123,6 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, 
unsigned int max_ratio);
  * BDI_CAP_NO_ACCT_WB: Don't automatically account writeback pages
  * BDI_CAP_STRICTLIMIT:Keep number of dirty pages below bdi threshold.
  *
- * BDI_CAP_CGROUP_WRITEBACK: Supports cgroup-aware writeback.
  * BDI_CAP_SYNCHRONOUS_IO: Device is so fast that asynchronous IO would be
  *inefficient.
  */
@@ -233,9 +232,9 @@ int inode_congested(struct inode *inode, int cong_bits);
  * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
  * @inode: inode of interest
  *
- * cgroup writeback requires support from both the bdi and filesystem.
- * Also, both memcg and iocg have to be on the default hierarchy.  Test
- * whether all conditions are met.
+ * Cgroup writeback requires support from the filesystem.  Also, both memcg and
+ * iocg have to be on the default hierarchy.  Test whether all conditions are
+ * met.
  *
  * Note that the test result may change dynamically on the same inode
  * depending on how memcg and iocg are configured.
@@ -247,7 +246,6 @@ static inline bool inode_cgwb_enabled(struct inode *inode)
return cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
cgroup_subsys_on_dfl(io_cgrp_subsys) &&
bdi_cap_account_dirty(bdi) &&
-   (bdi->capabilities & BDI_CAP_CGROUP_WRITEBACK) &&
(inode->i_sb->s_iflags & SB_I_CGROUPWB);
 }
 
-- 
2.28.0



[PATCH 07/13] block: lift setting the readahead size into the block layer

2020-09-23 Thread Christoph Hellwig
Drivers shouldn't really mess with the readahead size, as that is a VM
concept.  Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk.  Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors.  To ensure the limits work well for stacking drivers a
new helper is added to update the readahead limits from the block
limits, which is also called from disk_stack_limits.

Signed-off-by: Christoph Hellwig 
Acked-by: Coly Li 
Reviewed-by: Johannes Thumshirn 
---
 block/blk-settings.c | 18 --
 block/blk-sysfs.c|  2 ++
 drivers/block/aoe/aoeblk.c   |  1 -
 drivers/block/drbd/drbd_nl.c | 10 +-
 drivers/md/bcache/super.c|  3 ---
 drivers/md/dm-table.c|  3 +--
 drivers/md/raid0.c   | 16 
 drivers/md/raid10.c  | 24 +---
 drivers/md/raid5.c   | 13 +
 drivers/nvme/host/core.c |  1 +
 include/linux/blkdev.h   |  1 +
 11 files changed, 24 insertions(+), 68 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 5ea3de48afba22..4f6eb4bb17236a 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -372,6 +372,19 @@ void blk_queue_alignment_offset(struct request_queue *q, 
unsigned int offset)
 }
 EXPORT_SYMBOL(blk_queue_alignment_offset);
 
+void blk_queue_update_readahead(struct request_queue *q)
+{
+   /*
+* For read-ahead of large files to be effective, we need to read ahead
+* at least twice the optimal I/O size.
+*/
+   q->backing_dev_info->ra_pages =
+   max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
+   q->backing_dev_info->io_pages =
+   queue_max_sectors(q) >> (PAGE_SHIFT - 9);
+}
+EXPORT_SYMBOL_GPL(blk_queue_update_readahead);
+
 /**
  * blk_limits_io_min - set minimum request size for a device
  * @limits: the queue limits
@@ -450,6 +463,8 @@ EXPORT_SYMBOL(blk_limits_io_opt);
 void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
 {
blk_limits_io_opt(&q->limits, opt);
+   q->backing_dev_info->ra_pages =
+   max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
 }
 EXPORT_SYMBOL(blk_queue_io_opt);
 
@@ -631,8 +646,7 @@ void disk_stack_limits(struct gendisk *disk, struct 
block_device *bdev,
   top, bottom);
}
 
-   t->backing_dev_info->io_pages =
-   t->limits.max_sectors >> (PAGE_SHIFT - 9);
+   blk_queue_update_readahead(disk->queue);
 }
 EXPORT_SYMBOL(disk_stack_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 81722cdcf0cb21..869ed21a9edcab 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -854,6 +854,8 @@ int blk_register_queue(struct gendisk *disk)
percpu_ref_switch_to_percpu(&q->q_usage_counter);
}
 
+   blk_queue_update_readahead(q);
+
ret = blk_trace_init_sysfs(dev);
if (ret)
return ret;
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index d8cfc233e64b93..c34e71b0c4a98c 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -406,7 +406,6 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
-   q->backing_dev_info->ra_pages = SZ_2M / PAGE_SIZE;
blk_queue_io_opt(q, SZ_2M);
d->bufpool = mp;
d->blkq = gd->queue = q;
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index aaff5bde391506..54a4930c04fe07 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1362,15 +1362,7 @@ static void drbd_setup_queue_param(struct drbd_device 
*device, struct drbd_backi
 
if (b) {
blk_stack_limits(&q->limits, &b->limits, 0);
-
-   if (q->backing_dev_info->ra_pages !=
-   b->backing_dev_info->ra_pages) {
-   drbd_info(device, "Adjusting my ra_pages to backing 
device's (%lu -> %lu)\n",
-q->backing_dev_info->ra_pages,
-b->backing_dev_info->ra_pages);
-   q->backing_dev_info->ra_pages =
-   b->backing_dev_info->ra_pages;
-   }
+   blk_queue_update_readahead(q);
}
fixup_discard_if_not_supported(q);
fixup_write_zeroes(device, q);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 48113005ed86ad..6bfa771673623e 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1427,9 +1427,6 @@ static int cached_dev_init(struct cached_dev *dc, 
unsigned int block_size)
if (ret)
return ret;
 
-   dc->disk.disk->queue->backing_dev_info->ra_pages =
-   max(dc->disk.disk->queue->backing_dev_info->ra_pages,
-   

[PATCH 01/13] fs: remove the unused SB_I_MULTIROOT flag

2020-09-23 Thread Christoph Hellwig
The last user of SB_I_MULTIROOT is disappeared with commit f2aedb713c28
("NFS: Add fs_context support.")

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Reviewed-by: Johannes Thumshirn 
---
 fs/namei.c | 4 ++--
 include/linux/fs.h | 1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e99e2a9da0f7de..f1eb8ccd2be958 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -568,8 +568,8 @@ static bool path_connected(struct vfsmount *mnt, struct 
dentry *dentry)
 {
struct super_block *sb = mnt->mnt_sb;
 
-   /* Bind mounts and multi-root filesystems can have disconnected paths */
-   if (!(sb->s_iflags & SB_I_MULTIROOT) && (mnt->mnt_root == sb->s_root))
+   /* Bind mounts can have disconnected paths */
+   if (mnt->mnt_root == sb->s_root)
return true;
 
return is_subdir(dentry, mnt->mnt_root);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a082c..fbd74df5ce5f34 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1385,7 +1385,6 @@ extern int send_sigurg(struct fown_struct *fown);
 #define SB_I_CGROUPWB  0x0001  /* cgroup-aware writeback enabled */
 #define SB_I_NOEXEC0x0002  /* Ignore executables on this fs */
 #define SB_I_NODEV 0x0004  /* Ignore devices on this fs */
-#define SB_I_MULTIROOT 0x0008  /* Multiple roots to the dentry tree */
 
 /* sb->s_iflags to limit user namespace mounts */
 #define SB_I_USERNS_VISIBLE0x0010 /* fstype already mounted */
-- 
2.28.0



[PATCH 02/13] drbd: remove dead code in device_to_statistics

2020-09-23 Thread Christoph Hellwig
Ever since the switch to blk-mq, a lower device not used for VM
writeback will not be marked congested, so the check will never
trigger.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Reviewed-by: Johannes Thumshirn 
---
 drivers/block/drbd/drbd_nl.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 43c8ae4d9fca81..aaff5bde391506 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -3370,7 +3370,6 @@ static void device_to_statistics(struct device_statistics 
*s,
if (get_ldev(device)) {
struct drbd_md *md = &device->ldev->md;
u64 *history_uuids = (u64 *)s->history_uuids;
-   struct request_queue *q;
int n;
 
spin_lock_irq(&md->uuid_lock);
@@ -3384,11 +3383,6 @@ static void device_to_statistics(struct 
device_statistics *s,
spin_unlock_irq(&md->uuid_lock);
 
s->dev_disk_flags = md->flags;
-   q = bdev_get_queue(device->ldev->backing_bdev);
-   s->dev_lower_blocked =
-   bdi_congested(q->backing_dev_info,
- (1 << WB_async_congested) |
- (1 << WB_sync_congested));
put_ldev(device);
}
s->dev_size = drbd_get_capacity(device->this_bdev);
-- 
2.28.0



Re: [PATCH] KVM: SVM: Add a dedicated INVD intercept routine

2020-09-23 Thread Paolo Bonzini
On 23/09/20 22:40, Tom Lendacky wrote:
>>> +static int invd_interception(struct vcpu_svm *svm)
>>> +{
>>> +   /*
>>> +* Can't do emulation on an SEV guest and INVD is emulated
>>> +* as a NOP, so just skip the instruction.
>>> +*/
>>> +   return (sev_guest(svm->vcpu.kvm))
>>> +   ? kvm_skip_emulated_instruction(&svm->vcpu)
>>> +   : kvm_emulate_instruction(&svm->vcpu, 0);
>>
>> Is there any reason not to do kvm_skip_emulated_instruction() for both SEV
>> and legacy?  VMX has the same odd kvm_emulate_instruction() call, but AFAICT
>> that's completely unecessary, i.e. VMX can also convert to a straight skip.
> 
> You could, I just figured I'd leave the legacy behavior just in case. Not
> that I can think of a reason that behavior would ever change.

Yeah, let's do skip for both SVM and VMX.

Paolo



[PATCH v3 4/4] venus: put dummy vote on video-mem path after last session release

2020-09-23 Thread Mansur Alisha Shaik
As per current implementation, video driver is unvoting "videom-mem" path
for last video session during vdec_session_release().
While video playback when we try to suspend device, we see video clock
warnings since votes are already removed during vdec_session_release().

corrected this by putting dummy vote on "video-mem" after last video
session release and unvoting it during suspend.

Fixes: 7482a983d ("media: venus: redesign clocks and pm domains control")
Signed-off-by: Mansur Alisha Shaik 
---
Changes in v3:
- Added fixes tag

 drivers/media/platform/qcom/venus/pm_helpers.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/media/platform/qcom/venus/pm_helpers.c 
b/drivers/media/platform/qcom/venus/pm_helpers.c
index 57877ea..ca09ea8 100644
--- a/drivers/media/platform/qcom/venus/pm_helpers.c
+++ b/drivers/media/platform/qcom/venus/pm_helpers.c
@@ -212,6 +212,16 @@ static int load_scale_bw(struct venus_core *core)
}
mutex_unlock(&core->lock);
 
+   /*
+* keep minimum bandwidth vote for "video-mem" path,
+* so that clks can be disabled during vdec_session_release().
+* Actual bandwidth drop will be done during device supend
+* so that device can power down without any warnings.
+*/
+
+   if (!total_avg && !total_peak)
+   total_avg = kbps_to_icc(1000);
+
dev_dbg(core->dev, VDBGL "total: avg_bw: %u, peak_bw: %u\n",
total_avg, total_peak);
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member 
of Code Aurora Forum, hosted by The Linux Foundation



[PATCH 06/13] md: update the optimal I/O size on reshape

2020-09-23 Thread Christoph Hellwig
The raid5 and raid10 drivers currently update the read-ahead size,
but not the optimal I/O size on reshape.  To prepare for deriving the
read-ahead size from the optimal I/O size make sure it is updated
as well.

Signed-off-by: Christoph Hellwig 
Acked-by: Song Liu 
Reviewed-by: Johannes Thumshirn 
---
 drivers/md/raid10.c | 22 ++
 drivers/md/raid5.c  | 10 --
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e8fa327339171c..9956a04ac13bd6 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3703,10 +3703,20 @@ static struct r10conf *setup_conf(struct mddev *mddev)
return ERR_PTR(err);
 }
 
+static void raid10_set_io_opt(struct r10conf *conf)
+{
+   int raid_disks = conf->geo.raid_disks;
+
+   if (!(conf->geo.raid_disks % conf->geo.near_copies))
+   raid_disks /= conf->geo.near_copies;
+   blk_queue_io_opt(conf->mddev->queue, (conf->mddev->chunk_sectors << 9) *
+raid_disks);
+}
+
 static int raid10_run(struct mddev *mddev)
 {
struct r10conf *conf;
-   int i, disk_idx, chunk_size;
+   int i, disk_idx;
struct raid10_info *disk;
struct md_rdev *rdev;
sector_t size;
@@ -3742,18 +3752,13 @@ static int raid10_run(struct mddev *mddev)
mddev->thread = conf->thread;
conf->thread = NULL;
 
-   chunk_size = mddev->chunk_sectors << 9;
if (mddev->queue) {
blk_queue_max_discard_sectors(mddev->queue,
  mddev->chunk_sectors);
blk_queue_max_write_same_sectors(mddev->queue, 0);
blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
-   blk_queue_io_min(mddev->queue, chunk_size);
-   if (conf->geo.raid_disks % conf->geo.near_copies)
-   blk_queue_io_opt(mddev->queue, chunk_size * 
conf->geo.raid_disks);
-   else
-   blk_queue_io_opt(mddev->queue, chunk_size *
-(conf->geo.raid_disks / 
conf->geo.near_copies));
+   blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
+   raid10_set_io_opt(conf);
}
 
rdev_for_each(rdev, mddev) {
@@ -4727,6 +4732,7 @@ static void end_reshape(struct r10conf *conf)
stripe /= conf->geo.near_copies;
if (conf->mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
conf->mddev->queue->backing_dev_info->ra_pages = 2 * 
stripe;
+   raid10_set_io_opt(conf);
}
conf->fullsync = 0;
 }
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 225380efd1e24f..9a7d1250894ef1 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7232,6 +7232,12 @@ static int only_parity(int raid_disk, int algo, int 
raid_disks, int max_degraded
return 0;
 }
 
+static void raid5_set_io_opt(struct r5conf *conf)
+{
+   blk_queue_io_opt(conf->mddev->queue, (conf->chunk_sectors << 9) *
+(conf->raid_disks - conf->max_degraded));
+}
+
 static int raid5_run(struct mddev *mddev)
 {
struct r5conf *conf;
@@ -7521,8 +7527,7 @@ static int raid5_run(struct mddev *mddev)
 
chunk_size = mddev->chunk_sectors << 9;
blk_queue_io_min(mddev->queue, chunk_size);
-   blk_queue_io_opt(mddev->queue, chunk_size *
-(conf->raid_disks - conf->max_degraded));
+   raid5_set_io_opt(conf);
mddev->queue->limits.raid_partial_stripes_expensive = 1;
/*
 * We can only discard a whole stripe. It doesn't make sense to
@@ -8115,6 +8120,7 @@ static void end_reshape(struct r5conf *conf)
   / PAGE_SIZE);
if (conf->mddev->queue->backing_dev_info->ra_pages < 2 
* stripe)
conf->mddev->queue->backing_dev_info->ra_pages 
= 2 * stripe;
+   raid5_set_io_opt(conf);
}
}
 }
-- 
2.28.0



[PATCH v3 2/4] venus: core: vote for video-mem path

2020-09-23 Thread Mansur Alisha Shaik
Currently video driver is voting for venus0-ebi path during buffer
processing with an average bandwidth of all the instances and
unvoting during session release.

While video streaming when we try to do XO-SD using the command
"echo mem > /sys/power/state command" , device is not entering
to suspend state and from interconnect summary seeing votes for venus0-ebi

Corrected this by voting for venus0-ebi path in venus_runtime_resume()
and unvote during venus_runtime_suspend().

Fixes: 7482a983d ("media: venus: redesign clocks and pm domains control")
Signed-off-by: Mansur Alisha Shaik 
---
Changes in v3:
- Addressed review comments by Stephen Boyd

 drivers/media/platform/qcom/venus/core.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/media/platform/qcom/venus/core.c 
b/drivers/media/platform/qcom/venus/core.c
index 52a3886..fa363b8 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -363,7 +363,18 @@ static __maybe_unused int venus_runtime_suspend(struct 
device *dev)
 
ret = icc_set_bw(core->cpucfg_path, 0, 0);
if (ret)
-   return ret;
+   goto err_cpucfg_path;
+
+   ret = icc_set_bw(core->video_path, 0, 0);
+   if (ret)
+   goto err_video_path;
+
+   return ret;
+
+err_video_path:
+   icc_set_bw(core->cpucfg_path, kbps_to_icc(1000), 0);
+err_cpucfg_path:
+   pm_ops->core_power(dev, POWER_ON);
 
return ret;
 }
@@ -374,6 +385,10 @@ static __maybe_unused int venus_runtime_resume(struct 
device *dev)
const struct venus_pm_ops *pm_ops = core->pm_ops;
int ret;
 
+   ret = icc_set_bw(core->video_path, 0, kbps_to_icc(1000));
+   if (ret)
+   return ret;
+
ret = icc_set_bw(core->cpucfg_path, 0, kbps_to_icc(1000));
if (ret)
return ret;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member 
of Code Aurora Forum, hosted by The Linux Foundation



[PATCH v3 3/4] venus: core: vote with average bandwidth and peak bandwidth as zero

2020-09-23 Thread Mansur Alisha Shaik
As per bandwidth table video driver is voting with average bandwidth
for "video-mem" and "cpu-cfg" paths as peak bandwidth is zero
in bandwidth table.

Fixes: 7482a983d ("media: venus: redesign clocks and pm domains control")
Signed-off-by: Mansur Alisha Shaik 
---
Changes in v3:
- Added fixes tag

 drivers/media/platform/qcom/venus/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/qcom/venus/core.c 
b/drivers/media/platform/qcom/venus/core.c
index fa363b8..d5bfd6f 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -385,11 +385,11 @@ static __maybe_unused int venus_runtime_resume(struct 
device *dev)
const struct venus_pm_ops *pm_ops = core->pm_ops;
int ret;
 
-   ret = icc_set_bw(core->video_path, 0, kbps_to_icc(1000));
+   ret = icc_set_bw(core->video_path, kbps_to_icc(2), 0);
if (ret)
return ret;
 
-   ret = icc_set_bw(core->cpucfg_path, 0, kbps_to_icc(1000));
+   ret = icc_set_bw(core->cpucfg_path, kbps_to_icc(1000), 0);
if (ret)
return ret;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member 
of Code Aurora Forum, hosted by The Linux Foundation



[PATCH v3 1/4] venus: core: change clk enable and disable order in resume and suspend

2020-09-23 Thread Mansur Alisha Shaik
Currently video driver is voting after clk enable and un voting
before clk disable. This is incorrect, video driver should vote
before clk enable and unvote after clk disable.

Corrected this by changing the order of clk enable and clk disable.

Fixes: 7482a983d ("media: venus: redesign clocks and pm domains control")
Signed-off-by: Mansur Alisha Shaik 
Reviewed-by: Stephen Boyd 
---
 drivers/media/platform/qcom/venus/core.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/media/platform/qcom/venus/core.c 
b/drivers/media/platform/qcom/venus/core.c
index 6103aaf..52a3886 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -355,13 +355,16 @@ static __maybe_unused int venus_runtime_suspend(struct 
device *dev)
if (ret)
return ret;
 
+   if (pm_ops->core_power) {
+   ret = pm_ops->core_power(dev, POWER_OFF);
+   if (ret)
+   return ret;
+   }
+
ret = icc_set_bw(core->cpucfg_path, 0, 0);
if (ret)
return ret;
 
-   if (pm_ops->core_power)
-   ret = pm_ops->core_power(dev, POWER_OFF);
-
return ret;
 }
 
@@ -371,16 +374,16 @@ static __maybe_unused int venus_runtime_resume(struct 
device *dev)
const struct venus_pm_ops *pm_ops = core->pm_ops;
int ret;
 
+   ret = icc_set_bw(core->cpucfg_path, 0, kbps_to_icc(1000));
+   if (ret)
+   return ret;
+
if (pm_ops->core_power) {
ret = pm_ops->core_power(dev, POWER_ON);
if (ret)
return ret;
}
 
-   ret = icc_set_bw(core->cpucfg_path, 0, kbps_to_icc(1000));
-   if (ret)
-   return ret;
-
return hfi_core_resume(core, false);
 }
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member 
of Code Aurora Forum, hosted by The Linux Foundation



[PATCH v3 3/6] clk: axi-clkgen: add support for ZynqMP (UltraScale)

2020-09-23 Thread Alexandru Ardelean
From: Dragos Bogdan 

This IP core also works and is supported on the Xilinx ZynqMP (UltraScale)
FPGA boards.
This patch enables the driver to be available on these platforms as well.

Signed-off-by: Dragos Bogdan 
Signed-off-by: Alexandru Ardelean 
---
 drivers/clk/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 4026fac9fac3..44353f257fe2 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -239,7 +239,7 @@ config CLK_TWL6040
 
 config COMMON_CLK_AXI_CLKGEN
tristate "AXI clkgen driver"
-   depends on ARCH_ZYNQ || MICROBLAZE || COMPILE_TEST
+   depends on ARCH_ZYNQ || ARCH_ZYNQMP || MICROBLAZE || COMPILE_TEST
help
  Support for the Analog Devices axi-clkgen pcore clock generator for 
Xilinx
  FPGAs. It is commonly used in Analog Devices' reference designs.
-- 
2.25.1



[PATCH v3 6/6] clk: axi-clkgen: Add support for FPGA info

2020-09-23 Thread Alexandru Ardelean
From: Mircea Caprioru 

This patch adds support for vco maximum and minimum ranges in accordance
with fpga speed grade, voltage, device package, technology and family. This
new information is extracted from two new registers implemented in the ip
core: ADI_REG_FPGA_INFO and ADI_REG_FPGA_VOLTAGE, which are stored in the
'include/linux/fpga/adi-axi-common.h' file as they are common to all ADI
FPGA cores.

Signed-off-by: Mircea Caprioru 
Signed-off-by: Alexandru Ardelean 
---
 drivers/clk/clk-axi-clkgen.c | 67 +++-
 1 file changed, 59 insertions(+), 8 deletions(-)

diff --git a/drivers/clk/clk-axi-clkgen.c b/drivers/clk/clk-axi-clkgen.c
index 6ffc19e9d850..b03ea28270cb 100644
--- a/drivers/clk/clk-axi-clkgen.c
+++ b/drivers/clk/clk-axi-clkgen.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -49,6 +50,7 @@
 struct axi_clkgen {
void __iomem *base;
struct clk_hw clk_hw;
+   unsigned int pcore_version;
 };
 
 static uint32_t axi_clkgen_lookup_filter(unsigned int m)
@@ -101,15 +103,15 @@ static uint32_t axi_clkgen_lookup_lock(unsigned int m)
 }
 
 #ifdef ARCH_ZYNQMP
-static const unsigned int fpfd_min = 1;
-static const unsigned int fpfd_max = 45;
-static const unsigned int fvco_min = 80;
-static const unsigned int fvco_max = 160;
+static unsigned int fpfd_min = 1;
+static unsigned int fpfd_max = 45;
+static unsigned int fvco_min = 80;
+static unsigned int fvco_max = 160;
 #else
-static const unsigned int fpfd_min = 1;
-static const unsigned int fpfd_max = 30;
-static const unsigned int fvco_min = 60;
-static const unsigned int fvco_max = 120;
+static unsigned int fpfd_min = 1;
+static unsigned int fpfd_max = 30;
+static unsigned int fvco_min = 60;
+static unsigned int fvco_max = 120;
 #endif
 
 static void axi_clkgen_calc_params(unsigned long fin, unsigned long fout,
@@ -229,6 +231,49 @@ static void axi_clkgen_read(struct axi_clkgen *axi_clkgen,
*val = readl(axi_clkgen->base + reg);
 }
 
+static void axi_clkgen_setup_ranges(struct axi_clkgen *axi_clkgen)
+{
+   unsigned int reg_value;
+   unsigned int tech, family, speed_grade, voltage;
+
+   axi_clkgen_read(axi_clkgen, ADI_AXI_REG_FPGA_INFO, ®_value);
+   tech = ADI_AXI_INFO_FPGA_TECH(reg_value);
+   family = ADI_AXI_INFO_FPGA_FAMILY(reg_value);
+   speed_grade = ADI_AXI_INFO_FPGA_SPEED_GRADE(reg_value);
+
+   axi_clkgen_read(axi_clkgen, ADI_AXI_REG_FPGA_VOLTAGE, ®_value);
+   voltage = ADI_AXI_INFO_FPGA_VOLTAGE(reg_value);
+
+   switch (speed_grade) {
+   case ADI_AXI_FPGA_SPEED_GRADE_XILINX_1 ... 
ADI_AXI_FPGA_SPEED_GRADE_XILINX_1LV:
+   fvco_max = 120;
+   fpfd_max = 45;
+   break;
+   case ADI_AXI_FPGA_SPEED_GRADE_XILINX_2 ... 
ADI_AXI_FPGA_SPEED_GRADE_XILINX_2LV:
+   fvco_max = 144;
+   fpfd_max = 50;
+   if ((family == ADI_AXI_FPGA_FAMILY_XILINX_KINTEX) |
+   (family == ADI_AXI_FPGA_FAMILY_XILINX_ARTIX)) {
+   if (voltage < 950) {
+   fvco_max = 120;
+   fpfd_max = 45;
+   }
+   }
+   break;
+   case ADI_AXI_FPGA_SPEED_GRADE_XILINX_3:
+   fvco_max = 160;
+   fpfd_max = 55;
+   break;
+   default:
+   break;
+   };
+
+   if (tech == ADI_AXI_FPGA_TECH_XILINX_ULTRASCALE_PLUS) {
+   fvco_max = 160;
+   fvco_min = 80;
+   }
+}
+
 static int axi_clkgen_wait_non_busy(struct axi_clkgen *axi_clkgen)
 {
unsigned int timeout = 1;
@@ -524,6 +569,12 @@ static int axi_clkgen_probe(struct platform_device *pdev)
if (IS_ERR(axi_clkgen->base))
return PTR_ERR(axi_clkgen->base);
 
+   axi_clkgen_read(axi_clkgen, ADI_AXI_REG_VERSION,
+   &axi_clkgen->pcore_version);
+
+   if (ADI_AXI_PCORE_VER_MAJOR(axi_clkgen->pcore_version) > 0x04)
+   axi_clkgen_setup_ranges(axi_clkgen);
+
init.num_parents = of_clk_get_parent_count(pdev->dev.of_node);
if (init.num_parents < 1 || init.num_parents > 2)
return -EINVAL;
-- 
2.25.1



[PATCH v3 0/4] Venus - change clk enable, disable order and change bw values

2020-09-23 Thread Mansur Alisha Shaik
The intention of this patchset is to correct clock enable and disable
order and vote for venus-ebi and cpucfg paths with average bandwidth
instad of peak bandwidth since with current implementation we are seeing
clock related warning during XO-SD and suspend device while video playback

Mansur Alisha Shaik (4):
  venus: core: change clk enable and disable order in resume and suspend
  venus: core: vote for video-mem path
  venus: core: vote with average bandwidth and peak bandwidth as zero
  venus: put dummy vote on video-mem path after last session release

 drivers/media/platform/qcom/venus/core.c   | 32 --
 drivers/media/platform/qcom/venus/pm_helpers.c | 10 
 2 files changed, 35 insertions(+), 7 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member 
of Code Aurora Forum, hosted by The Linux Foundation



[PATCH v3 2/6] clk: axi-clkgen: Set power bits for fractional mode

2020-09-23 Thread Alexandru Ardelean
From: Lars-Peter Clausen 

Using the fractional dividers requires some additional power bits to be
set.

The fractional power bits are not documented and the current heuristic
for setting them seems be insufficient for some cases. Just always set all
the fractional power bits when in fractional mode.

Signed-off-by: Lars-Peter Clausen 
Signed-off-by: Alexandru Ardelean 
---
 drivers/clk/clk-axi-clkgen.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/clk/clk-axi-clkgen.c b/drivers/clk/clk-axi-clkgen.c
index 1df03cc6d089..14d803e6af62 100644
--- a/drivers/clk/clk-axi-clkgen.c
+++ b/drivers/clk/clk-axi-clkgen.c
@@ -37,6 +37,7 @@
 #define MMCM_REG_LOCK1 0x18
 #define MMCM_REG_LOCK2 0x19
 #define MMCM_REG_LOCK3 0x1a
+#define MMCM_REG_POWER 0x28
 #define MMCM_REG_FILTER1   0x4e
 #define MMCM_REG_FILTER2   0x4f
 
@@ -320,6 +321,7 @@ static int axi_clkgen_set_rate(struct clk_hw *clk_hw,
struct axi_clkgen *axi_clkgen = clk_hw_to_axi_clkgen(clk_hw);
unsigned int d, m, dout;
struct axi_clkgen_div_params params;
+   uint32_t power = 0;
uint32_t filter;
uint32_t lock;
 
@@ -331,6 +333,11 @@ static int axi_clkgen_set_rate(struct clk_hw *clk_hw,
if (d == 0 || dout == 0 || m == 0)
return -EINVAL;
 
+   if ((dout & 0x7) != 0 || (m & 0x7) != 0)
+   power |= 0x9800;
+
+   axi_clkgen_mmcm_write(axi_clkgen, MMCM_REG_POWER, power, 0x9800);
+
filter = axi_clkgen_lookup_filter(m - 1);
lock = axi_clkgen_lookup_lock(m - 1);
 
-- 
2.25.1



Re: [PATCH] KVM: Enable hardware before doing arch VM initialization

2020-09-23 Thread Paolo Bonzini
On 24/09/20 08:31, Huacai Chen wrote:
> Hi, Sean,
> 
> On Thu, Sep 24, 2020 at 3:00 AM Sean Christopherson
>  wrote:
>>
>> Swap the order of hardware_enable_all() and kvm_arch_init_vm() to
>> accommodate Intel's Trust Domain Extension (TDX), which needs VMX to be
>> fully enabled during VM init in order to make SEAMCALLs.
>>
>> This also provides consistent ordering between kvm_create_vm() and
>> kvm_destroy_vm() with respect to calling kvm_arch_destroy_vm() and
>> hardware_disable_all().
> Do you means that hardware_enable_all() enable VMX, kvm_arch_init_vm()
> enable TDX, and TDX depends on VMX enabled at first? If so, can TDX be
> also enabled at hardware_enable_all()?

kvm_arch_init_vm() enables TDX *for the VM*, and to do that it needs VMX
instructions (specifically SEAMCALL, which is a hypervisor->"ultravisor"
call).  Because that action is VM-specific it cannot be done in
hardware_enable_all().

Paolo

> The swapping seems not affect MIPS, but I observed a fact:
> kvm_arch_hardware_enable() not only be called at
> hardware_enable_all(), but also be called at kvm_starting_cpu(). Even
> if you swap the order, new starting CPUs are not enabled VMX before
> kvm_arch_init_vm(). (Maybe I am wrong because I'm not familiar with
> VMX/TDX).
> 
> Huacai
>>
>> Cc: Marc Zyngier 
>> Cc: James Morse 
>> Cc: Julien Thierry 
>> Cc: Suzuki K Poulose 
>> Cc: linux-arm-ker...@lists.infradead.org
>> Cc: Huacai Chen 
>> Cc: Aleksandar Markovic 
>> Cc: linux-m...@vger.kernel.org
>> Cc: Paul Mackerras 
>> Cc: kvm-...@vger.kernel.org
>> Cc: Christian Borntraeger 
>> Cc: Janosch Frank 
>> Cc: David Hildenbrand 
>> Cc: Cornelia Huck 
>> Cc: Claudio Imbrenda 
>> Cc: Vitaly Kuznetsov 
>> Cc: Wanpeng Li 
>> Cc: Jim Mattson 
>> Cc: Joerg Roedel 
>> Signed-off-by: Sean Christopherson 
>> ---
>>
>> Obviously not required until the TDX series comes along, but IMO KVM
>> should be consistent with respect to enabling and disabling virt support
>> in hardware.
>>
>> Tested only on Intel hardware.  Unless I missed something, this only
>> affects x86, Arm and MIPS as hardware enabling is a nop for s390 and PPC.
>> Arm looks safe (based on my mostly clueless reading of the code), but I
>> have no idea if this will cause problem for MIPS, which is doing all kinds
>> of things in hardware_enable() that I don't pretend to fully understand.
>>
>>  virt/kvm/kvm_main.c | 16 
>>  1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index cf88233b819a..58fa19bcfc90 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -766,7 +766,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
>> struct kvm_memslots *slots = kvm_alloc_memslots();
>>
>> if (!slots)
>> -   goto out_err_no_arch_destroy_vm;
>> +   goto out_err_no_disable;
>> /* Generations must be different for each address space. */
>> slots->generation = i;
>> rcu_assign_pointer(kvm->memslots[i], slots);
>> @@ -776,19 +776,19 @@ static struct kvm *kvm_create_vm(unsigned long type)
>> rcu_assign_pointer(kvm->buses[i],
>> kzalloc(sizeof(struct kvm_io_bus), 
>> GFP_KERNEL_ACCOUNT));
>> if (!kvm->buses[i])
>> -   goto out_err_no_arch_destroy_vm;
>> +   goto out_err_no_disable;
>> }
>>
>> kvm->max_halt_poll_ns = halt_poll_ns;
>>
>> -   r = kvm_arch_init_vm(kvm, type);
>> -   if (r)
>> -   goto out_err_no_arch_destroy_vm;
>> -
>> r = hardware_enable_all();
>> if (r)
>> goto out_err_no_disable;
>>
>> +   r = kvm_arch_init_vm(kvm, type);
>> +   if (r)
>> +   goto out_err_no_arch_destroy_vm;
>> +
>>  #ifdef CONFIG_HAVE_KVM_IRQFD
>> INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list);
>>  #endif
>> @@ -815,10 +815,10 @@ static struct kvm *kvm_create_vm(unsigned long type)
>> mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
>>  #endif
>>  out_err_no_mmu_notifier:
>> -   hardware_disable_all();
>> -out_err_no_disable:
>> kvm_arch_destroy_vm(kvm);
>>  out_err_no_arch_destroy_vm:
>> +   hardware_disable_all();
>> +out_err_no_disable:
>> WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
>> for (i = 0; i < KVM_NR_BUSES; i++)
>> kfree(kvm_get_bus(kvm, i));
>> --
>> 2.28.0
>>
> 



[PATCH v3 4/6] clk: axi-clkgen: Respect ZYNQMP PFD/VCO frequency limits

2020-09-23 Thread Alexandru Ardelean
From: Mathias Tausen 

Since axi-clkgen is now supported on ZYNQMP, make sure the max/min
frequencies of the PFD and VCO are respected.

Signed-off-by: Mathias Tausen 
Signed-off-by: Alexandru Ardelean 
---
 drivers/clk/clk-axi-clkgen.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/clk/clk-axi-clkgen.c b/drivers/clk/clk-axi-clkgen.c
index 14d803e6af62..6ffc19e9d850 100644
--- a/drivers/clk/clk-axi-clkgen.c
+++ b/drivers/clk/clk-axi-clkgen.c
@@ -100,10 +100,17 @@ static uint32_t axi_clkgen_lookup_lock(unsigned int m)
return 0x1f1f00fa;
 }
 
+#ifdef ARCH_ZYNQMP
+static const unsigned int fpfd_min = 1;
+static const unsigned int fpfd_max = 45;
+static const unsigned int fvco_min = 80;
+static const unsigned int fvco_max = 160;
+#else
 static const unsigned int fpfd_min = 1;
 static const unsigned int fpfd_max = 30;
 static const unsigned int fvco_min = 60;
 static const unsigned int fvco_max = 120;
+#endif
 
 static void axi_clkgen_calc_params(unsigned long fin, unsigned long fout,
unsigned int *best_d, unsigned int *best_m, unsigned int *best_dout)
-- 
2.25.1



[PATCH v3 1/6] clk: axi-clkgen: Add support for fractional dividers

2020-09-23 Thread Alexandru Ardelean
From: Lars-Peter Clausen 

The axi-clkgen has (optional) fractional dividers on the output clock
divider and feedback clock divider path. Utilizing the fractional dividers
allows for a better resolution of the output clock, being able to
synthesize more frequencies.

Rework the driver support to support the fractional register fields, both
for setting a new rate as well as reading back the current rate from the
hardware.

For setting the rate if no perfect divider settings were found in
non-fractional mode try again in fractional mode and see if better settings
can be found. This appears to be the recommended mode of operation.

Signed-off-by: Lars-Peter Clausen 
Signed-off-by: Alexandru Ardelean 
---
 drivers/clk/clk-axi-clkgen.c | 180 +--
 1 file changed, 129 insertions(+), 51 deletions(-)

diff --git a/drivers/clk/clk-axi-clkgen.c b/drivers/clk/clk-axi-clkgen.c
index 96f351785b41..1df03cc6d089 100644
--- a/drivers/clk/clk-axi-clkgen.c
+++ b/drivers/clk/clk-axi-clkgen.c
@@ -27,8 +27,10 @@
 
 #define AXI_CLKGEN_V2_DRP_STATUS_BUSY  BIT(16)
 
+#define MMCM_REG_CLKOUT5_2 0x07
 #define MMCM_REG_CLKOUT0_1 0x08
 #define MMCM_REG_CLKOUT0_2 0x09
+#define MMCM_REG_CLKOUT6_2 0x13
 #define MMCM_REG_CLK_FB1   0x14
 #define MMCM_REG_CLK_FB2   0x15
 #define MMCM_REG_CLK_DIV   0x16
@@ -40,6 +42,7 @@
 
 #define MMCM_CLKOUT_NOCOUNTBIT(6)
 
+#define MMCM_CLK_DIV_DIVIDEBIT(11)
 #define MMCM_CLK_DIV_NOCOUNT   BIT(12)
 
 struct axi_clkgen {
@@ -107,6 +110,8 @@ static void axi_clkgen_calc_params(unsigned long fin, 
unsigned long fout,
unsigned long d, d_min, d_max, _d_min, _d_max;
unsigned long m, m_min, m_max;
unsigned long f, dout, best_f, fvco;
+   unsigned long fract_shift = 0;
+   unsigned long fvco_min_fract, fvco_max_fract;
 
fin /= 1000;
fout /= 1000;
@@ -119,42 +124,89 @@ static void axi_clkgen_calc_params(unsigned long fin, 
unsigned long fout,
d_min = max_t(unsigned long, DIV_ROUND_UP(fin, fpfd_max), 1);
d_max = min_t(unsigned long, fin / fpfd_min, 80);
 
-   m_min = max_t(unsigned long, DIV_ROUND_UP(fvco_min, fin) * d_min, 1);
-   m_max = min_t(unsigned long, fvco_max * d_max / fin, 64);
+again:
+   fvco_min_fract = fvco_min << fract_shift;
+   fvco_max_fract = fvco_max << fract_shift;
+
+   m_min = max_t(unsigned long, DIV_ROUND_UP(fvco_min_fract, fin) * d_min, 
1);
+   m_max = min_t(unsigned long, fvco_max_fract * d_max / fin, 64 << 
fract_shift);
 
for (m = m_min; m <= m_max; m++) {
-   _d_min = max(d_min, DIV_ROUND_UP(fin * m, fvco_max));
-   _d_max = min(d_max, fin * m / fvco_min);
+   _d_min = max(d_min, DIV_ROUND_UP(fin * m, fvco_max_fract));
+   _d_max = min(d_max, fin * m / fvco_min_fract);
 
for (d = _d_min; d <= _d_max; d++) {
fvco = fin * m / d;
 
dout = DIV_ROUND_CLOSEST(fvco, fout);
-   dout = clamp_t(unsigned long, dout, 1, 128);
+   dout = clamp_t(unsigned long, dout, 1, 128 << 
fract_shift);
f = fvco / dout;
if (abs(f - fout) < abs(best_f - fout)) {
best_f = f;
*best_d = d;
-   *best_m = m;
-   *best_dout = dout;
+   *best_m = m << (3 - fract_shift);
+   *best_dout = dout << (3 - fract_shift);
if (best_f == fout)
return;
}
}
}
+
+   /* Lets see if we find a better setting in fractional mode */
+   if (fract_shift == 0) {
+   fract_shift = 3;
+   goto again;
+   }
 }
 
-static void axi_clkgen_calc_clk_params(unsigned int divider, unsigned int *low,
-   unsigned int *high, unsigned int *edge, unsigned int *nocount)
+struct axi_clkgen_div_params {
+   unsigned int low;
+   unsigned int high;
+   unsigned int edge;
+   unsigned int nocount;
+   unsigned int frac_en;
+   unsigned int frac;
+   unsigned int frac_wf_f;
+   unsigned int frac_wf_r;
+   unsigned int frac_phase;
+};
+
+static void axi_clkgen_calc_clk_params(unsigned int divider,
+   unsigned int frac_divider, struct axi_clkgen_div_params *params)
 {
-   if (divider == 1)
-   *nocount = 1;
-   else
-   *nocount = 0;
 
-   *high = divider / 2;
-   *edge = divider % 2;
-   *low = divider - *high;
+   memset(params, 0x0, sizeof(*params));
+
+   if (divider == 1) {
+   params->nocount = 1;
+   return;
+   }
+
+   if (frac_divider == 0) {
+   params->high = divider / 2;
+   params->edge = divider % 2;
+  

[PATCH v3 5/6] include: fpga: adi-axi-common.h: add definitions for supported FPGAs

2020-09-23 Thread Alexandru Ardelean
From: Mircea Caprioru 

All (newer) FPGA IP cores supported by Analog Devices, store information in
the synthesized designs. This information describes various parameters,
including the family of boards on which this is deployed, speed-grade, and
so on.

Currently, some of these definitions are deployed mostly on Xilinx boards,
but they have been considered also for FPGA boards from other vendors.

The register definitions are described at this link:
  https://wiki.analog.com/resources/fpga/docs/hdl/regmap
(the 'Base (common to all cores)' section).

Acked-by: Moritz Fischer 
Signed-off-by: Mircea Caprioru 
Signed-off-by: Alexandru Ardelean 
---
 include/linux/fpga/adi-axi-common.h | 103 
 1 file changed, 103 insertions(+)

diff --git a/include/linux/fpga/adi-axi-common.h 
b/include/linux/fpga/adi-axi-common.h
index 141ac3f251e6..1a7f18e3a384 100644
--- a/include/linux/fpga/adi-axi-common.h
+++ b/include/linux/fpga/adi-axi-common.h
@@ -13,6 +13,9 @@
 
 #define ADI_AXI_REG_VERSION0x
 
+#define ADI_AXI_REG_FPGA_INFO  0x001C
+#define ADI_AXI_REG_FPGA_VOLTAGE   0x0140
+
 #define ADI_AXI_PCORE_VER(major, minor, patch) \
(((major) << 16) | ((minor) << 8) | (patch))
 
@@ -20,4 +23,104 @@
 #define ADI_AXI_PCORE_VER_MINOR(version)   (((version) >> 8) & 0xff)
 #define ADI_AXI_PCORE_VER_PATCH(version)   ((version) & 0xff)
 
+#define ADI_AXI_INFO_FPGA_VOLTAGE(val) ((val) & 0x)
+
+#define ADI_AXI_INFO_FPGA_TECH(info)   (((info) >> 24) & 0xff)
+#define ADI_AXI_INFO_FPGA_FAMILY(info) (((info) >> 16) & 0xff)
+#define ADI_AXI_INFO_FPGA_SPEED_GRADE(info)(((info) >> 8) & 0xff)
+#define ADI_AXI_INFO_FPGA_DEV_PACKAGE(info)((info) & 0xff)
+
+/**
+ * FPGA Technology definitions
+ */
+#define ADI_AXI_FPGA_TECH_XILINX_UNKNOWN   0
+#define ADI_AXI_FPGA_TECH_XILINS_SERIES7   1
+#define ADI_AXI_FPGA_TECH_XILINX_ULTRASCALE2
+#define ADI_AXI_FPGA_TECH_XILINX_ULTRASCALE_PLUS   3
+
+#define ADI_AXI_FPGA_TECH_INTEL_UNKNOWN100
+#define ADI_AXI_FPGA_TECH_INTEL_CYCLONE_5  101
+#define ADI_AXI_FPGA_TECH_INTEL_CYCLONE_10 102
+#define ADI_AXI_FPGA_TECH_INTEL_ARRIA_10   103
+#define ADI_AXI_FPGA_TECH_INTEL_STRATIX_10 104
+
+/**
+ * FPGA Family definitions
+ */
+#define ADI_AXI_FPGA_FAMILY_UNKNOWN0
+
+#define ADI_AXI_FPGA_FAMILY_XILINX_ARTIX   1
+#define ADI_AXI_FPGA_FAMILY_XILINX_KINTEX  2
+#define ADI_AXI_FPGA_FAMILY_XILINX_VIRTEX  3
+#define ADI_AXI_FPGA_FAMILY_XILINX_ZYNQ4
+
+#define ADI_AXI_FPGA_FAMILY_INTEL_SX   1
+#define ADI_AXI_FPGA_FAMILY_INTEL_GX   2
+#define ADI_AXI_FPGA_FAMILY_INTEL_GT   3
+#define ADI_AXI_FPGA_FAMILY_INTEL_GZ   4
+
+/**
+ * FPGA Speed-grade definitions
+ */
+#define ADI_AXI_FPGA_SPEED_GRADE_UNKNOWN   0
+
+#define ADI_AXI_FPGA_SPEED_GRADE_XILINX_1  10
+#define ADI_AXI_FPGA_SPEED_GRADE_XILINX_1L 11
+#define ADI_AXI_FPGA_SPEED_GRADE_XILINX_1H 12
+#define ADI_AXI_FPGA_SPEED_GRADE_XILINX_1HV13
+#define ADI_AXI_FPGA_SPEED_GRADE_XILINX_1LV14
+#define ADI_AXI_FPGA_SPEED_GRADE_XILINX_2  20
+#define ADI_AXI_FPGA_SPEED_GRADE_XILINX_2L 21
+#define ADI_AXI_FPGA_SPEED_GRADE_XILINX_2LV22
+#define ADI_AXI_FPGA_SPEED_GRADE_XILINX_3  30
+
+#define ADI_AXI_FPGA_SPEED_GRADE_INTEL_1   1
+#define ADI_AXI_FPGA_SPEED_GRADE_INTEL_2   2
+#define ADI_AXI_FPGA_SPEED_GRADE_INTEL_3   3
+#define ADI_AXI_FPGA_SPEED_GRADE_INTEL_4   4
+#define ADI_AXI_FPGA_SPEED_GRADE_INTEL_5   5
+#define ADI_AXI_FPGA_SPEED_GRADE_INTEL_6   6
+#define ADI_AXI_FPGA_SPEED_GRADE_INTEL_7   7
+#define ADI_AXI_FPGA_SPEED_GRADE_INTEL_8   8
+#define ADI_AXI_FPGA_SPEED_GRADE_INTEL_9   9
+
+/**
+ * FPGA Device Package definitions
+ */
+#define ADI_AXI_FPGA_DEV_PACKAGE_UNKNOWN   0
+
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_RF 1
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_FL 2
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_FF 3
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_FB 4
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_HC 5
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_FH 6
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_CS 7
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_CP 8
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_FT 9
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_FG 10
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_SB 11
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_RB 12
+#define ADI_AXI_FPGA_DEV_PACKAGE_XILINX_RS 13
+#define ADI_AXI_FPGA_DEV_PACKAGE_XI

[PATCH] net/ethernet/broadcom: fix spelling typo

2020-09-23 Thread Wang Qing
Modify the comment typo: "compliment" -> "complement".

Signed-off-by: Wang Qing 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
index bfc0e45..5caa75b
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
@@ -284,12 +284,12 @@
 #define CCM_REG_GR_ARB_TYPE 0xd015c
 /* [RW 2] Load (FIC0) channel group priority. The lowest priority is 0; the
highest priority is 3. It is supposed; that the Store channel priority is
-   the compliment to 4 of the rest priorities - Aggregation channel; Load
+   the complement to 4 of the rest priorities - Aggregation channel; Load
(FIC0) channel and Load (FIC1). */
 #define CCM_REG_GR_LD0_PR   0xd0164
 /* [RW 2] Load (FIC1) channel group priority. The lowest priority is 0; the
highest priority is 3. It is supposed; that the Store channel priority is
-   the compliment to 4 of the rest priorities - Aggregation channel; Load
+   the complement to 4 of the rest priorities - Aggregation channel; Load
(FIC0) channel and Load (FIC1). */
 #define CCM_REG_GR_LD1_PR   0xd0168
 /* [RW 2] General flags index. */
@@ -4489,11 +4489,11 @@
 #define TCM_REG_GR_ARB_TYPE 0x50114
 /* [RW 2] Load (FIC0) channel group priority. The lowest priority is 0; the
highest priority is 3. It is supposed that the Store channel is the
-   compliment of the other 3 groups. */
+   complement of the other 3 groups. */
 #define TCM_REG_GR_LD0_PR   0x5011c
 /* [RW 2] Load (FIC1) channel group priority. The lowest priority is 0; the
highest priority is 3. It is supposed that the Store channel is the
-   compliment of the other 3 groups. */
+   complement of the other 3 groups. */
 #define TCM_REG_GR_LD1_PR   0x50120
 /* [RW 4] The number of double REG-pairs; loaded from the STORM context and
sent to STORM; for a specific connection type. The double REG-pairs are
@@ -5020,11 +5020,11 @@
 #define UCM_REG_GR_ARB_TYPE 0xe0144
 /* [RW 2] Load (FIC0) channel group priority. The lowest priority is 0; the
highest priority is 3. It is supposed that the Store channel group is
-   compliment to the others. */
+   complement to the others. */
 #define UCM_REG_GR_LD0_PR   0xe014c
 /* [RW 2] Load (FIC1) channel group priority. The lowest priority is 0; the
highest priority is 3. It is supposed that the Store channel group is
-   compliment to the others. */
+   complement to the others. */
 #define UCM_REG_GR_LD1_PR   0xe0150
 /* [RW 2] The queue index for invalidate counter flag decision. */
 #define UCM_REG_INV_CFLG_Q  0xe00e4
@@ -5523,11 +5523,11 @@
 #define XCM_REG_GR_ARB_TYPE 0x2020c
 /* [RW 2] Load (FIC0) channel group priority. The lowest priority is 0; the
highest priority is 3. It is supposed that the Channel group is the
-   compliment of the other 3 groups. */
+   complement of the other 3 groups. */
 #define XCM_REG_GR_LD0_PR   0x20214
 /* [RW 2] Load (FIC1) channel group priority. The lowest priority is 0; the
highest priority is 3. It is supposed that the Channel group is the
-   compliment of the other 3 groups. */
+   complement of the other 3 groups. */
 #define XCM_REG_GR_LD1_PR   0x20218
 /* [RW 1] Input nig0 Interface enable. If 0 - the valid input is
disregarded; acknowledge output is deasserted; all other signals are
-- 
2.7.4



[PATCH v3 0/6] clk: axi-clk-gen: misc updates to the driver

2020-09-23 Thread Alexandru Ardelean
These patches synchronize the driver with the current state in the
Analog Devices Linux tree:
  https://github.com/analogdevicesinc/linux/

They have been in the tree for about 2-3, so they did receive some
testing.

Highlights are:
* Add support for fractional dividers (Lars-Peter Clausen)
* Enable support for ZynqMP (UltraScale) (Dragos Bogdan)
* Support frequency limits for ZynqMP (Mathias Tausen)
  - And continued by Mircea Caprioru, to read them from the IP cores

Changelog v2 -> v3:
* for patch 'include: fpga: adi-axi-common.h: add definitions for supported 
FPGAs'
  - fix whitespace found by checkpatch
  - add 'Acked-by: Moritz Fischer '

Changelog v1 -> v2:
- in patch 'include: fpga: adi-axi-common.h: add definitions for supported 
FPGAs'
  * converted enums to #define
  * added Intel FPGA definitions
  * added Device-Package definitions
  * added INTEL / XILINX in the define names
 definitions according to:
 
https://github.com/analogdevicesinc/hdl/blob/4e438261aa319b1dda4c593c155218a93b1d869b/library/scripts/adi_intel_device_info_enc.tcl
 
https://github.com/analogdevicesinc/hdl/blob/4e438261aa319b1dda4c593c155218a93b1d869b/library/scripts/adi_xilinx_device_info_enc.tcl

Dragos Bogdan (1):
  clk: axi-clkgen: add support for ZynqMP (UltraScale)

Lars-Peter Clausen (2):
  clk: axi-clkgen: Add support for fractional dividers
  clk: axi-clkgen: Set power bits for fractional mode

Mathias Tausen (1):
  clk: axi-clkgen: Respect ZYNQMP PFD/VCO frequency limits

Mircea Caprioru (2):
  include: fpga: adi-axi-common.h: add definitions for supported FPGAs
  clk: axi-clkgen: Add support for FPGA info

 drivers/clk/Kconfig |   2 +-
 drivers/clk/clk-axi-clkgen.c| 253 ++--
 include/linux/fpga/adi-axi-common.h | 103 +++
 3 files changed, 302 insertions(+), 56 deletions(-)

-- 
2.25.1



Re: [PATCH] leds: lp50xx: Fix an error handling path in 'lp50xx_probe_dt()'

2020-09-23 Thread Dan Carpenter
On Wed, Sep 23, 2020 at 08:49:56PM +0200, Christophe JAILLET wrote:
> Le 23/09/2020 à 15:35, Dan Carpenter a écrit :
> > I've added Heikki Krogerus to the CC list because my question is mostly
> > about commit 59abd83672f7 ("drivers: base: Introducing software nodes to
> > the firmware node framework").
> > 
> > I have been trying to teach Smatch to understand reference counting so
> > it can discover these kinds of bugs automatically.
> > 
> > I don't know how software_node_get_next_child() can work when it doesn't
> > call kobject_get().  This sort of bug would have been caught in testing
> > because it affects the success path so I must be reading the code wrong.
> > 
> 
> I had the same reading of the code and thought that I was missing something
> somewhere.
> 
> There is the same question about 'acpi_get_next_subnode' which is also a
> '.get_next_child_node' function, without any ref counting, if I'm correct.
> 

Yeah, but there aren't any ->get/put() ops for the acpi_get_next_subnode()
stuff so it's not a problem.  (Presumably there is some other sort of
refcounting policy there).

regards,
dan carpenter



Re: [RFC PATCH 1/9] misc: Add Surface Aggregator subsystem

2020-09-23 Thread Greg Kroah-Hartman
On Wed, Sep 23, 2020 at 10:34:23PM +0200, Maximilian Luz wrote:
> In short: Concurrent execution of the counter functions works, as far as
> I can tell at least, and, as you see by the long answer, I have to spend
> some time and think about the duplicate-value problem (again). If you've
> managed to read through this wall of text (sorry about that) and you
> have any ideas/preferences, please let me know.

No, this all answers my question really well, thanks, what you have now
is fine, no need to change it.

thanks,

greg k-h


[PATCH RESEND] sched/fair: Fix wrong cpu selecting from isolated domain

2020-09-23 Thread Xunlei Pang
We've met problems that occasionally tasks with full cpumask
(e.g. by putting it into a cpuset or setting to full affinity)
were migrated to our isolated cpus in production environment.

After some analysis, we found that it is due to the current
select_idle_smt() not considering the sched_domain mask.

Steps to reproduce on my 31-CPU hyperthreads machine:
1. with boot parameter: "isolcpus=domain,2-31"
   (thread lists: 0,16 and 1,17)
2. cgcreate -g cpu:test; cgexec -g cpu:test "test_threads"
3. some threads will be migrated to the isolated cpu16~17.

Fix it by checking the valid domain mask in select_idle_smt().

Fixes: 10e2f1acd010 ("sched/core: Rewrite and improve select_idle_siblings())
Reported-by: Wetp Zhang 
Reviewed-by: Jiang Biao 
Signed-off-by: Xunlei Pang 
---
 kernel/sched/fair.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1a68a05..fa942c4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6075,7 +6075,7 @@ static int select_idle_core(struct task_struct *p, struct 
sched_domain *sd, int
 /*
  * Scan the local SMT mask for idle CPUs.
  */
-static int select_idle_smt(struct task_struct *p, int target)
+static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int 
target)
 {
int cpu;
 
@@ -6083,7 +6083,8 @@ static int select_idle_smt(struct task_struct *p, int 
target)
return -1;
 
for_each_cpu(cpu, cpu_smt_mask(target)) {
-   if (!cpumask_test_cpu(cpu, p->cpus_ptr))
+   if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
+   !cpumask_test_cpu(cpu, sched_domain_span(sd)))
continue;
if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
return cpu;
@@ -6099,7 +6100,7 @@ static inline int select_idle_core(struct task_struct *p, 
struct sched_domain *s
return -1;
 }
 
-static inline int select_idle_smt(struct task_struct *p, int target)
+static inline int select_idle_smt(struct task_struct *p, struct sched_domain 
*sd, int target)
 {
return -1;
 }
@@ -6274,7 +6275,7 @@ static int select_idle_sibling(struct task_struct *p, int 
prev, int target)
if ((unsigned)i < nr_cpumask_bits)
return i;
 
-   i = select_idle_smt(p, target);
+   i = select_idle_smt(p, sd, target);
if ((unsigned)i < nr_cpumask_bits)
return i;
 
-- 
1.8.3.1



[PATCH] iwlwifi: mvm: Increase session protection duration for association

2020-09-23 Thread Kai-Heng Feng
Sometimes Intel AX201 fails to associate with AP:
[  839.290042] wlp0s20f3: authenticate with xx:xx:xx:xx:xx:xx
[  839.291737] wlp0s20f3: send auth to xx:xx:xx:xx:xx:xx (try 1/3)
[  839.350010] wlp0s20f3: send auth to xx:xx:xx:xx:xx:xx (try 2/3)
[  839.360826] wlp0s20f3: authenticated
[  839.363205] wlp0s20f3: associate with xx:xx:xx:xx:xx:xx (try 1/3)
[  839.370342] wlp0s20f3: RX AssocResp from xx:xx:xx:xx:xx:xx (capab=0x431 
status=0 aid=12)
[  839.378925] wlp0s20f3: associated
[  839.431788] wlp0s20f3: deauthenticated from xx:xx:xx:xx:xx:xx (Reason: 
2=PREV_AUTH_NOT_VALID)

It fails because EAPOL hasn't finished. Increase the  session protection
duration to 1200TU can eliminate the problem.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209237
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c 
b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
index 9374c85c5caf..54acd9a68955 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
@@ -3297,13 +3297,13 @@ static void iwl_mvm_mac_mgd_prepare_tx(struct 
ieee80211_hw *hw,
 * session for a much longer time since the firmware will internally
 * create two events: a 300TU one with a very high priority that
 * won't be fragmented which should be enough for 99% of the cases,
-* and another one (which we configure here to be 900TU long) which
+* and another one (which we configure here to be 1200TU long) which
 * will have a slightly lower priority, but more importantly, can be
 * fragmented so that it'll allow other activities to run.
 */
if (fw_has_capa(&mvm->fw->ucode_capa,
IWL_UCODE_TLV_CAPA_SESSION_PROT_CMD))
-   iwl_mvm_schedule_session_protection(mvm, vif, 900,
+   iwl_mvm_schedule_session_protection(mvm, vif, 1200,
min_duration, false);
else
iwl_mvm_protect_session(mvm, vif, duration,
-- 
2.17.1



Re: [RFC PATCH 8/9] surface_aggregator: Add DebugFS interface

2020-09-23 Thread Greg Kroah-Hartman
On Thu, Sep 24, 2020 at 12:06:54AM +0200, Maximilian Luz wrote:
> On 9/23/20 8:29 PM, Greg Kroah-Hartman wrote:
> > On Wed, Sep 23, 2020 at 08:03:38PM +0200, Maximilian Luz wrote:
> > > On 9/23/20 6:14 PM, Greg Kroah-Hartman wrote:
> 
> [...]
> 
> > > So the -EFAULT returned by put_user should have precedence? I was aiming
> > > for "in case it fails, return with the first error".
> > 
> > -EFAULT trumps everything :)
> 
> Perfect, thanks!
> 
> > > > Listen, I'm all for doing whatever you want in debugfs, but why are you
> > > > doing random ioctls here?  Why not just read/write a file to do what you
> > > > need/want to do here instead?
> > > 
> > > Two reasons, mostly: First, the IOCTL allows me to execute requests in
> > > parallel with just one open file descriptor and not having to maintain
> > > some sort of back-buffer to wait around until the reader gets to reading
> > > the thing. I've used that for stress-testing the EC communication in the
> > > past, which had some issues (dropping bytes, invalid CRCs, ...) under
> > > heavy(-ish) load. Second, I'm considering adding support for events to
> > > this device in the future by having user-space receive events by reading
> > > from the device. Events would also be enabled or disabled via an IOCTL.
> > > That could be implemented in a second device though. Events were also my
> > > main reason for adding a version to this interface: Discerning between
> > > one that has event support and one that has not.
> > 
> > A misc device can also do this, much simpler, right?  Why not use that?
> 
> Sorry to ask so many questions, just want to make sure I understand you
> correctly:
> 
>  - So you suggest I go with a misc device instead of putting this into
>debugfs?

Yes.

>  - And I keep the IOCTL?

If you need it, although the interface Arnd says might be much simpler
(read/write)

>  - Can I still tell people to not use it and that it's not my fault if a
>change in the interface breaks their tools if it's not in debugfs?

Yes :)

>  - Also load it via a separate module (module_misc_device, I assume)?

That works.

> One reason why the platform_device approach is practical in this
> scenario is that I can leverage the driver core to defer probing and
> thus defer creating the device if the controller isn't there yet.

That's fine, and is a nice abuse of the platform driver interface.  I
say "abuse" because we really don't have a simpler way to do this at the
moment, but this really isn't a platform device...

> Similarly, the driver is automatically unbound if the controller goes
> away and the device should be destroyed. All of this should currently be
> handled via the device link created by ssam_client_bind() (unless I
> really misunderstood those).

That all is fine, just create the misc device when your driver binds to
the device, just like you create the debugfs file entries today.
There's no difference except you get a "real" char device node instead
of a debugfs file.

> I should be able to handle that by having the device refuse to open the
> file if the controller isn't there. Holding the state-lock during the
> request execution should ensure that the controller doesn't get shut
> down.

Nah, no need for that, again, keep the platform driver/device and then
create the misc device (and remove it) where you are creating/removing
the debugfs files.

> > A simple misc device would make it very simple and easy to do instead,
> > why not do that?
> 
> Again, I considered the probe deferring of the platform driver fairly
> handy (in addition to having the implicit debugfs warning of "don't rely
> on this"), but if you prefer me implementing this as misc device, I'll
> do that.

The "joy" of creating a user api is that no matter how much you tell
people "do not depend on this", they will, so no matter the file being
in debugfs, or a misc device, you might be stuck with it for forever,
sorry.

thanks,

greg k-h


linux-next: manual merge of the nvdimm tree with the vfs tree

2020-09-23 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the nvdimm tree got a conflict in:

  lib/iov_iter.c

between commit:

  e33ea6e5ba6a ("x86/uaccess: Use pointer masking to limit uaccess speculation")

from the vfs tree and commit:

  0a78de3d4b7b ("x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, 
kernel}()")

from the nvdimm tree.

I fixed it up (I just used the latter, but I suspect that more work is
needed) and can carry the fix as necessary. This is now fixed as far as
linux-next is concerned, but any non trivial conflicts should be mentioned
to your upstream maintainer when your tree is submitted for merging.
You may also want to consider cooperating with the maintainer of the
conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpnFdDwOhdd8.pgp
Description: OpenPGP digital signature


[PATCH] sound/soc/codecs: fix spelling typo in comments

2020-09-23 Thread Wang Qing
Modify the comment typo: "compliment" -> "complement".

Signed-off-by: Wang Qing 
---
 sound/soc/codecs/ak4458.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/ak4458.h b/sound/soc/codecs/ak4458.h
index f906215..e43144c
--- a/sound/soc/codecs/ak4458.h
+++ b/sound/soc/codecs/ak4458.h
@@ -49,7 +49,7 @@
 
 /* DIF21 0
  *  x  1 0 MSB justified  Figure 3 (default)
- *  x  1 1 I2S Compliment  Figure 4
+ *  x  1 1 I2S Complement  Figure 4
  */
 #define AK4458_DIF_SHIFT   1
 #define AK4458_DIF_MASKGENMASK(3, 1)
-- 
2.7.4



Re: [PATCH] rpadlpar_io:Add MODULE_DESCRIPTION entries to kernel modules

2020-09-23 Thread Oliver O'Halloran
On Thu, Sep 24, 2020 at 3:15 PM Mamatha Inamdar
 wrote:
>
> This patch adds a brief MODULE_DESCRIPTION to rpadlpar_io kernel modules
> (descriptions taken from Kconfig file)
>
> Signed-off-by: Mamatha Inamdar 
> ---
>  drivers/pci/hotplug/rpadlpar_core.c |1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> b/drivers/pci/hotplug/rpadlpar_core.c
> index f979b70..bac65ed 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -478,3 +478,4 @@ static void __exit rpadlpar_io_exit(void)
>  module_init(rpadlpar_io_init);
>  module_exit(rpadlpar_io_exit);
>  MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("RPA Dynamic Logical Partitioning driver for I/O slots");

RPA as a spec was superseded by PAPR in the early 2000s. Can we rename
this already?

The only potential problem I can see is scripts doing: modprobe
rpadlpar_io or similar

However, we should be able to fix that with a module alias.

Oliver


[PATCH -next] MIPS: OCTEON: fix error - use 'ret' after remove it

2020-09-23 Thread Qinglang Miao
Variable ret was removed in commit 0ee69c589ec("MIPS: OCTEON:
use devm_platform_ioremap_resource") but still being used in
devm_release_mem_region which is unneeded. So remove this
line to fix error.

Fixes: 0ee69c589ec("MIPS: OCTEON: use devm_platform_ioremap_resource")
Reported-by: kernel test robot 
Signed-off-by: Qinglang Miao 
---
 arch/mips/cavium-octeon/octeon-usb.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/mips/cavium-octeon/octeon-usb.c 
b/arch/mips/cavium-octeon/octeon-usb.c
index 97f6dc31e1b4..987a94cbf3d0 100644
--- a/arch/mips/cavium-octeon/octeon-usb.c
+++ b/arch/mips/cavium-octeon/octeon-usb.c
@@ -534,8 +534,6 @@ static int __init dwc3_octeon_device_init(void)
dev_info(&pdev->dev, "clocks initialized.\n");
mutex_unlock(&dwc3_octeon_clocks_mutex);
devm_iounmap(&pdev->dev, base);
-   devm_release_mem_region(&pdev->dev, res->start,
-   resource_size(res));
}
} while (node != NULL);
 
-- 
2.23.0



Re: [PATCH] clk: rockchip: Initialize hw to error to avoid undefined behavior

2020-09-23 Thread Heiko Stübner
Am Donnerstag, 24. September 2020, 02:44:41 CEST schrieb Stephen Boyd:
> We can get down to this return value from ERR_CAST() without
> initializing hw. Set it to -ENOMEM so that we always return something
> sane.
> 
> Fixes the following smatch warning:
> 
> drivers/clk/rockchip/clk-half-divider.c:228 rockchip_clk_register_halfdiv() 
> error: uninitialized symbol 'hw'.
> drivers/clk/rockchip/clk-half-divider.c:228 rockchip_clk_register_halfdiv() 
> warn: passing zero to 'ERR_CAST'
> 
> Cc: Elaine Zhang 
> Cc: Heiko Stuebner 
> Fixes: 956060a52795 ("clk: rockchip: add support for half divider")
> Signed-off-by: Stephen Boyd 

Reviewed-by: Heiko Stuebner 


> ---
>  drivers/clk/rockchip/clk-half-divider.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/clk/rockchip/clk-half-divider.c 
> b/drivers/clk/rockchip/clk-half-divider.c
> index e97fd3dfbae7..ccd5c270c213 100644
> --- a/drivers/clk/rockchip/clk-half-divider.c
> +++ b/drivers/clk/rockchip/clk-half-divider.c
> @@ -166,7 +166,7 @@ struct clk *rockchip_clk_register_halfdiv(const char 
> *name,
> unsigned long flags,
> spinlock_t *lock)
>  {
> - struct clk_hw *hw;
> + struct clk_hw *hw = ERR_PTR(-ENOMEM);
>   struct clk_mux *mux = NULL;
>   struct clk_gate *gate = NULL;
>   struct clk_divider *div = NULL;
> 
> base-commit: ca52a47af60f791b08a540a8e14d8f5751ee63e9
> 






Re: [PATCH v20 00/15] Introduce Data Access MONitor (DAMON)

2020-09-23 Thread SeongJae Park
On Wed, 23 Sep 2020 10:04:57 -0700 Shakeel Butt  wrote:

> On Mon, Aug 17, 2020 at 3:52 AM SeongJae Park  wrote:
> >
> > From: SeongJae Park 
> >
> > Changes from Previous Version
> > =
> >
> > - Place 'CREATE_TRACE_POINTS' after '#include' statements (Steven Rostedt)
> > - Support large record file (Alkaid)
> > - Place 'put_pid()' of virtual monitoring targets in 'cleanup' callback
> > - Avoid conflict between concurrent DAMON users
> > - Update evaluation result document
> >
> > Introduction
> > 
> >
> > DAMON is a data access monitoring framework subsystem for the Linux kernel.
> > The core mechanisms of DAMON called 'region based sampling' and 'adaptive
> > regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
> > patchset for the detail) make it
> >
> >  - accurate (The monitored information is useful for DRAM level memory
> >management. It might not appropriate for Cache-level accuracy, though.),
> >  - light-weight (The monitoring overhead is low enough to be applied online
> >while making no impact on the performance of the target workloads.), and
> >  - scalable (the upper-bound of the instrumentation overhead is controllable
> >regardless of the size of target workloads.).
> >
> > Using this framework, therefore, the kernel's core memory management 
> > mechanisms
> > such as reclamation and THP can be optimized for better memory management.  
> > The
> > experimental memory management optimization works that incurring high
> > instrumentation overhead will be able to have another try.  In user space,
> > meanwhile, users who have some special workloads will be able to write
> > personalized tools or applications for deeper understanding and specialized
> > optimizations of their systems.
> >
> > Evaluations
> > ===
> >
> > We evaluated DAMON's overhead, monitoring quality and usefulness using 25
> > realistic workloads on my QEMU/KVM based virtual machine running a kernel 
> > that
> > v20 DAMON patchset is applied.
> >
> > DAMON is lightweight.  It increases system memory usage by 0.12% and slows
> > target workloads down by 1.39%.
> >
> > DAMON is accurate and useful for memory management optimizations.  An
> > experimental DAMON-based operation scheme for THP, 'ethp', removes 88.16% of
> > THP memory overheads while preserving 88.73% of THP speedup.  Another
> > experimental DAMON-based 'proactive reclamation' implementation, 'prcl',
> > reduces 91.34% of residential sets and 25.59% of system memory footprint 
> > while
> > incurring only 1.58% runtime overhead in the best case (parsec3/freqmine).
> >
> > NOTE that the experimentail THP optimization and proactive reclamation are 
> > not
> > for production but just only for proof of concepts.
> >
> > Please refer to the official document[1] or "Documentation/admin-guide/mm: 
> > Add
> > a document for DAMON" patch in this patchset for detailed evaluation setup 
> > and
> > results.
> >
> > [1] 
> > https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
> >
> 
> 
> Hi SeongJae,
> 
> Sorry for the late response. I will start looking at this series in
> more detail in the next couple of weeks.

Thank you so much!

> I have a couple of high level comments for now.
> 
> 1) Please explain in the cover letter why someone should prefer to use
> DAMON instead of Page Idle Tracking.

In short, because DAMON provides overhead-quality tradeoff and allow use of
variable monitoring primitives other than only PG_Idle and PTE Accessed bits.
I will explain this in detail in the cover letter of the next version of this
patchset.

> 
> 2) Also add what features Page Idle Tracking provides which the first
> version of DAMON does not provide (like page level tracking, physical
> or unmapped memory tracking e.t.c) and tell if you plan to add such
> features to DAMON in future. Basically giving reasons to not block the
> current version of DAMON until it is feature-rich.

In short, DAMON will provide only virtual address space monitoring by default
but I believe the lack of features because DAMON is expandable for those.
Also, I will make DAMON co-exists with Idle Page Tracking again.  I will post
another RFC patchset for this soon.  Again, I will describe this in detail in
the next version of the cover letter.

> 
> 3) I think in the first mergeable version of DAMON, I would prefer to
> have support to control (create/delete/account) the DAMON context. You
> already have a RFC series on it. I would like to have that series part
> of this one.

Ok, I will apply it here.


Thanks,
SeongJae Park


linux-next: manual merge of the nvdimm tree with the vfs tree

2020-09-23 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the nvdimm tree got conflicts in:

  arch/x86/include/asm/uaccess_64.h

between commit:

  e33ea6e5ba6a ("x86/uaccess: Use pointer masking to limit uaccess speculation")

from the vfs tree and commit:

  0a78de3d4b7b ("x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, 
kernel}()")

from the nvdimm tree.

I fixed it up (the latter just removed copy_to_user_mcsafe from this file,
so I did that) and can carry the fix as necessary. This is now fixed as
far as linux-next is concerned, but any non trivial conflicts should be
mentioned to your upstream maintainer when your tree is submitted for
merging.  You may also want to consider cooperating with the maintainer
of the conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgp_CQVUoCXrj.pgp
Description: OpenPGP digital signature


Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path

2020-09-23 Thread Sergey Senozhatsky
On (20/09/23 17:08), Prasad Sodagudi wrote:
> From: Mohammed Khajapasha 
> 
> The thread which initiates the hot plug can get scheduled
> out, while trying to acquire the console lock,
> thus increasing the hot plug latency. This option
> allows to selectively disable the console flush and
> in turn reduce the hot plug latency.

It can schedule out or get preempted pretty much anywhere at any
time. printk->console_lock is not special in this regard. What am
I missing?

-ss


Re: [RFC PATCH 0/3] KVM: Introduce "VM bugged" concept

2020-09-23 Thread Christian Borntraeger



On 24.09.20 00:45, Sean Christopherson wrote:
> This series introduces a concept we've discussed a few times in x86 land.
> The crux of the problem is that x86 has a few cases where KVM could
> theoretically encounter a software or hardware bug deep in a call stack
> without any sane way to propagate the error out to userspace.
> 
> Another use case would be for scenarios where letting the VM live will
> do more harm than good, e.g. we've been using KVM_BUG_ON for early TDX
> enabling as botching anything related to secure paging all but guarantees
> there will be a flood of WARNs and error messages because lower level PTE
> operations will fail if an upper level operation failed.
> 
> The basic idea is to WARN_ONCE if a bug is encountered, kick all vCPUs out
> to userspace, and mark the VM as bugged so that no ioctls() can be issued
> on the VM or its devices/vCPUs.
> 
> RFC as I've done nowhere near enough testing to verify that rejecting the
> ioctls(), evicting running vCPUs, etc... works as intended.

I like the idea. Especially when we add a common "understanding" in QEMU
across all platforms. That would then even allow to propagate an error.
> 
> Sean Christopherson (3):
>   KVM: Export kvm_make_all_cpus_request() for use in marking VMs as
> bugged
>   KVM: Add infrastructure and macro to mark VM as bugged
>   KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the
> VM
> 
>  arch/x86/kvm/svm/svm.c   |  2 +-
>  arch/x86/kvm/vmx/vmx.c   | 23 
>  arch/x86/kvm/x86.c   |  4 
>  include/linux/kvm_host.h | 45 
>  virt/kvm/kvm_main.c  | 11 +-
>  5 files changed, 61 insertions(+), 24 deletions(-)
> 


[PATCH] power: fix spelling typo

2020-09-23 Thread Wang Qing
Modify the comment typo: "compliment" -> "complement".

Signed-off-by: Wang Qing 
---
 drivers/power/supply/ab8500_fg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/power/supply/ab8500_fg.c b/drivers/power/supply/ab8500_fg.c
index 7eec415..592a73d
--- a/drivers/power/supply/ab8500_fg.c
+++ b/drivers/power/supply/ab8500_fg.c
@@ -653,7 +653,7 @@ int ab8500_fg_inst_curr_finalize(struct ab8500_fg *di, int 
*res)
 
/*
 * negative value for Discharging
-* convert 2's compliment into decimal
+* convert 2's complement into decimal
 */
if (high & 0x10)
val = (low | (high << 8) | 0xE000);
@@ -781,7 +781,7 @@ static void ab8500_fg_acc_cur_work(struct work_struct *work)
if (ret < 0)
goto exit;
 
-   /* Check for sign bit in case of negative value, 2's compliment */
+   /* Check for sign bit in case of negative value, 2's complement */
if (high & 0x10)
val = (low | (med << 8) | (high << 16) | 0xFFE0);
else
-- 
2.7.4



RE: [EXT] Re: [PATCH] net: fec: Keep device numbering consistent with datasheet

2020-09-23 Thread Andy Duan
From: David Miller  Sent: Thursday, September 24, 2020 
4:32 AM
> From: Stefan Riedmueller 
> Date: Wed, 23 Sep 2020 16:25:28 +0200
> 
> > From: Christian Hemp 
> >
> > Make use of device tree alias for device enumeration to keep the
> > device order consistent with the naming in the datasheet.
> >
> > Otherwise for the i.MX 6UL/ULL the ENET1 interface is enumerated as
> > eth1 and ENET2 as eth0.
> >
> > Signed-off-by: Christian Hemp 
> > Signed-off-by: Stefan Riedmueller 
> 
> Device naming and ordering for networking devices was never, ever,
> guaranteed.
> 
> Use udev or similar.
> 
> > @@ -3691,6 +3692,10 @@ fec_probe(struct platform_device *pdev)
> >
> >   ndev->max_mtu = PKT_MAXBUF_SIZE - ETH_HLEN - ETH_FCS_LEN;
> >
> > + eth_id = of_alias_get_id(pdev->dev.of_node, "ethernet");
> > + if (eth_id >= 0)
> > + sprintf(ndev->name, "eth%d", eth_id);
> 
> You can't ever just write into ndev->name, what if another networking device 
> is
> already using that name?
> 
> This change is incorrect on many levels.

David is correct.

For example, imx8DXL has ethernet0 is EQOS TSN, ethernet1 is FEC.
EQOS TSN is andother driver and is registered early, the dev->name is eth0.
So the patch will bring conflict in such case.

Andy


Re: [PATCH v2 2/2] USB: misc: Add onboard_usb_hub driver

2020-09-23 Thread Greg Kroah-Hartman
On Wed, Sep 23, 2020 at 03:25:45PM -0700, Matthias Kaehlcke wrote:
> On Mon, Sep 21, 2020 at 06:18:37PM -0700, Matthias Kaehlcke wrote:
> > On Sun, Sep 20, 2020 at 04:17:20PM +0200, Greg Kroah-Hartman wrote:
> > > On Thu, Sep 17, 2020 at 11:46:22AM -0700, Matthias Kaehlcke wrote:
> > > >
> > > > ...
> > > >
> > > > +static int __init onboard_hub_init(void)
> > > > +{
> > > > +   int rc;
> > > > +
> > > > +   rc = platform_driver_register(&onboard_hub_driver);
> > > > +   if (rc)
> > > > +   return rc;
> > > > +
> > > > +   return usb_register_device_driver(&onboard_hub_usbdev_driver, 
> > > > THIS_MODULE);
> > > 
> > > No unwinding of the platform driver register if this fails?
> > 
> > Right, will add unwinding.
> > 
> > > And THIS_MODULE should not be needed, did we get the api wrong here?
> > 
> > It seems you suggest to use usb_register() instead, SGTM
> 
> Actually usb_register() is for registering a struct usb_driver, however
> this is a struct usb_device_driver, there doesn't seem to be a
> registration function/macro that doesn't require THIS_MODULE. Please
> provide a pointer if I'm wrong.

You are correct, I was just making a meta-comment that we got this api
wrong when adding it to the kernel and need to fix it up so that you do
not have to manually pass in the module owner.  i.e. make it much like
usb_register() does.

thanks,

greg k-h


[PATCH] null_blk: synchronization fix for zoned device

2020-09-23 Thread Kanchan Joshi
Parallel write,read,zone-mgmt operations accessing/altering zone state
and write-pointer may get into race. Avoid the situation by using a new
spinlock for zoned device.
Concurrent zone-appends (on a zone) returning same write-pointer issue
is also avoided using this lock.

Signed-off-by: Kanchan Joshi 
---
 drivers/block/null_blk.h   |  1 +
 drivers/block/null_blk_zoned.c | 84 +++---
 2 files changed, 58 insertions(+), 27 deletions(-)

diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h
index daed4a9c3436..b3f4d62e7c38 100644
--- a/drivers/block/null_blk.h
+++ b/drivers/block/null_blk.h
@@ -44,6 +44,7 @@ struct nullb_device {
unsigned int nr_zones;
struct blk_zone *zones;
sector_t zone_size_sects;
+   spinlock_t zlock;
 
unsigned long size; /* device size in MB */
unsigned long completion_nsec; /* time in ns to complete a request */
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index 3d25c9ad2383..04fbf267703a 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -45,6 +45,7 @@ int null_init_zoned_dev(struct nullb_device *dev, struct 
request_queue *q)
if (!dev->zones)
return -ENOMEM;
 
+   spin_lock_init(&dev->zlock);
if (dev->zone_nr_conv >= dev->nr_zones) {
dev->zone_nr_conv = dev->nr_zones - 1;
pr_info("changed the number of conventional zones to %u",
@@ -124,6 +125,7 @@ int null_report_zones(struct gendisk *disk, sector_t sector,
nr_zones = min(nr_zones, dev->nr_zones - first_zone);
trace_nullb_report_zones(nullb, nr_zones);
 
+   spin_lock_irq(&dev->zlock);
for (i = 0; i < nr_zones; i++) {
/*
 * Stacked DM target drivers will remap the zone information by
@@ -134,10 +136,13 @@ int null_report_zones(struct gendisk *disk, sector_t 
sector,
memcpy(&zone, &dev->zones[first_zone + i],
   sizeof(struct blk_zone));
error = cb(&zone, i, data);
-   if (error)
+   if (error) {
+   spin_unlock_irq(&dev->zlock);
return error;
+   }
}
 
+   spin_unlock_irq(&dev->zlock);
return nr_zones;
 }
 
@@ -147,16 +152,24 @@ size_t null_zone_valid_read_len(struct nullb *nullb,
struct nullb_device *dev = nullb->dev;
struct blk_zone *zone = &dev->zones[null_zone_no(dev, sector)];
unsigned int nr_sectors = len >> SECTOR_SHIFT;
+   size_t ret = 0;
 
+   spin_lock_irq(&dev->zlock);
/* Read must be below the write pointer position */
if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL ||
-   sector + nr_sectors <= zone->wp)
-   return len;
+   sector + nr_sectors <= zone->wp) {
+   ret = len;
+   goto out_unlock;
+   }
 
if (sector > zone->wp)
-   return 0;
+   goto out_unlock;
+
+   ret = (zone->wp - sector) << SECTOR_SHIFT;
 
-   return (zone->wp - sector) << SECTOR_SHIFT;
+out_unlock:
+   spin_unlock_irq(&dev->zlock);
+   return ret;
 }
 
 static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
@@ -165,17 +178,19 @@ static blk_status_t null_zone_write(struct nullb_cmd 
*cmd, sector_t sector,
struct nullb_device *dev = cmd->nq->dev;
unsigned int zno = null_zone_no(dev, sector);
struct blk_zone *zone = &dev->zones[zno];
-   blk_status_t ret;
+   blk_status_t ret = BLK_STS_OK;
 
trace_nullb_zone_op(cmd, zno, zone->cond);
 
if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
return null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors);
 
+   spin_lock_irq(&dev->zlock);
switch (zone->cond) {
case BLK_ZONE_COND_FULL:
/* Cannot write to a full zone */
-   return BLK_STS_IOERR;
+   ret = BLK_STS_IOERR;
+   break;
case BLK_ZONE_COND_EMPTY:
case BLK_ZONE_COND_IMP_OPEN:
case BLK_ZONE_COND_EXP_OPEN:
@@ -193,27 +208,33 @@ static blk_status_t null_zone_write(struct nullb_cmd 
*cmd, sector_t sector,
else
cmd->rq->__sector = sector;
} else if (sector != zone->wp) {
-   return BLK_STS_IOERR;
+   ret = BLK_STS_IOERR;
+   break;
}
 
-   if (zone->wp + nr_sectors > zone->start + zone->capacity)
-   return BLK_STS_IOERR;
+   if (zone->wp + nr_sectors > zone->start + zone->capacity) {
+   ret = BLK_STS_IOERR;
+   break;
+   }
 
if (zone->cond != BLK_ZONE_COND_EXP_OPEN)
zone->cond = BLK_ZONE_COND_IMP_OPEN;
 
ret = null

Re: [PATCH 5/5] perf test: Add expand cgroup event test

2020-09-23 Thread Namhyung Kim
On Thu, Sep 24, 2020 at 7:36 AM Ian Rogers  wrote:
>
> On Tue, Sep 22, 2020 at 7:00 PM Namhyung Kim  wrote:
> >
> > It'll expand given events for cgroups A, B and C.
> >
> >   $ ./perf test -v expansion
> >   69: Event expansion for cgroups  :
> >   --- start ---
> >   test child forked, pid 983140
> >   metric expr 1 / IPC for CPI
> >   metric expr instructions / cycles for IPC
> >   found event instructions
> >   found event cycles
> >   adding {instructions,cycles}:W
> >   copying metric event for cgroup 'A': instructions (idx=0)
> >   copying metric event for cgroup 'B': instructions (idx=0)
> >   copying metric event for cgroup 'C': instructions (idx=0)
> >   test child finished with 0
> >    end 
> >   Event expansion for cgroups: Ok
> >
> > Cc: John Garry 
> > Signed-off-by: Namhyung Kim 
> > ---
[SNIP]
> Should this be #ifdef HAVE_LIBPFM ?

Do you mean the below function?
Actually I thought about it and ended up not using it.
Please see below..

>
> > +static int expand_libpfm_events(void)
> > +{
> > +   int ret;
> > +   struct evlist *evlist;
> > +   struct rblist metric_events;
> > +   const char event_str[] = "UNHALTED_CORE_CYCLES";
> > +   struct option opt = {
> > +   .value = &evlist,
> > +   };
> > +
> > +   symbol_conf.event_group = true;
> > +
> > +   evlist = evlist__new();
> > +   TEST_ASSERT_VAL("failed to get evlist", evlist);
> > +
> > +   ret = parse_libpfm_events_option(&opt, event_str, 0);
> > +   if (ret < 0) {
> > +   pr_debug("failed to parse libpfm event '%s', err %d\n",
> > +event_str, ret);
> > +   goto out;
> > +   }
> > +   if (perf_evlist__empty(evlist)) {
> > +   pr_debug("libpfm was not enabled\n");
> > +   goto out;
> > +   }

That's handled here.  The parse_libpfm_events_option()
will return 0 if HAVE_LIBPFM is not defined so evlist will be empty.

Thanks
Namhyung

> > +
> > +   rblist__init(&metric_events);
> > +   ret = test_expand_events(evlist, &metric_events);
> > +out:
> > +   evlist__delete(evlist);
> > +   return ret;
> > +}
> > +


[PATCH] arm64: dts: rockchip: disable USB type-c DisplayPort

2020-09-23 Thread Jian-Hong Pan
The cdn-dp sub driver probes the device failed on PINEBOOK Pro.

kernel: cdn-dp fec0.dp: [drm:cdn_dp_probe [rockchipdrm]] *ERROR* missing 
extcon or phy
kernel: cdn-dp: probe of fec0.dp failed with error -22

Then, the device halts all of the DRM related device jobs. For example,
the operations: vop_component_ops, vop_component_ops and
rockchip_dp_component_ops cannot be bound to corresponding devices. So,
Xorg cannot find the correct DRM device.

The USB type-C DisplayPort does not work for now. So, disable the
DisplayPort node until the type-C phy work has been done.

Link: https://patchwork.kernel.org/patch/11794141/#23639877
Signed-off-by: Jian-Hong Pan 
---
 arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
index 06d48338c836..d624c595c533 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
@@ -380,7 +380,7 @@ mains_charger: dc-charger {
 };
 
 &cdn_dp {
-   status = "okay";
+   status = "disabled";
 };
 
 &cpu_b0 {
-- 
2.28.0



[PATCH] USB: serial: pl2303: add device-id for HP GC device

2020-09-23 Thread Scott Chen
This is adds a device id for HP LD381 which is a pl2303GC-base device.

Signed-off-by: Scott Chen 
---
 drivers/usb/serial/pl2303.c | 1 +
 drivers/usb/serial/pl2303.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/usb/serial/pl2303.c b/drivers/usb/serial/pl2303.c
index 048452d8a4a4..be8067017eaa 100644
--- a/drivers/usb/serial/pl2303.c
+++ b/drivers/usb/serial/pl2303.c
@@ -100,6 +100,7 @@ static const struct usb_device_id id_table[] = {
{ USB_DEVICE(HP_VENDOR_ID, HP_LD220_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LD220TA_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LD381_PRODUCT_ID) },
+   { USB_DEVICE(HP_VENDOR_ID, HP_LD381GC_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LD960_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LD960TA_PRODUCT_ID) },
{ USB_DEVICE(HP_VENDOR_ID, HP_LCM220_PRODUCT_ID) },
diff --git a/drivers/usb/serial/pl2303.h b/drivers/usb/serial/pl2303.h
index 7d3090ee7e0c..b0f399a8c628 100644
--- a/drivers/usb/serial/pl2303.h
+++ b/drivers/usb/serial/pl2303.h
@@ -127,6 +127,7 @@
 
 /* Hewlett-Packard POS Pole Displays */
 #define HP_VENDOR_ID   0x03f0
+#define HP_LD381GC_PRODUCT_ID   0x0183
 #define HP_LM920_PRODUCT_ID0x026b
 #define HP_TD620_PRODUCT_ID0x0956
 #define HP_LD960_PRODUCT_ID0x0b39
-- 
2.17.1



[PATCH] rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config

2020-09-23 Thread Neeraj Upadhyay
Clarify the "x" in rcuox/N naming in RCU_NOCB_CPU config
description.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/Kconfig | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index b71e21f..5b22747 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -227,11 +227,12 @@ config RCU_NOCB_CPU
  specified at boot time by the rcu_nocbs parameter.  For each
  such CPU, a kthread ("rcuox/N") will be created to invoke
  callbacks, where the "N" is the CPU being offloaded, and where
- the "p" for RCU-preempt (PREEMPTION kernels) and "s" for RCU-sched
- (!PREEMPTION kernels).  Nothing prevents this kthread from running
- on the specified CPUs, but (1) the kthreads may be preempted
- between each callback, and (2) affinity or cgroups can be used
- to force the kthreads to run on whatever set of CPUs is desired.
+ the "x" is "p" for RCU-preempt (PREEMPTION kernels) and "s" for
+ RCU-sched (!PREEMPTION kernels).  Nothing prevents this kthread
+ from running on the specified CPUs, but (1) the kthreads may be
+ preempted between each callback, and (2) affinity or cgroups can
+ be used to force the kthreads to run on whatever set of CPUs is
+ desired.
 
  Say Y here if you want to help to debug reduced OS jitter.
  Say N here if you are unsure.
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH 2/2] printk: Make the console flush configurable in hotplug path

2020-09-23 Thread Greg KH
On Wed, Sep 23, 2020 at 05:08:32PM -0700, Prasad Sodagudi wrote:
> From: Mohammed Khajapasha 
> 
> The thread which initiates the hot plug can get scheduled
> out, while trying to acquire the console lock,
> thus increasing the hot plug latency. This option
> allows to selectively disable the console flush and
> in turn reduce the hot plug latency.
> 
> Signed-off-by: Mohammed Khajapasha 
> Signed-off-by: Prasad Sodagudi 
> ---
>  init/Kconfig   | 10 ++
>  kernel/printk/printk.c | 10 --
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index d6a0b31..9ce39ba 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -699,6 +699,16 @@ config LOG_BUF_SHIFT
>13 =>  8 KB
>12 =>  4 KB
>  
> +config CONSOLE_FLUSH_ON_HOTPLUG
> + bool "Enable console flush configurable in hot plug code path"
> + depends on HOTPLUG_CPU
> + def_bool n

n is the default, no need to list it.

> + help
> + In cpu hot plug path console lock acquire and release causes the
> + console to flush. If console lock is not free hot plug latency
> + increases. So make console flush configurable in hot plug path
> + and default disabled to help in cpu hot plug latencies.

Why would you not want this option?

Why isn't this just a bugfix?

> +
>  config LOG_CPU_MAX_BUF_SHIFT
>   int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
>   depends on SMP
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 9b75f6b..f02d3ef 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2283,6 +2283,8 @@ void resume_console(void)
>   console_unlock();
>  }
>  
> +#ifdef CONFIG_CONSOLE_FLUSH_ON_HOTPLUG
> +
>  /**
>   * console_cpu_notify - print deferred console messages after CPU hotplug
>   * @cpu: unused
> @@ -2302,6 +2304,8 @@ static int console_cpu_notify(unsigned int cpu)
>   return 0;
>  }
>  
> +#endif
> +
>  /**
>   * console_lock - lock the console system for exclusive use.
>   *
> @@ -2974,7 +2978,7 @@ void __init console_init(void)
>  static int __init printk_late_init(void)
>  {
>   struct console *con;
> - int ret;
> + int ret = 0;
>  
>   for_each_console(con) {
>   if (!(con->flags & CON_BOOT))
> @@ -2996,13 +3000,15 @@ static int __init printk_late_init(void)
>   unregister_console(con);
>   }
>   }
> +#ifdef CONFIG_CONSOLE_FLUSH_ON_HOTPLUG

#ifdef in .c code is a mess to maintain.

>   ret = cpuhp_setup_state_nocalls(CPUHP_PRINTK_DEAD, "printk:dead", NULL,
>   console_cpu_notify);
>   WARN_ON(ret < 0);
>   ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "printk:online",
>   console_cpu_notify, NULL);
>   WARN_ON(ret < 0);
> - return 0;
> +#endif

What happens if we don't make these calls entirely?  Why not just remove
them as who wants extra latency for their system?

thanks,

greg k-h


Re: [PATCH 1/2] genirq/cpuhotplug: Reduce logging level for couple of prints

2020-09-23 Thread Greg KH
On Wed, Sep 23, 2020 at 05:08:31PM -0700, Prasad Sodagudi wrote:
> During the cpu hot plug stress testing, couple of messages
> continuous flooding on to the console is causing timers
> migration delay. Delayed time migrations from hot plugging
> core is causing device instability with watchdog. So reduce
> log level for couple of prints in cpu hot plug flow.
> 
> Signed-off-by: Prasad Sodagudi 
> ---
>  arch/arm64/kernel/smp.c | 2 +-
>  kernel/irq/cpuhotplug.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 355ee9e..08da6e3 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -338,7 +338,7 @@ void __cpu_die(unsigned int cpu)
>   pr_crit("CPU%u: cpu didn't die\n", cpu);
>   return;
>   }
> - pr_notice("CPU%u: shutdown\n", cpu);
> + pr_info("CPU%u: shutdown\n", cpu);
>  
>   /*
>* Now that the dying CPU is beyond the point of no return w.r.t.
> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> index 02236b1..82802e0 100644
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -42,7 +42,7 @@ static inline bool irq_needs_fixup(struct irq_data *d)
>* If this happens then there was a missed IRQ fixup at some
>* point. Warn about it and enforce fixup.
>*/
> - pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline 
> CPUs after offlining CPU %u\n",
> + pr_info("Eff. affinity %*pbl of IRQ %u contains only offline 
> CPUs after offlining CPU %u\n",
>   cpumask_pr_args(m), d->irq, cpu);
>   return true;
>   }
> @@ -166,7 +166,7 @@ void irq_migrate_all_off_this_cpu(void)
>   raw_spin_unlock(&desc->lock);
>  
>   if (affinity_broken) {
> - pr_warn_ratelimited("IRQ %u: no longer affine to 
> CPU%u\n",
> + pr_info_ratelimited("IRQ %u: no longer affine to 
> CPU%u\n",
>   irq, smp_processor_id());
>   }
>   }
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 

Reviewed-by: Greg Kroah-Hartman 


Re: [PATCH] KVM: Enable hardware before doing arch VM initialization

2020-09-23 Thread Huacai Chen
Hi, Sean,

On Thu, Sep 24, 2020 at 3:00 AM Sean Christopherson
 wrote:
>
> Swap the order of hardware_enable_all() and kvm_arch_init_vm() to
> accommodate Intel's Trust Domain Extension (TDX), which needs VMX to be
> fully enabled during VM init in order to make SEAMCALLs.
>
> This also provides consistent ordering between kvm_create_vm() and
> kvm_destroy_vm() with respect to calling kvm_arch_destroy_vm() and
> hardware_disable_all().
Do you means that hardware_enable_all() enable VMX, kvm_arch_init_vm()
enable TDX, and TDX depends on VMX enabled at first? If so, can TDX be
also enabled at hardware_enable_all()?

The swapping seems not affect MIPS, but I observed a fact:
kvm_arch_hardware_enable() not only be called at
hardware_enable_all(), but also be called at kvm_starting_cpu(). Even
if you swap the order, new starting CPUs are not enabled VMX before
kvm_arch_init_vm(). (Maybe I am wrong because I'm not familiar with
VMX/TDX).

Huacai
>
> Cc: Marc Zyngier 
> Cc: James Morse 
> Cc: Julien Thierry 
> Cc: Suzuki K Poulose 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: Huacai Chen 
> Cc: Aleksandar Markovic 
> Cc: linux-m...@vger.kernel.org
> Cc: Paul Mackerras 
> Cc: kvm-...@vger.kernel.org
> Cc: Christian Borntraeger 
> Cc: Janosch Frank 
> Cc: David Hildenbrand 
> Cc: Cornelia Huck 
> Cc: Claudio Imbrenda 
> Cc: Vitaly Kuznetsov 
> Cc: Wanpeng Li 
> Cc: Jim Mattson 
> Cc: Joerg Roedel 
> Signed-off-by: Sean Christopherson 
> ---
>
> Obviously not required until the TDX series comes along, but IMO KVM
> should be consistent with respect to enabling and disabling virt support
> in hardware.
>
> Tested only on Intel hardware.  Unless I missed something, this only
> affects x86, Arm and MIPS as hardware enabling is a nop for s390 and PPC.
> Arm looks safe (based on my mostly clueless reading of the code), but I
> have no idea if this will cause problem for MIPS, which is doing all kinds
> of things in hardware_enable() that I don't pretend to fully understand.
>
>  virt/kvm/kvm_main.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index cf88233b819a..58fa19bcfc90 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -766,7 +766,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
> struct kvm_memslots *slots = kvm_alloc_memslots();
>
> if (!slots)
> -   goto out_err_no_arch_destroy_vm;
> +   goto out_err_no_disable;
> /* Generations must be different for each address space. */
> slots->generation = i;
> rcu_assign_pointer(kvm->memslots[i], slots);
> @@ -776,19 +776,19 @@ static struct kvm *kvm_create_vm(unsigned long type)
> rcu_assign_pointer(kvm->buses[i],
> kzalloc(sizeof(struct kvm_io_bus), 
> GFP_KERNEL_ACCOUNT));
> if (!kvm->buses[i])
> -   goto out_err_no_arch_destroy_vm;
> +   goto out_err_no_disable;
> }
>
> kvm->max_halt_poll_ns = halt_poll_ns;
>
> -   r = kvm_arch_init_vm(kvm, type);
> -   if (r)
> -   goto out_err_no_arch_destroy_vm;
> -
> r = hardware_enable_all();
> if (r)
> goto out_err_no_disable;
>
> +   r = kvm_arch_init_vm(kvm, type);
> +   if (r)
> +   goto out_err_no_arch_destroy_vm;
> +
>  #ifdef CONFIG_HAVE_KVM_IRQFD
> INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list);
>  #endif
> @@ -815,10 +815,10 @@ static struct kvm *kvm_create_vm(unsigned long type)
> mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
>  #endif
>  out_err_no_mmu_notifier:
> -   hardware_disable_all();
> -out_err_no_disable:
> kvm_arch_destroy_vm(kvm);
>  out_err_no_arch_destroy_vm:
> +   hardware_disable_all();
> +out_err_no_disable:
> WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
> for (i = 0; i < KVM_NR_BUSES; i++)
> kfree(kvm_get_bus(kvm, i));
> --
> 2.28.0
>


Re: [PATCH] ARM: dts: document pinctrl-single,pins when #pinctrl-cells = 2

2020-09-23 Thread Trent Piepho
On Wed, Sep 23, 2020 at 11:06 PM Tony Lindgren  wrote:
>
> * Trent Piepho  [200924 05:49]:
> > On Wed, Sep 23, 2020 at 10:43 PM Tony Lindgren  wrote:
> > >
> > > * Trent Piepho  [200924 01:34]:
> > > > On Tue, Sep 22, 2020 at 11:57 PM Tony Lindgren  wrote:
> > > > >
> > > > > Also FYI, folks have also complained for a long time that the 
> > > > > pinctrl-single
> > > > > binding mixes mux and conf values while they should be handled 
> > > > > separately.
> > > > >
> > > >
> > > > Instead of combining two fields when the dts is generated they are now
> > > > combined when the pinctrl-single driver reads the dts.  Other than
> > > > this detail, the result is the same.  The board dts source is the
> > > > same.  The value programmed into the pinctrl register is the same.
> > > > There is no mechanism currently that can alter that value in any way.
> > > >
> > > > What does combining them later allow that is not possible now?
> > >
> > > It now allows further driver changes to manage conf and mux separately :)
> >
> > The pinctrl-single driver?  How will that work with boards that are
> > not am335x and don't use conf and mux fields in the same manner as
> > am335x?
>
> For those cases we still have #pinctrl-cells = <1>.

If pincntrl-single is going to be am335x specific, then shouldn't it
be a different compatible string?

Are the driver changes something that can be not be done with the
pinconf-single properties?  They all include a mask.


Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference

2020-09-23 Thread Rafael Aquini
On Thu, Sep 24, 2020 at 11:51:17AM +0800, Huang, Ying wrote:
> Rafael Aquini  writes:
> > The bug here is quite simple: split_swap_cluster() misses checking for
> > lock_cluster() returning NULL before committing to change 
> > cluster_info->flags.
> 
> I don't think so.  We shouldn't run into this situation firstly.  So the
> "fix" hides the real bug instead of fixing it.  Just like we call
> VM_BUG_ON_PAGE(!PageLocked(head), head) in split_huge_page_to_list()
> instead of returning if !PageLocked(head) silently.
>

Not the same thing, obviously, as you are going for an apples-to-carrots
comparison, but since you mentioned:

split_huge_page_to_list() asserts (in debug builds) *page is locked, 
and later checks if *head bears the SwapCache flag. 
deferred_split_scan(), OTOH, doesn't hand down the compound head locked, 
but the 2nd page in the group instead. 
This doesn't necessarely means it's a problem, though, but might help
on hitting the issue. 

 
> > The fundamental problem has nothing to do with allocating, or not allocating
> > a swap cluster, but it has to do with the fact that the THP deferred split 
> > scan
> > can transiently race with swapcache insertion, and the fact that when you 
> > run
> > your swap area on rotational storage cluster_info is _always_ NULL.
> > split_swap_cluster() needs to check for lock_cluster() returning NULL 
> > because
> > that's one possible case, and it clearly fails to do so.
> 
> If there's a race, we should fix the race.  But the code path for
> swapcache insertion is,
> 
> add_to_swap()
>   get_swap_page() /* Return if fails to allocate */
>   add_to_swap_cache()
> SetPageSwapCache()
> 
> While the code path to split THP is,
> 
> split_huge_page_to_list()
>   if PageSwapCache()
> split_swap_cluster()
> 
> Both code paths are protected by the page lock.  So there should be some
> other reasons to trigger the bug.

As mentioned above, no they seem to not be protected (at least, not the
same page, depending on the case). While add_to_swap() will assure a 
page_lock on the compound head, split_huge_page_to_list() does not.


> And again, for HDD, a THP shouldn't have PageSwapCache() set at the
> first place.  If so, the bug is that the flag is set and we should fix
> the setting.
> 

I fail to follow your claim here. Where is the guarantee, in the code, that 
you'll never have a compound head in the swapcache? 

> > Run a workload that cause multiple THP COW, and add a memory hogger to 
> > create
> > memory pressure so you'll force the reclaimers to kick the registered
> > shrinkers. The trigger is not heavy swapping, and that's probably why
> > most swap test cases don't hit it. The window is tight, but you will get the
> > NULL pointer dereference.
> 
> Do you have a script to reproduce the bug?
> 

Nope, a convoluted set of internal regression tests we have usually
triggers it. In the wild, customers running HANNA are seeing it,
occasionally.

> > Regardless you find furhter bugs, or not, this patch is needed to correct a
> > blunt coding mistake.
> 
> As above.  I don't agree with that.
> 

It's OK to disagree, split_swap_cluster still misses the cluster_info NULL 
check,
though.



[PATCH V2] doc: zh_CN: add translatation for btrfs

2020-09-23 Thread Wang Qing
Translate Documentation/filesystems/btrfs.rst into Chinese.

Signed-off-by: Wang Qing 
---
 .../translations/zh_CN/filesystems/btrfs.rst   | 37 ++
 1 file changed, 37 insertions(+)

diff --git a/Documentation/translations/zh_CN/filesystems/btrfs.rst 
b/Documentation/translations/zh_CN/filesystems/btrfs.rst
index 000..8b8cca2
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/btrfs.rst
@@ -0,0 +1,37 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: :ref:`Documentation/filesystems/ext3.rst `
+
+translated by 王擎 Wang Qing
+
+=
+BTRFS
+=
+
+Btrfs是一个写时复制更新的文件系统,它注重容错、修复和易于管理。
+Btrfs由多家公司联合开发,并获得GPL许可,免费开放给所有人。
+
+Btrfs的主要功能包括:
+
+*扩展大小的文件存储(文件最大支持2^64)
+*填充方式使小文件更节省空间
+*索引目录的方式更节省空间
+*动态的索引节点分配方式
+*可写快照的特性
+*支持子卷(独立的内部根文件系统)
+*对象级别的镜像克隆
+*基于数据和元数据的校验和(支持多种算法)
+*支持压缩
+*內建多种磁盘阵列算法,支持多种设备
+*支持离线的文件系统检查
+*高效的增量备份和文件系统镜像
+*在线文件系统碎片整理
+
+更多有关信息,请参阅Wiki
+
+  https://btrfs.wiki.kernel.org
+
+维护信息包含管理任务、常见问题、用例、挂载选项、变更日志、
+特性、手册、源码仓、联系人等。
-- 
2.7.4



Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance

2020-09-23 Thread Wanpeng Li
Any comments? Paolo! :)
On Wed, 9 Sep 2020 at 11:04, Wanpeng Li  wrote:
>
> Any comments? guys!
> On Tue, 1 Sep 2020 at 19:52,  wrote:
> >
> > From: Yulei Zhang 
> >
> > Currently in KVM memory virtulization we relay on mmu_lock to
> > synchronize the memory mapping update, which make vCPUs work
> > in serialize mode and slow down the execution, especially after
> > migration to do substantial memory mapping will cause visible
> > performance drop, and it can get worse if guest has more vCPU
> > numbers and memories.
> >
> > The idea we present in this patch set is to mitigate the issue
> > with pre-constructed memory mapping table. We will fast pin the
> > guest memory to build up a global memory mapping table according
> > to the guest memslots changes and apply it to cr3, so that after
> > guest starts up all the vCPUs would be able to update the memory
> > simultaneously without page fault exception, thus the performance
> > improvement is expected.
> >
> > We use memory dirty pattern workload to test the initial patch
> > set and get positive result even with huge page enabled. For example,
> > we create guest with 32 vCPUs and 64G memories, and let the vcpus
> > dirty the entire memory region concurrently, as the initial patch
> > eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
> > get the job done in about 50% faster.
> >
> > We only validate this feature on Intel x86 platform. And as Ben
> > pointed out in RFC V1, so far we disable the SMM for resource
> > consideration, drop the mmu notification as in this case the
> > memory is pinned.
> >
> > V1->V2:
> > * Rebase the code to kernel version 5.9.0-rc1.
> >
> > Yulei Zhang (9):
> >   Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
> > support
> >   Introduce page table population function for direct build EPT feature
> >   Introduce page table remove function for direct build EPT feature
> >   Add release function for direct build ept when guest VM exit
> >   Modify the page fault path to meet the direct build EPT requirement
> >   Apply the direct build EPT according to the memory slots change
> >   Add migration support when using direct build EPT
> >   Introduce kvm module parameter global_tdp to turn on the direct build
> > EPT mode
> >   Handle certain mmu exposed functions properly while turn on direct
> > build EPT mode
> >
> >  arch/mips/kvm/mips.c|  13 +
> >  arch/powerpc/kvm/powerpc.c  |  13 +
> >  arch/s390/kvm/kvm-s390.c|  13 +
> >  arch/x86/include/asm/kvm_host.h |  13 +-
> >  arch/x86/kvm/mmu/mmu.c  | 533 ++--
> >  arch/x86/kvm/svm/svm.c  |   2 +-
> >  arch/x86/kvm/vmx/vmx.c  |   7 +-
> >  arch/x86/kvm/x86.c  |  55 ++--
> >  include/linux/kvm_host.h|   7 +-
> >  virt/kvm/kvm_main.c |  43 ++-
> >  10 files changed, 639 insertions(+), 60 deletions(-)
> >
> > --
> > 2.17.1
> >


[PATCH] net: usb: ax88179_178a: add Toshiba usb 3.0 adapter

2020-09-23 Thread Wilken Gottwalt
Reposted and added netdev as suggested by Jakub Kicinski.

---
Adds the driver_info and usb ids of the AX88179 based Toshiba USB 3.0
ethernet adapter.

Signed-off-by: Wilken Gottwalt 
---
 drivers/net/usb/ax88179_178a.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
index ac7bc436da33..ed078e5a3629 100644
--- a/drivers/net/usb/ax88179_178a.c
+++ b/drivers/net/usb/ax88179_178a.c
@@ -1829,6 +1829,19 @@ static const struct driver_info belkin_info = {
.tx_fixup = ax88179_tx_fixup,
 };
 
+static const struct driver_info toshiba_info = {
+   .description = "Toshiba USB Ethernet Adapter",
+   .bind   = ax88179_bind,
+   .unbind = ax88179_unbind,
+   .status = ax88179_status,
+   .link_reset = ax88179_link_reset,
+   .reset  = ax88179_reset,
+   .stop = ax88179_stop,
+   .flags  = FLAG_ETHER | FLAG_FRAMING_AX,
+   .rx_fixup = ax88179_rx_fixup,
+   .tx_fixup = ax88179_tx_fixup,
+};
+
 static const struct usb_device_id products[] = {
 {
/* ASIX AX88179 10/100/1000 */
@@ -1862,6 +1875,10 @@ static const struct usb_device_id products[] = {
/* Belkin B2B128 USB 3.0 Hub + Gigabit Ethernet Adapter */
USB_DEVICE(0x050d, 0x0128),
.driver_info = (unsigned long)&belkin_info,
+}, {
+   /* Toshiba USB 3.0 GBit Ethernet Adapter */
+   USB_DEVICE(0x0930, 0x0a13),
+   .driver_info = (unsigned long)&toshiba_info,
 },
{ },
 };
-- 
2.28.0



Re: [PATCH] xen-blkback: add a parameter for disabling of persistent grants

2020-09-23 Thread SeongJae Park
On Wed, 23 Sep 2020 16:09:30 -0400 Konrad Rzeszutek Wilk 
 wrote:

> On Tue, Sep 22, 2020 at 09:01:25AM +0200, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > Persistent grants feature provides high scalability.  On some small
> > systems, however, it could incur data copy overhead[1] and thus it is
> > required to be disabled.  But, there is no option to disable it.  For
> > the reason, this commit adds a module parameter for disabling of the
> > feature.
> 
> Would it be better suited to have it per guest?

The latest version of this patchset[1] supports blkfront side disablement.
Could that partially solves your concern?

[1] https://lore.kernel.org/xen-devel/20200923061841.20531-1-sjp...@amazon.com/


Thanks,
SeongJae Park


[PATCH V2] doc: zh_CN: add translatation for tmpfs.rst

2020-09-23 Thread Wang Qing
Translate Documentation/filesystems/tmpfs.rst into Chinese.

Signed-off-by: Wang Qing 
---
 .../translations/zh_CN/filesystems/index.rst   |   3 +-
 .../translations/zh_CN/filesystems/tmpfs.rst   | 146 +
 2 files changed, 148 insertions(+), 1 deletion(-)

diff --git a/Documentation/translations/zh_CN/filesystems/index.rst 
b/Documentation/translations/zh_CN/filesystems/index.rst
index 186501d..c45b550
--- a/Documentation/translations/zh_CN/filesystems/index.rst
+++ b/Documentation/translations/zh_CN/filesystems/index.rst
@@ -21,8 +21,9 @@ Linux Kernel中的文件系统
 文件系统实现文档。
 
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 3
 
virtiofs
debugfs
+   tmpfs
 
diff --git a/Documentation/translations/zh_CN/filesystems/tmpfs.rst 
b/Documentation/translations/zh_CN/filesystems/tmpfs.rst
index 000..700d870
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/tmpfs.rst
@@ -0,0 +1,146 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: :ref:`Documentation/filesystems/tmpfs.rst `
+
+translated by 王擎 Wang Qing
+
+=
+Tmpfs
+=
+
+Tmpfs是一个将所有文件都保存在虚拟内存中的文件系统。
+
+tmpfs中的所有内容都是临时的,也就是说没有任何文件会在硬盘上创建。
+如果卸载tmpfs实例,所有保存在其中的文件都会丢失。
+
+tmpfs将所有文件保存在内核缓存中,随着文件内容增长或缩小可以将不需要的
+页面swap出去。它具有最大限制,可以通过“mount -o remount ...”调整。
+
+和ramfs(创建tmpfs的模板)相比,tmpfs包含交换和限制检查。和tmpfs相似的另
+一个东西是RAM磁盘(/dev/ram*),可以在物理RAM中模拟固定大小的硬盘,并在
+此之上创建一个普通的文件系统。Ramdisks无法swap,因此无法调整它们的大小。
+
+由于tmpfs完全保存于页面缓存和swap中,因此所有tmpfs页面将在/proc/meminfo
+中显示为“Shmem”,而在free(1)中显示为“Shared”。请注意,这些计数还包括
+共享内存(shmem,请参阅ipcs(1))。获得计数的最可靠方法是使用df(1)和du(1)。
+
+tmpfs具有以下用途:
+
+1) 内核总有一个无法看到的内部挂载,用于共享匿名映射和SYSV共享内存。
+
+   挂载不依赖于CONFIG_TMPFS。如果CONFIG_TMPFS未设置,tmpfs对用户不可见。
+   但是内部机制始终存在。
+
+2) glibc 2.2及更高版本期望将tmpfs挂载在/dev/shm上以用于POSIX共享内存
+   (shm_open,shm_unlink)。添加内容到/etc/fstab应注意如下:
+
+   tmpfs   /dev/shmtmpfs   defaults0 0
+
+   使用时需要记住创建挂载tmpfs的目录。
+   
+   SYSV共享内存无需挂载,内部已默认支持。(在2.3内核版本中,必须挂载
+   tmpfs的前身(shm fs)才能使用SYSV共享内存)
+
+3) 很多人(包括我)都觉的在/tmp和/var/tmp上挂载非常方便,并具有较大的
+   swap分区。目前循环挂载tmpfs可以正常工作,所以大多数发布都应当可以
+   使用mkinitrd通过/tmp访问/tmp。
+
+4) 也许还有更多我不知道的地方:-)
+
+
+tmpfs有三个用于调整大小的挂载选项:
+
+=  
+size   tmpfs实例分配的字节数限制。默认值是不swap时物理RAM的一半。
+   如果tmpfs实例过大,机器将死锁,因为OOM处理将无法释放该内存。
+nr_blocks  与size相同,但以PAGE_SIZE为单位。
+nr_inodes  tmpfs实例的最大inode个数。默认值是物理内存页数的一半,或者
+   (有高端内存的机器)低端内存RAM的页数,二者以较低者为准。
+=  
+
+这些参数接受后缀k,m或g表示千,兆和千兆字节,可以在remount时更改。
+size参数也接受后缀%用来限制tmpfs实例占用物理RAM的百分比:
+未指定size或nr_blocks时,默认值为size=50%
+
+如果nr_blocks=0(或size=0),block个数将不受限制;如果nr_inodes=0,
+inode个数将不受限制。这样挂载通常是不明智的,因为它允许任何具有写权限的
+用户通过访问tmpfs耗尽机器上的所有内存;但同时这样做也会增强在多个CPU的
+场景下的访问。
+
+tmpfs具有为所有文件设置NUMA内存分配策略挂载选项(如果启用了CONFIG_NUMA),
+可以通过“mount -o remount ...”调整
+
+ ==
+mpol=default 采用进程分配策略
+ (请参阅 set_mempolicy(2))
+mpol=prefer:Node 倾向从给定的节点分配
+mpol=bind:NodeList   只允许从指定的链表分配
+mpol=interleave  倾向于依次从每个节点分配
+mpol=interleave:NodeList 依次从每个节点分配
+mpol=local  prefers 从本地节点分配内存
+ ==
+
+NodeList格式是以逗号分隔的十进制数字表示大小和范围,最大和最小范围是用-
+分隔符的十进制数来表示。例如,mpol=bind0-3,5,7,9-15
+
+带有有效NodeList的内存策略将按指定格式保存,在创建文件时使用。当任务在该
+文件系统上创建文件时,会使用到挂载时的内存策略NodeList选项,如果设置的话,
+由调用任务的cpuset[请参见Documentation/admin-guide/cgroup-v1/cpusets.rst]
+以及下面列出的可选标志约束。如果NodeLists为设置为空集,则文件的内存策略将
+恢复为“默认”策略。
+
+NUMA内存分配策略有可选标志,可以用于模式结合。在挂载tmpfs时指定这些可选
+标志可以在NodeList之前生效。
+Documentation/admin-guide/mm/numa_memory_policy.rst列出所有可用的内存
+分配策略模式标志及其对内存策略。
+
+::
+
+   =static 相当于 MPOL_F_STATIC_NODES
+   =relative   相当于 MPOL_F_RELATIVE_NODES
+
+例如,mpol=bind=staticNodeList相当于MPOL_BIND|MPOL_F_STATIC_NODES的分配策略
+
+请注意,如果内核不支持NUMA,那么使用mpol选项挂载tmpfs将会失败;nodelist指定不
+在线的节点也会失败。如果您的系统依赖于此,但内核会运行不带NUMA功能(也许是安全
+revocery内核),或者具有较少的节点在线,建议从自动模式中省略mpol选项挂载选项。
+可以在以后通过“mount -o remount,mpol=Policy:NodeList MountPoint”添加到挂载点。
+
+要指定初始根目录,可以使用如下挂载选项:
+
+   ==
+模式 权限用八进制数字表示
+uid应用ID
+gid组ID
+   ==
+
+这些选项对remount没有任何影响。您可以通过chmod(1),chown(1)和chgrp(1)的更改
+已经挂载的参数。
+
+tmpfs具有选择32位还是64位inode的挂载选项:
+
+===   
+inode64   Use 64-bit inode numbers
+inode32   Use 32-bit inode numbers
+===   
+
+在32位内核上,默认是inode32,挂载时指定inode64会被拒绝。
+在64位内核上,默认配置是CONFIG_TMPFS_INODE64。inode64避免了单个设备上可能有多个
+具有相同inode编号的文件;比如32位应用程序使用glibc如果长期访问tmpfs,一旦达到33
+位inode编号,就有EOVERFLOW失败的危险,无法打开大于2GiB的文件,并返回EINVAL。
+
+所以'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'将在
+/mytmpfs上挂载tmpfs实例,分配只能由root用户访问的10GB RAM/SWAP,可以有10240个
+inode的实例。
+
+
+:作者:
+   Christoph Rohland , 1.12.01
+:更新:
+   Hugh Dickins, 4 June 2007
+:更新:
+   KOSAKI Motohiro, 16 Mar 2010
+:更

Re: [PATCH 1/7] perf bench: Add build-id injection benchmark

2020-09-23 Thread Namhyung Kim
Hi Ian,

On Thu, Sep 24, 2020 at 7:13 AM Ian Rogers  wrote:
>
> On Wed, Sep 23, 2020 at 1:05 AM Namhyung Kim  wrote:
> >
> > Sometimes I can see perf record piped with perf inject take long time
> > processing build-id.  So add inject-build-id benchmark to the
> > internals benchmark suite to measure its overhead regularly.
> >
> > It runs perf inject command internally and feeds the given number of
> > synthesized events (MMAP2 + SAMPLE basically).
> >
> >   Usage: perf bench internals inject-build-id 
> >
> > -i, --iterations   Number of iterations used to compute average 
> > (default: 100)
> > -m, --nr-mmaps Number of mmap events for each iteration 
> > (default: 100)
> > -n, --nr-samples   Number of sample events per mmap event (default: 
> > 100)
> > -v, --verbose be more verbose (show iteration count, DSO name, 
> > etc)
> >
> > By default, it measures average processing time of 100 MMAP2 events
> > and 1 SAMPLE events.  Below is a result on my laptop.
> >
> >   $ perf bench internals inject-build-id
> >   # Running 'internals/inject-build-id' benchmark:
> > Average build-id injection took: 22.997 msec (+- 0.067 msec)
> > Average time per event: 2.255 usec (+- 0.007 usec)
>
> This is great! Some suggestions below.

Thanks!

>
> > Signed-off-by: Namhyung Kim 
> > ---
[SNIP]
> > +
> > +static const char *const bench_usage[] = {
> > +   "perf bench internals inject-build-id ",
> > +   NULL
> > +};
> > +
>
> Perhaps a comment:
> /* Helper for collect_dso that adds the given file as a dso to
> dso_list if it contains a buildid. Stops after 4 such dsos.*/

Will add.. please see below.

>
> > +static int add_dso(const char *fpath, const struct stat *sb __maybe_unused,
> > +  int typeflag, struct FTW *ftwbuf __maybe_unused)
> > +{
> > +   struct bench_dso *dso;
> > +   unsigned char build_id[BUILD_ID_SIZE];
> > +
> > +   if (typeflag == FTW_D || typeflag == FTW_SL) {
> > +   return 0;
> > +   }
> > +
> > +   if (filename__read_build_id(fpath, build_id, BUILD_ID_SIZE) < 0)
> > +   return 0;
> > +
> > +   dso = malloc(sizeof(*dso));
> > +   if (dso == NULL)
> > +   return -1;
> > +
> > +   dso->name = realpath(fpath, NULL);
> > +   if (dso->name == NULL) {
> > +   free(dso);
> > +   return -1;
> > +   }
> > +
> > +   dso->ino = nr_dsos++;
> > +   list_add(&dso->list, &dso_list);
> > +   pr_debug2("  Adding DSO: %s\n", fpath);
> > +
> > +   /* stop if we collected 4x DSOs than needed */
> > +   if ((unsigned)nr_dsos > 4 * nr_mmaps)
> > +   return 1;
> > +
> > +   return 0;
> > +}
> > +
> > +static void collect_dso(void)
> > +{
> > +   if (nftw("/usr/lib/", add_dso, 10, FTW_PHYS) < 0)
> > +   return;
> > +
> > +   pr_debug("  Collected %d DSOs\n", nr_dsos);
>
> Should this fail if the count isn't 4?

The add_dso would stop if it collected enough DSOs.
I chose it as 4 x nr_mmaps (default: 100).

It's gonna pick a DSO in the list randomly during benchmark
and I want to reduce the chance it selects the same one in the
same iteration. So instead of having nr_mmaps DSOs, it keeps
4 times more DSOs than needed.

>
> > +}
> > +
> > +static void release_dso(void)
> > +{
> > +   struct bench_dso *dso;
> > +
> > +   while (!list_empty(&dso_list)) {
> > +   dso = list_first_entry(&dso_list, struct bench_dso, list);
> > +   list_del(&dso->list);
> > +   free(dso->name);
> > +   free(dso);
> > +   }
> > +}
> > +
>
> Perhaps a comment and move next to synthesize_mmap.
> /* Fake address used by mmap events. */

OK, will do.  (and it's used by sample events too)

>
> > +static u64 dso_map_addr(struct bench_dso *dso)
> > +{
> > +   return 0x40ULL + dso->ino * 8192ULL;
> > +}
[SNIP]

> > +static int setup_injection(struct bench_data *data)
> > +{
> > +   int ready_pipe[2];
> > +   int dev_null_fd;
> > +   char buf;
> > +
> > +   if (pipe(ready_pipe) < 0)
> > +   return -1;
> > +
> > +   if (pipe(data->input_pipe) < 0)
> > +   return -1;
> > +
> > +   if (pipe(data->output_pipe) < 0)
> > +   return -1;
> > +
> > +   data->pid = fork();
> > +   if (data->pid < 0)
> > +   return -1;
> > +
> > +   if (data->pid == 0) {
> > +   const char **inject_argv;
> > +
> > +   close(data->input_pipe[1]);
> > +   close(data->output_pipe[0]);
> > +   close(ready_pipe[0]);
> > +
> > +   dup2(data->input_pipe[0], STDIN_FILENO);
> > +   close(data->input_pipe[0]);
> > +   dup2(data->output_pipe[1], STDOUT_FILENO);
> > +   close(data->output_pipe[1]);
> > +
> > +   dev_null_fd = open("/dev/null", O_WRONLY);
> > +   if (dev_null_fd < 0)
> > +   exi

RE: [PATCH v6 5/8] clk: clock-wizard: Add support for fractional support

2020-09-23 Thread Shubhrajyoti Datta
Hi ,
Thanks for the review.

> -Original Message-
> From: Stephen Boyd 
> Sent: Tuesday, September 22, 2020 2:48 AM
> To: Shubhrajyoti Datta ; linux-...@vger.kernel.org
> Cc: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> de...@driverdev.osuosl.org; robh...@kernel.org;
> gre...@linuxfoundation.org; mturque...@baylibre.com; Shubhrajyoti
> Datta 
> Subject: Re: [PATCH v6 5/8] clk: clock-wizard: Add support for fractional
> support
> 
> Quoting Shubhrajyoti Datta (2020-08-28 06:39:53)
> > Currently the set rate granularity is to integral divisors.
> > Add support for the fractional divisors.
> > Only the first output0 is fractional in the hardware.
> >
> > Signed-off-by: Shubhrajyoti Datta 
> 
> Getting closer.
> 
> > diff --git a/drivers/clk/clk-xlnx-clock-wizard.c
> > b/drivers/clk/clk-xlnx-clock-wizard.c
> > index 8dfcec8..1af59a4 100644
> > --- a/drivers/clk/clk-xlnx-clock-wizard.c
> > +++ b/drivers/clk/clk-xlnx-clock-wizard.c
> > @@ -185,6 +191,134 @@ static const struct clk_ops
> clk_wzrd_clk_divider_ops = {
> > .recalc_rate = clk_wzrd_recalc_rate,  };
> >
> > +static unsigned long clk_wzrd_recalc_ratef(struct clk_hw *hw,
> > +  unsigned long parent_rate)
> > +{
> > +   unsigned int val;
> > +   u32 div, frac;
> > +   struct clk_wzrd_divider *divider = to_clk_wzrd_divider(hw);
> > +   void __iomem *div_addr = divider->base + divider->offset;
> > +
> > +   val = readl(div_addr);
> > +   div = val & div_mask(divider->width);
> > +   frac = (val >> WZRD_CLKOUT_FRAC_SHIFT) &
> > + WZRD_CLKOUT_FRAC_MASK;
> > +
> > +   return ((parent_rate * 1000) / ((div * 1000) + frac));
> 
> Please remove extra parenthesis. And is this mult_frac()?
> 
Will fix
> > +}
> > +
> > +static int clk_wzrd_dynamic_reconfig_f(struct clk_hw *hw, unsigned long
> rate,
> > +  unsigned long parent_rate) {
> > +   int err;
> > +   u32 value, pre;
> > +   unsigned long rate_div, f, clockout0_div;
> > +   struct clk_wzrd_divider *divider = to_clk_wzrd_divider(hw);
> > +   void __iomem *div_addr = divider->base + divider->offset;
> > +
> > +   rate_div = ((parent_rate * 1000) / rate);
> > +   clockout0_div = rate_div / 1000;
> > +
> > +   pre = DIV_ROUND_CLOSEST((parent_rate * 1000), rate);
> > +   f = (u32)(pre - (clockout0_div * 1000));
> > +   f = f & WZRD_CLKOUT_FRAC_MASK;
> > +
> > +   value = ((f << WZRD_CLKOUT_DIVIDE_WIDTH) | (clockout0_div &
> > +   WZRD_CLKOUT_DIVIDE_MASK));
> 
> Please split this to multiple lines.
Will fix
> 
> > +
> > +   /* Set divisor and clear phase offset */
> > +   writel(value, div_addr);
> > +   writel(0x0, div_addr + WZRD_DR_DIV_TO_PHASE_OFFSET);
> > +
> > +   /* Check status register */
> > +   err= readl_poll_timeout(divider->base +
> WZRD_DR_STATUS_REG_OFFSET, value,
> > +   value & WZRD_DR_LOCK_BIT_MASK,
> > +   WZRD_USEC_POLL, WZRD_TIMEOUT_POLL);
> > +   if (err)
> > +   return err;
> > +
> > +   /* Initiate reconfiguration */
> > +   writel(WZRD_DR_BEGIN_DYNA_RECONF,
> > +  divider->base + WZRD_DR_INIT_REG_OFFSET);
> > +
> > +   /* Check status register */
> > +   err= readl_poll_timeout(divider->base +
> WZRD_DR_STATUS_REG_OFFSET, value,
> > +   value & WZRD_DR_LOCK_BIT_MASK,
> > +   WZRD_USEC_POLL, WZRD_TIMEOUT_POLL);
> > +
> > +   return err;
> 
> Just return readl_poll_timeout() please.
Will fix
> 
> > +}
> > +
> > +static long clk_wzrd_round_rate_f(struct clk_hw *hw, unsigned long
> rate,
> > + unsigned long *prate) {
> > +   return rate;
> 
> Can every rate be supported? This function is supposed to tell the clk
> framework what rate will be achieved if we call clk_set_rate() with 'rate'
> passed to this function. Almost always returning 'rate' is not the case.
> 

We can support rate upto 3 decimal places to prevent truncation here we are 
Returning rate.
> >
> > +
> > +static const struct clk_ops clk_wzrd_clk_divider_ops_f = {
> > +   .round_rate = clk_wzrd_round_rate_f,
> > +   .set_rate = clk_wzrd_dynamic_reconfig_f,
> > +   .recalc_rate = clk_wzrd_recalc_ratef, };
> > +
> > +static struct clk *clk_wzrd_register_divf(struct device *dev,
> > + const char *name,
> > + const char *parent_name,
> > + unsigned long flags,
> > + void __iomem *base, u16 offset,
> > + u8 shift, u8 width,
> > + u8 clk_divider_flags,
> > + const struct clk_div_table *table,
> > + 

Re: [PATCH v4 2/2] leds: mt6360: Add LED driver for MT6360

2020-09-23 Thread Gene Chen
Jacek Anaszewski  於 2020年9月24日 週四 上午5:49寫道:

>
> Hi Gene,
>
> Thank you for the update. I have some more comments below.
>
> On 9/23/20 2:50 PM, Gene Chen wrote:
> > From: Gene Chen 
> >
> > Add MT6360 LED driver include 2-channel Flash LED with torch/strobe mode,
> > and 4-channel RGB LED support Register/Flash/Breath Mode
> >
> > Signed-off-by: Gene Chen 
> > ---
> >   drivers/leds/Kconfig   |  11 +
> >   drivers/leds/Makefile  |   1 +
> >   drivers/leds/leds-mt6360.c | 705 
> > +
> >   3 files changed, 717 insertions(+)
> >   create mode 100644 drivers/leds/leds-mt6360.c
> >
> > diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
> > index 1c181df..5561b08 100644
> > --- a/drivers/leds/Kconfig
> > +++ b/drivers/leds/Kconfig
> > @@ -271,6 +271,17 @@ config LEDS_MT6323
> > This option enables support for on-chip LED drivers found on
> > Mediatek MT6323 PMIC.
> >
> > +config LEDS_MT6360
> > + tristate "LED Support for Mediatek MT6360 PMIC"
> > + depends on LEDS_CLASS_FLASH && OF
> > + depends on V4L2_FLASH_LED_CLASS || !V4L2_FLASH_LED_CLASS
> > + depends on MFD_MT6360
> > + help
> > +   This option enables support for dual Flash LED drivers found on
> > +   Mediatek MT6360 PMIC.
> > +   Independent current sources supply for each flash LED support torch
> > +   and strobe mode.
> > +
> >   config LEDS_S3C24XX
> >   tristate "LED Support for Samsung S3C24XX GPIO LEDs"
> >   depends on LEDS_CLASS
> > diff --git a/drivers/leds/Makefile b/drivers/leds/Makefile
> > index c2c7d7a..5596427 100644
> > --- a/drivers/leds/Makefile
> > +++ b/drivers/leds/Makefile
> > @@ -66,6 +66,7 @@ obj-$(CONFIG_LEDS_MIKROTIK_RB532)   += leds-rb532.o
> >   obj-$(CONFIG_LEDS_MLXCPLD)  += leds-mlxcpld.o
> >   obj-$(CONFIG_LEDS_MLXREG)   += leds-mlxreg.o
> >   obj-$(CONFIG_LEDS_MT6323)   += leds-mt6323.o
> > +obj-$(CONFIG_LEDS_MT6360)+= leds-mt6360.o
> >   obj-$(CONFIG_LEDS_NET48XX)  += leds-net48xx.o
> >   obj-$(CONFIG_LEDS_NETXBIG)  += leds-netxbig.o
> >   obj-$(CONFIG_LEDS_NIC78BX)  += leds-nic78bx.o
> > diff --git a/drivers/leds/leds-mt6360.c b/drivers/leds/leds-mt6360.c
> > new file mode 100644
> > index 000..1c3486e
> > --- /dev/null
> > +++ b/drivers/leds/leds-mt6360.c
> > @@ -0,0 +1,705 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +enum {
> > + MT6360_LED_ISNK1 = 0,
> > + MT6360_LED_ISNK2,
> > + MT6360_LED_ISNK3,
> > + MT6360_LED_ISNK4,
>
> One question about these ISINKs - how are they exploited in your device?
> Are these LEDs used to indicate camera activity or it is one RGB LED
> for status? And what functionality has the remaining amber one (sticking
> to the naming from your DT bindings)?
>
> Can you share how the documenation for this device describes the purpose
> of these sinks, if it does it at all?
>
> I got probably mislead by your naming in the driver and got fixed on
> their function as camera activity indicators, for which V4L2 has
> support. If that is not the case, then you'd better switch to using
> multicolor framework for all four "indicator" LEDs.
>

It's one RGB LED for status, not for camera.

The MT6360 integrates a three-channel RGB LED driver, designed to
provide a variety of lighting effects for mobile device applications.
The RGB LED driver includes a smart LED string controller, and it can
drive 3 channels of LEDs with a sink current of up to 24mA. The
default setting of RGB_ISINK1 is auto mode for TA charging indicator,
and RGB_ISINK1 also supports software mode. It provides three
operation modes for the RGB LEDs: flash mode, breath mode, and
register mode. The device can increase or decrease the brightness of
the RGB LEDs upon command via the I2C interface. The RGB_ISINK4
provide more sink current up to 150mA, which we can moonlight mode.

Do you mean we should remove "isink register v4l2 device, only need
register ledclass device"?

> > + MT6360_LED_FLASH1,
> > + MT6360_LED_FLASH2,
> > + MT6360_MAX_LEDS
> > +};
> > +
> > +#define MT6360_REG_RGBEN 0x380
> > +#define MT6360_REG_ISNK(_led_no) (0x381 + (_led_no))
> > +#define MT6360_ISNK_ENMASK(_led_no)  BIT(7 - (_led_no))
> > +#define MT6360_ISNK_MASK GENMASK(4, 0)
> > +#define MT6360_CHRINDSEL_MASKBIT(3)
> > +
> > +#define MT6360_REG_FLEDEN0x37E
> > +#define MT6360_REG_STRBTO0x373
> > +#define MT6360_REG_FLEDBASE(_id) (0x372 + 4 * (_id - 
> > MT6360_LED_FLASH1))
> > +#define MT6360_REG_FLEDISTRB(_id)(MT6360_REG_FLEDBASE(_id) + 2)
> > +#define MT6360_REG_FLEDITOR(_id) (MT6360_REG_FLEDBASE(_id) + 3)
> > +#define MT6360_REG_CHGSTAT2  0x3E1
> > +#define MT6360_REG_FLEDSTAT1 0x3E9
> > +

Re: [PATCH v2 5/5] clk: qcom: add video clock controller driver for SM8250

2020-09-23 Thread Stephen Boyd
Quoting Jonathan Marek (2020-09-23 17:54:59)
> On 9/23/20 7:30 PM, Stephen Boyd wrote:
> > Quoting Jonathan Marek (2020-09-23 09:07:16)
> >> On 9/22/20 2:46 PM, Stephen Boyd wrote:
> >>> Quoting Jonathan Marek (2020-09-03 20:09:54)
> >>>
>  +   .ops = &clk_branch2_ops,
>  +   },
>  +   },
>  +};
>  +
>  +static struct clk_branch video_cc_mvs0_clk = {
>  +   .halt_reg = 0xd34,
>  +   .halt_check = BRANCH_HALT_SKIP, /* TODO: hw gated ? */
> >>>
> >>> Is this resolved?
> >>>
> >>
> >> Downstream has this clock as BRANCH_HALT_VOTED, but with the upstream
> >> venus driver (with patches to enable sm8250), that results in a
> >> "video_cc_mvs0_clk status stuck at 'off" error. AFAIK venus
> >> enables/disables this clock on its own (venus still works without
> >> touching this clock), but I didn't want to remove this in case it might
> >> be needed. I removed these clocks in the v3 I just sent.
> >>
> > 
> > Hmm. Does downstream use these clks? There have been some clk stuck
> > problems with venus recently that were attributed to improperly enabling
> > clks before enabling interconnects and power domains. Maybe it's the
> > same problem.
> > 
> 
> Yes, downstream uses these clks.
> 
> The "stuck" problem still happens if GSDCS/interconnects are always on, 
> and like I mentioned, venus works even with these clocks completely 
> removed.
> 
> I think venus controls these clocks (and downstream just happens to try 
> enabling it at a point where venus has already enabled it?). I'm not too 
> sure about this, it might have something to do with the GDSC having the 
> HW_CTRL flag too..

Ok. Maybe Taniya has an idea.


Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers

2020-09-23 Thread Sergey Senozhatsky
On (20/09/22 17:44), John Ogness wrote:
> +/*
> + * The sprint buffers are used with interrupts disabled, so each CPU
> + * only requires 2 buffers: for non-NMI and NMI contexts. Recursive
> + * printk() calls are handled by the safe buffers.
> + */
> +#define SPRINT_CTX_DEPTH 2
> +
> +/* Static sprint buffers for early boot (only 1 CPU). */
> +static DECLARE_BITMAP(sprint_static_textbuf_map, SPRINT_CTX_DEPTH);
> +static char sprint_static_textbuf[SPRINT_CTX_DEPTH * LOG_LINE_MAX];
> +
> +/* Dynamically allocated sprint buffers. */
> +static unsigned int sprint_dynamic_textbuf_count;
> +static unsigned long *sprint_dynamic_textbuf_map;
> +static char *sprint_dynamic_textbuf;

Just a question:

Can dynamic_textbuf be a PER_CPU array of five textbuf[1024] buffers
(for normal printk, nmi, hard irq, soft irq and one extra buffer for
recursive printk calls)?

So then we'd

vprintk(...)
{
preempt_disable();
buf = this_cpu_ptr(... preempt_count_to_ctx());
...
preempt_enable();
}

preempt_disable()/preempt_enable() is already in printk().

-ss


linux-next: build warning after merge of the pwm tree

2020-09-23 Thread Stephen Rothwell
Hi all,

After merging the pwm tree, today's linux-next build (x86_64 allmodconfig)
produced this warning:

WARNING: modpost: missing MODULE_LICENSE() in drivers/pwm/pwm-intel-lgm.o

Introduced by commit

  9fba318f0f7f ("Add PWM fan controller driver for LGM SoC")

-- 
Cheers,
Stephen Rothwell


pgpQkHA4BV7zI.pgp
Description: OpenPGP digital signature


Re: [PATCH v3 7/7] clk: qcom: Add display clock controller driver for SM8250

2020-09-23 Thread Stephen Boyd
Quoting Jonathan Marek (2020-09-23 09:10:04)
> On 9/22/20 3:00 PM, Stephen Boyd wrote:
> > Quoting Jonathan Marek (2020-09-11 08:34:07)
> >> diff --git a/drivers/clk/qcom/dispcc-sm8250.c 
> >> b/drivers/clk/qcom/dispcc-sm8250.c
> >> new file mode 100644
> >> index ..7c0f384a3a42
> >> --- /dev/null
> >> +++ b/drivers/clk/qcom/dispcc-sm8250.c
> >> @@ -0,0 +1,1100 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * Copyright (c) 2018-2020, The Linux Foundation. All rights reserved.
> >> + */
> >> +
> > [...]
> >> +
> >> +static const struct clk_parent_data disp_cc_parent_data_6[] = {
> >> +   { .fw_name = "bi_tcxo" },
> >> +   { .fw_name = "dsi0_phy_pll_out_dsiclk" },
> >> +   { .fw_name = "dsi1_phy_pll_out_dsiclk" },
> > 
> > Can we remove clk postfix on these clk names?
> > 
> 
> This is consistent with the names used in both sdm845 and sc7180 
> drivers. If this should change then those should be changed too?

If DT isn't using it already then it sounds OK to change the other
SoCs. Otherwise fix it just for this one.


Re: [PATCH v3 6/7] clk: qcom: Add display clock controller driver for SM8150

2020-09-23 Thread Stephen Boyd
Quoting Jonathan Marek (2020-09-23 09:24:04)
> On 9/22/20 3:04 PM, Stephen Boyd wrote:
> > Quoting Jonathan Marek (2020-09-11 08:34:06)
> >> Add support for the display clock controller found on SM8150
> >> based devices. This would allow display drivers to probe and
> >> control their clocks.
> >>
> >> Signed-off-by: Jonathan Marek 
> >> ---
> >>   drivers/clk/qcom/Kconfig |9 +
> >>   drivers/clk/qcom/Makefile|1 +
> >>   drivers/clk/qcom/dispcc-sm8150.c | 1152 ++
> >>   3 files changed, 1162 insertions(+)
> >>   create mode 100644 drivers/clk/qcom/dispcc-sm8150.c
> > 
> > If the bindings are the same for these two drivers I wonder if there is
> > anything different between the two. Maybe the two drivers can be one
> > driver?
> > 
> 
> Possibly, the biggest difference seems to be the plls (trion vs lucid, 
> different config), which could be resolved in the probe() function. If 
> you think combining the drivers is the right thing to do then I can do that.

If that's the main difference then it sounds OK to merge the two.


Re: [PATCH v1 1/6] dt_bindings: mfd: Add ROHM BD9576MUF and BD9573MUF PMICs

2020-09-23 Thread Vaittinen, Matti

On Wed, 2020-09-23 at 08:27 -0600, Rob Herring wrote:
> On Sat, Sep 19, 2020 at 5:46 AM Vaittinen, Matti
>  wrote:
> > Thanks Rob for taking a look at this!
> > 
> > On Fri, 2020-09-18 at 11:28 -0600, Rob Herring wrote:
> > > On Thu, Sep 17, 2020 at 11:01:52AM +0300, Matti Vaittinen wrote:
> > > > Add bindings for ROHM BD9576MUF and BD9573MUF PMICs. These
> > > > PMICs are primarily intended to be used to power the R-Car
> > > > series
> > > > processors. They provide 6 power outputs, safety features and a
> > > > watchdog with two functional modes.
> > > > 
> > > > Signed-off-by: Matti Vaittinen <
> > > > matti.vaitti...@fi.rohmeurope.com>
> > > > ---
> > > >  .../bindings/mfd/rohm,bd9576-pmic.yaml| 129
> > > > ++
> > > >  1 file changed, 129 insertions(+)
> > > >  create mode 100644
> > > > Documentation/devicetree/bindings/mfd/rohm,bd9576-pmic.yaml
> > > > 
> > > > diff --git a/Documentation/devicetree/bindings/mfd/rohm,bd9576-
> > > > pmic.yaml b/Documentation/devicetree/bindings/mfd/rohm,bd9576-
> > > > pmic.yaml
> > > > new file mode 100644
> > > > index ..f17d4d621585
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/mfd/rohm,bd9576-
> > > > pmic.yaml
> > > > @@ -0,0 +1,129 @@
> > > > +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> > > > +%YAML 1.2
> > > > +---
> > > > +$id: http://devicetree.org/schemas/mfd/rohm,bd9576-pmic.yaml#
> > > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > > +
> > > > +title: ROHM BD9576MUF and BD9573MUF Power Management
> > > > Integrated
> > > > Circuit bindings
> > > > +
> > > > +maintainers:
> > > > +  - Matti Vaittinen 
> > > > +
> > > > +description: |
> > > > +  BD9576MUF and BD9573MUF are power management ICs primarily
> > > > intended for
> > > > +  powering the R-Car series processors.
> > > > +  The IC provides 6 power outputs with configurable sequencing
> > > > and
> > > > safety
> > > > +  monitoring. A watchdog logic with slow ping/windowed modes
> > > > is
> > > > also included.
> > > > +
> > > > +properties:
> > > > +  compatible:
> > > > +enum:
> > > > +  - rohm,bd9576
> > > > +  - rohm,bd9573
> > > > +
> > > > +  reg:
> > > > +description:
> > > > +  I2C slave address.
> > > > +maxItems: 1
> > > > +
> > > > +  interrupts:
> > > > +maxItems: 1
> > > > +
> > > > +  rohm,vout1-en-low:
> > > > +description:
> > > > +  BD9576 and BD9573 VOUT1 regulator enable state can be
> > > > individually
> > > > +  controlled by a GPIO. This is dictated by state of
> > > > vout1-en
> > > > pin during
> > > > +  the PMIC startup. If vout1-en is LOW during PMIC startup
> > > > then the VOUT1
> > > > +  enable sate is controlled via this pin. Set this
> > > > property if
> > > > vout1-en
> > > > +  is wired to be down at PMIC start-up.
> > > > +type: boolean
> > > > +
> > > > +  rohm,vout1-en-gpios:
> > > > +description:
> > > > +  GPIO specifier to specify the GPIO connected to vout1-en 
> > > > for
> > > > vout1 ON/OFF
> > > > +  state control.
> > > > +maxItems: 1
> > > > +
> > > > +  rohm,ddr-sel-low:
> > > > +description:
> > > > +  The BD9576 and BD9573 output voltage for DDR can be
> > > > selected
> > > > by setting
> > > > +  the ddr-sel pin low or high. Set this property if ddr-
> > > > sel is
> > > > grounded.
> > > > +type: boolean
> > > > +
> > > > +  rohm,watchdog-enable-gpios:
> > > > +description: The GPIO line used to enable the watchdog.
> > > > +maxItems: 1
> > > > +
> > > > +  rohm,watchdog-ping-gpios:
> > > > +description: The GPIO line used to ping the watchdog.
> > > > +maxItems: 1
> > > > +
> > > > +  hw_margin_ms:
> > > 
> > > Needs a vendor prefix.
> > > 
> > > s/_/-/
> > > 
> > > > +minimum: 4
> > > > +maximum: 4416
> > > > +description: Watchog timeout in milliseconds
> > > 
> > > Maybe the words in the description should be in the property name
> > > as
> > > I don't see how 'h/w margin' relates to 'watchdog timeout'.
> > 
> > The hw_margin_ms is an existing property. As I wrote to Guenter:
> > "hw_margin_ms" is an existing binding for specifying the maximum
> > TMO in
> > HW (if I understood it correctly). (It is used at least by the
> > generig
> > GPIO watchdog) I thought it's better to not invent a new vendor
> > specific binding when we have a generic one.
> > 
> > https://elixir.bootlin.com/linux/v5.9-rc2/source/Documentation/devicetree/bindings/watchdog/gpio-wdt.txt
> 
> That one is odd and I haven't found an actual user of it. It would
> make more sense as a collection of properties devices could use
> rather
> than a virtual device.
> 
> I think I'd do something like 'watchdog-ping-time-msec' that can be
> either ' ' or ''.

Your suggestion looks good to me. If we introduce such then it would
make sense to add handling for this in GPIO watchdog too.

What I do wonder is how "hw_margin_ms" is unused? I see it is a
required property for GPIO

[PATCH v2] perf annotate mips: Add perf arch instructions annotate handlers

2020-09-23 Thread Peng Fan
From: Dengcheng Zhu 

Support the MIPS architecture using the ins_ops association
method. With this patch, perf-annotate can work well on MIPS.

Testing it with a perf.data file collected on a mips machine:
$./perf annotate -i perf.data

 :   Disassembly of section .text:
 :
 :   000be6a0 :
 :   get_next_seq():
0.00 :   be6a0:   lw  v0,0(a0)
0.00 :   be6a4:   daddiu  sp,sp,-128
0.00 :   be6a8:   ld  a7,72(a0)
0.00 :   be6ac:   gssqs5,s4,80(sp)
0.00 :   be6b0:   gssqs1,s0,48(sp)
0.00 :   be6b4:   gssqs8,gp,112(sp)
0.00 :   be6b8:   gssqs7,s6,96(sp)
0.00 :   be6bc:   gssqs3,s2,64(sp)
0.00 :   be6c0:   sd  a3,0(sp)
0.00 :   be6c4:   moves0,a0
0.00 :   be6c8:   sd  v0,32(sp)
0.00 :   be6cc:   sd  a5,8(sp)
0.00 :   be6d0:   sd  zero,8(a0)
0.00 :   be6d4:   sd  a6,16(sp)
0.00 :   be6d8:   ld  s2,48(a0)
8.53 :   be6dc:   ld  s1,40(a0)
9.42 :   be6e0:   ld  v1,32(a0)
0.00 :   be6e4:   nop
0.00 :   be6e8:   ld  s4,24(a0)
0.00 :   be6ec:   ld  s5,16(a0)
0.00 :   be6f0:   sd  a7,40(sp)
   10.11 :   be6f4:   ld  s6,64(a0)

...

The original patch link: 
https://lore.kernel.org/patchwork/patch/1180480/

Signed-off-by: Dengcheng Zhu 
Signed-off-by: Peng Fan 

[fanp...@loongson.cn: Add missing "bgtzl", "bltzl",
"bgezl", "blezl", "beql" and "bnel" for pre-R6processors]
---
 tools/perf/arch/mips/Build   |  2 +-
 tools/perf/arch/mips/annotate/instructions.c | 46 
 tools/perf/util/annotate.c   |  8 +
 3 files changed, 55 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/arch/mips/annotate/instructions.c

diff --git a/tools/perf/arch/mips/Build b/tools/perf/arch/mips/Build
index 1bb8bf6..e4e5f33 100644
--- a/tools/perf/arch/mips/Build
+++ b/tools/perf/arch/mips/Build
@@ -1 +1 @@
-# empty
+perf-y += util/
diff --git a/tools/perf/arch/mips/annotate/instructions.c 
b/tools/perf/arch/mips/annotate/instructions.c
new file mode 100644
index 000..340993f
--- /dev/null
+++ b/tools/perf/arch/mips/annotate/instructions.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+
+static
+struct ins_ops *mips__associate_ins_ops(struct arch *arch, const char *name)
+{
+   struct ins_ops *ops = NULL;
+
+   if (!strncmp(name, "bal", 3) ||
+   !strncmp(name, "bgezal", 6) ||
+   !strncmp(name, "bltzal", 6) ||
+   !strncmp(name, "bgtzal", 6) ||
+   !strncmp(name, "blezal", 6) ||
+   !strncmp(name, "beqzal", 6) ||
+   !strncmp(name, "bnezal", 6) ||
+   !strncmp(name, "bgtzl", 5) ||
+   !strncmp(name, "bltzl", 5) ||
+   !strncmp(name, "bgezl", 5) ||
+   !strncmp(name, "blezl", 5) ||
+   !strncmp(name, "jialc", 5) ||
+   !strncmp(name, "beql", 4) ||
+   !strncmp(name, "bnel", 4) ||
+   !strncmp(name, "jal", 3))
+   ops = &call_ops;
+   else if (!strncmp(name, "jr", 2))
+   ops = &ret_ops;
+   else if (name[0] == 'j' || name[0] == 'b')
+   ops = &jump_ops;
+   else
+   return NULL;
+
+   arch__associate_ins_ops(arch, name, ops);
+
+   return ops;
+}
+
+static
+int mips__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
+{
+   if (!arch->initialized) {
+   arch->associate_instruction_ops = mips__associate_ins_ops;
+   arch->initialized = true;
+   arch->objdump.comment_char = '#';
+   }
+
+   return 0;
+}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 0a1fcf7..80a4a3d 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -152,6 +152,7 @@ static int arch__associate_ins_ops(struct arch* arch, const 
char *name, struct i
 #include "arch/arm/annotate/instructions.c"
 #include "arch/arm64/annotate/instructions.c"
 #include "arch/csky/annotate/instructions.c"
+#include "arch/mips/annotate/instructions.c"
 #include "arch/x86/annotate/instructions.c"
 #include "arch/powerpc/annotate/instructions.c"
 #include "arch/s390/annotate/instructions.c"
@@ -175,6 +176,13 @@ static struct arch architectures[] = {
.init = csky__annotate_init,
},
{
+   .name = "mips",
+   .init = mips__annotate_init,
+   .objdump = {
+   .comment_char = '#',
+   },
+   },
+   {
.name = "x86",
.init = x86__annotate_init,
.instructions = x86__instructions,
-- 
2.1.0



Re: [PATCH] drm/rockchip: skip probed failed device

2020-09-23 Thread Jian-Hong Pan
Heiko Stübner  於 2020年9月23日 週三 下午7:16寫道:
>
> Am Mittwoch, 23. September 2020, 13:05:26 CEST schrieb Robin Murphy:
> > On 2020-09-23 07:59, Jian-Hong Pan wrote:
> > > The cdn-dp sub driver probes the device failed on PINEBOOK Pro.
> > >
> > > kernel: cdn-dp fec0.dp: [drm:cdn_dp_probe [rockchipdrm]] *ERROR* 
> > > missing extcon or phy
> > > kernel: cdn-dp: probe of fec0.dp failed with error -22
> >
> > Wouldn't it make more sense to simply not enable the DisplayPort node in
> > the upstream DT, until the type-C phy work has been done to make it
> > usable at all?
>
> Or alternatively just disable the cdn-dp Rockchip driver in the kernel config,
> which results in it also not getting probed.

This may be the simplest way.
However, considering generic distro kernels have a policy to enable
all drivers, disabling the DisplayPort node in the upstream DT, until
the type-C phy work has been done may be a better solution for now.
I can prepare a patch for this.

Jian-Hong Pan

> > AIUI the "official" Manjaro kernel is carrying a bunch of
> > hacks to make type-C work via extcon, but they know that isn't an
> > upstreamable solution.
> >
> > Robin.
> >
> > > Then, the device halts all of the DRM related device jobs. For example,
> > > the operations: vop_component_ops, vop_component_ops and
> > > rockchip_dp_component_ops cannot be bound to corresponding devices. So,
> > > Xorg cannot find the correct DRM device.
> > >
> > > This patch skips the probing failed devices to fix this issue.
> > >
> > > Link: 
> > > http://lists.infradead.org/pipermail/linux-rockchip/2020-September/022352.html
> > > Signed-off-by: Jian-Hong Pan 
> > > ---
> > >   drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 6 ++
> > >   1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c 
> > > b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> > > index 0f3eb392fe39..de13588602b4 100644
> > > --- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> > > +++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> > > @@ -331,6 +331,12 @@ static struct component_match 
> > > *rockchip_drm_match_add(struct device *dev)
> > >
> > > if (!d)
> > > break;
> > > +   if (!d->driver) {
> > > +   DRM_DEV_ERROR(d,
> > > + "%s did not probe successfully",
> > > + drv->driver.name);
> > > +   continue;
> > > +   }
> > >
> > > device_link_add(dev, d, DL_FLAG_STATELESS);
> > > component_match_add(dev, &match, compare_dev, d);


Re: [PATCH] ARM: dts: document pinctrl-single,pins when #pinctrl-cells = 2

2020-09-23 Thread Tony Lindgren
* Trent Piepho  [200924 05:49]:
> On Wed, Sep 23, 2020 at 10:43 PM Tony Lindgren  wrote:
> >
> > * Trent Piepho  [200924 01:34]:
> > > On Tue, Sep 22, 2020 at 11:57 PM Tony Lindgren  wrote:
> > > >
> > > > Also FYI, folks have also complained for a long time that the 
> > > > pinctrl-single
> > > > binding mixes mux and conf values while they should be handled 
> > > > separately.
> > > >
> > >
> > > Instead of combining two fields when the dts is generated they are now
> > > combined when the pinctrl-single driver reads the dts.  Other than
> > > this detail, the result is the same.  The board dts source is the
> > > same.  The value programmed into the pinctrl register is the same.
> > > There is no mechanism currently that can alter that value in any way.
> > >
> > > What does combining them later allow that is not possible now?
> >
> > It now allows further driver changes to manage conf and mux separately :)
> 
> The pinctrl-single driver?  How will that work with boards that are
> not am335x and don't use conf and mux fields in the same manner as
> am335x?

For those cases we still have #pinctrl-cells = <1>.

Regards,

Tony


[PATCH v2] fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is set

2020-09-23 Thread Hao Li
If DCACHE_REFERENCED is set, fast_dput() will return true, and then
retain_dentry() have no chance to check DCACHE_DONTCACHE. As a result,
the dentry won't be killed and the corresponding inode can't be evicted.
In the following example, the DAX policy can't take effects unless we
do a drop_caches manually.

  # DCACHE_LRU_LIST will be set
  echo abcdefg > test.txt

  # DCACHE_REFERENCED will be set and DCACHE_DONTCACHE can't do anything
  xfs_io -c 'chattr +x' test.txt

  # Drop caches to make DAX changing take effects
  echo 2 > /proc/sys/vm/drop_caches

What this patch does is preventing fast_dput() from returning true if
DCACHE_DONTCACHE is set. Then retain_dentry() will detect the
DCACHE_DONTCACHE and will return false. As a result, the dentry will be
killed and the inode will be evicted. In this way, if we change per-file
DAX policy, it will take effects automatically after this file is closed
by all processes.

I also add some comments to make the code more clear.

Signed-off-by: Hao Li 
---
v1 is split into two standalone patch as discussed in [1], and the first
patch has been reviewed in [2]. This is the second patch.

[1]: 
https://lore.kernel.org/linux-fsdevel/20200831003407.ge12...@dread.disaster.area/
[2]: 
https://lore.kernel.org/linux-fsdevel/20200906214002.gi12...@dread.disaster.area/

 fs/dcache.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ea0485861d93..97e81a844a96 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -793,10 +793,17 @@ static inline bool fast_dput(struct dentry *dentry)
 * a reference to the dentry and change that, but
 * our work is done - we can leave the dentry
 * around with a zero refcount.
+*
+* Nevertheless, there are two cases that we should kill
+* the dentry anyway.
+* 1. free disconnected dentries as soon as their refcount
+*reached zero.
+* 2. free dentries if they should not be cached.
 */
smp_rmb();
d_flags = READ_ONCE(dentry->d_flags);
-   d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | DCACHE_DISCONNECTED;
+   d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST |
+   DCACHE_DISCONNECTED | DCACHE_DONTCACHE;
 
/* Nothing to do? Dropping the reference was all we needed? */
if (d_flags == (DCACHE_REFERENCED | DCACHE_LRU_LIST) && 
!d_unhashed(dentry))
-- 
2.28.0





Re: [PATCH] Revert "net: linkwatch: add check for netdevice being present to linkwatch_do_dev"

2020-09-23 Thread Saeed Mahameed
On Wed, 2020-09-23 at 17:23 -0700, David Miller wrote:
> From: David Miller 
> Date: Wed, 23 Sep 2020 17:21:25 -0700 (PDT)
> 
> > If an async code path tests 'present', gets true, and then the RTNL
> > holding synchronous code path puts the device into D3hot
> immediately
> > afterwards, the async code path will still continue and access the
> > chips registers and fault.
> 
> Wait, is the sequence:
> 
> ->ndo_stop()
> mark device not present and put into D3hot
> triggers linkwatch event
>   ...
>  ->ndo_get_stats64()
> 
> ???
> 

I assume it is, since normally device drivers do carrier_off() on
ndo_stop()

1) One problematic sequence would be 
(for drivers doing D3hot on ndo_stop())

__dev_close_many()
   ->ndo_stop()
  netif_device_detach() //Mark !present;
  ... D3hot
  carrier_off()->linkwatch_event()
... // !present && IFF_UP 
  
2) Another problematic scenario which i see is repeated in many
drivers:

shutdown/suspend()
rtnl_lock()
netif_device_detach()//Mark !present;
stop()->carrier_off()->linkwatch_event()
// at this point device is still IFF_UP and !present
// due to the early detach above..  
rtnl_unlock();
   
For scenario 1) we can fix by marking IFF_UP at the beginning, but for
2), i think we need to fix the drivers to detach only after stop :(
   
> Then yeah we might have to clear IFF_UP at the beginning of taking
> a netdev down.




Re: [PATCH] ARM: dts: document pinctrl-single,pins when #pinctrl-cells = 2

2020-09-23 Thread Trent Piepho
On Wed, Sep 23, 2020 at 10:43 PM Tony Lindgren  wrote:
>
> * Trent Piepho  [200924 01:34]:
> > On Tue, Sep 22, 2020 at 11:57 PM Tony Lindgren  wrote:
> > >
> > > Also FYI, folks have also complained for a long time that the 
> > > pinctrl-single
> > > binding mixes mux and conf values while they should be handled separately.
> > >
> >
> > Instead of combining two fields when the dts is generated they are now
> > combined when the pinctrl-single driver reads the dts.  Other than
> > this detail, the result is the same.  The board dts source is the
> > same.  The value programmed into the pinctrl register is the same.
> > There is no mechanism currently that can alter that value in any way.
> >
> > What does combining them later allow that is not possible now?
>
> It now allows further driver changes to manage conf and mux separately :)

The pinctrl-single driver?  How will that work with boards that are
not am335x and don't use conf and mux fields in the same manner as
am335x?


Re: [Linux-stm32] [PATCH 3/3] ARM: dts: stm32: update stm32mp151 for remote proc synchronisation support

2020-09-23 Thread Ahmad Fatoum
Hello Arnaud,

On 8/27/20 9:21 AM, Arnaud Pouliquen wrote:
> Two backup registers are used to store the Cortex-M4 state and the resource
> table address.
> Declare the tamp node and add associated properties in m4_rproc node
> to allow Linux to attach to a firmware loaded by the first boot stages.
> 
> Associated driver implementation is available in commit 9276536f455b3
> ("remoteproc: stm32: Parse syscon that will manage M4 synchronisation").
> 
> Signed-off-by: Arnaud Pouliquen 
> ---
>  arch/arm/boot/dts/stm32mp151.dtsi | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/stm32mp151.dtsi 
> b/arch/arm/boot/dts/stm32mp151.dtsi
> index bfe29023fbd5..842ecffae73a 100644
> --- a/arch/arm/boot/dts/stm32mp151.dtsi
> +++ b/arch/arm/boot/dts/stm32mp151.dtsi
> @@ -1541,6 +1541,11 @@
>   status = "disabled";
>   };
>  
> + tamp: tamp@5c00a000 {
> + compatible = "st,stm32-tamp", "syscon";
> + reg = <0x5c00a000 0x400>;
> + };
> +

Just saw this now. I have a pending patch adding this node as well:
https://lore.kernel.org/patchwork/patch/1306971/

For my use case, I need a "simple-mfd" compatible to allow child
nodes to be probed.

Could you CC me when you send out your v2, so I can rebase?
(Or if you don't mind, just add the "simple-mfd" into the compatible
list yourself :-)

Cheers
Ahmad

>   /*
>* Break node order to solve dependency probe issue between
>* pinctrl and exti.
> @@ -1717,6 +1722,8 @@
>   st,syscfg-holdboot = <&rcc 0x10C 0x1>;
>   st,syscfg-tz = <&rcc 0x000 0x1>;
>   st,syscfg-pdds = <&pwr_mcu 0x0 0x1>;
> + st,syscfg-rsc-tbl = <&tamp 0x144 0x>;
> + st,syscfg-m4-state = <&tamp 0x148 0x>;
>   status = "disabled";
>   };
>   };
> 

-- 
Pengutronix e.K.   | |
Steuerwalder Str. 21   | http://www.pengutronix.de/  |
31137 Hildesheim, Germany  | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |


Re: [PATCH] ARM: dts: document pinctrl-single,pins when #pinctrl-cells = 2

2020-09-23 Thread Tony Lindgren
* Trent Piepho  [200924 01:34]:
> On Tue, Sep 22, 2020 at 11:57 PM Tony Lindgren  wrote:
> >
> > Also FYI, folks have also complained for a long time that the pinctrl-single
> > binding mixes mux and conf values while they should be handled separately.
> >
> 
> Instead of combining two fields when the dts is generated they are now
> combined when the pinctrl-single driver reads the dts.  Other than
> this detail, the result is the same.  The board dts source is the
> same.  The value programmed into the pinctrl register is the same.
> There is no mechanism currently that can alter that value in any way.
> 
> What does combining them later allow that is not possible now?

It now allows further driver changes to manage conf and mux separately :)

Regards,

Tony


Re: [PATCH v2 0/6] clk: axi-clk-gen: misc updates to the driver

2020-09-23 Thread Alexandru Ardelean
On Thu, Sep 24, 2020 at 7:53 AM Moritz Fischer  wrote:
>
> Hi Stephen,
>
> On Wed, Sep 23, 2020 at 04:58:33PM -0700, Stephen Boyd wrote:
> > Quoting Alexandru Ardelean (2020-09-22 23:22:33)
> > > On Tue, Sep 22, 2020 at 10:42 PM Stephen Boyd  wrote:
> > > >
> > > > Quoting Moritz Fischer (2020-09-14 19:41:38)
> > > > > On Mon, Sep 14, 2020 at 11:11:05AM +0300, Alexandru Ardelean wrote:
> > > > > > On Mon, Aug 10, 2020 at 4:41 PM Alexandru Ardelean
> > > > > >  wrote:
> > > > > > >
> > > > > > > These patches synchronize the driver with the current state in the
> > > > > > > Analog Devices Linux tree:
> > > > > > >   https://github.com/analogdevicesinc/linux/
> > > > > > >
> > > > > > > They have been in the tree for about 2-3, so they did receive some
> > > > > > > testing.
> > > > > >
> > > > > > Ping on this series.
> > > > > > Do I need to do a re-send?
> > > >
> > > > I got this patch series twice. Not sure why.
> > >
> > > My fault here.
> > > Some Ctrl + R usage and not being attentive with the arguments.
> > > I think I added "*.patch" twice on the send-mail command.
> > > I did something similar [by accident] for some DMA patches.
> > > Apologies.
> > >
> > > I can do a re-send for this, if it helps.
> >
> > Sure. Please resend it.
> >
> > >
> > > >
> > > > >
> > > > > I've applied the FPGA one, the other ones should go through the clock
> > > > > tree I think?
> > > >
> > > > Doesn't patch 6 rely on the FPGA patch? How can that driver build
> > > > without the header file?
> > >
> > > Yes it does depend on the FPGA patch.
> > > We can drop patch 6 for now, pending a merge to Linus' tree and then
> > > wait for the trickle-down.
> > > I don't mind waiting for these patches.
> > > I have plenty of backlog that I want to run through, and cleanup and
> > > then upstream.
> > > So, there is no hurry.
> >
> > Can you send me a signed tag with that patch? I can base this patch
> > series on top of that. Or I can just apply it to clk tree and if nobody
> > changes it in the meantime merge should work out in linux-next and
> > linus' tree upstream.
>
> Long story short I messed up my pull-request to Greg and had to back out
> the patch anyways. In retrospect I think the patch should have gone
> through your tree anyways, so here's our chance to get it right.
>
> Feel free to take it with the rest of the changes through your tree.
>
> Note: When I applied the patch I fixed up the whitespace that checkpatch
> complained about so you might want to do that (or ask Alexandru to
> resend the patch).
>

I'll fixup the checkpatch stuff, re-send as a V3, and add your Acked-by.
Thanks & apologies for the mess-up on my part.

> Acked-by: Moritz Fischer 
>
> Sorry for the confusion and let me know if you still prefer a signed
> tag.
>
> - Moritz


Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers

2020-09-23 Thread Sergey Senozhatsky
On (20/09/23 17:11), Petr Mladek wrote:
>
> AFAIK, there is one catch. We need to use va_copy() around
> the 1st call because va_format can be proceed only once.
>

Current printk() should be good enough for reporting, say, "Kernel
stack overflow" errors. Is extra pressure that va_copy() adds something
that we need to consider?

-ss


[PATCH] Input: trackpoint - enable Synaptics trackpoints

2020-09-23 Thread Vincent Huang
Add Synaptics IDs in trackpoint_start_protocol() to mark them as valid.

Signed-off-by: Vincent Huang 
---
 drivers/input/mouse/trackpoint.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/input/mouse/trackpoint.c b/drivers/input/mouse/trackpoint.c
index 854d5e758724..ef2fa0905208 100644
--- a/drivers/input/mouse/trackpoint.c
+++ b/drivers/input/mouse/trackpoint.c
@@ -282,6 +282,8 @@ static int trackpoint_start_protocol(struct psmouse 
*psmouse,
case TP_VARIANT_ALPS:
case TP_VARIANT_ELAN:
case TP_VARIANT_NXP:
+   case TP_VARIANT_JYT_SYNAPTICS:
+   case TP_VARIANT_SYNAPTICS:
if (variant_id)
*variant_id = param[0];
if (firmware_id)
-- 
2.25.1



Re: [PATCH] KVM: Enable hardware before doing arch VM initialization

2020-09-23 Thread Christian Borntraeger



On 23.09.20 20:57, Sean Christopherson wrote:
> Swap the order of hardware_enable_all() and kvm_arch_init_vm() to
> accommodate Intel's Trust Domain Extension (TDX), which needs VMX to be
> fully enabled during VM init in order to make SEAMCALLs.
> 
> This also provides consistent ordering between kvm_create_vm() and
> kvm_destroy_vm() with respect to calling kvm_arch_destroy_vm() and
> hardware_disable_all().
> 
> Cc: Marc Zyngier 
> Cc: James Morse 
> Cc: Julien Thierry 
> Cc: Suzuki K Poulose 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: Huacai Chen 
> Cc: Aleksandar Markovic 
> Cc: linux-m...@vger.kernel.org
> Cc: Paul Mackerras 
> Cc: kvm-...@vger.kernel.org
> Cc: Christian Borntraeger 
> Cc: Janosch Frank 
> Cc: David Hildenbrand 
> Cc: Cornelia Huck 
> Cc: Claudio Imbrenda 
> Cc: Vitaly Kuznetsov 
> Cc: Wanpeng Li 
> Cc: Jim Mattson 
> Cc: Joerg Roedel 
> Signed-off-by: Sean Christopherson 
> ---
> 
> Obviously not required until the TDX series comes along, but IMO KVM
> should be consistent with respect to enabling and disabling virt support
> in hardware.
> 
> Tested only on Intel hardware.  Unless I missed something, this only
> affects x86, Arm and MIPS as hardware enabling is a nop for s390 and PPC.

Yes, looks fine from an s390 perspective.

Reviewed-by: Christian Borntraeger 



Re: [PATCH] doc: zh_CN: add translatation for btrfs

2020-09-23 Thread Alex Shi
Hi Qing,

It looks like all vivo guys patch has 'charset=y' problem and
mess code which fails on a success 'git am'. I have to repeat
the same reminder again and again...

Could you guys double your patches before send out? and make 
sure docs looks good on webpage, like 
https://www.kernel.org/doc/html/v5.9-rc3/translations/zh_CN/filesystems/debugfs.html

Thanks
Alex

在 2020/9/22 下午8:03, Wang Qing 写道:
> Translate Documentation/filesystems/btrfs.rst into Chinese.
> 
> Signed-off-by: Wang Qing 
> ---


[PATCH v3] mm: cma: indefinitely retry allocations in cma_alloc

2020-09-23 Thread Chris Goldsworthy
V1: Introduces a retry loop that attempts a CMA allocation a finite
number of times before giving up:
 
https://lkml.org/lkml/2020/8/5/1097
https://lkml.org/lkml/2020/8/11/893

V2: Introduces an indefinite retry for CMA allocations.  David Hildenbrand
raised a page pinning example which precludes doing this infite-retrying
for all CMA users:

https://lkml.org/lkml/2020/9/17/984

V3: Re-introduce a GFP mask argument for cma_alloc(), that can take in
__GFP_NOFAIL as an argument to indicate that a CMA allocation should be
retried indefinitely. This lets callers of cma_alloc() decide if they want
to perform indefinite retires. Also introduces a config option for
controlling the duration of the sleep between retries.

Chris Goldsworthy (1):
  mm: cma: indefinitely retry allocations in cma_alloc

 arch/powerpc/kvm/book3s_hv_builtin.c   |  2 +-
 drivers/dma-buf/heaps/cma_heap.c   |  2 +-
 drivers/s390/char/vmcp.c   |  2 +-
 drivers/staging/android/ion/ion_cma_heap.c |  2 +-
 include/linux/cma.h|  2 +-
 kernel/dma/contiguous.c|  4 ++--
 mm/Kconfig | 11 ++
 mm/cma.c   | 35 +-
 mm/cma_debug.c |  2 +-
 mm/hugetlb.c   |  4 ++--
 10 files changed, 50 insertions(+), 16 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v3] mm: cma: indefinitely retry allocations in cma_alloc

2020-09-23 Thread Chris Goldsworthy
CMA allocations will fail if 'pinned' pages are in a CMA area, since we
cannot migrate pinned pages. The _refcount of a struct page being greater
than _mapcount for that page can cause pinning for anonymous pages.  This
is because try_to_unmap(), which (1) is called in the CMA allocation path,
and (2) decrements both _refcount and _mapcount for a page, will stop
unmapping a page from VMAs once the _mapcount for a page reaches 0.  This
implies that after try_to_unmap() has finished successfully for a page
where _recount > _mapcount, that _refcount will be greater than 0.  Later
in the CMA allocation path in migrate_page_move_mapping(), we will have one
more reference count than intended for anonymous pages, meaning the
allocation will fail for that page.

If a process ends up causing _refcount > _mapcount for a page (by either
incrementing _recount or decrementing _mapcount), such that the process is
context switched out after modifying one refcount but before modifying the
other, the page will be temporarily pinned.

One example of where _refcount can be greater than _mapcount is inside of
zap_pte_range(), which is called for all the entries of a PMD when a
process is exiting, to unmap the process's memory.  Inside of
zap_pte_range(), after unammping a page with page_remove_rmap(), we have
that _recount > _mapcount.  _refcount can only be decremented after a TLB
flush is performed for the page - this doesn't occur until enough pages
have been batched together for flushing.  The flush can either occur inside
of zap_pte_range() (during the same invocation or a later one), or if there
aren't enough pages collected by the time we unmap all of the pages in a
process, the flush will occur in tlb_finish_mmu() in exit_mmap().  After
the flush has occurred, tlb_batch_pages_flush() will decrement the
references on the flushed pages.

Another such example like the above is inside of copy_one_pte(), which is
called during a fork. For PTEs for which pte_present(pte) == true,
copy_one_pte() will increment the _refcount field followed by the
_mapcount field of a page.

So, inside of cma_alloc(), add the option of letting users pass in
__GFP_NOFAIL to indicate that we should retry CMA allocations indefinitely,
in the event that alloc_contig_range() returns -EBUSY after having scanned
a whole CMA-region bitmap.

Signed-off-by: Chris Goldsworthy 
Co-developed-by: Vinayak Menon 
Signed-off-by: Vinayak Menon 
---
 arch/powerpc/kvm/book3s_hv_builtin.c   |  2 +-
 drivers/dma-buf/heaps/cma_heap.c   |  2 +-
 drivers/s390/char/vmcp.c   |  2 +-
 drivers/staging/android/ion/ion_cma_heap.c |  2 +-
 include/linux/cma.h|  2 +-
 kernel/dma/contiguous.c|  4 ++--
 mm/Kconfig | 11 ++
 mm/cma.c   | 35 +-
 mm/cma_debug.c |  2 +-
 mm/hugetlb.c   |  4 ++--
 10 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 073617c..21c3f6a 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -74,7 +74,7 @@ struct page *kvm_alloc_hpt_cma(unsigned long nr_pages)
VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 
return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES),
-false);
+0);
 }
 EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma);
 
diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 626cf7f..7657359 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -66,7 +66,7 @@ static int cma_heap_allocate(struct dma_heap *heap,
helper_buffer->heap = heap;
helper_buffer->size = len;
 
-   cma_pages = cma_alloc(cma_heap->cma, nr_pages, align, false);
+   cma_pages = cma_alloc(cma_heap->cma, nr_pages, align, 0);
if (!cma_pages)
goto free_buf;
 
diff --git a/drivers/s390/char/vmcp.c b/drivers/s390/char/vmcp.c
index 9e06628..11c4e3b 100644
--- a/drivers/s390/char/vmcp.c
+++ b/drivers/s390/char/vmcp.c
@@ -70,7 +70,7 @@ static void vmcp_response_alloc(struct vmcp_session *session)
 * anymore the system won't work anyway.
 */
if (order > 2)
-   page = cma_alloc(vmcp_cma, nr_pages, 0, false);
+   page = cma_alloc(vmcp_cma, nr_pages, 0, 0);
if (page) {
session->response = (char *)page_to_phys(page);
session->cma_alloc = 1;
diff --git a/drivers/staging/android/ion/ion_cma_heap.c 
b/drivers/staging/android/ion/ion_cma_heap.c
index bf65e67..128d3a5 100644
--- a/drivers/staging/android/ion/ion_cma_heap.c
+++ b/drivers/staging/android/ion/ion_cma_heap.c
@@ -39,7 +39,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct 
ion_buf

[PATCH] rpadlpar_io:Add MODULE_DESCRIPTION entries to kernel modules

2020-09-23 Thread Mamatha Inamdar
This patch adds a brief MODULE_DESCRIPTION to rpadlpar_io kernel modules
(descriptions taken from Kconfig file)

Signed-off-by: Mamatha Inamdar 
---
 drivers/pci/hotplug/rpadlpar_core.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
b/drivers/pci/hotplug/rpadlpar_core.c
index f979b70..bac65ed 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -478,3 +478,4 @@ static void __exit rpadlpar_io_exit(void)
 module_init(rpadlpar_io_init);
 module_exit(rpadlpar_io_exit);
 MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("RPA Dynamic Logical Partitioning driver for I/O slots");



Re: [PATCH v2] mm: cma: indefinitely retry allocations in cma_alloc

2020-09-23 Thread Chris Goldsworthy

On 2020-09-17 10:54, Chris Goldsworthy wrote:

On 2020-09-15 00:53, David Hildenbrand wrote:

On 14.09.20 20:33, Chris Goldsworthy wrote:

On 2020-09-14 02:31, David Hildenbrand wrote:

On 11.09.20 21:17, Chris Goldsworthy wrote:


So, inside of cma_alloc(), instead of giving up when
alloc_contig_range()
returns -EBUSY after having scanned a whole CMA-region bitmap,
perform
retries indefinitely, with sleeps, to give the system an 
opportunity

to
unpin any pinned pages.

Signed-off-by: Chris Goldsworthy 
Co-developed-by: Vinayak Menon 
Signed-off-by: Vinayak Menon 
---
 mm/cma.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/mm/cma.c b/mm/cma.c
index 7f415d7..90bb505 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -442,8 +443,28 @@ struct page *cma_alloc(struct cma *cma, size_t
count, unsigned int align,
bitmap_maxno, start, bitmap_count, mask,
offset);
if (bitmap_no >= bitmap_maxno) {
-   mutex_unlock(&cma->lock);
-   break;
+   if (ret == -EBUSY) {
+   mutex_unlock(&cma->lock);
+
+   /*
+* Page may be momentarily pinned by some other
+* process which has been scheduled out, e.g.
+* in exit path, during unmap call, or process
+* fork and so cannot be freed there. Sleep
+* for 100ms and retry the allocation.
+*/
+   start = 0;
+   ret = -ENOMEM;
+   msleep(100);
+   continue;
+   } else {
+   /*
+* ret == -ENOMEM - all bits in cma->bitmap are
+* set, so we break accordingly.
+*/
+   mutex_unlock(&cma->lock);
+   break;
+   }
}
bitmap_set(cma->bitmap, bitmap_no, bitmap_count);
/*



What about long-term pinnings? IIRC, that can happen easily e.g.,
with
vfio (and I remember there is a way via vmsplice).

Not convinced trying forever is a sane approach in the general case
...


V1:
[1] https://lkml.org/lkml/2020/8/5/1097
[2] https://lkml.org/lkml/2020/8/6/1040
[3] https://lkml.org/lkml/2020/8/11/893
[4] https://lkml.org/lkml/2020/8/21/1490
[5] https://lkml.org/lkml/2020/9/11/1072

We're fine with doing indefinite retries, on the grounds that if 
there

is some long-term pinning that occurs when alloc_contig_range returns
-EBUSY, that it should be debugged and fixed.  Would it be possible 
to

make this infinite-retrying something that could be enabled or
disabled
by a defconfig option?


Two thoughts:

This means I strongly prefer something like [3] if feasible.


_Resending so that this ends up on LKML_

I can give [3] some further thought then.  Also, I realized [3] will 
not

completely solve the problem, it just reduces the window in which
_refcount > _mapcount (as mentioned in earlier threads, we encountered
the pinning when a task in copy_one_pte() or in the exit_mmap() path
gets context switched out).  If we were to try a sleeping-lock based
solution, do you think it would be permissible to add another lock to
struct page?


I have not been able to think of a clean way of introducing calls to 
preempt_disable() in exit_mmap(), which is the more problematic case.  
We would need to track state across multiple invocations of 
zap_pte_range() (which is called for each entry in a PMD when a 
process's memory is being unmapped), and would also need to extend this 
to tlb_finish_mmu(), which is called after all the process's memory has 
been unmapped: 
https://elixir.bootlin.com/linux/v5.8.10/source/mm/mmap.c#L3164.  As a 
follow-up to this patch, I'm submitting a patch that re-introduces the 
GFP mask for cma_alloc, that will perform indefinite retires if 
__GFP_NOFAIL is passed to the function.


--
The Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

a Linux Foundation Collaborative Project


Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed

2020-09-23 Thread Alexey Kardashevskiy



On 23/09/2020 17:06, Cédric Le Goater wrote:
> On 9/23/20 2:33 AM, Qian Cai wrote:
>> On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>>> When a passthrough IO adapter is removed from a pseries machine using
>>> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>>> guest OS to clear all page table entries related to the adapter. If
>>> some are still present, the RTAS call which isolates the PCI slot
>>> returns error 9001 "valid outstanding translations" and the removal of
>>> the IO adapter fails. This is because when the PHBs are scanned, Linux
>>> maps automatically the INTx interrupts in the Linux interrupt number
>>> space but these are never removed.
>>>
>>> To solve this problem, we introduce a PPC platform specific
>>> pcibios_remove_bus() routine which clears all interrupt mappings when
>>> the bus is removed. This also clears the associated page table entries
>>> of the ESB pages when using XIVE.
>>>
>>> For this purpose, we record the logical interrupt numbers of the
>>> mapped interrupt under the PHB structure and let pcibios_remove_bus()
>>> do the clean up.
>>>
>>> Since some PCI adapters, like GPUs, use the "interrupt-map" property
>>> to describe interrupt mappings other than the legacy INTx interrupts,
>>> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>>> number of interrupt mappings is computed from the "interrupt-map"
>>> property and the mapping array is allocated accordingly.
>>>
>>> Cc: "Oliver O'Halloran" 
>>> Cc: Alexey Kardashevskiy 
>>> Signed-off-by: Cédric Le Goater 
>>
>> Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed 
>> to
>> this patch.
>>
>> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> 
> OK. The patch is missing a NULL assignement after kfree() and that
> might be the issue. 
> 
> I did try PHB removal under PowerNV, so I would like to understand 
> how we managed to remove twice the PCI bus and possibly reproduce. 
> Any chance we could grab what the syscall fuzzer (syzkaller) did ? 



My guess would be it is doing this in parallel to provoke races.



-- 
Alexey


Re: [PATCH v3 2/3] misc: bcm-vk: add Broadcom VK driver

2020-09-23 Thread Greg Kroah-Hartman
On Wed, Sep 23, 2020 at 09:43:55PM -0700, Scott Branden wrote:
> >> +struct bcm_vk_tty {
> >> +  struct tty_port port;
> >> +  uint32_t to_offset; /* bar offset to use */
> >> +  uint32_t to_size;   /* to VK buffer size */
> >> +  uint32_t wr;/* write offset shadow */
> >> +  uint32_t from_offset;   /* bar offset to use */
> >> +  uint32_t from_size; /* from VK buffer size */
> >> +  uint32_t rd;/* read offset shadow */
> > nit, these "unit32_t" stuff really doesn't matter in the kernel, 'u32'
> > is a better choice overall.  Same for u8 and others, for this whole
> > driver.
> Other than personal preference, I don't understand how 'u32' is better.
> uint32_t follows the ANSI stdint.h.  It allows for portable code without
> the need to define custom u32 types.

The ANSI namespace does not work in the kernel, which is why we have our
own types that pre-date those, and work properly everywhere in the
kernel.

> stdint types are used in many drivers in the linux kernel already.
> We would prefer to keep our code as portable as possible and use
> stdint types in the driver.

You aren't porting this code to other operating systems easily, please
use the kernel types :)

And yes, these types are used in other parts, but when you have 25
million lines of code, some crud does slip in at times...

> >> +  pid_t pid;
> >> +  bool irq_enabled;
> >> +  bool is_opened; /* tracks tty open/close */
> > Why do you need to track this?  Doesn't the tty core handle this for
> > you?
> I have tried using tty_port_kopened() and it doesn't seem to work.
> Will need to debug some more unless you have another suggested function to 
> use.

You didn't answer _why_ you need to track this.  A tty driver shouldn't
care about this type of thing.

> >> +  struct workqueue_struct *tty_wq_thread;
> >> +  struct work_struct tty_wq_work;
> >> +
> >> +  /* Reference-counting to handle file operations */
> >> +  struct kref kref;
> > And a kref?
> >
> > What is controlling the lifetime rules of your structure?
> >
> > Why a kref?
> >
> > Why the tty ports?
> >
> > Why the misc device?
> >
> > This feels really crazy to me...
> Comments mostly from Desmond here:
> 
> Yes, we have created a PCIe centric driver that combines with both a misc 
> devices on top (for the read/write/ioctrl), and also ttys.
> The device sits on PCIe but we are using the misc device for accessing it.
> tty is just another on top.  I don't think this is that uncommon to have a 
> hybrid driver.

Ugh, yes, it is uncommon because those are two different things.  Why do
you need/want a misc driver to control a tty device?  Why do you need a
tty device?  What really is this beast?

We got rid of the old "control path" device nodes for tty devices a long
time ago, this feels like a return to that old model, is that why you
are doing this?

But again, I really don't understand what this driver is trying to
control/manage, so it's hard to review it without that knowledge.

> Since we have a hybrid of PCIe + misc + tty, it means that we could 
> simultaneously have opening dev/node to read/write (multiple) + tty o going.

That's almost always a bad idea.

> Since the struct is embedded inside the primary PCIe structure, we need a way 
> to know when all the references are done, and then at that point we could 
> free the primary structure.
> That is the reason for the kref.  On PCIe device removal, we signal the user 
> space process to stop first, but the data structure can not be freed until 
> the ref goes to 0.

Again, you can not have multiple reference count objects controling a
single object.  That way is madness and buggy and will never work
properly.

You can have different objects with different lifespans, which, if you
really really want to do this, is the correct way.  Otherwise, stick
with one object and one reference count please.

thanks,

greg k-h


Re: [PATCH -next] crypto: qat - remove unnecessary mutex_init()

2020-09-23 Thread Giovanni Cabiddu
On Wed, Sep 16, 2020 at 07:21:21AM +0100, Qinglang Miao wrote:
> The mutex adf_ctl_lock is initialized statically. It is
> unnecessary to initialize by mutex_init().
> 
> Signed-off-by: Qinglang Miao 

Acked-by: Giovanni Cabiddu 

> ---
>  drivers/crypto/qat/qat_common/adf_ctl_drv.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/crypto/qat/qat_common/adf_ctl_drv.c 
> b/drivers/crypto/qat/qat_common/adf_ctl_drv.c
> index 71d0c44aa..eb9b3be9d 100644
> --- a/drivers/crypto/qat/qat_common/adf_ctl_drv.c
> +++ b/drivers/crypto/qat/qat_common/adf_ctl_drv.c
> @@ -416,8 +416,6 @@ static long adf_ctl_ioctl(struct file *fp, unsigned int 
> cmd, unsigned long arg)
>  
>  static int __init adf_register_ctl_device_driver(void)
>  {
> - mutex_init(&adf_ctl_lock);
> -
>   if (adf_chr_drv_create())
>   goto err_chr_dev;
>  
> -- 
> 2.23.0
> 


Re: [PATCH -next] crypto: qat - convert to use DEFINE_SEQ_ATTRIBUTE macro

2020-09-23 Thread Giovanni Cabiddu
On Wed, Sep 16, 2020 at 03:50:17AM +0100, Liu Shixin wrote:
> Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.
> 
> Signed-off-by: Liu Shixin 

Acked-by: Giovanni Cabiddu 

> ---
>  drivers/crypto/qat/qat_common/adf_cfg.c   | 19 +
>  .../qat/qat_common/adf_transport_debug.c  | 42 ++-
>  2 files changed, 5 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/crypto/qat/qat_common/adf_cfg.c 
> b/drivers/crypto/qat/qat_common/adf_cfg.c
> index ac462796cefc..22ae32838113 100644
> --- a/drivers/crypto/qat/qat_common/adf_cfg.c
> +++ b/drivers/crypto/qat/qat_common/adf_cfg.c
> @@ -52,24 +52,7 @@ static const struct seq_operations qat_dev_cfg_sops = {
>   .show = qat_dev_cfg_show
>  };
>  
> -static int qat_dev_cfg_open(struct inode *inode, struct file *file)
> -{
> - int ret = seq_open(file, &qat_dev_cfg_sops);
> -
> - if (!ret) {
> - struct seq_file *seq_f = file->private_data;
> -
> - seq_f->private = inode->i_private;
> - }
> - return ret;
> -}
> -
> -static const struct file_operations qat_dev_cfg_fops = {
> - .open = qat_dev_cfg_open,
> - .read = seq_read,
> - .llseek = seq_lseek,
> - .release = seq_release
> -};
> +DEFINE_SEQ_ATTRIBUTE(qat_dev_cfg);
>  
>  /**
>   * adf_cfg_dev_add() - Create an acceleration device configuration table.
> diff --git a/drivers/crypto/qat/qat_common/adf_transport_debug.c 
> b/drivers/crypto/qat/qat_common/adf_transport_debug.c
> index 2a2eccbf56ec..dac25ba47260 100644
> --- a/drivers/crypto/qat/qat_common/adf_transport_debug.c
> +++ b/drivers/crypto/qat/qat_common/adf_transport_debug.c
> @@ -77,31 +77,14 @@ static void adf_ring_stop(struct seq_file *sfile, void *v)
>   mutex_unlock(&ring_read_lock);
>  }
>  
> -static const struct seq_operations adf_ring_sops = {
> +static const struct seq_operations adf_ring_debug_sops = {
>   .start = adf_ring_start,
>   .next = adf_ring_next,
>   .stop = adf_ring_stop,
>   .show = adf_ring_show
>  };
>  
> -static int adf_ring_open(struct inode *inode, struct file *file)
> -{
> - int ret = seq_open(file, &adf_ring_sops);
> -
> - if (!ret) {
> - struct seq_file *seq_f = file->private_data;
> -
> - seq_f->private = inode->i_private;
> - }
> - return ret;
> -}
> -
> -static const struct file_operations adf_ring_debug_fops = {
> - .open = adf_ring_open,
> - .read = seq_read,
> - .llseek = seq_lseek,
> - .release = seq_release
> -};
> +DEFINE_SEQ_ATTRIBUTE(adf_ring_debug);
>  
>  int adf_ring_debugfs_add(struct adf_etr_ring_data *ring, const char *name)
>  {
> @@ -188,31 +171,14 @@ static void adf_bank_stop(struct seq_file *sfile, void 
> *v)
>   mutex_unlock(&bank_read_lock);
>  }
>  
> -static const struct seq_operations adf_bank_sops = {
> +static const struct seq_operations adf_bank_debug_sops = {
>   .start = adf_bank_start,
>   .next = adf_bank_next,
>   .stop = adf_bank_stop,
>   .show = adf_bank_show
>  };
>  
> -static int adf_bank_open(struct inode *inode, struct file *file)
> -{
> - int ret = seq_open(file, &adf_bank_sops);
> -
> - if (!ret) {
> - struct seq_file *seq_f = file->private_data;
> -
> - seq_f->private = inode->i_private;
> - }
> - return ret;
> -}
> -
> -static const struct file_operations adf_bank_debug_fops = {
> - .open = adf_bank_open,
> - .read = seq_read,
> - .llseek = seq_lseek,
> - .release = seq_release
> -};
> +DEFINE_SEQ_ATTRIBUTE(adf_bank_debug);
>  
>  int adf_bank_debugfs_add(struct adf_etr_bank_data *bank)
>  {
> -- 
> 2.25.1
> 


  1   2   3   4   5   6   7   8   9   10   >