date:20210408

Re: [PATCH -next] nilfs2: Fix typos in comments

2021-04-08 Thread lujialin (A)


Thanks for reviewing, I will exclude it in v2 patch and send out later.

在 2021/4/8 23:35, Ryusuke Konishi 写道:

Hi,

This patch partially overlaps the following fix that I previously sent to
Andrew:

   https://lkml.org/lkml/2021/4/8/114

Can you exclude two typo fixes of "retured -> returned" from yours ?

Thanks,
Ryusuke Konishi

On Thu, Apr 8, 2021 at 11:08 PM Lu Jialin  wrote:

numer -> number in fs/nilfs2/cpfile.c and fs/nilfs2/segment.c
retured -> returned and Decription -> Description in fs/nilfs2/ioctl.c
isntance -> instance in fs/nilfs2/the_nilfs.c
No functionality changed.

Signed-off-by: Lu Jialin 
---
  fs/nilfs2/cpfile.c| 2 +-
  fs/nilfs2/ioctl.c | 6 +++---
  fs/nilfs2/segment.c   | 4 ++--
  fs/nilfs2/the_nilfs.c | 2 +-
  4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/nilfs2/cpfile.c b/fs/nilfs2/cpfile.c
index 025fb082575a..ce144776b4ef 100644
--- a/fs/nilfs2/cpfile.c
+++ b/fs/nilfs2/cpfile.c
@@ -293,7 +293,7 @@ void nilfs_cpfile_put_checkpoint(struct inode *cpfile, 
__u64 cno,
   * nilfs_cpfile_delete_checkpoints - delete checkpoints
   * @cpfile: inode of checkpoint file
   * @start: start checkpoint number
- * @end: end checkpoint numer
+ * @end: end checkpoint number
   *
   * Description: nilfs_cpfile_delete_checkpoints() deletes the checkpoints in
   * the period from @start to @end, excluding @end itself. The checkpoints
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index b053b40315bf..cbb59a6c4b81 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -979,7 +979,7 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, 
struct file *filp,
   * and metadata are written out to the device when it successfully
   * returned.
   *
- * Return Value: On success, 0 is retured. On errors, one of the following
+ * Return Value: On success, 0 is returned. On errors, one of the following
   * negative error code is returned.
   *
   * %-EROFS - Read only filesystem.
@@ -1058,7 +1058,7 @@ static int nilfs_ioctl_resize(struct inode *inode, struct 
file *filp,
   * @inode: inode object
   * @argp: pointer on argument from userspace
   *
- * Decription: nilfs_ioctl_trim_fs is the FITRIM ioctl handle function. It
+ * Description: nilfs_ioctl_trim_fs is the FITRIM ioctl handle function. It
   * checks the arguments from userspace and calls nilfs_sufile_trim_fs, which
   * performs the actual trim operation.
   *
@@ -1100,7 +1100,7 @@ static int nilfs_ioctl_trim_fs(struct inode *inode, void 
__user *argp)
   * @inode: inode object
   * @argp: pointer on argument from userspace
   *
- * Decription: nilfs_ioctl_set_alloc_range() function defines lower limit
+ * Description: nilfs_ioctl_set_alloc_range() function defines lower limit
   * of segments in bytes and upper limit of segments in bytes.
   * The NILFS_IOCTL_SET_ALLOC_RANGE is used by nilfs_resize utility.
   *
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index cd4da9535aed..686c8ee7b29c 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2214,7 +2214,7 @@ static void nilfs_segctor_wakeup(struct nilfs_sc_info 
*sci, int err)
   * nilfs_construct_segment - construct a logical segment
   * @sb: super block
   *
- * Return Value: On success, 0 is retured. On errors, one of the following
+ * Return Value: On success, 0 is returned. On errors, one of the following
   * negative error code is returned.
   *
   * %-EROFS - Read only filesystem.
@@ -2251,7 +2251,7 @@ int nilfs_construct_segment(struct super_block *sb)
   * @start: start byte offset
   * @end: end byte offset (inclusive)
   *
- * Return Value: On success, 0 is retured. On errors, one of the following
+ * Return Value: On success, 0 is returned. On errors, one of the following
   * negative error code is returned.
   *
   * %-EROFS - Read only filesystem.
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 221a1cc597f0..8b7b01a380ce 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -195,7 +195,7 @@ static int nilfs_store_log_cursor(struct the_nilfs *nilfs,
  /**
   * load_nilfs - load and recover the nilfs
   * @nilfs: the_nilfs structure to be released
- * @sb: super block isntance used to recover past segment
+ * @sb: super block instance used to recover past segment
   *
   * load_nilfs() searches and load the latest super root,
   * attaches the last segment, and does recovery if needed.
--
2.17.1


.

Re: [PATCH v33 00/12] Landlock LSM

2021-04-08 Thread James Morris

I've added this to my tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
landlock_lsm_v33

and merged that into the next-testing branch which is pulled into Linux 
next.


-- 
James Morris

[PATCH -next] ASoC: sti: sti_uniperif: add missing MODULE_DEVICE_TABLE

2021-04-08 Thread Chen Lifu

This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.

Reported-by: Hulk Robot 
Signed-off-by: Chen Lifu 
---
 sound/soc/sti/sti_uniperif.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/sti/sti_uniperif.c b/sound/soc/sti/sti_uniperif.c
index 67315d9b352d..e3561f00ed40 100644
--- a/sound/soc/sti/sti_uniperif.c
+++ b/sound/soc/sti/sti_uniperif.c
@@ -97,6 +97,7 @@ static const struct of_device_id snd_soc_sti_match[] = {
},
{},
 };
+MODULE_DEVICE_TABLE(of, snd_soc_sti_match);
 
 int  sti_uniperiph_reset(struct uniperif *uni)
 {

Re: [PATCH] tty: n_gsm: check error while registering tty devices

2021-04-08 Thread Hao Sun

> Can you share the info you know about the syzbot report?

Sorry for the late reply, I don't know the REPORT information of
syzbot because I haven't deployed it.
The attached reproduction program was generated by syz-repro.
As you can see from the repro.cprog, the bug occurred in the case of
fault injection.

In repro.cprog, line 108-109:
inject_fault(81);
syscall(__NR_ioctl, r[0], 0x5423, 0x2080ul);


Hillf Danton  于2021年4月7日周三 下午4:21写道：
>
> On Wed, 7 Apr 2021 07:37:53 Jiri Slaby wrote:
> >
> >Yes, the fix makes sense.
>
> Thanks for taking a look.
>
> >But could you elaborate in the commit log when this happens?
> >I only wonder how real this is. I assume you inject faults to allocations?
>
> After looking at Hao's report [1] again, I think you are right as it was
> reported by syzbot too. Can you share the info you know about the syzbot
> report, Hao, something like the line below with the Reported-by prefix?
>
> (This is just an example Reported-by: 
> syzbot+b804f902bbb6bcf29...@syzkaller.appspotmail.com)
>
>
> [1] 
> https://lore.kernel.org/lkml/cackbjsyehouqud2qjobumbyftaxyyogqxgm8gxyzhqsnv8d...@mail.gmail.com/

[PATCH -next] mmc: owl-mmc: Remove unnecessary error log

2021-04-08 Thread Laibin Qiu

devm_ioremap_resource() has recorded error log, so it's
unnecessary to record log again.

Reported-by: Hulk Robot 
Signed-off-by: Laibin Qiu 
---
 drivers/mmc/host/owl-mmc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/mmc/host/owl-mmc.c b/drivers/mmc/host/owl-mmc.c
index 5490962dc8e5..3dc143b03939 100644
--- a/drivers/mmc/host/owl-mmc.c
+++ b/drivers/mmc/host/owl-mmc.c
@@ -581,7 +581,6 @@ static int owl_mmc_probe(struct platform_device *pdev)
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
owl_host->base = devm_ioremap_resource(>dev, res);
if (IS_ERR(owl_host->base)) {
-   dev_err(>dev, "Failed to remap registers\n");
ret = PTR_ERR(owl_host->base);
goto err_free_host;
}
-- 
2.25.1

Re: [PATCH v5 0/6] Add SiFive FU740 PCIe host controller driver support

2021-04-08 Thread Greentime Hu

Lorenzo Pieralisi  於 2021年4月9日 週五 上午12:25寫道：
>
> On Tue, Apr 06, 2021 at 05:26:28PM +0800, Greentime Hu wrote:
> > This patchset includes SiFive FU740 PCIe host controller driver. We also
> > add pcie_aux clock and pcie_power_on_reset controller to prci driver for
> > PCIe driver to use it.
> >
> > This is tested with e1000e: Intel(R) PRO/1000 Network Card, AMD Radeon R5
> > 230 graphics card and SP M.2 PCIe Gen 3 SSD in SiFive Unmatched based on
> > v5.11 Linux kernel.
> >
> > Changes in v5:
> >  - Fix typo in comments
> >  - Keep comments style consistent
> >  - Refine some error handling codes
> >  - Remove unneeded header file including
> >  - Merge fu740_pcie_ltssm_enable implementation to fu740_pcie_start_link
> >
> > Changes in v4:
> >  - Fix Wunused-but-set-variable warning in prci driver
> >
> > Changes in v3:
> >  - Remove items that has been defined
> >  - Refine format of sifive,fu740-pcie.yaml
> >  - Replace perstn-gpios with the common one
> >  - Change DBI mapping space to 2GB from 4GB
> >  - Refine drivers/reset/Kconfig
> >
> > Changes in v2:
> >  - Refine codes based on reviewers' feedback
> >  - Remove define and use the common one
> >  - Replace __raw_writel with writel_relaxed
> >  - Split fu740_phyregreadwrite to write function
> >  - Use readl_poll_timeout in stead of while loop checking
> >  - Use dwc common codes
> >  - Use gpio descriptors and the gpiod_* api.
> >  - Replace devm_ioremap_resource with devm_platform_ioremap_resource_byname
> >  - Replace devm_reset_control_get with devm_reset_control_get_exclusive
> >  - Add more comments for delay and sleep
> >  - Remove "phy ? x : y" expressions
> >  - Refine code logic to remove possible infinite loop
> >  - Replace magic number with meaningful define
> >  - Remove fu740_pcie_pm_ops
> >  - Use builtin_platform_driver
> >
> > Greentime Hu (5):
> >   clk: sifive: Add pcie_aux clock in prci driver for PCIe driver
> >   clk: sifive: Use reset-simple in prci driver for PCIe driver
> >   MAINTAINERS: Add maintainers for SiFive FU740 PCIe driver
> >   dt-bindings: PCI: Add SiFive FU740 PCIe host controller
> >   riscv: dts: Add PCIe support for the SiFive FU740-C000 SoC
> >
> > Paul Walmsley (1):
> >   PCI: fu740: Add SiFive FU740 PCIe host controller driver
>
> I can pull the patches above into the PCI tree (but will drop patch 6 -
> dts changes), is it OK for you ? Please let me know how you would like
> to upstream it.
>

Hi Lorenzo,

Thank you.
I am ok with it. So I should ask Palmer to pick patch 6 dts changes to
RISC-V tree?

Re: [PATCH v4 1/3] dt-bindings: Add Hycon Technology vendor prefix

2021-04-08 Thread Rob Herring

On Wed, 07 Apr 2021 19:49:07 +0200, Giulio Benetti wrote:
> Update Documentation/devicetree/bindings/vendor-prefixes.yaml to
> include "hycon" as a vendor prefix for "Hycon Technology".
> Company website: https://www.hycontek.com/
> 
> Signed-off-by: Giulio Benetti 
> Reviewed-by: Jonathan Neuschäfer 
> ---
>  Documentation/devicetree/bindings/vendor-prefixes.yaml | 2 ++
>  1 file changed, 2 insertions(+)
> 

Acked-by: Rob Herring

Re: [PATCH v3 1/1] of: unittest: overlay: ensure proper alignment of copied FDT

2021-04-08 Thread Guenter Roeck

On 4/8/21 1:06 PM, Frank Rowand wrote:

>>> +#define FDT_ALIGN_SIZE 8
>>> +
>>
>> Use existing define ? Or was that local in libfdt ?
> 
> I don't see a define in libfdt.  If anyone finds one,
> I'll switch to it.
> 

Turns out that was hardcoded in scripts/dtc/libfdt/fdt.c

+   /* The device tree must be at an 8-byte aligned address */
+   if ((uintptr_t)fdt & 7)
+   return -FDT_ERR_ALIGNMENT;
+

Guenter

Re: [PATCH v4 1/2] dt-bindings: iio: temperature: Add DT bindings for TMP117

2021-04-08 Thread Rob Herring

On Wed, 07 Apr 2021 23:51:46 +0530, Puranjay Mohan wrote:
> Add devicetree binding document for TMP117, a digital temperature sensor.
> 
> Signed-off-by: Puranjay Mohan 
> ---
>  .../bindings/iio/temperature/ti,tmp117.yaml   | 41 +++
>  1 file changed, 41 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/iio/temperature/ti,tmp117.yaml
> 

Reviewed-by: Rob Herring

Re: [PATCH v4 2/3] dt-bindings: touchscreen: Add HY46XX bindings

2021-04-08 Thread Rob Herring

On Wed, Apr 07, 2021 at 07:49:08PM +0200, Giulio Benetti wrote:
> This adds device tree bindings for the Hycon HY46XX touchscreen series.
> 
> Signed-off-by: Giulio Benetti 
> ---
> V1->V2:
> As suggested by Rob Herring:
> * fixed $id: address
> * added "hycon," in front of every custom property
> * changed all possible property to boolean type
> * removed proximity-sensor-switch property since it's not handled in driver
> V2->V3:
> As suggested by Jonathan Neuschäfer:
> * fixed some typo
> * fixed description indentation
> * improved boolean properties descriptions
> * improved hycon,report-speed description
> V3->V4:
> * fixed binding compatible string in example as suggested by Jonathan 
> Neuschäfer
> ---
>  .../input/touchscreen/hycon,hy46xx.yaml   | 120 ++
>  MAINTAINERS   |   6 +
>  2 files changed, 126 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/input/touchscreen/hycon,hy46xx.yaml
> 
> diff --git 
> a/Documentation/devicetree/bindings/input/touchscreen/hycon,hy46xx.yaml 
> b/Documentation/devicetree/bindings/input/touchscreen/hycon,hy46xx.yaml
> new file mode 100644
> index ..8860613a12ad
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/input/touchscreen/hycon,hy46xx.yaml
> @@ -0,0 +1,120 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/input/touchscreen/hycon,hy46xx.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Hycon HY46XX series touchscreen controller bindings
> +
> +description: |
> +  There are 6 variants of the chip for various touch panel sizes and cover 
> lens material
> +   Glass: 0.3mm--4.0mm
> +PET/PMMA: 0.2mm--2.0mm
> +HY4613(B)-N048  < 6"
> +HY4614(B)-N068  7" .. 10.1"
> +HY4621-NS32  < 5"
> +HY4623-NS48  5.1" .. 7"
> +   Glass: 0.3mm--8.0mm
> +PET/PMMA: 0.2mm--4.0mm
> +HY4633(B)-N048  < 6"
> +HY4635(B)-N048  < 7" .. 10.1"
> +
> +maintainers:
> +  - Giulio Benetti 
> +
> +allOf:
> +  - $ref: touchscreen.yaml#
> +
> +properties:
> +  compatible:
> +enum:
> +  - hycon,hycon-hy4613
> +  - hycon,hycon-hy4614
> +  - hycon,hycon-hy4621
> +  - hycon,hycon-hy4623
> +  - hycon,hycon-hy4633
> +  - hycon,hycon-hy4635

As suggested earlier, drop the 2nd 'hycon'.

> +
> +  reg:
> +maxItems: 1
> +
> +  interrupts:
> +maxItems: 1
> +
> +  reset-gpios:
> +maxItems: 1
> +
> +  vcc-supply: true
> +
> +  hycon,threshold:
> +description: Allows setting the sensitivity in the range from 0 to 255.
> +$ref: /schemas/types.yaml#/definitions/uint32
> +minimum: 0
> +maximum: 255
> +
> +  hycon,glove-enable:
> +type: boolean
> +description: Allows enabling glove setting.
> +
> +  hycon,report-speed:
> +description: Allows setting the report speed in Hertz.

If in Hertz, use standard unit suffix.

> +$ref: /schemas/types.yaml#/definitions/uint32

And then you can drop this.

> +minimum: 0

0Hz doesn't seem to useful?

> +maximum: 255
> +
> +  hycon,power-noise-enable:

hycon,noise-filter-enable

No one wants to enable power noise. :)

> +type: boolean
> +description: Allows enabling power noise filter.
> +
> +  hycon,filter-data:
> +description: Allows setting the filtering data before reporting touch
> + in the range from 0 to 5.

This is averaging samples? Sounds like something common perhaps.

> +$ref: /schemas/types.yaml#/definitions/uint32
> +minimum: 0
> +maximum: 5
> +
> +  hycon,gain:
> +description: Allows setting the sensitivity distance in the range from 0 
> to 5.
> +$ref: /schemas/types.yaml#/definitions/uint32
> +minimum: 0
> +maximum: 5
> +
> +  hycon,edge-offset:
> +description: Allows setting the edge compensation in the range from 0 to 
> 16.
> +$ref: /schemas/types.yaml#/definitions/uint32
> +minimum: 0
> +maximum: 16
> +
> +  touchscreen-size-x: true
> +  touchscreen-size-y: true
> +  touchscreen-fuzz-x: true
> +  touchscreen-fuzz-y: true
> +  touchscreen-inverted-x: true
> +  touchscreen-inverted-y: true
> +  touchscreen-swapped-x-y: true
> +  interrupt-controller: true
> +
> +additionalProperties: false
> +
> +required:
> +  - compatible
> +  - reg
> +  - interrupts
> +
> +examples:
> +  - |
> +#include 
> +#include 
> +i2c {
> +  #address-cells = <1>;
> +  #size-cells = <0>;
> +  hycon-hy4633@1c {

touchscreen@1c

> +compatible = "hycon,hycon-hy4633";
> +reg = <0x1c>;
> +interrupt-parent = <>;
> +interrupts = <5 IRQ_TYPE_EDGE_FALLING>;
> +reset-gpios = < 6 GPIO_ACTIVE_LOW>;
> +  };
> +};
> +
> +...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c80ad735b384..d022ff09e609 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8243,6 +8243,12 @@ S: Maintained
>  F:   mm/hwpoison-inject.c
>  F:   mm/memory-failure.c
>  
> +HYCON

Re: [PATCH] MIPS: uaccess: Reduce number of nested macros

2021-04-08 Thread Thomas Bogendoerfer

On Thu, Apr 08, 2021 at 09:46:11PM +0200, Christoph Hellwig wrote:
> > +#define put_user(x, ptr)   \
> > +({ \
> > +   __typeof__(*(ptr)) __user *__p = (ptr); \
> > +   \
> > +   might_fault();  \
> > +   access_ok(__p, sizeof(*__p)) ?  \
> > +   __put_user((x), __p) :  \
> > +   -EFAULT;\
> 
> Why not merge this into a single line, which seems a little more
> readable:
> 
>   access_ok(__p, sizeof(*__p)) ? __put_user((x), __p) : -EFAULT;  \

I just copied the riscv version ;-) I'll make it one line before applying.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]

Re: [PATCH v1] ata: ahci_tegra: call tegra_powergate_power_off only when PM domain is not present

2021-04-08 Thread Sowjanya Komatineni




On 4/8/21 12:58 PM, Dmitry Osipenko wrote:

08.04.2021 19:40, Sowjanya Komatineni пишет:

This patch adds a check on present of PM domain and calls legacy power
domain API tegra_powergate_power_off() only when PM domain is not present.

This is a follow-up patch to Tegra186 AHCI support patch series
https://lore.kernel.org/patchwork/cover/1408752/

Signed-off-by: Sowjanya Komatineni 

---
  drivers/ata/ahci_tegra.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/ahci_tegra.c b/drivers/ata/ahci_tegra.c
index 56612af..bd484dd 100644
--- a/drivers/ata/ahci_tegra.c
+++ b/drivers/ata/ahci_tegra.c
@@ -287,7 +287,8 @@ static void tegra_ahci_power_off(struct ahci_host_priv 
*hpriv)
reset_control_assert(tegra->sata_cold_rst);
  
  	clk_disable_unprepare(tegra->sata_clk);

-   tegra_powergate_power_off(TEGRA_POWERGATE_SATA);
+   if (!tegra->pdev->dev.pm_domain)
+   tegra_powergate_power_off(TEGRA_POWERGATE_SATA);
  
  	regulator_bulk_disable(tegra->soc->num_supplies, tegra->supplies);

  }


There are two instances of tegra_powergate_power_off() in the driver.

Thanks Dmitry. Sorry missed it. Will fix

Re: [RFC bpf-next 1/1] bpf: Introduce iter_pagecache

2021-04-08 Thread Al Viro

On Thu, Apr 08, 2021 at 01:49:35PM -0700, Daniel Xu wrote:

> Ah right, sorry. Nobody will clean up the super_block.
> 
> > IOW, NAK.  The objects you are playing with have non-trivial lifecycle
> > and poking into the guts of data structures without bothering to
> > understand it is not a good idea.
> > 
> > Rule of the thumb: if your code ends up using fields that are otherwise
> > handled by a small part of codebase, the odds are that you need to be
> > bloody careful.  In particular, ->ns_lock has 3 users - all in
> > fs/namespace.c.  ->list/->mnt_list: all users in fs/namespace.c and
> > fs/pnode.c.  ->s_active: majority in fs/super.c, with several outliers
> > in filesystems and safety of those is not trivial.
> > 
> > Any time you see that kind of pattern, you are risking to reprise
> > a scene from The Modern Times - the one with Charlie taking a trip
> > through the guts of machinery.
> 
> I'll take a closer look at the lifetime semantics.
> 
> Hopefully the overall goal of the patch is ok. Happy to iterate on the
> implementation details until it's correct.

That depends.  Note that bumping ->s_active means that umount of that
sucker will *NOT* shut it down - that would happen only on the thread
doing the final deactivation.  What's more, having e.g. a USB stick
mounted, doing umount(1), having it complete successfully, pulling the
damn thing out and getting writes lost would make for a nasty surprise
for users.

With your approach it seems to be inevitable.  Holding namespace_sem
through the entire thing would prevent that, but's it's a non-starter
for other reasons (starting with "it's a system-wide lock, so that'd
be highly antisocial").  Are there any limits on what could be done
to the pages, anyway?  Because if it's "anything user wanted to do",
it's *really* not feasible.

Re: [PATCH v2] firmware: qcom_scm: Only compile legacy calls on ARM

2021-04-08 Thread Stephen Boyd

Quoting Stephan Gerhold (2021-04-08 00:19:44)
> Personally, I think it would be best to introduce a new, SMC64 only
> compatible (e.g. "qcom,scm-64" like I mentioned). Then you can skip the
> detection check for the boards that opt-in by adding the compatible.
> You can then use it on all newer boards/SoCs/firmwares where you know
> exactly that there is SMC64.
> 
> I would just like to avoid breaking any existing boards where we don't
> know exactly if they have SMC32 or SMC64.

Ok that's fair.

> > 
> > Heh, it tried to ensure we use the right calling convention but broke
> > things in the process, because the way of detecting the convention isn't
> > always there. I wouldn't be surprised if this comes up again for other
> > boards that use TF-A.
> 
> Ah okay, this sounds like a better reason than just trying to avoid the
> "overhead" of the detection step. :) I still think it should work if you
> just start marking all newer boards/SoCs/... as "qcom,scm-64" or
> something like that, right?

Sure. I can cook up a set of patches for this.

[PATCH] MIPS: octeon: Add __raw_copy_[from|to|in]_user symbols

2021-04-08 Thread Thomas Bogendoerfer

Cavium Octeon has it's own memcpy implementation and also need the change
done in commit 04324f44cb69 ("MIPS: Remove get_fs/set_fs").

Fixes: 04324f44cb69 ("MIPS: Remove get_fs/set_fs")
Reported-by: kernel test robot 
Signed-off-by: Thomas Bogendoerfer 
---
 arch/mips/cavium-octeon/octeon-memcpy.S | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/mips/cavium-octeon/octeon-memcpy.S 
b/arch/mips/cavium-octeon/octeon-memcpy.S
index 0a7c9834b81c..600d018cf354 100644
--- a/arch/mips/cavium-octeon/octeon-memcpy.S
+++ b/arch/mips/cavium-octeon/octeon-memcpy.S
@@ -150,8 +150,12 @@ LEAF(memcpy)   /* 
a0=dst a1=src a2=len */
 EXPORT_SYMBOL(memcpy)
movev0, dst /* return value */
 __memcpy:
-FEXPORT(__copy_user)
-EXPORT_SYMBOL(__copy_user)
+FEXPORT(__raw_copy_from_user)
+EXPORT_SYMBOL(__raw_copy_from_user)
+FEXPORT(__raw_copy_to_user)
+EXPORT_SYMBOL(__raw_copy_to_user)
+FEXPORT(__raw_copy_in_user)
+EXPORT_SYMBOL(__raw_copy_in_user)
/*
 * Note: dst & src may be unaligned, len may be 0
 * Temps
-- 
2.29.2

Re: [PATCH] KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command

2021-04-08 Thread Nathan Tempelman

On Thu, Apr 1, 2021 at 6:45 PM Steve Rutherford  wrote:
>
> After completion of SEND_START, but before SEND_FINISH, the source VMM can
> issue the SEND_CANCEL command to stop a migration. This is necessary so
> that a cancelled migration can restart with a new target later.
>
> Signed-off-by: Steve Rutherford 
> ---
>  .../virt/kvm/amd-memory-encryption.rst|  9 +++
>  arch/x86/kvm/svm/sev.c| 24 +++
>  include/linux/psp-sev.h   | 10 
>  include/uapi/linux/kvm.h  |  2 ++
>  4 files changed, 45 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst 
> b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 469a6308765b1..9e018a3eec03b 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -284,6 +284,15 @@ Returns: 0 on success, -negative on error
>  __u32 len;
>  };
>
> +16. KVM_SEV_SEND_CANCEL
> +
> +
> +After completion of SEND_START, but before SEND_FINISH, the source VMM can 
> issue the
> +SEND_CANCEL command to stop a migration. This is necessary so that a 
> cancelled
> +migration can restart with a new target later.
> +
> +Returns: 0 on success, -negative on error
> +
>  References
>  ==
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 83e00e5245136..88e72102cb900 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1110,6 +1110,27 @@ static int sev_get_attestation_report(struct kvm *kvm, 
> struct kvm_sev_cmd *argp)
> return ret;
>  }
>
> +static int sev_send_cancel(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
> +   struct sev_data_send_cancel *data;
> +   int ret;
> +
> +   if (!sev_guest(kvm))
> +   return -ENOTTY;
> +
> +   data = kzalloc(sizeof(*data), GFP_KERNEL);
> +   if (!data)
> +   return -ENOMEM;
> +
> +   data->handle = sev->handle;
> +   ret = sev_issue_cmd(kvm, SEV_CMD_SEND_CANCEL, data, >error);
> +
> +   kfree(data);
> +   return ret;
> +}
> +
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
> struct kvm_sev_cmd sev_cmd;
> @@ -1163,6 +1184,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_GET_ATTESTATION_REPORT:
> r = sev_get_attestation_report(kvm, _cmd);
> break;
> +   case KVM_SEV_SEND_CANCEL:
> +   r = sev_send_cancel(kvm, _cmd);
> +   break;
> default:
> r = -EINVAL;
> goto out;
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index b801ead1e2bb5..74f2babffc574 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -73,6 +73,7 @@ enum sev_cmd {
> SEV_CMD_SEND_UPDATE_DATA= 0x041,
> SEV_CMD_SEND_UPDATE_VMSA= 0x042,
> SEV_CMD_SEND_FINISH = 0x043,
> +   SEV_CMD_SEND_CANCEL = 0x044,
>
> /* Guest migration commands (incoming) */
> SEV_CMD_RECEIVE_START   = 0x050,
> @@ -392,6 +393,15 @@ struct sev_data_send_finish {
> u32 handle; /* In */
>  } __packed;
>
> +/**
> + * struct sev_data_send_cancel - SEND_CANCEL command parameters
> + *
> + * @handle: handle of the VM to process
> + */
> +struct sev_data_send_cancel {
> +   u32 handle; /* In */
> +} __packed;
> +
>  /**
>   * struct sev_data_receive_start - RECEIVE_START command parameters
>   *
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index f6afee209620d..707469b6b7072 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1671,6 +1671,8 @@ enum sev_cmd_id {
> KVM_SEV_CERT_EXPORT,
> /* Attestation report */
> KVM_SEV_GET_ATTESTATION_REPORT,
> +   /* Guest Migration Extension */
> +   KVM_SEV_SEND_CANCEL,
>
> KVM_SEV_NR_MAX,
>  };
> --
> 2.31.0.208.g409f899ff0-goog
>

Reviewed-by: Nathan Tempelman

Re: [PATCH 09/12] i2c: icy: Constify the software node

2021-04-08 Thread Max Staudt

On Thu, 8 Apr 2021 23:22:51 +0200
Wolfram Sang  wrote:

> I read this as "Reviewed-by" ;)

Sure, why not :)


Reviewed-by: Max Staudt

Re: [PATCH v6 1/5] dt-bindings: usb: Add binding for Realtek RTS5411 hub controller

2021-04-08 Thread Matthias Kaehlcke

On Tue, Apr 06, 2021 at 11:30:01AM -0500, Rob Herring wrote:
> On Mon, Apr 05, 2021 at 01:18:13PM -0700, Matthias Kaehlcke wrote:
> > The Realtek RTS5411 is a USB 3.0 hub controller with 4 ports.
> > 
> > This initial version of the binding only describes USB related
> > aspects of the RTS5411, it does not cover the option of
> > connecting the controller as an i2c slave.
> > 
> > Signed-off-by: Matthias Kaehlcke 
> > ---
> > 
> > Changes in v7:
> > - added type ref for 'companion-hub' property
> > 
> > Changes in v6:
> > - Realtek binding instead of generic onboard_usb_hub
> > - added 'companion-hub' property
> > - added reference to 'usb-device.yaml'
> > - 'fixed' indentation of compatible entries to keep yamllint happy
> > - added 'additionalProperties' entry
> > - updated commit message
> > 
> > Changes in v5:
> > - updated 'title'
> > - only use standard USB compatible strings
> > - deleted 'usb_hub' node
> > - renamed 'usb_controller' node to 'usb-controller'
> > - removed labels from USB nodes
> > - added 'vdd-supply' to USB nodes
> > 
> > Changes in v4:
> > - none
> > 
> > Changes in v3:
> > - updated commit message
> > - removed recursive reference to $self
> > - adjusted 'compatible' definition to support multiple entries
> > - changed USB controller phandle to be a node
> > 
> > Changes in v2:
> > - removed 'wakeup-source' and 'power-off-in-suspend' properties
> > - consistently use spaces for indentation in example
> > 
> >  .../bindings/usb/realtek,rts5411.yaml | 59 +++
> >  1 file changed, 59 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/usb/realtek,rts5411.yaml
> > 
> > diff --git a/Documentation/devicetree/bindings/usb/realtek,rts5411.yaml 
> > b/Documentation/devicetree/bindings/usb/realtek,rts5411.yaml
> > new file mode 100644
> > index ..b59001972749
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/usb/realtek,rts5411.yaml
> > @@ -0,0 +1,59 @@
> > +# SPDX-License-Identifier: GPL-2.0-only or BSD-2-Clause
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/usb/realtek,rts5411.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Binding for the Realtek RTS5411 USB 3.0 hub controller
> > +
> > +maintainers:
> > +  - Matthias Kaehlcke 
> > +
> > +allOf:
> > +  - $ref: usb-device.yaml#
> > +
> > +properties:
> > +  compatible:
> > +items:
> > +  - enum:
> > +  - usbbda,5411
> > +  - usbbda,411
> > +
> 
> reg: true
> 
> to fix the error.

Will fix in v8 (this is v7, even though the subject says otherwise,
I forgot to increment the version number when sending).

> > +  vdd-supply:
> > +description:
> > +  phandle to the regulator that provides power to the hub.
> > +
> > +  companion-hub:
> > +$ref: '/schemas/types.yaml#/definitions/phandle'
> > +description:
> > +  phandle to the companion hub on the controller.
> 
> This should be required I think. I suppose you could only hook up 2.0
> ports, but why. And 3.0 only wouldn't be USB compliant, would it?

Agreed, that makes sense now that this is a specific binding for the
RTS5411. It seems unlikely that a system would use a USB 3.0 capable
hub on a USB 2.0 controller, and as you said 3.0 only wouldn't be USB
compliant.

I made the attribute initially optional because the binding was
intended to be generic (bad idea), and for certain hubs a required
'companion-hub' wouldn't make sense (e.g. USB 2.0 only).

> > +
> > +required:
> > +  - compatible
> > +  - reg
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > +  - |
> > +usb-controller {
> 
> 'usb' is the standard name.

ack

Thanks for your comments!

m.

Re: [RFC PATCH v2 1/4] arm64: Implement infrastructure for stack trace reliability checks

2021-04-08 Thread Madhavan T. Venkataraman




On 4/8/21 2:30 PM, Madhavan T. Venkataraman wrote:
> 
> 
> On 4/8/21 12:17 PM, Mark Brown wrote:
>> On Mon, Apr 05, 2021 at 03:43:10PM -0500, madve...@linux.microsoft.com wrote:
>>
>>> These checks will involve checking the return PC to see if it falls inside
>>> any special functions where the stack trace is considered unreliable.
>>> Implement the infrastructure needed for this.
>>
>> Following up again based on an off-list discussion with Mark Rutland:
>> while I think this is a reasonable implementation for specifically
>> listing functions that cause problems we could make life easier for
>> ourselves by instead using annotations at the call sites to put things
>> into sections which indicate that they're unsafe for unwinding, we can
>> then check for any address in one of those sections (or possibly do the
>> reverse and check for any address in a section we specifically know is
>> safe) rather than having to enumerate problematic functions in the
>> unwinder.  This also has the advantage of not having a list that's
>> separate to the functions themselves so it's less likely that the
>> unwinder will get out of sync with the rest of the code as things evolve.
>>
>> We already have SYM_CODE_START() annotations in the code for assembly
>> functions that aren't using the standard calling convention which should
>> help a lot here, we could add a variant of that for things that we know
>> are safe on stacks (like those we expect to find at the bottom of
>> stacks).
>>
> 
> As I already mentioned before, I like the idea of sections. The only reason 
> that I did
> not try it was that I have to address FTRACE trampolines and the 
> kretprobe_trampoline
> (and optprobes in the future).
> 
> I have the following options:
> 
> 1. Create a common section (I will have to come up with an appropriate name) 
> and put
>all such functions in that one section.
> 
> 2. Create one section for each logical type (exception section, ftrace 
> section and
>kprobe section) or some such.
> 

For now, I will start with idea 2. I will create a special section for each 
class of
functions (EL1 exception handlers, FTRACE trampolines, KPROBE trampolines). 
Instead of a
special functions array, I will implement a special_sections array. The rest of 
the code
should just fall into place.

Let me know if you prefer something different.

Thanks.

Madhavan

> 3. Use the section idea only for the el1 exceptions. For the others use the 
> current
>special_functions[] approach.
> 
> Which one do you and Mark Rutland prefer? Or, is there another choice?
> 
> Madhavan
>

[PATCH 9/9] userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_ptes

2021-04-08 Thread Axel Rasmussen

In a previous commit, we added the mcopy_atomic_install_ptes() helper.
This helper does the job of setting up PTEs for an existing page, to map
it into a given VMA. It deals with both the anon and shmem cases, as
well as the shared and private cases.

In other words, shmem_mcopy_atomic_pte() duplicates a case it already
handles. So, expose it, and let shmem_mcopy_atomic_pte() use it
directly, to reduce code duplication.

This requires that we refactor shmem_mcopy_atomic-pte() a bit:

Instead of doing accounting (shmem_recalc_inode() et al) part-way
through the PTE setup, do it beforehand. This frees up
mcopy_atomic_install_ptes() from having to care about this accounting,
but it does mean we need to clean it up if we get a failure afterwards
(shmem_uncharge()).

We can *almost* use shmem_charge() to do this, reducing code
duplication. But, it does `inode->i_mapping->nrpages++`, which would
double-count since shmem_add_to_page_cache() also does this.

Signed-off-by: Axel Rasmussen 
---
 include/linux/userfaultfd_k.h |  5 
 mm/shmem.c| 52 +++
 mm/userfaultfd.c  | 25 -
 3 files changed, 27 insertions(+), 55 deletions(-)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 794d1538b8ba..3e20bfa9ef80 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -53,6 +53,11 @@ enum mcopy_atomic_mode {
MCOPY_ATOMIC_CONTINUE,
 };
 
+extern int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+struct vm_area_struct *dst_vma,
+unsigned long dst_addr, struct page *page,
+bool newly_allocated, bool wp_copy);
+
 extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
unsigned long src_start, unsigned long len,
bool *mmap_changing, __u64 mode);
diff --git a/mm/shmem.c b/mm/shmem.c
index 99c54b165c16..5d4b82e9bcb2 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2380,10 +2380,8 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
struct address_space *mapping = inode->i_mapping;
gfp_t gfp = mapping_gfp_mask(mapping);
pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
-   spinlock_t *ptl;
void *page_kaddr;
struct page *page;
-   pte_t _dst_pte, *dst_pte;
int ret;
pgoff_t max_off;
 
@@ -2393,8 +2391,10 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
 
if (!*pagep) {
page = shmem_alloc_page(gfp, info, pgoff);
-   if (!page)
-   goto out_unacct_blocks;
+   if (!page) {
+   shmem_inode_unacct_blocks(inode, 1);
+   goto out;
+   }
 
if (!zeropage) {/* COPY */
page_kaddr = kmap_atomic(page);
@@ -2434,59 +2434,27 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
if (ret)
goto out_release;
 
-   _dst_pte = mk_pte(page, dst_vma->vm_page_prot);
-   if (dst_vma->vm_flags & VM_WRITE)
-   _dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
-   else {
-   /*
-* We don't set the pte dirty if the vma has no
-* VM_WRITE permission, so mark the page dirty or it
-* could be freed from under us. We could do it
-* unconditionally before unlock_page(), but doing it
-* only if VM_WRITE is not set is faster.
-*/
-   set_page_dirty(page);
-   }
-
-   dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, );
-
-   ret = -EFAULT;
-   max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-   if (unlikely(pgoff >= max_off))
-   goto out_release_unlock;
-
-   ret = -EEXIST;
-   if (!pte_none(*dst_pte))
-   goto out_release_unlock;
-
-   lru_cache_add(page);
-
spin_lock_irq(>lock);
info->alloced++;
inode->i_blocks += BLOCKS_PER_PAGE;
shmem_recalc_inode(inode);
spin_unlock_irq(>lock);
 
-   inc_mm_counter(dst_mm, mm_counter_file(page));
-   page_add_file_rmap(page, false);
-   set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+   ret = mcopy_atomic_install_ptes(dst_mm, dst_pmd, dst_vma, dst_addr,
+   page, true, false);
+   if (ret)
+   goto out_release_uncharge;
 
-   /* No need to invalidate - it was non-present before */
-   update_mmu_cache(dst_vma, dst_addr, dst_pte);
-   pte_unmap_unlock(dst_pte, ptl);
unlock_page(page);
ret = 0;
 out:
return ret;
-out_release_unlock:
-   pte_unmap_unlock(dst_pte, ptl);
-   ClearPageDirty(page);
+out_release_uncharge:
delete_from_page_cache(page);
+

[PATCH 3/9] userfaultfd/shmem: support minor fault registration for shmem

2021-04-08 Thread Axel Rasmussen

This patch allows shmem-backed VMAs to be registered for minor faults.
Minor faults are appropriately relayed to userspace in the fault path,
for VMAs with the relevant flag.

This commit doesn't hook up the UFFDIO_CONTINUE ioctl for shmem-backed
minor faults, though, so userspace doesn't yet have a way to resolve
such faults.

Signed-off-by: Axel Rasmussen 
---
 fs/userfaultfd.c |  6 +++---
 include/uapi/linux/userfaultfd.h |  7 ++-
 mm/memory.c  |  8 +---
 mm/shmem.c   | 20 
 4 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 14f92285d04f..9f3b8684cf3c 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1267,8 +1267,7 @@ static inline bool vma_can_userfault(struct 
vm_area_struct *vma,
}
 
if (vm_flags & VM_UFFD_MINOR) {
-   /* FIXME: Add minor fault interception for shmem. */
-   if (!is_vm_hugetlb_page(vma))
+   if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma)))
return false;
}
 
@@ -1941,7 +1940,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
/* report all available features and ioctls to userland */
uffdio_api.features = UFFD_API_FEATURES;
 #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
-   uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS;
+   uffdio_api.features &=
+   ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM);
 #endif
uffdio_api.ioctls = UFFD_API_IOCTLS;
ret = -EFAULT;
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index bafbeb1a2624..159a74e9564f 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -31,7 +31,8 @@
   UFFD_FEATURE_MISSING_SHMEM | \
   UFFD_FEATURE_SIGBUS |\
   UFFD_FEATURE_THREAD_ID | \
-  UFFD_FEATURE_MINOR_HUGETLBFS)
+  UFFD_FEATURE_MINOR_HUGETLBFS |   \
+  UFFD_FEATURE_MINOR_SHMEM)
 #define UFFD_API_IOCTLS\
((__u64)1 << _UFFDIO_REGISTER | \
 (__u64)1 << _UFFDIO_UNREGISTER |   \
@@ -185,6 +186,9 @@ struct uffdio_api {
 * UFFD_FEATURE_MINOR_HUGETLBFS indicates that minor faults
 * can be intercepted (via REGISTER_MODE_MINOR) for
 * hugetlbfs-backed pages.
+*
+* UFFD_FEATURE_MINOR_SHMEM indicates the same support as
+* UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead.
 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0)
 #define UFFD_FEATURE_EVENT_FORK(1<<1)
@@ -196,6 +200,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_SIGBUS(1<<7)
 #define UFFD_FEATURE_THREAD_ID (1<<8)
 #define UFFD_FEATURE_MINOR_HUGETLBFS   (1<<9)
+#define UFFD_FEATURE_MINOR_SHMEM   (1<<10)
__u64 features;
 
__u64 ioctls;
diff --git a/mm/memory.c b/mm/memory.c
index c8e357627318..a1e5ff55027e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3929,9 +3929,11 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
 * something).
 */
if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) {
-   ret = do_fault_around(vmf);
-   if (ret)
-   return ret;
+   if (likely(!userfaultfd_minor(vmf->vma))) {
+   ret = do_fault_around(vmf);
+   if (ret)
+   return ret;
+   }
}
 
ret = __do_fault(vmf);
diff --git a/mm/shmem.c b/mm/shmem.c
index c21f20cc4204..99c54b165c16 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1785,7 +1785,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t 
index,
  * vm. If we swap it in we mark it dirty since we also free the swap
  * entry since a page cannot live in both the swap and page cache.
  *
- * vmf and fault_type are only supplied by shmem_fault:
+ * vma, vmf, and fault_type are only supplied by shmem_fault:
  * otherwise they are NULL.
  */
 static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
@@ -1802,6 +1802,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t 
index,
pgoff_t hindex = index;
gfp_t huge_gfp;
int error;
+   bool swapped;
int once = 0;
int alloced = 0;
 
@@ -1820,16 +1821,27 @@ static int shmem_getpage_gfp(struct inode *inode, 
pgoff_t index,
 
page = pagecache_get_page(mapping, index,
FGP_ENTRY | FGP_HEAD | FGP_LOCK, 0);
-   if (xa_is_value(page)) {
+   swapped = xa_is_value(page);
+   if (swapped) {
error = shmem_swapin_page(inode, index, ,

[PATCH 5/9] userfaultfd/selftests: use memfd_create for shmem test type

2021-04-08 Thread Axel Rasmussen

This is a preparatory commit. In the future, we want to be able to setup
alias mappings for area_src and area_dst in the shmem test, like we do
in the hugetlb_shared test. With a VMA obtained via
mmap(MAP_ANONYMOUS | MAP_SHARED), it isn't clear how to do this.

So, mmap() with an fd, so we can create alias mappings. Use memfd_create
instead of actually passing in a tmpfs path like hugetlb does, since
it's more convenient / simpler to run, and works just as well.

Future commits will:

1. Setup the alias mappings.
2. Extend our tests to actually take advantage of this, to test new
   userfaultfd behavior being introduced in this series.

Also, a small fix in the area we're changing: when the hugetlb setup
fails in main(), pass in the right argv[] so we actually print out the
hugetlb file path.

Signed-off-by: Axel Rasmussen 
---
 tools/testing/selftests/vm/userfaultfd.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index 310fc617c383..b0af88b258d7 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -85,6 +85,7 @@ static bool test_uffdio_wp = false;
 static bool test_uffdio_minor = false;
 
 static bool map_shared;
+static int shm_fd;
 static int huge_fd;
 static char *huge_fd_off0;
 static unsigned long long *count_verify;
@@ -278,10 +279,13 @@ static void shmem_release_pages(char *rel_area)
 
 static void shmem_allocate_area(void **alloc_area)
 {
+   unsigned long offset =
+   alloc_area == (void **)_src ? 0 : nr_pages * page_size;
+
*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
-  MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+  MAP_SHARED, shm_fd, offset);
if (*alloc_area == MAP_FAILED)
-   err("shared memory mmap failed");
+   err("mmap of memfd failed");
 }
 
 struct uffd_test_ops {
@@ -1446,6 +1450,16 @@ int main(int argc, char **argv)
err("Open of %s failed", argv[4]);
if (ftruncate(huge_fd, 0))
err("ftruncate %s to size 0 failed", argv[4]);
+   } else if (test_type == TEST_SHMEM) {
+   shm_fd = memfd_create(argv[0], 0);
+   if (shm_fd < 0)
+   err("memfd_create");
+   if (ftruncate(shm_fd, nr_pages * page_size * 2))
+   err("ftruncate");
+   if (fallocate(shm_fd,
+ FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0,
+ nr_pages * page_size * 2))
+   err("fallocate");
}
printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n",
   nr_pages, nr_pages_per_cpu);
-- 
2.31.1.295.g9ea45b61b8-goog

[PATCH 7/9] userfaultfd/selftests: reinitialize test context in each test

2021-04-08 Thread Axel Rasmussen

Currently, the context (fds, mmap-ed areas, etc.) are global. Each test
mutates this state in some way, in some cases really "clobbering it"
(e.g., the events test mremap-ing area_dst over the top of area_src, or
the minor faults tests overwriting the count_verify values in the test
areas). We run the tests in a particular order, each test is careful to
make the right assumptions about its starting state, etc.

But, this is fragile. It's better for a test's success or failure to not
depend on what some other prior test case did to the global state.

To that end, clear and reinitialize the test context at the start of
each test case, so whatever prior test cases did doesn't affect future
tests.

This is particularly relevant to this series because the events test's
mremap of area_dst screws up assumptions the minor fault test was
relying on. This wasn't a problem for hugetlb, as we don't mremap in
that case.

Signed-off-by: Axel Rasmussen 
---
 tools/testing/selftests/vm/userfaultfd.c | 221 +--
 1 file changed, 129 insertions(+), 92 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index 4b49b2cf9819..9b032cfdc262 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -89,7 +89,8 @@ static int shm_fd;
 static int huge_fd;
 static char *huge_fd_off0;
 static unsigned long long *count_verify;
-static int uffd, uffd_flags, finished, *pipefd;
+static int uffd = -1;
+static int uffd_flags, finished, *pipefd;
 static char *area_src, *area_src_alias, *area_dst, *area_dst_alias;
 static char *zeropage;
 pthread_attr_t attr;
@@ -343,6 +344,124 @@ static struct uffd_test_ops hugetlb_uffd_test_ops = {
 
 static struct uffd_test_ops *uffd_test_ops;
 
+static int userfaultfd_open(uint64_t *features)
+{
+   struct uffdio_api uffdio_api;
+
+   uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+   if (uffd < 0)
+   err("userfaultfd syscall not available in this kernel");
+   uffd_flags = fcntl(uffd, F_GETFD, NULL);
+
+   uffdio_api.api = UFFD_API;
+   uffdio_api.features = *features;
+   if (ioctl(uffd, UFFDIO_API, _api))
+   err("UFFDIO_API failed.\nPlease make sure to "
+   "run with either root or ptrace capability.");
+   if (uffdio_api.api != UFFD_API)
+   err("UFFDIO_API error: %" PRIu64, (uint64_t)uffdio_api.api);
+
+   *features = uffdio_api.features;
+   return 0;
+}
+
+static int uffd_test_ctx_init_ext(uint64_t *features)
+{
+   unsigned long nr, cpu;
+
+   uffd_test_ops->allocate_area((void **)_src);
+   if (!area_src)
+   return 1;
+   uffd_test_ops->allocate_area((void **)_dst);
+   if (!area_dst)
+   return 1;
+
+   if (uffd_test_ops->release_pages(area_src))
+   return 1;
+
+   if (uffd_test_ops->release_pages(area_dst))
+   return 1;
+
+   if (userfaultfd_open(features))
+   return 1;
+
+   count_verify = malloc(nr_pages * sizeof(unsigned long long));
+   if (!count_verify)
+   err("count_verify");
+
+   for (nr = 0; nr < nr_pages; nr++) {
+   *area_mutex(area_src, nr) =
+   (pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER;
+   count_verify[nr] = *area_count(area_src, nr) = 1;
+   /*
+* In the transition between 255 to 256, powerpc will
+* read out of order in my_bcmp and see both bytes as
+* zero, so leave a placeholder below always non-zero
+* after the count, to avoid my_bcmp to trigger false
+* positives.
+*/
+   *(area_count(area_src, nr) + 1) = 1;
+   }
+
+   pipefd = malloc(sizeof(int) * nr_cpus * 2);
+   if (!pipefd)
+   err("pipefd");
+   for (cpu = 0; cpu < nr_cpus; cpu++)
+   if (pipe2([cpu * 2], O_CLOEXEC | O_NONBLOCK))
+   err("pipe");
+
+   return 0;
+}
+
+static inline int uffd_test_ctx_init(uint64_t features)
+{
+   return uffd_test_ctx_init_ext();
+}
+
+static inline int munmap_area(void **area)
+{
+   if (*area)
+   if (munmap(*area, nr_pages * page_size))
+   err("munmap");
+
+   *area = NULL;
+   return 0;
+}
+
+static int uffd_test_ctx_clear(void)
+{
+   int ret = 0;
+   size_t i;
+
+   if (pipefd) {
+   for (i = 0; i < nr_cpus * 2; ++i) {
+   if (close(pipefd[i]))
+   err("close pipefd");
+   }
+   free(pipefd);
+   pipefd = NULL;
+   }
+
+   if (count_verify) {
+   free(count_verify);
+   count_verify = NULL;
+   }
+
+   if (uffd != -1) {
+   if (close(uffd))
+   err("close uffd");
+   uffd

[PATCH 8/9] userfaultfd/selftests: exercise minor fault handling shmem support

2021-04-08 Thread Axel Rasmussen

Enable test_uffdio_minor for test_type == TEST_SHMEM, and modify the
test slightly to pass in / check for the right feature flags.

Signed-off-by: Axel Rasmussen 
---
 tools/testing/selftests/vm/userfaultfd.c | 29 
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index 9b032cfdc262..640d0a2d107d 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -488,6 +488,7 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool 
wp)
 static void continue_range(int ufd, __u64 start, __u64 len)
 {
struct uffdio_continue req;
+   int ret;
 
req.range.start = start;
req.range.len = len;
@@ -496,6 +497,17 @@ static void continue_range(int ufd, __u64 start, __u64 len)
if (ioctl(ufd, UFFDIO_CONTINUE, ))
err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
(uint64_t)start);
+
+   /*
+* Error handling within the kernel for continue is subtly different
+* from copy or zeropage, so it may be a source of bugs. Trigger an
+* error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
+*/
+   req.mapped = 0;
+   ret = ioctl(ufd, UFFDIO_CONTINUE, );
+   if (ret >= 0 || req.mapped != -EEXIST)
+   err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d, 
mapped=%" PRId64,
+   ret, req.mapped);
 }
 
 static void *locking_thread(void *arg)
@@ -1198,7 +1210,7 @@ static int userfaultfd_minor_test(void)
void *expected_page;
char c;
struct uffd_stats stats = { 0 };
-   uint64_t features = UFFD_FEATURE_MINOR_HUGETLBFS;
+   uint64_t req_features, features_out;
 
if (!test_uffdio_minor)
return 0;
@@ -1206,10 +1218,18 @@ static int userfaultfd_minor_test(void)
printf("testing minor faults: ");
fflush(stdout);
 
-   if (uffd_test_ctx_clear() || uffd_test_ctx_init_ext())
+   if (test_type == TEST_HUGETLB)
+   req_features = UFFD_FEATURE_MINOR_HUGETLBFS;
+   else if (test_type == TEST_SHMEM)
+   req_features = UFFD_FEATURE_MINOR_SHMEM;
+   else
+   return 1;
+
+   features_out = req_features;
+   if (uffd_test_ctx_clear() || uffd_test_ctx_init_ext(_out))
return 1;
-   /* If kernel reports the feature isn't supported, skip the test. */
-   if (!(features & UFFD_FEATURE_MINOR_HUGETLBFS)) {
+   /* If kernel reports required features aren't supported, skip test. */
+   if ((features_out & req_features) != req_features) {
printf("skipping test due to lack of feature support\n");
fflush(stdout);
return 0;
@@ -1444,6 +1464,7 @@ static void set_test_type(const char *type)
map_shared = true;
test_type = TEST_SHMEM;
uffd_test_ops = _uffd_test_ops;
+   test_uffdio_minor = true;
} else {
err("Unknown test type: %s", type);
}
-- 
2.31.1.295.g9ea45b61b8-goog

[PATCH 6/9] userfaultfd/selftests: create alias mappings in the shmem test

2021-04-08 Thread Axel Rasmussen

Previously, we just allocated two shm areas: area_src and area_dst. With
this commit, change this so we also allocate area_src_alias, and
area_dst_alias.

area_*_alias and area_* (respectively) point to the same underlying
physical pages, but are different VMAs. In a future commit in this
series, we'll leverage this setup to exercise minor fault handling
support for shmem, just like we do in the hugetlb_shared test.

Signed-off-by: Axel Rasmussen 
---
 tools/testing/selftests/vm/userfaultfd.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index b0af88b258d7..4b49b2cf9819 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -279,13 +279,29 @@ static void shmem_release_pages(char *rel_area)
 
 static void shmem_allocate_area(void **alloc_area)
 {
-   unsigned long offset =
-   alloc_area == (void **)_src ? 0 : nr_pages * page_size;
+   void *area_alias = NULL;
+   bool is_src = alloc_area == (void **)_src;
+   unsigned long offset = is_src ? 0 : nr_pages * page_size;
 
*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
   MAP_SHARED, shm_fd, offset);
if (*alloc_area == MAP_FAILED)
err("mmap of memfd failed");
+
+   area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
+ MAP_SHARED, shm_fd, offset);
+   if (area_alias == MAP_FAILED)
+   err("mmap of memfd alias failed");
+
+   if (is_src)
+   area_src_alias = area_alias;
+   else
+   area_dst_alias = area_alias;
+}
+
+static void shmem_alias_mapping(__u64 *start, size_t len, unsigned long offset)
+{
+   *start = (unsigned long)area_dst_alias + offset;
 }
 
 struct uffd_test_ops {
@@ -315,7 +331,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = {
.expected_ioctls = SHMEM_EXPECTED_IOCTLS,
.allocate_area  = shmem_allocate_area,
.release_pages  = shmem_release_pages,
-   .alias_mapping = noop_alias_mapping,
+   .alias_mapping = shmem_alias_mapping,
 };
 
 static struct uffd_test_ops hugetlb_uffd_test_ops = {
-- 
2.31.1.295.g9ea45b61b8-goog

[PATCH 4/9] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem

2021-04-08 Thread Axel Rasmussen

With this change, userspace can resolve a minor fault within a
shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this
match those for hugetlbfs - we look up the existing page in the page
cache, and install PTEs for it.

This commit introduces a new helper: mcopy_atomic_install_ptes.

Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
shmem.c? The existing userfault implementation only relies on shmem.c
for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just
fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for
shmem in one place, regardless of shared/private (to reduce code
duplication).

Why add a new mcopy_atomic_install_ptes helper? A problem we have with
continue is that shmem_mcopy_atomic_pte() and mcopy_atomic_pte() are
*close* to what we want, but not exactly. We do want to setup the PTEs
in a CONTINUE operation, but we don't want to e.g. allocate a new page,
charge it (e.g. to the shmem inode), manipulate various flags, etc. Also
we have the problem stated above: shmem_mcopy_atomic_pte() and
mcopy_atomic_pte() both handle one-half of the problem (shared /
private) continue cares about. So, introduce mcontinue_atomic_pte(), to
handle all of the shmem continue cases. Introduce the helper so it
doesn't duplicate code with mcopy_atomic_pte().

In a future commit, shmem_mcopy_atomic_pte() will also be modified to
use this new helper. However, since this is a bigger refactor, it seems
most clear to do it as a separate change.

Signed-off-by: Axel Rasmussen 
---
 mm/userfaultfd.c | 176 +++
 1 file changed, 131 insertions(+), 45 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 60ae22207761..a539fe18b9a7 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -48,6 +48,87 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm,
return dst_vma;
 }
 
+/*
+ * Install PTEs, to map dst_addr (within dst_vma) to page.
+ *
+ * This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed),
+ * whether or not dst_vma is VM_SHARED. It also handles the more general
+ * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file
+ * backed, or not).
+ *
+ * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by
+ * shmem_mcopy_atomic_pte instead.
+ */
+static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+struct vm_area_struct *dst_vma,
+unsigned long dst_addr, struct page *page,
+bool newly_allocated, bool wp_copy)
+{
+   int ret;
+   pte_t _dst_pte, *dst_pte;
+   int writable;
+   bool vm_shared = dst_vma->vm_flags & VM_SHARED;
+   spinlock_t *ptl;
+   struct inode *inode;
+   pgoff_t offset, max_off;
+
+   _dst_pte = mk_pte(page, dst_vma->vm_page_prot);
+   writable = dst_vma->vm_flags & VM_WRITE;
+   /* For private, non-anon we need CoW (don't write to page cache!) */
+   if (!vma_is_anonymous(dst_vma) && !vm_shared)
+   writable = 0;
+
+   if (writable || vma_is_anonymous(dst_vma))
+   _dst_pte = pte_mkdirty(_dst_pte);
+   if (writable) {
+   if (wp_copy)
+   _dst_pte = pte_mkuffd_wp(_dst_pte);
+   else
+   _dst_pte = pte_mkwrite(_dst_pte);
+   } else if (vm_shared) {
+   /*
+* Since we didn't pte_mkdirty(), mark the page dirty or it
+* could be freed from under us. We could do this
+* unconditionally, but doing it only if !writable is faster.
+*/
+   set_page_dirty(page);
+   }
+
+   dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, );
+
+   if (vma_is_shmem(dst_vma)) {
+   /* The shmem MAP_PRIVATE case requires checking the i_size */
+   inode = dst_vma->vm_file->f_inode;
+   offset = linear_page_index(dst_vma, dst_addr);
+   max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+   ret = -EFAULT;
+   if (unlikely(offset >= max_off))
+   goto out_unlock;
+   }
+
+   ret = -EEXIST;
+   if (!pte_none(*dst_pte))
+   goto out_unlock;
+
+   inc_mm_counter(dst_mm, mm_counter(page));
+   if (vma_is_shmem(dst_vma))
+   page_add_file_rmap(page, false);
+   else
+   page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
+
+   if (newly_allocated)
+   lru_cache_add_inactive_or_unevictable(page, dst_vma);
+
+   set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+
+   /* No need to invalidate - it was non-present before */
+   update_mmu_cache(dst_vma, dst_addr, dst_pte);
+   ret = 0;
+out_unlock:
+   pte_unmap_unlock(dst_pte, ptl);
+   return ret;
+}
+
 static int

[PATCH 0/9] userfaultfd: add minor fault handling for shmem

2021-04-08 Thread Axel Rasmussen

Base


Since the original series [1] was merged into Andrew's tree, some issues were
noticed. Up to this point, we had been working on fixing what's in Andrew's
tree [2], but at this point we've changed direction enough that a lot of the
fix's delta is undoing what was done in the original series, thereby making it
hard to review.

As suggested by Hugh Dickins and Peter Xu, this series takes a step back. It can
be considered a v3 of the original series [1] - it combines those patches with
the fixes, reordered / broken up to allow for easier review.

The idea is that it will apply cleanly to akpm's tree, *replacing* the following
patches (i.e., drop these first, and then apply this series):

userfaultfd-support-minor-fault-handling-for-shmem.patch
userfaultfd-support-minor-fault-handling-for-shmem-fix.patch
userfaultfd-support-minor-fault-handling-for-shmem-fix-2.patch
userfaultfd-support-minor-fault-handling-for-shmem-fix-3.patch
userfaultfd-support-minor-fault-handling-for-shmem-fix-4.patch
userfaultfd-selftests-use-memfd_create-for-shmem-test-type.patch
userfaultfd-selftests-create-alias-mappings-in-the-shmem-test.patch
userfaultfd-selftests-reinitialize-test-context-in-each-test.patch
userfaultfd-selftests-exercise-minor-fault-handling-shmem-support.patch

Changelog
=

Changes since the most recent fixup patch [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
  easier, as we no longer have to sift through deltas undoing what we had done
  before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
  helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
  for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
  of some parameters, simplify labels/gotos, ...). [Hugh, Peter]

Overview


See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.

This series is structured as follows:

- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commits 5, 6, 7, 8 update the userfaultfd selftest to exercise the feature.
- Commit 9 is one final cleanup, modifying an existing code path to re-use a new
  helper we've introduced. We rely on the selftest to show that this change
  doesn't break anything.

Use Case


In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.

Additionally, Android folks (Lokesh Gidra ) hope to
optimize the Android Runtime garbage collector using this feature:

"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."

[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] 
https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmus...@google.com/T/#t

Axel Rasmussen (9):
  userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
  userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
  userfaultfd/shmem: support minor fault registration for shmem
  userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
  userfaultfd/selftests: use memfd_create for shmem test type
  userfaultfd/selftests: create alias mappings in the shmem test
  userfaultfd/selftests: reinitialize test context in each test
  userfaultfd/selftests: exercise minor fault handling shmem support
  userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_ptes

 fs/userfaultfd.c |   6 +-
 include/linux/hugetlb.h  |   5 +-
 include/linux/shmem_fs.h |  15 +-
 include/linux/userfaultfd_k.h|   5 +
 include/uapi/linux/userfaultfd.h |   7 +-
 mm/hugetlb.c |   1 +
 mm/memory.c  |   8 +-
 mm/shmem.c   | 122 --

[PATCH 1/9] userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h

2021-04-08 Thread Axel Rasmussen

Minimizing header file inclusion is desirable. In this case, we can do
so just by forward declaring the enumeration our signature relies upon.

Signed-off-by: Axel Rasmussen 
---
 include/linux/hugetlb.h | 5 -
 mm/hugetlb.c| 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 1d3246b31a41..dfb749eaf348 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 
 struct ctl_table;
 struct user_struct;
@@ -136,6 +135,8 @@ unsigned long hugetlb_total_pages(void);
 vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, unsigned int flags);
 #ifdef CONFIG_USERFAULTFD
+enum mcopy_atomic_mode;
+
 int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
@@ -315,6 +316,8 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather 
*tlb,
 }
 
 #ifdef CONFIG_USERFAULTFD
+enum mcopy_atomic_mode;
+
 static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
pte_t *dst_pte,
struct vm_area_struct *dst_vma,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9973dec4976c..3b93bbf8c80f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 int hugetlb_max_hstate __read_mostly;
-- 
2.31.1.295.g9ea45b61b8-goog

[PATCH 2/9] userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte

2021-04-08 Thread Axel Rasmussen

Previously, we did a dance where we had one calling path in
userfaultfd.c (mfill_atomic_pte), but then we split it into two in
shmem_fs.h (shmem_{mcopy_atomic,mfill_zeropage}_pte), and then rejoined
into a single shared function in shmem.c (shmem_mfill_atomic_pte).

This is all a bit overly complex. Just call the single combined shmem
function directly, allowing us to clean up various branches,
boilerplate, etc.

While we're touching this function, two other small cleanup changes:
- offset is equivalent to pgoff, so we can get rid of offset entirely.
- Split two VM_BUG_ON cases into two statements. This means the line
  number reported when the BUG is hit specifies exactly which condition
  was true.

Signed-off-by: Axel Rasmussen 
---
 include/linux/shmem_fs.h | 15 +---
 mm/shmem.c   | 52 +---
 mm/userfaultfd.c | 10 +++-
 3 files changed, 25 insertions(+), 52 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index d82b6f396588..919e36671fe6 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -122,21 +122,18 @@ static inline bool shmem_file(struct file *file)
 extern bool shmem_charge(struct inode *inode, long pages);
 extern void shmem_uncharge(struct inode *inode, long pages);
 
+#ifdef CONFIG_USERFAULTFD
 #ifdef CONFIG_SHMEM
 extern int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
  struct vm_area_struct *dst_vma,
  unsigned long dst_addr,
  unsigned long src_addr,
+ bool zeropage,
  struct page **pagep);
-extern int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm,
-   pmd_t *dst_pmd,
-   struct vm_area_struct *dst_vma,
-   unsigned long dst_addr);
-#else
+#else /* !CONFIG_SHMEM */
 #define shmem_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \
-  src_addr, pagep)({ BUG(); 0; })
-#define shmem_mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, \
-dst_addr)  ({ BUG(); 0; })
-#endif
+  src_addr, zeropage, pagep)   ({ BUG(); 0; })
+#endif /* CONFIG_SHMEM */
+#endif /* CONFIG_USERFAULTFD */
 
 #endif
diff --git a/mm/shmem.c b/mm/shmem.c
index b2db4ed0fbc7..c21f20cc4204 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2354,13 +2354,14 @@ static struct inode *shmem_get_inode(struct super_block 
*sb, const struct inode
return inode;
 }
 
-static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr,
- unsigned long src_addr,
- bool zeropage,
- struct page **pagep)
+#ifdef CONFIG_USERFAULTFD
+int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
+  pmd_t *dst_pmd,
+  struct vm_area_struct *dst_vma,
+  unsigned long dst_addr,
+  unsigned long src_addr,
+  bool zeropage,
+  struct page **pagep)
 {
struct inode *inode = file_inode(dst_vma->vm_file);
struct shmem_inode_info *info = SHMEM_I(inode);
@@ -2372,7 +2373,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct 
*dst_mm,
struct page *page;
pte_t _dst_pte, *dst_pte;
int ret;
-   pgoff_t offset, max_off;
+   pgoff_t max_off;
 
ret = -ENOMEM;
if (!shmem_inode_acct_block(inode, 1))
@@ -2383,7 +2384,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct 
*dst_mm,
if (!page)
goto out_unacct_blocks;
 
-   if (!zeropage) {/* mcopy_atomic */
+   if (!zeropage) {/* COPY */
page_kaddr = kmap_atomic(page);
ret = copy_from_user(page_kaddr,
 (const void __user *)src_addr,
@@ -2397,7 +2398,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct 
*dst_mm,
/* don't free the page */
return -ENOENT;
}
-   } else {/* mfill_zeropage_atomic */
+   } else {/* ZEROPAGE */
clear_highpage(page);
}
} else {
@@ -2405,15 +2406,15 @@ static int shmem_mfill_atomic_pte(struct mm_struct 
*dst_mm,
*pagep = NULL;
}
 
-   VM_BUG_ON(PageLocked(page) || PageSwapBacked(page));
+   VM_BUG_ON(PageLocked(page));
+

[tip:x86/entry] BUILD SUCCESS 70918779aec9bd01d16f4e6e800ffe423d196021

2021-04-08 Thread kernel test robot

  sh7785lcr_32bit_defconfig
sh   se7206_defconfig
nios2alldefconfig
arcvdk_hs38_defconfig
sh  sdk7786_defconfig
powerpc mpc83xx_defconfig
arm  pxa3xx_defconfig
sh   sh7724_generic_defconfig
sh  rsk7269_defconfig
mipsbcm47xx_defconfig
powerpcmpc7448_hpc2_defconfig
armzeus_defconfig
arm  footbridge_defconfig
powerpcwarp_defconfig
mips   ip22_defconfig
m68k  multi_defconfig
sh  lboxre2_defconfig
arm64alldefconfig
powerpc mpc5200_defconfig
powerpc  ep88xc_defconfig
m68k  amiga_defconfig
arm  colibri_pxa270_defconfig
powerpcmvme5100_defconfig
armtrizeps4_defconfig
armxcep_defconfig
ia64zx1_defconfig
powerpc  pasemi_defconfig
powerpc mpc832x_rdb_defconfig
powerpc   mpc834x_itxgp_defconfig
arm  ep93xx_defconfig
armdove_defconfig
powerpc mpc85xx_cds_defconfig
mips   ip32_defconfig
armrealview_defconfig
armmvebu_v7_defconfig
arm  collie_defconfig
powerpc ps3_defconfig
arm  gemini_defconfig
arm  iop32x_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
x86_64   randconfig-a005-20210408
x86_64   randconfig-a003-20210408
x86_64   randconfig-a001-20210408
x86_64   randconfig-a004-20210408
x86_64   randconfig-a002-20210408
x86_64   randconfig-a006-20210408
i386 randconfig-a006-20210408
i386 randconfig-a003-20210408
i386 randconfig-a001-20210408
i386 randconfig-a004-20210408
i386 randconfig-a005-20210408
i386 randconfig-a002-20210408
i386 randconfig-a014-20210408
i386 randconfig-a016-20210408
i386 randconfig-a011-20210408
i386 randconfig-a012-20210408
i386 randconfig-a013-20210408
i386 randconfig-a015-20210408
riscvnommu_virt_defconfig
riscv allnoconfig
riscv  rv32_defconfig
um   allmodconfig
um   allyesconfig
um  defconfig
x86_64rhel-8.3-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-a014-20210408
x86_64   randconfig-a015-20210408
x86_64   randconfig-a012-20210408
x86_64   randconfig-a011-20210408
x86_64   randconfig-a013-20210408
x86_64   randconfig-a016-20210408

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

[tip:x86/core] BUILD SUCCESS 53375a5a218e7ea0ac18087946b5391f749b764f

2021-04-08 Thread kernel test robot

   x86_64_defconfig
sh   sh7724_generic_defconfig
sh  rsk7269_defconfig
mipsbcm47xx_defconfig
powerpcmpc7448_hpc2_defconfig
armzeus_defconfig
arm  footbridge_defconfig
powerpcwarp_defconfig
mips   ip22_defconfig
m68k  multi_defconfig
sh  lboxre2_defconfig
arm64alldefconfig
powerpc mpc5200_defconfig
powerpc  ep88xc_defconfig
m68k  amiga_defconfig
arm  colibri_pxa270_defconfig
powerpcmvme5100_defconfig
armtrizeps4_defconfig
armxcep_defconfig
ia64zx1_defconfig
sh  sh7785lcr_32bit_defconfig
powerpc  pasemi_defconfig
powerpc mpc832x_rdb_defconfig
powerpc   mpc834x_itxgp_defconfig
arm  ep93xx_defconfig
armdove_defconfig
powerpc mpc85xx_cds_defconfig
arcvdk_hs38_smp_defconfig
mips   ip32_defconfig
armrealview_defconfig
armmvebu_v7_defconfig
arm  collie_defconfig
powerpc ps3_defconfig
arm  gemini_defconfig
arm  iop32x_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
x86_64   randconfig-a004-20210408
x86_64   randconfig-a005-20210408
x86_64   randconfig-a003-20210408
x86_64   randconfig-a001-20210408
x86_64   randconfig-a002-20210408
x86_64   randconfig-a006-20210408
i386 randconfig-a006-20210408
i386 randconfig-a003-20210408
i386 randconfig-a001-20210408
i386 randconfig-a004-20210408
i386 randconfig-a005-20210408
i386 randconfig-a002-20210408
i386 randconfig-a014-20210408
i386 randconfig-a016-20210408
i386 randconfig-a011-20210408
i386 randconfig-a012-20210408
i386 randconfig-a013-20210408
i386 randconfig-a015-20210408
riscvnommu_virt_defconfig
riscv allnoconfig
riscv  rv32_defconfig
um   allmodconfig
um   allyesconfig
um  defconfig
x86_64rhel-8.3-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-a014-20210408
x86_64   randconfig-a015-20210408
x86_64   randconfig-a012-20210408
x86_64   randconfig-a011-20210408
x86_64   randconfig-a013-20210408
x86_64   randconfig-a016-20210408

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

Re: [PATCH v3 net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-08 Thread Andrew Lunn

> Linux kernel doesn't do namespaces in the code, so every new driver needs
> to worry about global symbols clashing

This driver is called mana, yet the code uses ana. It would be good to
resolve this inconsistency as well. Ideally, you want to prefix
everything with ana_ or mana_, depending on what you choose, so we
have a clean namespace.

   Andrew

RE: [PATCH v3 net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-08 Thread Dexuan Cui

> From: Andrew Lunn 
> Sent: Thursday, April 8, 2021 5:30 PM
> To: Stephen Hemminger 
> ...
> > Linux kernel doesn't do namespaces in the code, so every new driver needs
> > to worry about global symbols clashing
> 
> This driver is called mana, yet the code uses ana. It would be good to
> resolve this inconsistency as well. Ideally, you want to prefix
> everything with ana_ or mana_, depending on what you choose, so we
> have a clean namespace.
> 
>  Andrew

Thanks for the suggestion! Let me think about this and work out a solution.

Re: [PATCH v6 01/30] iov_iter: Add ITER_XARRAY

2021-04-08 Thread Al Viro

On Thu, Apr 08, 2021 at 03:04:07PM +0100, David Howells wrote:
> Add an iterator, ITER_XARRAY, that walks through a set of pages attached to
> an xarray, starting at a given page and offset and walking for the
> specified amount of bytes.  The iterator supports transparent huge pages.
> 
> The iterate_xarray() macro calls the helper function with rcu_access()
> helped.  I think that this is only a problem for iov_iter_for_each_range()
> - and that returns an error for ITER_XARRAY (also, this function does not
> appear to be called).

Unused since lustre had gone away.

> +#define iterate_all_kinds(i, n, v, I, B, K, X) { \

Do you have any users that would pass different B and X?

> @@ -1440,7 +1665,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
>   return v.bv_len;
>   }),({
>   return -EFAULT;
> - })
> + }), 0

Correction - users that might get that flavour.  This one explicitly checks
for xarray and doesn't get to iterate_... in that case.

Re: [PATCH v2] integrity: Add declarations to init_once void arguments.

2021-04-08 Thread Jiele Zhao


Hi Mimi,

And this is another patch that has been modified.

On 2021/4/7 9:44, Jiele Zhao wrote:

init_once is a callback to kmem_cache_create. The parameter
type of this function is void *, so it's better to give a
explicit cast here.

Signed-off-by: Jiele Zhao 
---
  security/integrity/iint.c | 2 +-
  security/integrity/ima/ima_main.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/security/integrity/iint.c b/security/integrity/iint.c
index 0ba01847e836..fca8a9409e4a 100644
--- a/security/integrity/iint.c
+++ b/security/integrity/iint.c
@@ -160,7 +160,7 @@ void integrity_inode_free(struct inode *inode)
  
  static void init_once(void *foo)

  {
-   struct integrity_iint_cache *iint = foo;
+   struct integrity_iint_cache *iint = (struct integrity_iint_cache *) foo;
  
  	memset(iint, 0, sizeof(*iint));

iint->ima_file_status = INTEGRITY_UNKNOWN;
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 9ef748ea829f..03bef720ab44 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -482,7 +482,7 @@ int ima_bprm_check(struct linux_binprm *bprm)
  }
  
  /**

- * ima_path_check - based on policy, collect/store measurement.
+ * ima_file_check - based on policy, collect/store measurement.
   * @file: pointer to the file to be measured
   * @mask: contains MAY_READ, MAY_WRITE, MAY_EXEC or MAY_APPEND
   *

Re: [PATCH v4 19/20] mips: Convert to GENERIC_CMDLINE

2021-04-08 Thread Daniel Walker

On Thu, Apr 08, 2021 at 02:04:08PM -0500, Rob Herring wrote:
> On Tue, Apr 06, 2021 at 10:38:36AM -0700, Daniel Walker wrote:
> > On Fri, Apr 02, 2021 at 03:18:21PM +, Christophe Leroy wrote:
> > > -config CMDLINE_BOOL
> > > - bool "Built-in kernel command line"
> > > - help
> > > -   For most systems, it is firmware or second stage bootloader that
> > > -   by default specifies the kernel command line options.  However,
> > > -   it might be necessary or advantageous to either override the
> > > -   default kernel command line or add a few extra options to it.
> > > -   For such cases, this option allows you to hardcode your own
> > > -   command line options directly into the kernel.  For that, you
> > > -   should choose 'Y' here, and fill in the extra boot arguments
> > > -   in CONFIG_CMDLINE.
> > > -
> > > -   The built-in options will be concatenated to the default command
> > > -   line if CMDLINE_OVERRIDE is set to 'N'. Otherwise, the default
> > > -   command line will be ignored and replaced by the built-in string.
> > > -
> > > -   Most MIPS systems will normally expect 'N' here and rely upon
> > > -   the command line from the firmware or the second-stage bootloader.
> > > -
> > 
> > 
> > See how you complained that I have CMDLINE_BOOL in my changed, and you 
> > think it
> > shouldn't exist.
> > 
> > Yet here mips has it, and you just deleted it with no feature parity in your
> > changes for this.
> 
> AFAICT, CMDLINE_BOOL equates to a non-empty or empty CONFIG_CMDLINE. You 
> seem to need it just because you have CMDLINE_PREPEND and 
> CMDLINE_APPEND. If that's not it, what feature is missing? CMDLINE_BOOL 
> is not a feature, but an implementation detail.

Not true.

It makes it easier to turn it all off inside the Kconfig , so it's for usability
and multiple architecture have it even with just CMDLINE as I was commenting
here.

Daniel

[GIT PULL] SMB3 Fixes

2021-04-08 Thread Steve French

Please pull the following changes since commit
e49d033bddf5b565044e2abe4241353959bc9120:

  Linux 5.12-rc6 (2021-04-04 14:15:36 -0700)

are available in the Git repository at:

  git://git.samba.org/sfrench/cifs-2.6.git tags/5.12-rc6-smb3

for you to fetch changes up to 0fc9322ab5e1fe6910c9673e1a7ff29f7dd72611:

  cifs: escape spaces in share names (2021-04-07 21:30:27 -0500)


3 cifs/smb3 fixes, 2 for stable, includes a reconnect fix (for case
when server address changed) and fix for proper display of devnames
(when have space or tab).

Test results: 
http://smb3-test-rhel-75.southcentralus.cloudapp.azure.com/#/builders/2/builds/550

Maciek Borzecki (1):
  cifs: escape spaces in share names

Shyam Prasad N (1):
  cifs: On cifs_reconnect, resolve the hostname again.

Wan Jiabing (1):
  fs: cifs: Remove unnecessary struct declaration

 fs/cifs/Kconfig|  3 +--
 fs/cifs/Makefile   |  5 +++--
 fs/cifs/cifsfs.c   |  3 ++-
 fs/cifs/cifsglob.h |  2 --
 fs/cifs/connect.c  | 17 -
 5 files changed, 22 insertions(+), 8 deletions(-)

-- 
Thanks,

Steve

Re: [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed

2021-04-08 Thread Theodore Ts'o

On Wed, Apr 07, 2021 at 09:41:57AM +0800, yebin wrote:
> > > If call ext4_ext_insert_extent failed but new extent already inserted, we 
> > > just
> > > update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, 
> > > then
> > > cause bug on when cache extent.
> > How did this happen in the first place?  It sounds like if the extent
> > was already inserted, that would be casue there was an on-disk file
> > system corruption, no?
> > 
> > In that case, shouldn't we call ext4_error() to declare the file
> > system has an inconsistency, so it can be fixed by fsck?
> We inject IO fault when runing  fsstress,  JBD detect IO error then trigger
> JBD abort.  At the same time,
> if ext4_ext_insert_extent already insert new exntent then call
> ext4_ext_dirty to dirty metadata , but
> JBD already aborted ,  ext4_ext_dirty will return error.
> In ext4_ext_dirty function call  ext4_ext_check_inode check extent if ok, if
> not, trigger BUG_ON and
> also print extent detail information.

In this particular case, skipping the "ex->ee_len = orig_ex.ee_len"
may avoid the BUG_ON.  But it's not clear that this is always the
right thing to do.  The fundamental question is what should we do we
run into an error while we are in the middle of making changes to
on-disk and in-memory data structures?

In the ideal world, we should undo the changes that we were in the
middle of making before we return an error.  That way, the semantics
are very clear; on success, the function has made the requested change
to the file system.  If the function returns an error, then no changes
should be made.

That was the reasoning behind resetting ex->ee_len to orig_ex.ee_len
in the fix_extent_len inside ext4_split_extent_at().  Unofrtunately,
ext4_ext_insert_extent() does *not* always follow this convention, and
that's because it would be extremely difficult for it to do so --- the
mutations that it makes can be quite complex, including potentially
increasing the height of the extent tree.

However, I don't think your fix is by any means the ideal one, because
the much more common way that ext4_ext_insert_extent() is when it
needs to insert a new leaf node, or need to increase the height of the
extent tree --- and in it returns an ENOSPC failure.  In that case, it
won't have made any changes changes in the extent tree, and so having
ext4_split_extent_at() undo the change to ex->ee_len is the right
thing to do.

Having blocks get leaked when there is an ENOSPC failure, requiring
fsck to be run --- and without giving the user any warning that this
has happened is *not* a good way to fail.  So I don't think the
proposed patch is the right way to go.

A better way to go would be to teach ext4_ext_insert_extent() so if
there is a late failure, that it unwinds the leaf node back to its
original state (at least from a semantic value).  Since the extent
leaf node could have been split, and/or adjacent extent entries may
have been merged, what it would need to do is to remember the starting
block number and length, and make whatever changes are necessaries to
the extent entries in that leaf node corresponding to that starting
block number and length.

If you don't want to do that, then a "do no harm" fix would be
something like this:

...
} else if (err == -EROFS) {
return err;
} else if (err)
goto fix_extent_len;

So in the journal abort case, when err is set to EROFS, we don't try
to reset the length, since in theory the file system is read-only
already anyway.  However, in the ENOSPC case, we won't end up silently
leaking blocks that will be lost until the user somehow decides to run
fsck.

There are still times when this doesn't get things completely right
(e.g., what if we get a late ENOMEM error versus an early ENOMEM
failure), where the only real fix is to make ext4_ext_insert_extent()
obey the convention that if it returns an error, it must not result in
any user-visible state change.

Cheers,

- Ted

Re: [PATCH -next] clk: qcom: Add missing MODULE_DEVICE_TABLE

2021-04-08 Thread Chenhui (clare)


在 2021/4/9 4:30, Stephen Boyd 写道:


Quoting Chen Hui (2021-04-08 06:55:09)

Add missing MODULE_DEVICE_TABLE entries to support module autoloading,
as these drivers can be compiled as external modules.

Signed-off-by: Chen Hui 

Any fixes tag?
.


Thanks for reviewing, fixes tags will be added in v2 patches, which I 
will send out later

Re: [GIT PULL] fileattr API

2021-04-08 Thread Al Viro

On Fri, Apr 09, 2021 at 01:52:11AM +, Al Viro wrote:
> On Wed, Apr 07, 2021 at 09:22:52PM +0200, Miklos Szeredi wrote:
> > Hi Al,
> > 
> > Please pull from:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git fileattr_v4
> > 
> > Convert all (with the exception of CIFS) filesystems from handling
> > FS_IOC_[GS]ETFLAGS and FS_IOC_FS[GS]ETXATTR themselves to new i_ops and
> > common code moved into the VFS for these ioctls.  This removes boilerplate
> > from filesystems, and allows these operations to be properly stacked in
> > overlayfs.
> 
> Umm...  v4 or v5?

Looks like they only differ in making a couple of fuse helpers static.
Grabbed and merged into #for-next; will push if it passes the smoke
test...

Re: [PATCH v3] hwmon: Add driver for fsp-3y PSUs and PDUs

2021-04-08 Thread Guenter Roeck

On 4/8/21 6:27 PM, Václav Kubernát wrote:
> This patch adds support for these devices:
> - YH-5151E - the PDU
> - YM-2151E - the PSU
> 
> The device datasheet says that the devices support PMBus 1.2, but in my
> testing, a lot of the commands aren't supported and if they are, they
> sometimes behave strangely or inconsistently. For example, writes to the
> PAGE command requires using PEC, otherwise the write won't work and the
> page won't switch, even though, the standard says that PEC is opiotnal.
> On the other hand, writes the SMBALERT don't require PEC. Because of
> this, the driver is mostly reverse engineered with the help of a tool
> called pmbus_peek written by David Brownell (and later adopted by my
> colleague Jan Kundrát).
> 
> The device also has some sort of a timing issue when switching pages,
> which is explained further in the code.
> 
> Because of this, the driver support is limited. It exposes only the
> values, that have been tested to work correctly.
> 
> Signed-off-by: Václav Kubernát 
> ---
>  Documentation/hwmon/fsp-3y.rst |  26 
>  drivers/hwmon/pmbus/Kconfig|  10 ++
>  drivers/hwmon/pmbus/Makefile   |   1 +
>  drivers/hwmon/pmbus/fsp-3y.c   | 236 +
>  4 files changed, 273 insertions(+)
>  create mode 100644 Documentation/hwmon/fsp-3y.rst
>  create mode 100644 drivers/hwmon/pmbus/fsp-3y.c
> 
> diff --git a/Documentation/hwmon/fsp-3y.rst b/Documentation/hwmon/fsp-3y.rst
> new file mode 100644
> index ..68a547021846
> --- /dev/null
> +++ b/Documentation/hwmon/fsp-3y.rst
> @@ -0,0 +1,26 @@
> +Kernel driver fsp3y
> +==
> +Supported devices:
> +  * 3Y POWER YH-5151E
> +  * 3Y POWER YM-2151E
> +
> +Author: Václav Kubernát 
> +
> +Description
> +---
> +This driver implements limited support for two 3Y POWER devices.
> +
> +Sysfs entries
> +-
> +in1_inputinput voltage
> +in2_input12V output voltage
> +in3_input5V output voltage
> +curr1_input  input current
> +curr2_input  12V output current
> +curr3_input  5V output current
> +fan1_input   fan rpm
> +temp1_input  temperature 1
> +temp2_input  temperature 2
> +temp3_input  temperature 3
> +power1_input input power
> +power2_input output power
> diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
> index 03606d4298a4..9d12d446396c 100644
> --- a/drivers/hwmon/pmbus/Kconfig
> +++ b/drivers/hwmon/pmbus/Kconfig
> @@ -56,6 +56,16 @@ config SENSORS_BEL_PFE
> This driver can also be built as a module. If so, the module will
> be called bel-pfe.
>  
> +config SENSORS_FSP_3Y
> + tristate "FSP/3Y-Power power supplies"
> + help
> +   If you say yes here you get hardware monitoring support for
> +   FSP/3Y-Power hot-swap power supplies.
> +   Supported models: YH-5151E, YM-2151E
> +
> +   This driver can also be built as a module. If so, the module will
> +   be called fsp-3y.
> +
>  config SENSORS_IBM_CFFPS
>   tristate "IBM Common Form Factor Power Supply"
>   depends on LEDS_CLASS
> diff --git a/drivers/hwmon/pmbus/Makefile b/drivers/hwmon/pmbus/Makefile
> index 6a4ba0fdc1db..bfe218ad898f 100644
> --- a/drivers/hwmon/pmbus/Makefile
> +++ b/drivers/hwmon/pmbus/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_SENSORS_PMBUS)   += pmbus.o
>  obj-$(CONFIG_SENSORS_ADM1266)+= adm1266.o
>  obj-$(CONFIG_SENSORS_ADM1275)+= adm1275.o
>  obj-$(CONFIG_SENSORS_BEL_PFE)+= bel-pfe.o
> +obj-$(CONFIG_SENSORS_FSP_3Y) += fsp-3y.o
>  obj-$(CONFIG_SENSORS_IBM_CFFPS)  += ibm-cffps.o
>  obj-$(CONFIG_SENSORS_INSPUR_IPSPS) += inspur-ipsps.o
>  obj-$(CONFIG_SENSORS_IR35221)+= ir35221.o
> diff --git a/drivers/hwmon/pmbus/fsp-3y.c b/drivers/hwmon/pmbus/fsp-3y.c
> new file mode 100644
> index ..f03c4e27ec8c
> --- /dev/null
> +++ b/drivers/hwmon/pmbus/fsp-3y.c
> @@ -0,0 +1,236 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Hardware monitoring driver for FSP 3Y-Power PSUs
> + *
> + * Copyright (c) 2021 Václav Kubernát, CESNET
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "pmbus.h"
> +
> +#define YM2151_PAGE_12V_LOG  0x00
> +#define YM2151_PAGE_12V_REAL 0x00
> +#define YM2151_PAGE_5VSB_LOG 0x01
> +#define YM2151_PAGE_5VSB_REAL0x20
> +#define YH5151E_PAGE_12V_LOG 0x00
> +#define YH5151E_PAGE_12V_REAL0x00
> +#define YH5151E_PAGE_5V_LOG  0x01
> +#define YH5151E_PAGE_5V_REAL 0x10
> +#define YH5151E_PAGE_3V3_LOG 0x02
> +#define YH5151E_PAGE_3V3_REAL0x11
> +
> +enum chips {
> + ym2151e,
> + yh5151e
> +};
> +
> +struct fsp3y_data {
> + struct pmbus_driver_info info;
> + enum chips chip;
> + int page;
> +};
> +
> +#define to_fsp3y_data(x) container_of(x, struct fsp3y_data, info)
> +
> +static int page_log_to_page_real(int page_log, enum chips chip)
> +{
> + switch (chip) {
> + case

[PATCH v2 1/1] arm64: dts: mediatek: add MT6779 spi master dts node

2021-04-08 Thread Mason Zhang

This patch add address-cells && size-cells in spi node based on patch v1.

Signed-off-by: Mason Zhang 
---
 arch/arm64/boot/dts/mediatek/mt6779.dtsi | 112 +++
 1 file changed, 112 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt6779.dtsi 
b/arch/arm64/boot/dts/mediatek/mt6779.dtsi
index 370f309d32de..c81e76865d1b 100644
--- a/arch/arm64/boot/dts/mediatek/mt6779.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt6779.dtsi
@@ -219,6 +219,118 @@
status = "disabled";
};
 
+   spi0: spi0@1100a000 {
+   compatible = "mediatek,mt6779-spi",
+"mediatek,mt6765-spi";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   mediatek,pad-select = <0>;
+   reg = <0 0x1100a000 0 0x1000>;
+   interrupts = ;
+   clocks = < CLK_TOP_MAINPLL_D5_D2>,
+   < CLK_TOP_SPI>,
+   <_ao CLK_INFRA_SPI0>;
+   clock-names = "parent-clk", "sel-clk", "spi-clk";
+   };
+
+   spi1: spi1@1101 {
+   compatible = "mediatek,mt6779-spi",
+"mediatek,mt6765-spi";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   mediatek,pad-select = <0>;
+   reg = <0 0x1101 0 0x1000>;
+   interrupts = ;
+   clocks = < CLK_TOP_MAINPLL_D5_D2>,
+   < CLK_TOP_SPI>,
+   <_ao CLK_INFRA_SPI1>;
+   clock-names = "parent-clk", "sel-clk", "spi-clk";
+   };
+
+   spi2: spi2@11012000 {
+   compatible = "mediatek,mt6779-spi",
+"mediatek,mt6765-spi";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   mediatek,pad-select = <0>;
+   reg = <0 0x11012000 0 0x1000>;
+   interrupts = ;
+   clocks = < CLK_TOP_MAINPLL_D5_D2>,
+< CLK_TOP_SPI>,
+   <_ao CLK_INFRA_SPI2>;
+   clock-names = "parent-clk", "sel-clk", "spi-clk";
+   };
+
+   spi3: spi3@11013000 {
+   compatible = "mediatek,mt6779-spi",
+"mediatek,mt6765-spi";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   mediatek,pad-select = <0>;
+   reg = <0 0x11013000 0 0x1000>;
+   interrupts = ;
+   clocks = < CLK_TOP_MAINPLL_D5_D2>,
+< CLK_TOP_SPI>,
+<_ao CLK_INFRA_SPI3>;
+   clock-names = "parent-clk", "sel-clk", "spi-clk";
+   };
+
+   spi4: spi4@11018000 {
+   compatible = "mediatek,mt6779-spi",
+"mediatek,mt6765-spi";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   mediatek,pad-select = <0>;
+   reg = <0 0x11018000 0 0x1000>;
+   interrupts = ;
+   clocks = < CLK_TOP_MAINPLL_D5_D2>,
+< CLK_TOP_SPI>,
+<_ao CLK_INFRA_SPI4>;
+   clock-names = "parent-clk", "sel-clk", "spi-clk";
+   };
+
+   spi5: spi5@11019000 {
+   compatible = "mediatek,mt6779-spi",
+"mediatek,mt6765-spi";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   mediatek,pad-select = <0>;
+   reg = <0 0x11019000 0 0x1000>;
+   interrupts = ;
+   clocks = < CLK_TOP_MAINPLL_D5_D2>,
+   < CLK_TOP_SPI>,
+   <_ao CLK_INFRA_SPI5>;
+   clock-names = "parent-clk", "sel-clk", "spi-clk";
+   };
+
+   spi6: spi6@1101d000 {
+   compatible = "mediatek,mt6779-spi",
+"mediatek,mt6765-spi";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   mediatek,pad-select = <0>;
+   reg = <0 0x1101d000 0 0x1000>;
+   interrupts = ;
+   clocks = < CLK_TOP_MAINPLL_D5_D2>,
+< CLK_TOP_SPI>,
+<_ao CLK_INFRA_SPI6>;
+

Re: [PATCH v4 1/1] of: unittest: overlay: ensure proper alignment of copied FDT

2021-04-08 Thread Guenter Roeck

On 4/8/21 3:53 PM, Frank Rowand wrote:
> On 4/8/21 4:54 PM, Guenter Roeck wrote:
>> On 4/8/21 2:28 PM, Rob Herring wrote:
>>>
>>> Applying now so this gets into linux-next this week.
>>>
>> The patch doesn't apply on top of today's -next; it conflicts
>> with "of: properly check for error returned by fdt_get_name()".
>>
>> I reverted that patch and applied this one, and the DT unittests
>> run with it on openrisc. I do get a single test failure, but I that
>> is a different problem (possibly with the test case itself).
>>
>> ### dt-test ### FAIL of_unittest_dma_ranges_one():923 of_dma_get_range: 
>> wrong DMA addr 0x
>>  (expecting 1) on node 
>> /testcase-data/address-tests/bus@8000/device@1000
> 
> That is a known regression on the target that I use for testing (and
> has been since 5.10-rc1) - the 8074 dragonboard, arm 32.  No
> one else has reported it on the list, so even though I want to debug
> and fix it "promptly", other tasks have had higher priority.  In my
> notes I list two suspect commits:
> 
>   e0d072782c73 dma-mapping: introduce DMA range map, supplanting 
> dma_pfn_offset
>   0a0f0d8be76d dma-mapping: split 
> 
> I think that was purely based on looking at the list of commits that
> may have touched OF dma.  I have not done a bisect.
> 

Here you are:

# bad: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
# good: [bbf5c979011a099af5dc76498918ed7df445635b] Linux 5.9
git bisect start 'v5.10' 'v5.9'
# bad: [4d0e9df5e43dba52d38b251e3b909df8fa1110be] lib, uaccess: add failure 
injection to usercopy functions
git bisect bad 4d0e9df5e43dba52d38b251e3b909df8fa1110be
# good: [f888bdf9823c85fe945c4eb3ba353f749dec3856] Merge tag 
'devicetree-for-5.10' of 
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect good f888bdf9823c85fe945c4eb3ba353f749dec3856
# good: [640eee067d9aae0bb98d8706001976ff1affaf00] Merge tag 
'drm-misc-next-fixes-2020-10-13' of git://anongit.freedesktop.org/drm/drm-misc 
into drm-next
git bisect good 640eee067d9aae0bb98d8706001976ff1affaf00
# good: [c6dbef7307629cce855aa6b482b60cbfed88] Merge tag 'usb-5.10-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect good c6dbef7307629cce855aa6b482b60cbfed88
# good: [ce1558c285f9ad04c03b46833a028230771cc0a7] ALSA: hda/hdmi: fix 
incorrect locking in hdmi_pcm_close
git bisect good ce1558c285f9ad04c03b46833a028230771cc0a7
# good: [c48b75b7271db23c1b2d1204d6e8496d91f27711] Merge tag 'sound-5.10-rc1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good c48b75b7271db23c1b2d1204d6e8496d91f27711
# bad: [0cd7d9795fa82226e7516d38b474bddae8b1ff26] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
git bisect bad 0cd7d9795fa82226e7516d38b474bddae8b1ff26
# good: [b1839e7c2a42ccd9a0587c0092e880c7a213ee2a] dmaengine: xilinx: dpdma: 
convert tasklets to use new tasklet_setup() API
git bisect good b1839e7c2a42ccd9a0587c0092e880c7a213ee2a
# bad: [0de327969b61a245e3a47b60009eae73fe513cef] cma: decrease CMA_ALIGNMENT 
lower limit to 2
git bisect bad 0de327969b61a245e3a47b60009eae73fe513cef
# good: [6eb0233ec2d0df288fe8515d5b0b2b15562e05bb] usb: don't inherity DMA 
properties for USB devices
git bisect good 6eb0233ec2d0df288fe8515d5b0b2b15562e05bb
# bad: [48d15814dd0fc429e3205b87f1af6cc472018478] lib82596: move DMA allocation 
into the callers of i82596_probe
git bisect bad 48d15814dd0fc429e3205b87f1af6cc472018478
# bad: [eba304c6861613a649ba46cfab835b1eddeacd8e] dma-mapping: better document 
dma_addr_t and DMA_MAPPING_ERROR
git bisect bad eba304c6861613a649ba46cfab835b1eddeacd8e
# bad: [b9bb694b9f62f4b31652223ed3ca38cf98bbb370] iommu/io-pgtable-arm: Clean 
up faulty sanity check
git bisect bad b9bb694b9f62f4b31652223ed3ca38cf98bbb370
# bad: [a97740f81874c8063c12c24f34d25f10c4f5e9aa] dma-debug: convert comma to 
semicolon
git bisect bad a97740f81874c8063c12c24f34d25f10c4f5e9aa
# bad: [e0d072782c734d27f5af062c62266f2598f68542] dma-mapping: introduce DMA 
range map, supplanting dma_pfn_offset
git bisect bad e0d072782c734d27f5af062c62266f2598f68542
# first bad commit: [e0d072782c734d27f5af062c62266f2598f68542] dma-mapping: 
introduce DMA range map, supplanting dma_pfn_offset

Guenter

[RFC PATCH] usb: core: reduce power-on-good delay time of root hub

2021-04-08 Thread Chunfeng Yun

Return the exactly delay time given by root hub descriptor,
this helps to reduce resume time etc.

Due to the root hub descriptor is usually provided by the host
controller driver, if there is compatibility for a root hub,
we can fix it easily without affect other root hub

Signed-off-by: Chunfeng Yun 
---
 drivers/usb/core/hub.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h
index 73f4482d833a..22ea1f4f2d66 100644
--- a/drivers/usb/core/hub.h
+++ b/drivers/usb/core/hub.h
@@ -148,8 +148,10 @@ static inline unsigned hub_power_on_good_delay(struct 
usb_hub *hub)
 {
unsigned delay = hub->descriptor->bPwrOn2PwrGood * 2;
 
-   /* Wait at least 100 msec for power to become stable */
-   return max(delay, 100U);
+   if (!hub->hdev->parent) /* root hub */
+   return delay;
+   else /* Wait at least 100 msec for power to become stable */
+   return max(delay, 100U);
 }
 
 static inline int hub_port_debounce_be_connected(struct usb_hub *hub,
-- 
2.18.0

Re: [PATCH v2 01/24] x86/resctrl: Split struct rdt_resource

2021-04-08 Thread Reinette Chatre


Hi James,

On 4/8/2021 10:20 AM, James Morse wrote:

On 07/04/2021 00:42, Reinette Chatre wrote:

On 4/6/2021 10:13 AM, James Morse wrote:

On 31/03/2021 22:35, Reinette Chatre wrote:

On 3/12/2021 9:58 AM, James Morse wrote:

resctrl is the defacto Linux ABI for SoC resource partitioning features.
To support it on another architecture, it needs to be abstracted from
the features provided by Intel RDT and AMD PQoS, and moved to /fs/.

Start by splitting struct rdt_resource, (the name is kept to keep the noise
down), and add some type-trickery to keep the foreach helpers working.



Move everything that is particular to resctrl into a new header
file, keeping the x86 hardware accessors where they are. resctrl code
paths touching a 'hw' struct indicates where an abstraction is needed.


This establishes the significance of this patch. Here the rdt_resource struct 
is split up
and it is this split that guides the subsequent abstraction. Considering this I 
find that
this description does not explain the resulting split sufficiently.

Specifically, after reading the above summary I expect fs information in 
rdt_resource and
hw information in rdt_hw_resource but that does not seem to be the case. For 
example,
num_rmid is a property obtained from hardware but is found in rdt_resource 
while other
hardware properties initialized at the same time are found in rdt_hw_resource. 
It is
interesting to look at when the hardware is discovered (for example, functions 
like
cache_alloc_hsw_probe(), __get_mem_config_intel(), __rdt_get_mem_config_amd(),
rdt_get_cache_alloc_cfg()). Note how some of the discovered values end up in 
rdt_resource
and some in rdt_hw_resource.



I was expecting these properties discovered from hardware to
be in rdt_hw_resource.


Not all values discovered from the hardware are private to the architecture. 
They only
need to be private if there is some further abstraction involved.



ok, but rdt_hw_resource is described as "hw attributes of a resctrl resource" 
so this can
be very confusing if rdt_hw_resource does _not_ actually contain (all of) the hw
attributes of a resctrl resource.


Aha, right. I'm bad at naming things. This started as untangling the hardware 
(cough:
arch) specific bits, but some things have migrated back the other way.


It was the description that really tripped me. I'm ok with the current 
naming if the description is clear and usage consistent.




Do you think either of arch_rdt_resource or rdt_priv_resource are clearer?



Could you please expand the kernel doc for rdt_hw_resource to explain that, 
apart from
@resctrl (that I just noticed is missing a description),


I'll add one for mbm_width too,


it contains attributes needing
abstraction for different architectures as opposed to the actual hardware 
attributes?


|/**
| * struct rdt_hw_resource - arch private attributes of a resctrl resource
| * @resctrl:   Attributes of the resource used directly by resctrl.
| * @num_closid:Number of CLOSIDs available.
| * @msr_base:  Base MSR address for CBMs
| * @msr_update:Function pointer to update QOS MSRs
| * @mon_scale: cqm counter * mon_scale = occupancy in bytes
| * @mbm_width: Monitor width, to detect and correct for overflow.
| *
| * Members of this structure are either private to the architecture
| * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
| * msr_update and msr_base.
| */



As I commented in patch 7, where num_closid is stored in the schema, I 
think the descriptions can be improved to help understand the 
differences between the two num_closid instances. The two num_closid 
descriptions (found in struct resctrl_schema and  struct 
rdt_hw_resource) should be complimentary to help somebody have clear 
understanding of their difference. Currently they are mostly a copy of 
the same description not helping to understand what the difference is. 
Perhaps something here like "Number of CLOSIDs supported by 
hardware/architecture"?  Please feel free to improve.


The rest looks good, thank you.


On your specific example: the resctrl filesystem code allocates from num_rmid. 
Its meaning
doesn't change. num_closid on the other hand changes depending on whether CDP 
is in use.

Putting num_closid in resctrl's struct rdt_resource would work, but the value 
is wrong
once CDP is enabled. This would be annoying to debug, hiding the hardware value 
and
providing it via a helper avoids this, as by the end of the series there is 
only one
consumer: schemata_list_create().

For MPAM, the helper would return arm64's version of rdt_min_closid as there is 
only one
'num_closid' for the system, regardless of the resource. The driver has to 
duplicate the
logic in closid_init() to find the minimum common value of all the resources, 
as not all
the resources are exposed to resctrl, and an out-of-range closid value triggers 
an error
interrupt.



It is also not clear to me how these structures are intended

Re: [PATCH] x86/kvm: Don't alloc __pv_cpu_mask when !CONFIG_SMP

2021-04-08 Thread Sean Christopherson

On Wed, Apr 07, 2021, Wanpeng Li wrote:
> From: Wanpeng Li 
> 
> Enable PV TLB shootdown when !CONFIG_SMP doesn't make sense. Let's move 
> it inside CONFIG_SMP. In addition, we can avoid alloc __pv_cpu_mask when 
> !CONFIG_SMP and get rid of 'alloc' variable in kvm_alloc_cpumask.

...

> +static bool pv_tlb_flush_supported(void) { return false; }
> +static bool pv_ipi_supported(void) { return false; }
> +static void kvm_flush_tlb_others(const struct cpumask *cpumask,
> + const struct flush_tlb_info *info) { }
> +static void kvm_setup_pv_ipi(void) { }

If you shuffle things around a bit more, you can avoid these stubs, and hide the
definition of __pv_cpu_mask behind CONFIG_SMP, too.


diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5e78e01ca3b4..13c6b1c7c01b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -451,6 +451,8 @@ static void __init sev_map_percpu_data(void)
}
 }

+#ifdef CONFIG_SMP
+
 static bool pv_tlb_flush_supported(void)
 {
return (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) &&
@@ -460,8 +462,6 @@ static bool pv_tlb_flush_supported(void)

 static DEFINE_PER_CPU(cpumask_var_t, __pv_cpu_mask);

-#ifdef CONFIG_SMP
-
 static bool pv_ipi_supported(void)
 {
return kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI);
@@ -574,45 +574,6 @@ static void kvm_smp_send_call_func_ipi(const struct 
cpumask *mask)
}
 }

-static void __init kvm_smp_prepare_boot_cpu(void)
-{
-   /*
-* Map the per-cpu variables as decrypted before kvm_guest_cpu_init()
-* shares the guest physical address with the hypervisor.
-*/
-   sev_map_percpu_data();
-
-   kvm_guest_cpu_init();
-   native_smp_prepare_boot_cpu();
-   kvm_spinlock_init();
-}
-
-static void kvm_guest_cpu_offline(void)
-{
-   kvm_disable_steal_time();
-   if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
-   wrmsrl(MSR_KVM_PV_EOI_EN, 0);
-   kvm_pv_disable_apf();
-   apf_task_wake_all();
-}
-
-static int kvm_cpu_online(unsigned int cpu)
-{
-   local_irq_disable();
-   kvm_guest_cpu_init();
-   local_irq_enable();
-   return 0;
-}
-
-static int kvm_cpu_down_prepare(unsigned int cpu)
-{
-   local_irq_disable();
-   kvm_guest_cpu_offline();
-   local_irq_enable();
-   return 0;
-}
-#endif
-
 static void kvm_flush_tlb_others(const struct cpumask *cpumask,
const struct flush_tlb_info *info)
 {
@@ -639,6 +600,63 @@ static void kvm_flush_tlb_others(const struct cpumask 
*cpumask,
native_flush_tlb_others(flushmask, info);
 }

+static void __init kvm_smp_prepare_boot_cpu(void)
+{
+   /*
+* Map the per-cpu variables as decrypted before kvm_guest_cpu_init()
+* shares the guest physical address with the hypervisor.
+*/
+   sev_map_percpu_data();
+
+   kvm_guest_cpu_init();
+   native_smp_prepare_boot_cpu();
+   kvm_spinlock_init();
+}
+
+static void kvm_guest_cpu_offline(void)
+{
+   kvm_disable_steal_time();
+   if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
+   wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+   kvm_pv_disable_apf();
+   apf_task_wake_all();
+}
+
+static int kvm_cpu_online(unsigned int cpu)
+{
+   local_irq_disable();
+   kvm_guest_cpu_init();
+   local_irq_enable();
+   return 0;
+}
+
+static int kvm_cpu_down_prepare(unsigned int cpu)
+{
+   local_irq_disable();
+   kvm_guest_cpu_offline();
+   local_irq_enable();
+   return 0;
+}
+
+static __init int kvm_alloc_cpumask(void)
+{
+   int cpu;
+
+   if (!kvm_para_available() || nopv)
+   return 0;
+
+   if (pv_tlb_flush_supported() || pv_ipi_supported())
+   for_each_possible_cpu(cpu) {
+   zalloc_cpumask_var_node(per_cpu_ptr(&__pv_cpu_mask, 
cpu),
+   GFP_KERNEL, cpu_to_node(cpu));
+   }
+
+   return 0;
+}
+arch_initcall(kvm_alloc_cpumask);
+
+#endif
+
 static void __init kvm_guest_init(void)
 {
int i;
@@ -653,21 +671,21 @@ static void __init kvm_guest_init(void)
pv_ops.time.steal_clock = kvm_steal_clock;
}

+   if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
+   apic_set_eoi_write(kvm_guest_apic_eoi_write);
+
+   if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT) && kvmapf) {
+   static_branch_enable(_async_pf_enabled);
+   alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, 
asm_sysvec_kvm_asyncpf_interrupt);
+   }
+
+#ifdef CONFIG_SMP
if (pv_tlb_flush_supported()) {
pv_ops.mmu.flush_tlb_others = kvm_flush_tlb_others;
pv_ops.mmu.tlb_remove_table = tlb_remove_table;
pr_info("KVM setup pv remote TLB flush\n");
}

-   if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
-   apic_set_eoi_write(kvm_guest_apic_eoi_write);
-
-   if

Re: [PATCH v2 1/2] USB:ehci:Add a whitelist for EHCI controllers

2021-04-08 Thread kernel test robot

Hi Longfang,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on usb/usb-testing]
[also build test WARNING on v5.12-rc6 next-20210408]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Longfang-Liu/USB-ehci-fix-the-no-SRBN-register-problem/20210408-215249
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git 
usb-testing
config: riscv-randconfig-r025-20210408 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
56ea2e2fdd691136d5e6631fa0e447173694b82c)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/01b93fbbf8fb6137c7779062232c0fe8c1592940
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Longfang-Liu/USB-ehci-fix-the-no-SRBN-register-problem/20210408-215249
git checkout 01b93fbbf8fb6137c7779062232c0fe8c1592940
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/usb/host/ehci-pci.c:57:3: warning: incompatible pointer to integer 
>> conversion initializing 'u16' (aka 'unsigned short') with an expression of 
>> type 'void *' [-Wint-conversion]
   {NULL, NULL}
^~~~
   include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
   #define NULL ((void *)0)
^~~
   drivers/usb/host/ehci-pci.c:57:9: warning: incompatible pointer to integer 
conversion initializing 'u16' (aka 'unsigned short') with an expression of type 
'void *' [-Wint-conversion]
   {NULL, NULL}
  ^~~~
   include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
   #define NULL ((void *)0)
^~~
   2 warnings generated.


vim +57 drivers/usb/host/ehci-pci.c

49  
50  static const struct usb_nosbrn_whitelist_entry {
51  u16 vendor;
52  u16 device;
53  } usb_nosbrn_whitelist[] = {
54  /* STMICRO ConneXT has no sbrn register */
55  {PCI_VENDOR_ID_STMICRO, PCI_DEVICE_ID_STMICRO_USB_HOST},
56  /* End of list */
  > 57  {NULL, NULL}
58  };
59  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [RFC bpf-next 1/1] bpf: Introduce iter_pagecache

2021-04-08 Thread Daniel Xu

On Thu, Apr 08, 2021 at 10:19:35AM +0200, Christian Brauner wrote:
> On Wed, Apr 07, 2021 at 02:46:11PM -0700, Daniel Xu wrote:
> > This commit introduces the bpf page cache iterator. This iterator allows
> > users to run a bpf prog against each page in the "page cache".
> > Internally, the "page cache" is extremely tied to VFS superblock + inode
> > combo. Because of this, iter_pagecache will only examine pages in the
> > caller's mount namespace.
> > 
> > Signed-off-by: Daniel Xu 
> > ---
> >  kernel/bpf/Makefile |   2 +-
> >  kernel/bpf/pagecache_iter.c | 293 
> >  2 files changed, 294 insertions(+), 1 deletion(-)
> >  create mode 100644 kernel/bpf/pagecache_iter.c

<...>

> > 
> > +static int init_seq_pagecache(void *priv_data, struct bpf_iter_aux_info 
> > *aux)
> > +{
> > +   struct bpf_iter_seq_pagecache_info *info = priv_data;
> > +   struct radix_tree_iter iter;
> > +   struct super_block *sb;
> > +   struct mount *mnt;
> > +   void **slot;
> > +   int err;
> > +
> > +   info->ns = current->nsproxy->mnt_ns;
> > +   get_mnt_ns(info->ns);
> > +   INIT_RADIX_TREE(>superblocks, GFP_KERNEL);
> > +
> > +   spin_lock(>ns->ns_lock);
> > +   list_for_each_entry(mnt, >ns->list, mnt_list) {
> 
> Not just are there helpers for taking ns_lock
> static inline void lock_ns_list(struct mnt_namespace *ns)
> static inline void unlock_ns_list(struct mnt_namespace *ns)
> they are private to fs/namespace.c because it's the only place that
> should ever walk this list.

Thanks for the hints. Would it be acceptable to add some helpers to
fs/namespace.c to allow walking the list?

IIUC the only way to find a list of mounts is by looking at the mount
namespace. And walking each mount and looking at each `struct
super_node`'s inode's `struct address_space` seemed like the cleanest
way to walkthe page cache.

> This seems buggy: why is it ok here to only take ns_lock and not also
> namespace_sem like mnt_already_visible() and __is_local_mountpoint()
> or the relevant proc iterators? I might be missing something.

Thanks for the hints. I'll take a closer look at the locking. Most
probably I didn't get it right.

I should have also mentioned in the cover letter that I'm fairly sure I
messed up the locking somewhere.

> 
> > +   sb = mnt->mnt.mnt_sb;
> > +
> > +   /* The same mount may be mounted in multiple places */
> > +   if (radix_tree_lookup(>superblocks, (unsigned long)sb))
> > +   continue;
> > +
> > +   err = radix_tree_insert(>superblocks,
> > +   (unsigned long)sb, (void *)1);
> > +   if (err)
> > +   goto out;
> > +   }
> > +
> > +   radix_tree_for_each_slot(slot, >superblocks, , 0) {
> > +   sb = (struct super_block *)iter.index;
> > +   atomic_inc(>s_active);
> 
> It also isn't nice that you mess with sb->s_active directly.
> 
> Imho, this is poking around in a lot of fs/ specific stuff that other
> parts of the kernel should not care about or have access to.

Re above: do you think it'd be appropriate to add more helpers to fs/ ?

<...>

Thanks,
Daniel

[PATCH 2/2] pm: allow drivers to drop #ifdef and __maybe_unused from pm callbacks

2021-04-08 Thread Masahiro Yamada

Drivers typically surround suspend and resume callbacks with #ifdef
CONFIG_PM(_SLEEP) or mark them as __maybe_unused in order to avoid
-Wunused-const-variable warnings.

With this commit, drivers will be able to remove #ifdef CONFIG_PM(_SLEEP)
and __maybe_unsed because unused functions are dropped by the compiler
instead of the preprocessor.

Signed-off-by: Masahiro Yamada 
---

 include/linux/pm.h | 67 +-
 1 file changed, 24 insertions(+), 43 deletions(-)

diff --git a/include/linux/pm.h b/include/linux/pm.h
index 482313a8ccfc..ca764566692a 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -301,50 +301,37 @@ struct dev_pm_ops {
int (*runtime_idle)(struct device *dev);
 };
 
-#ifdef CONFIG_PM_SLEEP
+#define pm_ptr(_ptr)   PTR_IF(IS_ENABLED(CONFIG_PM), _ptr)
+#define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), _ptr)
+
 #define SET_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \
-   .suspend = suspend_fn, \
-   .resume = resume_fn, \
-   .freeze = suspend_fn, \
-   .thaw = resume_fn, \
-   .poweroff = suspend_fn, \
-   .restore = resume_fn,
-#else
-#define SET_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
-#endif
+   .suspend  = pm_sleep_ptr(suspend_fn), \
+   .resume   = pm_sleep_ptr(resume_fn), \
+   .freeze   = pm_sleep_ptr(suspend_fn), \
+   .thaw = pm_sleep_ptr(resume_fn), \
+   .poweroff = pm_sleep_ptr(suspend_fn), \
+   .restore  = pm_sleep_ptr(resume_fn),
 
-#ifdef CONFIG_PM_SLEEP
 #define SET_LATE_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \
-   .suspend_late = suspend_fn, \
-   .resume_early = resume_fn, \
-   .freeze_late = suspend_fn, \
-   .thaw_early = resume_fn, \
-   .poweroff_late = suspend_fn, \
-   .restore_early = resume_fn,
-#else
-#define SET_LATE_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
-#endif
+   .suspend_late  = pm_sleep_ptr(suspend_fn), \
+   .resume_early  = pm_sleep_ptr(resume_fn), \
+   .freeze_late   = pm_sleep_ptr(suspend_fn), \
+   .thaw_early= pm_sleep_ptr(resume_fn), \
+   .poweroff_late = pm_sleep_ptr(suspend_fn), \
+   .restore_early = pm_sleep_ptr(resume_fn),
 
-#ifdef CONFIG_PM_SLEEP
 #define SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \
-   .suspend_noirq = suspend_fn, \
-   .resume_noirq = resume_fn, \
-   .freeze_noirq = suspend_fn, \
-   .thaw_noirq = resume_fn, \
-   .poweroff_noirq = suspend_fn, \
-   .restore_noirq = resume_fn,
-#else
-#define SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
-#endif
+   .suspend_noirq  = pm_sleep_ptr(suspend_fn), \
+   .resume_noirq   = pm_sleep_ptr(resume_fn), \
+   .freeze_noirq   = pm_sleep_ptr(suspend_fn), \
+   .thaw_noirq = pm_sleep_ptr(resume_fn), \
+   .poweroff_noirq = pm_sleep_ptr(suspend_fn), \
+   .restore_noirq  = pm_sleep_ptr(resume_fn),
 
-#ifdef CONFIG_PM
 #define SET_RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \
-   .runtime_suspend = suspend_fn, \
-   .runtime_resume = resume_fn, \
-   .runtime_idle = idle_fn,
-#else
-#define SET_RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn)
-#endif
+   .runtime_suspend = pm_ptr(suspend_fn), \
+   .runtime_resume  = pm_ptr(resume_fn), \
+   .runtime_idle= pm_ptr(idle_fn),
 
 /*
  * Use this if you want to use the same suspend and resume callbacks for 
suspend
@@ -374,12 +361,6 @@ const struct dev_pm_ops __maybe_unused name = { \
SET_RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \
 }
 
-#ifdef CONFIG_PM
-#define pm_ptr(_ptr) (_ptr)
-#else
-#define pm_ptr(_ptr) NULL
-#endif
-
 /*
  * PM_EVENT_ messages
  *
-- 
2.27.0

[PATCH 0/2] linux/kconfig.h: move IF_ENABLED() out of

2021-04-08 Thread Masahiro Yamada



I insist on  having only minimal set of macros
that are needed to evaluate CONFIG options.

Everytime somebody added an alien to , I needed to
kick it out.

I did not notice 1b399bb04837183cecdc1b32ef1cfc7fcfa75d32 because
I was not addressed by [1].

[1]: https://lore.kernel.org/lkml/?q=kconfig.h%3A+Add+IF_ENABLED%28%29+macro

I like Paul's idea, but if I had noticed the patch in time, I would
have tried my best to persuade to implement it outside of 
(Paul's initial patch was adding it to a new header instead of 
)

Before it is widely used, I want to fix it.

In 2/2, I converted pm.h to allow driver cleanups.



Masahiro Yamada (2):
  linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in

  pm: allow drivers to drop #ifdef and __maybe_unused from pm callbacks

 drivers/pinctrl/pinctrl-ingenic.c | 20 -
 include/linux/kconfig.h   |  6 ---
 include/linux/kernel.h|  2 +
 include/linux/pm.h| 67 +++
 4 files changed, 36 insertions(+), 59 deletions(-)

-- 
2.27.0

[PATCH 1/2] linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in

2021-04-08 Thread Masahiro Yamada

 is included from all the kernel-space source files,
including C, assembly, linker scripts. It is intended to contain minimal
set of macros to evaluate CONFIG options.

IF_ENABLED() is an intruder here because (x ? y : z) is C code, which
should not be included from assembly files or linker scripts.

Also,  is no longer self-contained because NULL is
defined in .

Move IF_ENABLED() out to  as PTR_IF().

PTR_IF(IS_ENABLED(CONFIG_FOO), ...) is slightly longer than
IF_ENABLED(CONFIG_FOO, ...), but it is not a big deal because
sub-systems often define dedicated macros such as of_match_ptr(),
pm_ptr() etc. for common use-cases.

Signed-off-by: Masahiro Yamada 
---

 drivers/pinctrl/pinctrl-ingenic.c | 20 ++--
 include/linux/kconfig.h   |  6 --
 include/linux/kernel.h|  2 ++
 3 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/pinctrl/pinctrl-ingenic.c 
b/drivers/pinctrl/pinctrl-ingenic.c
index f2746125b077..b21e2ae4528d 100644
--- a/drivers/pinctrl/pinctrl-ingenic.c
+++ b/drivers/pinctrl/pinctrl-ingenic.c
@@ -2496,43 +2496,43 @@ static int __init ingenic_pinctrl_probe(struct 
platform_device *pdev)
 static const struct of_device_id ingenic_pinctrl_of_match[] = {
{
.compatible = "ingenic,jz4740-pinctrl",
-   .data = IF_ENABLED(CONFIG_MACH_JZ4740, _chip_info)
+   .data = PTR_IF(IS_ENABLED(CONFIG_MACH_JZ4740), 
_chip_info)
},
{
.compatible = "ingenic,jz4725b-pinctrl",
-   .data = IF_ENABLED(CONFIG_MACH_JZ4725B, _chip_info)
+   .data = PTR_IF(IS_ENABLED(CONFIG_MACH_JZ4725B), 
_chip_info)
},
{
.compatible = "ingenic,jz4760-pinctrl",
-   .data = IF_ENABLED(CONFIG_MACH_JZ4760, _chip_info)
+   .data = PTR_IF(IS_ENABLED(CONFIG_MACH_JZ4760), 
_chip_info)
},
{
.compatible = "ingenic,jz4760b-pinctrl",
-   .data = IF_ENABLED(CONFIG_MACH_JZ4760, _chip_info)
+   .data = PTR_IF(IS_ENABLED(CONFIG_MACH_JZ4760), 
_chip_info)
},
{
.compatible = "ingenic,jz4770-pinctrl",
-   .data = IF_ENABLED(CONFIG_MACH_JZ4770, _chip_info)
+   .data = PTR_IF(IS_ENABLED(CONFIG_MACH_JZ4770), 
_chip_info)
},
{
.compatible = "ingenic,jz4780-pinctrl",
-   .data = IF_ENABLED(CONFIG_MACH_JZ4780, _chip_info)
+   .data = PTR_IF(IS_ENABLED(CONFIG_MACH_JZ4780), 
_chip_info)
},
{
.compatible = "ingenic,x1000-pinctrl",
-   .data = IF_ENABLED(CONFIG_MACH_X1000, _chip_info)
+   .data = PTR_IF(IS_ENABLED(CONFIG_MACH_X1000), _chip_info)
},
{
.compatible = "ingenic,x1000e-pinctrl",
-   .data = IF_ENABLED(CONFIG_MACH_X1000, _chip_info)
+   .data = PTR_IF(IS_ENABLED(CONFIG_MACH_X1000), _chip_info)
},
{
.compatible = "ingenic,x1500-pinctrl",
-   .data = IF_ENABLED(CONFIG_MACH_X1500, _chip_info)
+   .data = PTR_IF(IS_ENABLED(CONFIG_MACH_X1500), _chip_info)
},
{
.compatible = "ingenic,x1830-pinctrl",
-   .data = IF_ENABLED(CONFIG_MACH_X1830, _chip_info)
+   .data = PTR_IF(IS_ENABLED(CONFIG_MACH_X1830), _chip_info)
},
{ /* sentinel */ },
 };
diff --git a/include/linux/kconfig.h b/include/linux/kconfig.h
index 24a59cb06963..cc8fa109cfa3 100644
--- a/include/linux/kconfig.h
+++ b/include/linux/kconfig.h
@@ -70,10 +70,4 @@
  */
 #define IS_ENABLED(option) __or(IS_BUILTIN(option), IS_MODULE(option))
 
-/*
- * IF_ENABLED(CONFIG_FOO, ptr) evaluates to (ptr) if CONFIG_FOO is set to 'y'
- * or 'm', NULL otherwise.
- */
-#define IF_ENABLED(option, ptr) (IS_ENABLED(option) ? (ptr) : NULL)
-
 #endif /* __LINUX_KCONFIG_H */
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5b7ed6dc99ac..8685ca4cf287 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -38,6 +38,8 @@
 #define PTR_ALIGN_DOWN(p, a)   ((typeof(p))ALIGN_DOWN((unsigned long)(p), (a)))
 #define IS_ALIGNED(x, a)   (((x) & ((typeof(x))(a) - 1)) == 0)
 
+#define PTR_IF(cond, ptr)  ((cond) ? (ptr) : NULL)
+
 /* generic data direction definitions */
 #define READ   0
 #define WRITE  1
-- 
2.27.0

Re: [RFC bpf-next 1/1] bpf: Introduce iter_pagecache

2021-04-08 Thread Matthew Wilcox

On Thu, Apr 08, 2021 at 12:48:49PM -0700, Daniel Xu wrote:
> No reason other than I didn't know about the latter. Thanks for the
> hint. find_get_entries() seems to return a pagevec of entries which
> would complicate the iteration (a 4th layer of things to iterate over).
> 
> But I did find find_get_pages_range() which I think can be used to find
> 1 page at a time. I'll look into it further.

Please don't, that's going to be a pagevec too.

> > I'm not really keen on the idea of random BPF programs being able to poke
> > at pages in the page cache like this.  From your initial description,
> > it sounded like all you needed was a list of which pages are present.
> 
> Could you elaborate on what "list of which pages are present" implies?
> The overall goal with this patch is to detect duplicate content in the
> page cache. So anything that helps achieve that goal I would (in theory)
> be OK with.
> 
> My understanding is the user would need to hash the contents
> of each page in the page cache. And BPF provides the flexibility such
> that this work could be reused for currently unanticipated use cases.

But if you need the contents, then you'll need to kmap() the pages.
I don't see people being keen on exposing kmap() to bpf either.  I think
you're much better off providing an interface that returns a hash of
each page to the BPF program.

> Furthermore, bpf programs could already look at all the pages in the
> page cache by hooking into tracepoint:filemap:mm_filemap_add_to_page_cache,
> albeit at a much slower rate. I figure the downside of adding this
> page cache iterator is we're explicitly condoning the behavior.

That should never have been exposed.  It's only supposed to be for error
injection.  If people have started actually using it for something,
then it's time we delete that tracepoint.

> The idea behind the radix tree was to deduplicate the mounts by
> superblock. Because a single filesystem may be mounted in different
> locations. I didn't find a set data structure I could reuse so I
> figured radix tree / xarray would work too.
> 
> Happy to take any better ideas too.
> 
> > If you don't understand why this is so bad, call xa_dump() on it after
> > constructing it.  I'll wait.
> 
> I did a dump and got the following results: http://ix.io/2VpY .
> 
> I receieved a hint that you may be referring to how the xarray/radix
> tree would be as large as the largest pointer. To my uneducated eye it
> doesn't look like that's the case in this dump. Could you please
> clarify?

We get seven nodes per 4kB page.

$ grep -c 'value 0' 2VpY 
15
$ grep -c node 2VpY 
43

so we use 6+1/7 pages in order to store 15 values.  That's 387 cache
lines, for the amount of data that could fit in two.

Liam and I are working on a data structure that would support doing
something along these lines in an efficient manner, but it's not
ready yet.

Re: [PATCH 2/2] pm: allow drivers to drop #ifdef and __maybe_unused from pm callbacks

2021-04-08 Thread Arnd Bergmann

On Thu, Apr 8, 2021 at 11:00 PM Masahiro Yamada  wrote:
>
> Drivers typically surround suspend and resume callbacks with #ifdef
> CONFIG_PM(_SLEEP) or mark them as __maybe_unused in order to avoid
> -Wunused-const-variable warnings.
>
> With this commit, drivers will be able to remove #ifdef CONFIG_PM(_SLEEP)
> and __maybe_unsed because unused functions are dropped by the compiler
> instead of the preprocessor.
>
> Signed-off-by: Masahiro Yamada 

I tried this before and could not get it to work right.

>
> -#ifdef CONFIG_PM_SLEEP
> +#define pm_ptr(_ptr)   PTR_IF(IS_ENABLED(CONFIG_PM), _ptr)
> +#define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), _ptr)
> +
>  #define SET_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \
> -   .suspend = suspend_fn, \
> -   .resume = resume_fn, \
> -   .freeze = suspend_fn, \
> -   .thaw = resume_fn, \
> -   .poweroff = suspend_fn, \
> -   .restore = resume_fn,
> -#else
> -#define SET_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
> -#endif
> +   .suspend  = pm_sleep_ptr(suspend_fn), \
> +   .resume   = pm_sleep_ptr(resume_fn), \
> +   .freeze   = pm_sleep_ptr(suspend_fn), \
> +   .thaw = pm_sleep_ptr(resume_fn), \
> +   .poweroff = pm_sleep_ptr(suspend_fn), \
> +   .restore  = pm_sleep_ptr(resume_fn),

The problem that I think you inevitably hit is that you run into a missing
declaration for any driver that still uses an #ifdef around a static
function.

The only way I can see us doing this is to create a new set of
macros that behave like the version you propose here but leave
the old macros in place until the last such #ifdef has been removed.

   Arnd

Re: [PATCH 01/10] mm/numa: node demotion data structure and lookup

2021-04-08 Thread Dave Hansen

On 4/8/21 1:03 AM, Oscar Salvador wrote:
> I think this patch and patch#2 could be squashed
> 
> Reviewed-by: Oscar Salvador 

Yeah, that makes a lot of sense.  I'll do that for the next version.

Re: [Outreachy kernel] [PATCH 2/2] media: zoran: replace bit shifts by BIT() macro

2021-04-08 Thread Mitali Borkar

On Thu, Apr 08, 2021 at 11:15:07PM +0200, Julia Lawall wrote:
> 
> 
> On Fri, 9 Apr 2021, Mitali Borkar wrote:
> 
> > Added #include  and replaced bit shifts by BIT() macro.
> > This BIT() macro from linux/bitops.h is used to define ZR36057_VFESPFR_* 
> > bitmasks.
> > Use of macro is better and neater. It maintains consistency.
> > Reported by checkpatch.
> >
> > Signed-off-by: Mitali Borkar 
> > ---
> >  drivers/staging/media/zoran/zr36057.h | 10 ++
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/staging/media/zoran/zr36057.h 
> > b/drivers/staging/media/zoran/zr36057.h
> > index a2a75fd9f535..93075459f910 100644
> > --- a/drivers/staging/media/zoran/zr36057.h
> > +++ b/drivers/staging/media/zoran/zr36057.h
> > @@ -8,6 +8,8 @@
> >  #ifndef _ZR36057_H_
> >  #define _ZR36057_H_
> >
> > +#include 
> > +
> >  /* Zoran ZR36057 registers */
> >
> >  #define ZR36057_VFEHCR  0x000  /* Video Front End, Horizontal 
> > Configuration Register */
> > @@ -31,12 +33,12 @@
> >  #define ZR36057_VFESPFR_VER_DCM  8
> >  #define ZR36057_VFESPFR_DISP_MODE6
> >  #define ZR36057_VFESPFR_YUV422  (0 << 3)
> > -#define ZR36057_VFESPFR_RGB888  (1 << 3)
> > +#define ZR36057_VFESPFR_RGB888  BIT(3)
> 
> Uniformity is generally considered to be more important than using BIT.
> Having only a few constants defined using BIT is a bit strange.
>
Okay Ma'am. Can you please tell me on how to proceed now? I am not sure
how to proceed.

> julia
> 
> >  #define ZR36057_VFESPFR_RGB565  (2 << 3)
> >  #define ZR36057_VFESPFR_RGB555  (3 << 3)
> > -#define ZR36057_VFESPFR_ERR_DIF  (1 << 2)
> > -#define ZR36057_VFESPFR_PACK24  (1 << 1)
> > -#define ZR36057_VFESPFR_LITTLE_ENDIAN(1 << 0)
> > +#define ZR36057_VFESPFR_ERR_DIF  BIT(2)
> > +#define ZR36057_VFESPFR_PACK24  BIT(1)
> > +#define ZR36057_VFESPFR_LITTLE_ENDIANBIT(0)
> >
> >  #define ZR36057_VDTR0x00c  /* Video Display "Top" Register 
> > */
> >
> > --
> > 2.30.2
> >
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "outreachy-kernel" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to outreachy-kernel+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/outreachy-kernel/ac8ec2b70ac2cc7c541c05a1d9a8db1fe79df793.1617912177.git.mitaliborkar810%40gmail.com.
> >

Re: [PATCH v1 2/2] drivers/gpu/drm: don't select DMA_CMA or CMA from aspeed or etnaviv

2021-04-08 Thread Linus Walleij

On Thu, Apr 8, 2021 at 6:44 PM David Hildenbrand  wrote:

> > drivers/gpu/drm/mcde/Kconfig
> > drivers/gpu/drm/pl111/Kconfig
> > drivers/gpu/drm/tve200/Kconfig
>
> I was assuming these are "real" dependencies. Will it also work without
> DMA_CMA?

It will mostly work but that is only because the reservations are
mostly contiguous anyway because they are done early and
are small. The hardware requires contiguous buffers in all
three cases. I'm not sure I always got it right.

> > certainly needs this as well, and pretty much anything that is
> > selecting DRM_KMS_CMA_HELPER or
> > DRM_GEM_CMA_HELPER "wants" DMA_CMA.
>
> "wants" as in "desires to use but can life without" or "wants" as in
> "really needs it". ?

I don't know the exact semantics of using DRM_KMS_CMA*
without actually using DMA_CMA. I suspect small allocations
will be contiguous and big allocations will start to fragment?
but it's just my guess. I guess "really need it"?

Yours,
Linus Walleij

[PATCH 1/5] srcu: Unconditionally embed struct lockdep_map

2021-04-08 Thread Frederic Weisbecker

struct lockdep_map is empty when CONFIG_DEBUG_LOCK_ALLOC=n. We can
remove the ifdeffery while adding the lockdep map in struct srcu_struct
without risking consuming space in the off-case. This will also simplify
further manipulations on this field.

Signed-off-by: Frederic Weisbecker 
Cc: Uladzislau Rezki 
Cc: Boqun Feng 
Cc: Lai Jiangshan 
Cc: Neeraj Upadhyay 
Cc: Josh Triplett 
Cc: Joel Fernandes 
---
 include/linux/srcutree.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index 9cfcc8a756ae..cb1f4351e8ba 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -82,9 +82,7 @@ struct srcu_struct {
/*  callback for the barrier */
/*  operation. */
struct delayed_work work;
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
-#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 };
 
 /* Values for state variable (bottom bits of ->srcu_gp_seq). */
-- 
2.25.1

[PATCH 2/5] srcu: Initialize SRCU after timers

2021-04-08 Thread Frederic Weisbecker

Once srcu_init() is called, the SRCU core is free to queue delayed
workqueues, which rely on timers. However init_timers() is called
several steps after rcu_init(). Any call_srcu() in-between would finish
its course inside a dangerously uninitialized timer core.

Make sure we stay in early SRCU mode until everything is well settled.

Signed-off-by: Frederic Weisbecker 
Cc: Uladzislau Rezki 
Cc: Boqun Feng 
Cc: Lai Jiangshan 
Cc: Neeraj Upadhyay 
Cc: Josh Triplett 
Cc: Joel Fernandes 
---
 include/linux/srcu.h  | 6 ++
 init/main.c   | 2 ++
 kernel/rcu/rcu.h  | 6 --
 kernel/rcu/srcutree.c | 5 +
 kernel/rcu/tiny.c | 1 -
 kernel/rcu/tree.c | 1 -
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index a0895bbf71ce..e6011a9975af 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -64,6 +64,12 @@ unsigned long get_state_synchronize_srcu(struct srcu_struct 
*ssp);
 unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
 bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long 
cookie);
 
+#ifdef CONFIG_SRCU
+void srcu_init(void);
+#else /* #ifdef CONFIG_SRCU */
+static inline void srcu_init(void) { }
+#endif /* #else #ifdef CONFIG_SRCU */
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 
 /**
diff --git a/init/main.c b/init/main.c
index 53b278845b88..1bc5cc9e52ef 100644
--- a/init/main.c
+++ b/init/main.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -956,6 +957,7 @@ asmlinkage __visible void __init __no_sanitize_address 
start_kernel(void)
tick_init();
rcu_init_nohz();
init_timers();
+   srcu_init();
hrtimers_init();
softirq_init();
timekeeping_init();
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index d64b842f4078..b3af34068051 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -422,12 +422,6 @@ do {   
\
 
 #endif /* #if defined(CONFIG_SRCU) || !defined(CONFIG_TINY_RCU) */
 
-#ifdef CONFIG_SRCU
-void srcu_init(void);
-#else /* #ifdef CONFIG_SRCU */
-static inline void srcu_init(void) { }
-#endif /* #else #ifdef CONFIG_SRCU */
-
 #ifdef CONFIG_TINY_RCU
 /* Tiny RCU doesn't expedite, as its purpose in life is instead to be tiny. */
 static inline bool rcu_gp_is_normal(void) { return true; }
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 10e681ea7051..108f9ca06047 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1384,6 +1384,11 @@ void __init srcu_init(void)
 {
struct srcu_struct *ssp;
 
+   /*
+* Once that is set, call_srcu() can follow the normal path and
+* queue delayed work. This must follow RCU workqueues creation
+* and timers initialization.
+*/
srcu_init_done = true;
while (!list_empty(_boot_list)) {
ssp = list_first_entry(_boot_list, struct srcu_struct,
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index c8a029fbb114..340b3f8b090d 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -221,5 +221,4 @@ void __init rcu_init(void)
 {
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
rcu_early_boot_tests();
-   srcu_init();
 }
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5c214705c33f..740f5cd34459 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4714,7 +4714,6 @@ void __init rcu_init(void)
WARN_ON(!rcu_gp_wq);
rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
WARN_ON(!rcu_par_gp_wq);
-   srcu_init();
 
/* Fill in default value for rcutree.qovld boot parameter. */
/* -After- the rcu_node ->lock fields are initialized! */
-- 
2.25.1

[PATCH 3/5] srcu: Fix broken node geometry after early ssp init

2021-04-08 Thread Frederic Weisbecker

An ssp initialized before rcu_init_geometry() will have its snp hierarchy
based on CONFIG_NR_CPUS.

Once rcu_init_geometry() is called, the nodes distribution is shrinked
and optimized toward meeting the actual possible number of CPUs detected
on boot.

Later on, the ssp that was initialized before rcu_init_geometry() is
confused and sometimes refers to its initial CONFIG_NR_CPUS based node
hierarchy, sometimes to the new num_possible_cpus() based one instead.
For example each of its sdp->mynode remain backward and refer to the
early node leaves that may not exist anymore. On the other hand the
srcu_for_each_node_breadth_first() refers to the new node hierarchy.

There are at least two bad possible outcomes to this:

1) a) A callback enqueued early on an sdp is recorded pending on
  sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with
  sdp->mynode pointing to a deep leaf (say 3 levels).

   b) The grace period ends after rcu_init_geometry() which shrinks the
  nodes level to a single one. srcu_gp_end() walks through the new
  snp hierarchy without ever reaching the old leaves so the callback
  is never executed.

   This is easily reproduced on an 8 CPUs machine with
   CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The
   srcu_barrier() after early tests verification never completes and
   the boot hangs:

[ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 
seconds.
[ 5413.147564]   Not tainted 5.12.0-rc4+ #28
[ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 5413.159753] task:swapper/0   state:D stack:0 pid:1 ppid: 
0 flags:0x4000
[ 5413.168099] Call Trace:
[ 5413.170555]  __schedule+0x36c/0x930
[ 5413.174057]  ? wait_for_completion+0x88/0x110
[ 5413.178423]  schedule+0x46/0xf0
[ 5413.181575]  schedule_timeout+0x284/0x380
[ 5413.185591]  ? wait_for_completion+0x88/0x110
[ 5413.189957]  ? mark_held_locks+0x61/0x80
[ 5413.193882]  ? mark_held_locks+0x61/0x80
[ 5413.197809]  ? _raw_spin_unlock_irq+0x24/0x50
[ 5413.202173]  ? wait_for_completion+0x88/0x110
[ 5413.206535]  wait_for_completion+0xb4/0x110
[ 5413.210724]  ? srcu_torture_stats_print+0x110/0x110
[ 5413.215610]  srcu_barrier+0x187/0x200
[ 5413.219277]  ? rcu_tasks_verify_self_tests+0x50/0x50
[ 5413.224244]  ? rdinit_setup+0x2b/0x2b
[ 5413.227907]  rcu_verify_early_boot_tests+0x2d/0x40
[ 5413.232700]  do_one_initcall+0x63/0x310
[ 5413.236541]  ? rdinit_setup+0x2b/0x2b
[ 5413.240207]  ? rcu_read_lock_sched_held+0x52/0x80
[ 5413.244912]  kernel_init_freeable+0x253/0x28f
[ 5413.249273]  ? rest_init+0x250/0x250
[ 5413.252846]  kernel_init+0xa/0x110
[ 5413.256257]  ret_from_fork+0x22/0x30

2) An ssp that gets initialized before rcu_init_geometry() and used
   afterward will always have stale rdp->mynode references, resulting in
   callbacks to be missed in srcu_gp_end(), just like in the previous
   scenario.

Solve this once rcu_init_geometry() is done with resetting the whole
state and node tree layout for each early initialized ssp's. Queued
callbacks are saved then requeued once the ssp reset is done.

Suggested-by: Paul E . McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Boqun Feng 
Cc: Lai Jiangshan 
Cc: Neeraj Upadhyay 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Uladzislau Rezki 
---
 include/linux/srcutree.h |  1 +
 kernel/rcu/srcutree.c| 77 ++--
 2 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index cb1f4351e8ba..a2422c442470 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -83,6 +83,7 @@ struct srcu_struct {
/*  operation. */
struct delayed_work work;
struct lockdep_map dep_map;
+   struct list_head early_init;
 };
 
 /* Values for state variable (bottom bits of ->srcu_gp_seq). */
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 108f9ca06047..7ca1bd0067c4 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -39,7 +39,7 @@ static ulong counter_wrap_check = (ULONG_MAX >> 2);
 module_param(counter_wrap_check, ulong, 0444);
 
 /* Early-boot callback-management, so early that no lock is required! */
-static LIST_HEAD(srcu_boot_list);
+static LIST_HEAD(srcu_early_init_list);
 static bool __read_mostly srcu_init_done;
 
 static void srcu_invoke_callbacks(struct work_struct *work);
@@ -174,10 +174,61 @@ static int init_srcu_struct_fields(struct srcu_struct 
*ssp, bool is_static)
init_srcu_struct_nodes(ssp);
ssp->srcu_gp_seq_needed_exp = 0;
ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
+   if (!srcu_init_done)
+   list_add_tail(>early_init,

Re: [PATCH v4 1/1] of: unittest: overlay: ensure proper alignment of copied FDT

2021-04-08 Thread Frank Rowand

On 4/8/21 4:28 PM, Rob Herring wrote:
> On Thu, Apr 8, 2021 at 3:45 PM  wrote:
>>
>> From: Frank Rowand 
>>
>> The Devicetree standard specifies an 8 byte alignment of the FDT.
>> Code in libfdt expects this alignment for an FDT image in memory.
>> kmemdup() returns 4 byte alignment on openrisc.  Replace kmemdup()
>> with kmalloc(), align pointer, memcpy() to get proper alignment.
>>
>> The 4 byte alignment exposed a related bug which triggered a crash
>> on openrisc with:
>> commit 79edff12060f ("scripts/dtc: Update to upstream version 
>> v1.6.0-51-g183df9e9c2b9")
>> as reported in:
>> https://lore.kernel.org/lkml/20210327224116.69309-1-li...@roeck-us.net/
>>
>> Reported-by: Guenter Roeck 
>> Signed-off-by: Frank Rowand 
>>
>> ---
>>
>> changes since version 1:
>>   - use pointer from kmalloc() for kfree() instead of using pointer that
>> has been modified for FDT alignment
>>
>> changes since version 2:
>>   - version 1 was a work in progress version, I failed to commit the 
>> following
>> final changes
>>   - reorder first two arguments of of_overlay_apply()
>>
>> changes since version 3:
>>   - size of memory allocation and size of copy after pointer alignment
>> differ, use separate variables with correct values for each case
>>   - edit comment to more clearly describe that ovcs->fdt is the allocated
>> memory region, which may be different than where the aligned pointer 
>> points
>>   - remove unused parameter from of_overlay_apply()
>>
>>  drivers/of/of_private.h |  2 ++
>>  drivers/of/overlay.c| 27 +--
>>  drivers/of/unittest.c   | 13 ++---
>>  3 files changed, 29 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
>> index d9e6a324de0a..d717efbd637d 100644
>> --- a/drivers/of/of_private.h
>> +++ b/drivers/of/of_private.h
>> @@ -8,6 +8,8 @@
>>   * Copyright (C) 1996-2005 Paul Mackerras.
>>   */
>>
>> +#define FDT_ALIGN_SIZE 8
>> +
>>  /**
>>   * struct alias_prop - Alias property in 'aliases' node
>>   * @link:  List node to link the structure in aliases_lookup list
>> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
>> index 50bbe0edf538..ecf967c57900 100644
>> --- a/drivers/of/overlay.c
>> +++ b/drivers/of/overlay.c
>> @@ -57,7 +57,7 @@ struct fragment {
>>   * struct overlay_changeset
>>   * @id:changeset identifier
>>   * @ovcs_list: list on which we are located
>> - * @fdt:   FDT that was unflattened to create @overlay_tree
>> + * @fdt:   base of memory allocated to hold aligned FDT that 
>> was unflattened to create @overlay_tree
>>   * @overlay_tree:  expanded device tree that contains the fragment nodes
>>   * @count: count of fragment structures
>>   * @fragments: fragment nodes in the overlay expanded device tree
>> @@ -719,8 +719,8 @@ static struct device_node *find_target(struct 
>> device_node *info_node)
>>  /**
>>   * init_overlay_changeset() - initialize overlay changeset from overlay tree
>>   * @ovcs:  Overlay changeset to build
>> - * @fdt:   the FDT that was unflattened to create @tree
>> - * @tree:  Contains all the overlay fragments and overlay fixup nodes
>> + * @fdt:   base of memory allocated to hold aligned FDT that was 
>> unflattened to create @tree
>> + * @tree:  Contains the overlay fragments and overlay fixup nodes
>>   *
>>   * Initialize @ovcs.  Populate @ovcs->fragments with node information from
>>   * the top level of @tree.  The relevant top level nodes are the fragment
>> @@ -873,7 +873,7 @@ static void free_overlay_changeset(struct 
>> overlay_changeset *ovcs)
>>   * internal documentation
>>   *
>>   * of_overlay_apply() - Create and apply an overlay changeset
>> - * @fdt:   the FDT that was unflattened to create @tree
>> + * @fdt:   base of memory allocated to hold the aligned FDT
>>   * @tree:  Expanded overlay device tree
>>   * @ovcs_id:   Pointer to overlay changeset id
>>   *
>> @@ -913,7 +913,7 @@ static void free_overlay_changeset(struct 
>> overlay_changeset *ovcs)
>>   */
>>
>>  static int of_overlay_apply(const void *fdt, struct device_node *tree,
>> -   int *ovcs_id)
>> +   int *ovcs_id)
>>  {
>> struct overlay_changeset *ovcs;
>> int ret = 0, ret_revert, ret_tmp;
>> @@ -953,7 +953,9 @@ static int of_overlay_apply(const void *fdt, struct 
>> device_node *tree,
>> /*
>>  * after overlay_notify(), ovcs->overlay_tree related pointers may 
>> have
>>  * leaked to drivers, so can not kfree() tree, aka 
>> ovcs->overlay_tree;
>> -* and can not free fdt, aka ovcs->fdt
>> +* and can not free memory containing aligned fdt.  The aligned fdt
>> +* is contained within the memory at ovcs->fdt, possibly at an offset
>> +* from ovcs->fdt.
>>  */
>> ret = overlay_notify(ovcs, OF_OVERLAY_PRE_APPLY);
>> if

[PATCH 4/5] srcu: Queue a callback in case of early started poll

2021-04-08 Thread Frederic Weisbecker

If SRCU polling is used before SRCU initialization, it won't benefit
from the callback requeue performed after rcu_init_geometry() in order
to fix the node hierarchy reshuffle. This is because polling start grace
periods that don't rely on callbacks. Therefore the grace period started
may be lost upon srcu_init().

To fix this, queue an empty callback in case of early use of
start_poll_synchronize_srcu() so that it can later get requeued with
the preserved order against other early calls to either call_srcu()
or start_poll_synchronize_srcu().

Since it can be called early any number of time, have at least two
struct rcu_head per ssp dedicated to this early enqueue. The two first
calls to start_poll_synchronize_srcu() will each start one new grace
period, if no call_srcu() happen before or in-between. Any subsequent
early call to start_poll_synchronize_srcu() will wait for the second
grace period so there is no need to queue empty callbacks beyond the
second call.

Suggested-by: Paul E . McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Boqun Feng 
Cc: Lai Jiangshan 
Cc: Neeraj Upadhyay 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Uladzislau Rezki 
---
 include/linux/srcutree.h |  1 +
 kernel/rcu/srcutree.c| 37 -
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index a2422c442470..9d4fbfc2c109 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -84,6 +84,7 @@ struct srcu_struct {
struct delayed_work work;
struct lockdep_map dep_map;
struct list_head early_init;
+   struct rcu_head early_poll[2];
 };
 
 /* Values for state variable (bottom bits of ->srcu_gp_seq). */
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 7ca1bd0067c4..2fa35e5bfbc9 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -190,6 +190,8 @@ static void reset_srcu_struct(struct srcu_struct *ssp)
 {
int cpu;
struct lockdep_map dep_map;
+   struct rcu_head early_poll[2];
+   int i;
struct rcu_cblist pendcbs;
struct rcu_head *rhp;
struct srcu_data __percpu *sda;
@@ -218,10 +220,16 @@ static void reset_srcu_struct(struct srcu_struct *ssp)
sda = ssp->sda;
/* Save the lockdep map, it may not suffer double-initialization */
dep_map = ssp->dep_map;
+   /* Save the early_poll callback links. They may be queued to pendcbs */
+   for (i = 0; i < ARRAY_SIZE(ssp->early_poll); i++)
+   early_poll[i] = ssp->early_poll[i];
 
memset(ssp, 0, sizeof(*ssp));
ssp->sda = sda;
ssp->dep_map = dep_map;
+   for (i = 0; i < ARRAY_SIZE(ssp->early_poll); i++)
+   ssp->early_poll[i] = early_poll[i];
+
spin_lock_init(_PRIVATE(ssp, lock));
init_srcu_struct_fields(ssp, true);
 
@@ -1079,6 +1087,10 @@ unsigned long get_state_synchronize_srcu(struct 
srcu_struct *ssp)
 }
 EXPORT_SYMBOL_GPL(get_state_synchronize_srcu);
 
+static void early_poll_func(struct rcu_head *rhp)
+{
+}
+
 /**
  * start_poll_synchronize_srcu - Provide cookie and start grace period
  * @ssp: srcu_struct to provide cookie for.
@@ -1091,7 +1103,30 @@ EXPORT_SYMBOL_GPL(get_state_synchronize_srcu);
  */
 unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp)
 {
-   return srcu_gp_start_if_needed(ssp, NULL, true);
+   struct rcu_head *rhp = NULL;
+
+   /*
+* After rcu_init_geometry(), we need to reset the ssp and restart
+* the early started grace periods. Callbacks can be requeued but
+* callback-less grace periods are harder to track, especially if
+* we want to preserve the order among all the early calls to
+* call_srcu() and start_poll_synchronize_srcu(). So queue empty
+* callbacks to solve this. We may initialize at most two grace periods
+* that early, no need to queue more than two callbacks per ssp, any
+* further early call to start_poll_synchronize_srcu() will wait for
+* the second grace period.
+*/
+   if (!srcu_init_done) {
+   int i;
+   for (i = 0; i < ARRAY_SIZE(ssp->early_poll); i++) {
+   if (!ssp->early_poll[i].func) {
+   rhp = >early_poll[i];
+   rhp->func = early_poll_func;
+   break;
+   }
+   }
+   }
+   return srcu_gp_start_if_needed(ssp, rhp, true);
 }
 EXPORT_SYMBOL_GPL(start_poll_synchronize_srcu);
 
-- 
2.25.1

[PATCH 5/5] srcu: Early test SRCU polling start

2021-04-08 Thread Frederic Weisbecker

Test early calls to start_poll_synchronize_srcu(), mixed within the
early test to call_srcu(), and make sure that
poll_state_synchronize_srcu() correctly see the expired grace periods
after the srcu_barrier() on late initcall. Normally srcu_barrier()
doesn't wait for callback-less grace periods but early calls to
start_poll_synchronize_srcu() involve empty callbacks.

Signed-off-by: Frederic Weisbecker 
Cc: Boqun Feng 
Cc: Lai Jiangshan 
Cc: Neeraj Upadhyay 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Uladzislau Rezki 
---
 kernel/rcu/update.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index dd94a602a6d2..7ee57d66a327 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -528,6 +528,7 @@ DEFINE_STATIC_SRCU(early_srcu);
 struct early_boot_kfree_rcu {
struct rcu_head rh;
 };
+static unsigned long early_cookie[3];
 
 static void early_boot_test_call_rcu(void)
 {
@@ -536,8 +537,14 @@ static void early_boot_test_call_rcu(void)
struct early_boot_kfree_rcu *rhp;
 
call_rcu(, test_callback);
-   if (IS_ENABLED(CONFIG_SRCU))
+   if (IS_ENABLED(CONFIG_SRCU)) {
+   int i;
+   early_cookie[0] = start_poll_synchronize_srcu(_srcu);
call_srcu(_srcu, , test_callback);
+
+   for (i = 1; i < ARRAY_SIZE(early_cookie); i++)
+   early_cookie[i] = 
start_poll_synchronize_srcu(_srcu);
+   }
rhp = kmalloc(sizeof(*rhp), GFP_KERNEL);
if (!WARN_ON_ONCE(!rhp))
kfree_rcu(rhp, rh);
@@ -561,8 +568,11 @@ static int rcu_verify_early_boot_tests(void)
early_boot_test_counter++;
rcu_barrier();
if (IS_ENABLED(CONFIG_SRCU)) {
+   int i;
early_boot_test_counter++;
srcu_barrier(_srcu);
+   for (i = 0; i < ARRAY_SIZE(early_cookie); i++)
+   
WARN_ON_ONCE(!poll_state_synchronize_srcu(_srcu, early_cookie[i]));
}
}
if (rcu_self_test_counter != early_boot_test_counter) {
-- 
2.25.1

Re: [PATCH v4 6/7] fs/xfs: Handle CoW for fsdax write() path

2021-04-08 Thread Darrick J. Wong

On Thu, Apr 08, 2021 at 08:04:31PM +0800, Shiyang Ruan wrote:
> In fsdax mode, WRITE and ZERO on a shared extent need CoW performed. After
> CoW, new allocated extents needs to be remapped to the file.  So, add an
> iomap_end for dax write ops to do the remapping work.
> 
> Signed-off-by: Shiyang Ruan 
> ---
>  fs/xfs/xfs_bmap_util.c |  3 +--
>  fs/xfs/xfs_file.c  |  9 +++
>  fs/xfs/xfs_iomap.c | 58 +-
>  fs/xfs/xfs_iomap.h |  4 +++
>  fs/xfs/xfs_iops.c  |  7 +++--
>  fs/xfs/xfs_reflink.c   |  3 +--
>  6 files changed, 69 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index e7d68318e6a5..9fcea33dd2c9 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -954,8 +954,7 @@ xfs_free_file_space(
>   return 0;
>   if (offset + len > XFS_ISIZE(ip))
>   len = XFS_ISIZE(ip) - offset;
> - error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
> - _buffered_write_iomap_ops);
> + error = xfs_iomap_zero_range(VFS_I(ip), offset, len, NULL);
>   if (error)
>   return error;
>  
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a007ca0711d9..5795d5d6f869 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -684,11 +684,8 @@ xfs_file_dax_write(
>   pos = iocb->ki_pos;
>  
>   trace_xfs_file_dax_write(iocb, from);
> - ret = dax_iomap_rw(iocb, from, _direct_write_iomap_ops);
> - if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
> - i_size_write(inode, iocb->ki_pos);
> - error = xfs_setfilesize(ip, pos, ret);
> - }
> + ret = dax_iomap_rw(iocb, from, _dax_write_iomap_ops);
> +
>  out:
>   if (iolock)
>   xfs_iunlock(ip, iolock);
> @@ -1309,7 +1306,7 @@ __xfs_filemap_fault(
>  
>   ret = dax_iomap_fault(vmf, pe_size, , NULL,
>   (write_fault && !vmf->cow_page) ?
> -  _direct_write_iomap_ops :
> +  _dax_write_iomap_ops :
>_read_iomap_ops);
>   if (ret & VM_FAULT_NEEDDSYNC)
>   ret = dax_finish_sync_fault(vmf, pe_size, pfn);
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index e17ab7f42928..f818f989687b 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -760,7 +760,8 @@ xfs_direct_write_iomap_begin(
>  
>   /* may drop and re-acquire the ilock */
>   error = xfs_reflink_allocate_cow(ip, , , ,
> - , flags & IOMAP_DIRECT);
> + ,
> + flags & IOMAP_DIRECT || IS_DAX(inode));

Parentheses, please:
(flags & IOMAP_DIRECT) || IS_DAX(inode));

>   if (error)
>   goto out_unlock;
>   if (shared)
> @@ -853,6 +854,38 @@ const struct iomap_ops xfs_direct_write_iomap_ops = {
>   .iomap_begin= xfs_direct_write_iomap_begin,
>  };
>  
> +static int
> +xfs_dax_write_iomap_end(
> + struct inode*inode,
> + loff_t  pos,
> + loff_t  length,
> + ssize_t written,
> + unsigned intflags,
> + struct iomap*iomap)
> +{
> + int error = 0;
> + xfs_inode_t *ip = XFS_I(inode);

Please don't use typedefs:

struct xfs_inode*ip = XFS_I(inode);

> + boolcow = xfs_is_cow_inode(ip);
> +
> + if (pos + written > i_size_read(inode)) {

What if we wrote zero bytes?  Usually that means error, right?

> + i_size_write(inode, pos + written);
> + error = xfs_setfilesize(ip, pos, written);
> + if (error && cow) {
> + xfs_reflink_cancel_cow_range(ip, pos, written, true);
> + return error;
> + }
> + }
> + if (cow)
> + error = xfs_reflink_end_cow(ip, pos, written);
> +
> + return error;
> +}
> +
> +const struct iomap_ops xfs_dax_write_iomap_ops = {
> + .iomap_begin= xfs_direct_write_iomap_begin,
> + .iomap_end  = xfs_dax_write_iomap_end,
> +};
> +
>  static int
>  xfs_buffered_write_iomap_begin(
>   struct inode*inode,
> @@ -1314,3 +1347,26 @@ xfs_xattr_iomap_begin(
>  const struct iomap_ops xfs_xattr_iomap_ops = {
>   .iomap_begin= xfs_xattr_iomap_begin,
>  };
> +
> +int
> +xfs_iomap_zero_range(
> + struct inode*inode,

Might as well pass the xfs_inode pointers directly into these two functions.

--D

> + loff_t  offset,
> + loff_t  len,
> + bool*did_zero)
> +{
> + return iomap_zero_range(inode, offset, len, did_zero,
> + IS_DAX(inode) ?

Re: [PATCH bpf-next v2 1/6] bpf: Factorize bpf_trace_printk and bpf_seq_printf

2021-04-08 Thread Florent Revest

On Wed, Apr 7, 2021 at 11:54 PM Andrii Nakryiko
 wrote:
> On Tue, Apr 6, 2021 at 8:35 AM Florent Revest  wrote:
> > On Fri, Mar 26, 2021 at 11:51 PM Andrii Nakryiko
> >  wrote:
> > > On Fri, Mar 26, 2021 at 2:53 PM Andrii Nakryiko
> > >  wrote:
> > > > On Tue, Mar 23, 2021 at 7:23 PM Florent Revest  
> > > > wrote:
> > > > > +/* Horrid workaround for getting va_list handling working with 
> > > > > different
> > > > > + * argument type combinations generically for 32 and 64 bit archs.
> > > > > + */
> > > > > +#define BPF_CAST_FMT_ARG(arg_nb, args, mod)  
> > > > >   \
> > > > > +   ((mod[arg_nb] == BPF_PRINTF_LONG_LONG ||  
> > > > >   \
> > > > > +(mod[arg_nb] == BPF_PRINTF_LONG && __BITS_PER_LONG == 64))   
> > > > >   \
> > > > > + ? args[arg_nb]  
> > > > >   \
> > > > > + : ((mod[arg_nb] == BPF_PRINTF_LONG ||   
> > > > >   \
> > > > > +(mod[arg_nb] == BPF_PRINTF_INT && __BITS_PER_LONG == 
> > > > > 32))  \
> > > >
> > > > is this right? INT is always 32-bit, it's only LONG that differs.
> > > > Shouldn't the rule be
> > > >
> > > > (LONG_LONG || LONG && __BITS_PER_LONG) -> (__u64)args[args_nb]
> > > > (INT || LONG && __BITS_PER_LONG == 32) -> (__u32)args[args_nb]
> > > >
> > > > Does (long) cast do anything fancy when casting from u64? Sorry, maybe
> > > > I'm confused.
> >
> > To be honest, I am also confused by that logic... :p My patch tries to
> > conserve exactly the same logic as "88a5c690b6 bpf: fix
> > bpf_trace_printk on 32 bit archs" because I was also afraid of missing
> > something and could not test it on 32 bit arches. From that commit
> > description, it is unclear to me what "u32 and long are passed
> > differently to u64, since the result of C conditional operators
> > follows the "usual arithmetic conversions" rules" means. Maybe Daniel
> > can comment on this ?
>
> Yeah, no idea. Seems like the code above should work fine for 32 and
> 64 bitness and both little- and big-endianness.

Yeah, looks good to me as well. I'll use it in v3.

> > > > > +int bpf_printf_preamble(char *fmt, u32 fmt_size, const u64 *raw_args,
> > > > > +   u64 *final_args, enum bpf_printf_mod_type 
> > > > > *mod,
> > > > > +   u32 num_args)
> > > > > +{
> > > > > +   struct bpf_printf_buf *bufs = this_cpu_ptr(_printf_buf);
> > > > > +   int err, i, fmt_cnt = 0, copy_size, used;
> > > > > +   char *unsafe_ptr = NULL, *tmp_buf = NULL;
> > > > > +   bool prepare_args = final_args && mod;
> > > >
> > > > probably better to enforce that both or none are specified, otherwise
> > > > return error
> >
> > Fair :)
> >
> > > it's actually three of them: raw_args, mod, and num_args, right? All
> > > three are either NULL or non-NULL.
> >
> > It is a bit tricky to see from that patch but in "3/6 bpf: Add a
> > bpf_snprintf helper" the verifier code calls this function with
> > num_args != 0 to check whether the number of arguments is correct
> > without actually converting anything.
> >
> > Also when the helper gets called, raw_args can come from the BPF
> > program and be NULL but in that case we will also have num_args = 0
> > guaranteed by the helper so the loop will bail out if it encounters a
> > format specifier.
>
> ok, but at least final_args and mod are locked together, so should be
> enforced to be either null or not, right?

Yes :) will do.

> > > > > +   enum bpf_printf_mod_type current_mod;
> > > > > +   size_t tmp_buf_len;
> > > > > +   u64 current_arg;
> > > > > +   char fmt_ptype;
> > > > > +
> > > > > +   for (i = 0; i < fmt_size && fmt[i] != '\0'; i++) {
> > > >
> > > > Can we say that if the last character is not '\0' then it's a bad
> > > > format string and return -EINVAL? And if \0 is inside the format
> > > > string, then it's also a bad format string? I wonder what others think
> > > > about this?... I think sanity should prevail.
> >
> > Overall, there are two situations:
> > - bpf_seq_printf, bpf_trace_printk: we have a pointer and size but we
> > are not guaranteed zero-termination
> > - bpf_snprintf: we have a pointer, no size but it's guaranteed to be
> > zero-terminated (by ARG_PTR_TO_CONST_STR)
> >
> > Currently, in the bpf_snprintf helper, I set fmt_size to UINT_MAX and
> > the terminating condition will be fmt[i] == '\0'.
> > As you pointed out a bit further, I got a bit carried away with the
> > refactoring and dropped the zero-termination checks for the existing
> > helpers !
> >
> > So I see two possibilities:
> > - either we check fmt[last] == '\0', add a bail out condition in the
> > loop if we encounter another `\0` and set fmt_size to sprintf(fmt) in
> > the bpf_snprintf verifier and helper code.
> > - or we unconditionally call strnlen(fmt, fmt_size) in
> > bpf_printf_preamble. If no 0 is found, we return an error, if there is
> > one we treat it as the NULL

Re: [PATCH v4 1/1] of: unittest: overlay: ensure proper alignment of copied FDT

2021-04-08 Thread Frank Rowand

On 4/8/21 4:54 PM, Guenter Roeck wrote:
> On 4/8/21 2:28 PM, Rob Herring wrote:
>>
>> Applying now so this gets into linux-next this week.
>>
> The patch doesn't apply on top of today's -next; it conflicts
> with "of: properly check for error returned by fdt_get_name()".
> 
> I reverted that patch and applied this one, and the DT unittests
> run with it on openrisc. I do get a single test failure, but I that
> is a different problem (possibly with the test case itself).
> 
> ### dt-test ### FAIL of_unittest_dma_ranges_one():923 of_dma_get_range: wrong 
> DMA addr 0x
>   (expecting 1) on node 
> /testcase-data/address-tests/bus@8000/device@1000

That is a known regression on the target that I use for testing (and
has been since 5.10-rc1) - the 8074 dragonboard, arm 32.  No
one else has reported it on the list, so even though I want to debug
and fix it "promptly", other tasks have had higher priority.  In my
notes I list two suspect commits:

  e0d072782c73 dma-mapping: introduce DMA range map, supplanting dma_pfn_offset
  0a0f0d8be76d dma-mapping: split 

I think that was purely based on looking at the list of commits that
may have touched OF dma.  I have not done a bisect.

One specific report of not seeing the FAIL was Vireshk on 5.11-rc6 with
a Hikey board.

> 
> Tested-by: Guenter Roeck 

Thanks for testing!

> 
> Guenter
>

[PATCH] staging: media: zoran: add '*' in subsequent line

2021-04-08 Thread Mitali Borkar

Added '*' in susbsequent lines for block comments to meet linux kernel
coding style.

Signed-off-by: Mitali Borkar 
---
 drivers/staging/media/zoran/zr36050.c | 34 +--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/staging/media/zoran/zr36050.c 
b/drivers/staging/media/zoran/zr36050.c
index 2826f4e5d37b..663ac2b3434e 100644
--- a/drivers/staging/media/zoran/zr36050.c
+++ b/drivers/staging/media/zoran/zr36050.c
@@ -25,7 +25,7 @@
 #include "videocodec.h"
 
 /* it doesn't make sense to have more than 20 or so,
-  just to prevent some unwanted loops */
+ * just to prevent some unwanted loops */
 #define MAX_CODECS 20
 
 /* amount of chips attached via this driver */
@@ -43,7 +43,7 @@ MODULE_PARM_DESC(debug, "Debug level (0-4)");
} while (0)
 
 /* =
-   Local hardware I/O functions:
+ *  Local hardware I/O functions:
 
read/write via codec layer (registers are located in the master device)
= */
@@ -80,7 +80,7 @@ static void zr36050_write(struct zr36050 *ptr, u16 reg, u8 
value)
 }
 
 /* =
-   Local helper function:
+ *  Local helper function:
 
status read
= */
@@ -95,7 +95,7 @@ static u8 zr36050_read_status1(struct zr36050 *ptr)
 }
 
 /* =
-   Local helper function:
+ *  Local helper function:
 
scale factor read
= */
@@ -112,7 +112,7 @@ static u16 zr36050_read_scalefactor(struct zr36050 *ptr)
 }
 
 /* =
-   Local helper function:
+ *  Local helper function:
 
wait if codec is ready to proceed (end of processing) or time is over
= */
@@ -133,7 +133,7 @@ static void zr36050_wait_end(struct zr36050 *ptr)
 }
 
 /* =
-   Local helper function:
+ *  Local helper function:
 
basic test of "connectivity", writes/reads to/from memory the SOF marker
= */
@@ -174,7 +174,7 @@ static int zr36050_basic_test(struct zr36050 *ptr)
 }
 
 /* =
-   Local helper function:
+ *  Local helper function:
 
simple loop for pushing the init datasets
= */
@@ -192,7 +192,7 @@ static int zr36050_pushit(struct zr36050 *ptr, u16 
startreg, u16 len, const char
 }
 
 /* =
-   Basic datasets:
+ *  Basic datasets:
 
jpeg baseline setup data (you find it on lots places in internet, or just
extract it from any regular .jpg image...)
@@ -294,7 +294,7 @@ static const char zr36050_decimation_h[8] = { 2, 1, 1, 0, 
0, 0, 0, 0 };
 static const char zr36050_decimation_v[8] = { 1, 1, 1, 0, 0, 0, 0, 0 };
 
 /* =
-   Local helper functions:
+ *  Local helper functions:
 
calculation and setup of parameter-dependent JPEG baseline segments
(needed for compression only)
@@ -303,7 +303,7 @@ static const char zr36050_decimation_v[8] = { 1, 1, 1, 0, 
0, 0, 0, 0 };
 /* - */
 
 /* SOF (start of frame) segment depends on width, height and sampling ratio
-of each color component */
+ *  of each color component */
 
 static int zr36050_set_sof(struct zr36050 *ptr)
 {
@@ -334,7 +334,7 @@ static int zr36050_set_sof(struct zr36050 *ptr)
 /* - */
 
 /* SOS (start of scan) segment depends on the used scan components
-   of each color component */
+ * of each color component */
 
 static int zr36050_set_sos(struct zr36050 *ptr)
 {
@@ -378,7 +378,7 @@ static int zr36050_set_dri(struct zr36050 *ptr)
 }
 
 /* =
-   Setup function:
+ *  Setup function:
 
Setup compression/decompression of Zoran's JPEG processor
( see also zoran 36050 manual )
@@ -531,13 +531,13 @@ static void zr36050_init(struct zr36050 *ptr)
 }
 
 /* =
-   CODEC API FUNCTIONS
+ *  CODEC API FUNCTIONS
 
this functions are accessed by the master via the API structure

Linux Kernel build bug with AMD_IOMMU_V2=M and HSA_AMD=Y

2021-04-08 Thread David Niklas

Hello,
(There are so many maintainers for kfd_iommu.c I feel like I'm spamming.)

When compiling for Linux version 5.11.12 using the AMDGPU GPU driver
with HSA_AMD enabled, I get the below set of errors. To work around this,
I need to change AMD_IOMMU_V2 from M to Y. This bug doesn't affect linux
kernel version 5.6 as it requires AMD_IOMMU_V2 to by Y when HSA_AMD is
enabled.
I'd bisect and request the removal of the relevant patch, but it's
possible that building the linux kernel should work this way and so a fix,
not a patch removal, is what should be issued.
I'm attaching my kernel config for 5.11.

Thanks,
David

PS: I made an official bug report in case you'd prefer that:
https://bugzilla.kernel.org/show_bug.cgi?id=212619

drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: In function
`kfd_iommu_bind_process_to_device': 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:120:
undefined reference to `amd_iommu_bind_pasid'
drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: In function
`kfd_iommu_unbind_process': 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:138:
undefined reference to `amd_iommu_unbind_pasid'
drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: In function
`kfd_iommu_suspend': 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:292:
undefined reference to
`amd_iommu_set_invalidate_ctx_cb' 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:293:
undefined reference to `amd_iommu_set_invalid_ppr_cb'
drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: In function
`kfd_iommu_resume': 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:312:
undefined reference to
`amd_iommu_init_device' 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:316:
undefined reference to
`amd_iommu_set_invalidate_ctx_cb' 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:318:
undefined reference to
`amd_iommu_set_invalid_ppr_cb' 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:323:
undefined reference to
`amd_iommu_set_invalidate_ctx_cb' 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:324:
undefined reference to
`amd_iommu_set_invalid_ppr_cb' 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:325:
undefined reference to
`amd_iommu_free_device' 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:232:
undefined reference to `amd_iommu_bind_pasid'
drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: In function
`kfd_iommu_suspend': 
/root/working/linux-5.11.12/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_iommu.c:294:
undefined reference to `amd_iommu_free_device' Makefile:1166: recipe for
target 'vmlinux' failed make: *** [vmlinux] Error 1


kernel-build-amdgpu-bug.conf.xz
Description: application/xz

Re: [PATCH] sched/fair: Rate limit calls to update_blocked_averages() for NOHZ

2021-04-08 Thread Tim Chen





On 4/8/21 7:51 AM, Vincent Guittot wrote:

>> I was suprised to find the overall cpu% consumption of 
>> update_blocked_averages
>> and throughput of the benchmark still didn't change much.  So I took a
>> peek into the profile and found the update_blocked_averages calls shifted to 
>> the idle load balancer.
>> The call to update_locked_averages was reduced in newidle_balance so the 
>> patch did
>> what we intended.  But the overall rate of calls to
> 
> At least , we have removed the useless call to update_blocked_averages
> in newidle_balance when we will not perform any newly idle load
> balance
> 
>> update_blocked_averages remain roughly the same, shifting from
>> newidle_balance to run_rebalance_domains.
>>
>>100.00%  (810cf070)
>> |
>> ---update_blocked_averages
>>|
>>|--95.47%--run_rebalance_domains
>>|  __do_softirq
>>|  |
>>|  |--94.27%--asm_call_irq_on_stack
>>|  |  do_softirq_own_stack
> 
> The call of  update_blocked_averages mainly comes from SCHED_SOFTIRQ.
> And as a result, not from the new path
> do_idle()->nohz_run_idle_balance() which has been added by this patch
> to defer the call to update_nohz_stats() after newlyidle_balance and
> before entering idle.
> 
>>|  |  |
>>|  |  |--93.74%--irq_exit_rcu
>>|  |  |  |
>>|  |  |  
>> |--88.20%--sysvec_apic_timer_interrupt
>>|  |  |  |  
>> asm_sysvec_apic_timer_interrupt
>>|  |  |  |  |
>>...
>>|
>>|
>> --4.53%--newidle_balance
>>   pick_next_task_fair
>>
>> I was expecting idle load balancer to be rate limited to 60 Hz, which
> 
> Why 60Hz ?
> 

My thinking is we will trigger load balance only after rq->next_balance.

void trigger_load_balance(struct rq *rq)
{
/* Don't need to rebalance while attached to NULL domain */
if (unlikely(on_null_domain(rq)))
return;

if (time_after_eq(jiffies, rq->next_balance))
raise_softirq(SCHED_SOFTIRQ);

nohz_balancer_kick(rq);
}

And it seems like next_balance is set to be 60 Hz

static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
{
int continue_balancing = 1;
int cpu = rq->cpu;
int busy = idle != CPU_IDLE && !sched_idle_cpu(cpu);
unsigned long interval;
struct sched_domain *sd;
/* Earliest time when we have to do rebalance again */
unsigned long next_balance = jiffies + 60*HZ;


>> should be 15 jiffies apart on the test system with CONFIG_HZ_250.
>> When I did a trace on a single CPU, I see that update_blocked_averages
>> are often called between 1 to 4 jiffies apart, which is at a much higher
>> rate than I expected.  I haven't taken a closer look yet.  But you may
> 
> 2 things can trigger a SCHED_SOFTIRQ/run_rebalance_domains:
> - the need for an update of blocked load which should not happen more
> than once every 32ms which means a rate of around 30Hz
> - the need for a load balance of a sched_domain. The min interval for
> a sched_domain is its weight when the CPU is idle which is usually few
> jiffies
> 
> The only idea that I have for now is that we spend less time in
> newidle_balance which changes the dynamic of your system.
> 
> In your trace, could you check if update_blocked_averages is called
> during the tick ? and Is the current task idle task ?

Here's a snapshot of the trace. However I didn't have the current task in my 
trace.
You can tell the frequency that update_blocked_averages is called on
cpu 2 by the jiffies value.  They are quite close together (1 to 3 jiffies 
apart).
When I have a chance to get on the machine, I'll take another look
at the current task and whether we got to trigger_load_balance() from 
scheduler_tick().


 3.505 ( ): probe:update_blocked_averages:(810cf070) cpu=2 
jiffies=0x1004fb731
 4.505 ( ): probe:update_blocked_averages:(810cf070) cpu=2 
jiffies=0x1004fb732
 6.484 ( ): probe:newidle_balance:(810d2470) 
this_rq=0x88fe7f8aae00 next_balance=0x1004fb731 jiffies=0x1004fb733
 6.506 ( ): probe:update_blocked_averages:(810cf070) cpu=2 
jiffies=0x1004fb734
 9.503 ( ): probe:update_blocked_averages:(810cf070) cpu=2 
jiffies=0x1004fb737
11.504 ( ): probe:update_blocked_averages:(810cf070) cpu=2 
jiffies=0x1004fb739
11.602 ( ): probe:newidle_balance:(810d2470) 
this_rq=0x88fe7f8aae00 next_balance=0x1004fb76c jiffies=0x1004fb739
11.624 ( ): probe:newidle_balance:(810d2470)

Re: [PATCH v4 7/7] fs/xfs: Add dedupe support for fsdax

2021-04-08 Thread Darrick J. Wong

On Thu, Apr 08, 2021 at 08:04:32PM +0800, Shiyang Ruan wrote:
> Add xfs_break_two_dax_layouts() to break layout for tow dax files.  Then
> call compare range function only when files are both DAX or not.
> 
> Signed-off-by: Shiyang Ruan 
> ---
>  fs/xfs/xfs_file.c| 20 
>  fs/xfs/xfs_inode.c   |  8 +++-
>  fs/xfs/xfs_inode.h   |  1 +
>  fs/xfs/xfs_reflink.c |  5 +++--
>  4 files changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5795d5d6f869..1fd457167c12 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -842,6 +842,26 @@ xfs_break_dax_layouts(
>   0, 0, xfs_wait_dax_page(inode));
>  }
>  
> +int
> +xfs_break_two_dax_layouts(
> + struct inode*src,
> + struct inode*dest)
> +{
> + int error;
> + boolretry = false;
> +
> +retry:
> + error = xfs_break_dax_layouts(src, );
> + if (error || retry)
> + goto retry;
> +
> + error = xfs_break_dax_layouts(dest, );
> + if (error || retry)
> + goto retry;
> +
> + return error;
> +}
> +
>  int
>  xfs_break_layouts(
>   struct inode*inode,
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index f93370bd7b1e..c01786917eef 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3713,8 +3713,10 @@ xfs_ilock2_io_mmap(
>   struct xfs_inode*ip2)
>  {
>   int ret;
> + struct inode*inode1 = VFS_I(ip1);
> + struct inode*inode2 = VFS_I(ip2);
>  
> - ret = xfs_iolock_two_inodes_and_break_layout(VFS_I(ip1), VFS_I(ip2));
> + ret = xfs_iolock_two_inodes_and_break_layout(inode1, inode2);
>   if (ret)
>   return ret;
>   if (ip1 == ip2)
> @@ -3722,6 +3724,10 @@ xfs_ilock2_io_mmap(
>   else
>   xfs_lock_two_inodes(ip1, XFS_MMAPLOCK_EXCL,
>   ip2, XFS_MMAPLOCK_EXCL);
> +
> + if (IS_DAX(inode1) && IS_DAX(inode2))
> + ret = xfs_break_two_dax_layouts(inode1, inode2);

This is wrong on many levels.

The first problem is that xfs_break_two_dax_layouts calls
xfs_break_dax_layouts twice even if inode1 == inode2, which is
unnecessary.

The second problem is that xfs_break_dax_layouts can cycle the MMAPLOCK
on the inode that it's processing.  Since there are two inodes in play
here, you must be /very/ careful about maintaining correct locking order,
which for the MMAPLOCK is increasing order of xfs_inode.i_ino.  If you
drop the MMAPLOCK for the lower-numbered inode for any reason, you have
to drop both MMAPLOCKs and try again.

In other words, you have to replace all that nice MMAPLOCK code with a
new xfs_mmaplock_two_inodes_and_break_dax_layouts function that is
structured similarly to what xfs_iolock_two_inodes_and_break_layout
does for the IOLOCK and PNFS layouts.

> +
>   return 0;
>  }
>  
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 88ee4c3930ae..5ef21924dddc 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -435,6 +435,7 @@ enum xfs_prealloc_flags {
>  
>  int  xfs_update_prealloc_flags(struct xfs_inode *ip,
> enum xfs_prealloc_flags flags);
> +int  xfs_break_two_dax_layouts(struct inode *inode1, struct inode *inode2);
>  int  xfs_break_layouts(struct inode *inode, uint *iolock,
>   enum layout_break_reason reason);
>  
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index a4cd6e8a7aa0..4426bcc8a985 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -29,6 +29,7 @@
>  #include "xfs_iomap.h"
>  #include "xfs_sb.h"
>  #include "xfs_ag_resv.h"
> +#include 

Why is this necessary?

--D

>  
>  /*
>   * Copy on Write of Shared Blocks
> @@ -1324,8 +1325,8 @@ xfs_reflink_remap_prep(
>   if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
>   goto out_unlock;
>  
> - /* Don't share DAX file data for now. */
> - if (IS_DAX(inode_in) || IS_DAX(inode_out))
> + /* Don't share DAX file data with non-DAX file. */
> + if (IS_DAX(inode_in) != IS_DAX(inode_out))
>   goto out_unlock;
>  
>   if (!IS_DAX(inode_in))
> -- 
> 2.31.0
> 
> 
>

Re: [PATCH] net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh

2021-04-08 Thread patchwork-bot+netdevbpf

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Fri, 9 Apr 2021 03:01:29 +0500 you wrote:
> nlh is being checked for validtity two times when it is dereferenced in
> this function. Check for validity again when updating the flags through
> nlh pointer to make the dereferencing safe.
> 
> CC: 
> Addresses-Coverity: ("NULL pointer dereference")
> Signed-off-by: Muhammad Usama Anjum 
> 
> [...]

Here is the summary with links:
  - net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
https://git.kernel.org/netdev/net/c/864db232dc70

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

Re: [PATCH 2/2] pm: allow drivers to drop #ifdef and __maybe_unused from pm callbacks

2021-04-08 Thread kernel test robot

Hi Masahiro,

I love your patch! Yet something to improve:

[auto build test ERROR on pinctrl/devel]
[also build test ERROR on pm/linux-next soc/for-next linus/master v5.12-rc6]
[cannot apply to next-20210408]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Masahiro-Yamada/linux-kconfig-h-move-IF_ENABLED-out-of-linux-kconfig-h/20210409-050128
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git devel
config: nds32-randconfig-r004-20210408 (attached as .config)
compiler: nds32le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/01dfb9e1a54c14b3f491c3a5e93f1e8756042567
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Masahiro-Yamada/linux-kconfig-h-move-IF_ENABLED-out-of-linux-kconfig-h/20210409-050128
git checkout 01dfb9e1a54c14b3f491c3a5e93f1e8756042567
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=nds32 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from drivers/input/rmi4/rmi_spi.c:7:
   drivers/input/rmi4/rmi_spi.c:507:26: error: 'rmi_spi_suspend' undeclared 
here (not in a function); did you mean 'rmi_driver_suspend'?
 507 |  SET_SYSTEM_SLEEP_PM_OPS(rmi_spi_suspend, rmi_spi_resume)
 |  ^~~
   include/linux/kernel.h:41:38: note: in definition of macro 'PTR_IF'
  41 | #define PTR_IF(cond, ptr) ((cond) ? (ptr) : NULL)
 |  ^~~
   include/linux/pm.h:308:14: note: in expansion of macro 'pm_sleep_ptr'
 308 |  .suspend  = pm_sleep_ptr(suspend_fn), \
 |  ^~~~
   drivers/input/rmi4/rmi_spi.c:507:2: note: in expansion of macro 
'SET_SYSTEM_SLEEP_PM_OPS'
 507 |  SET_SYSTEM_SLEEP_PM_OPS(rmi_spi_suspend, rmi_spi_resume)
 |  ^~~
   drivers/input/rmi4/rmi_spi.c:507:43: error: 'rmi_spi_resume' undeclared here 
(not in a function); did you mean 'rmi_spi_pm'?
 507 |  SET_SYSTEM_SLEEP_PM_OPS(rmi_spi_suspend, rmi_spi_resume)
 |   ^~
   include/linux/kernel.h:41:38: note: in definition of macro 'PTR_IF'
  41 | #define PTR_IF(cond, ptr) ((cond) ? (ptr) : NULL)
 |  ^~~
   include/linux/pm.h:309:14: note: in expansion of macro 'pm_sleep_ptr'
 309 |  .resume   = pm_sleep_ptr(resume_fn), \
 |  ^~~~
   drivers/input/rmi4/rmi_spi.c:507:2: note: in expansion of macro 
'SET_SYSTEM_SLEEP_PM_OPS'
 507 |  SET_SYSTEM_SLEEP_PM_OPS(rmi_spi_suspend, rmi_spi_resume)
 |  ^~~
>> drivers/input/rmi4/rmi_spi.c:508:21: error: 'rmi_spi_runtime_suspend' 
>> undeclared here (not in a function)
 508 |  SET_RUNTIME_PM_OPS(rmi_spi_runtime_suspend, rmi_spi_runtime_resume,
 | ^~~
   include/linux/kernel.h:41:38: note: in definition of macro 'PTR_IF'
  41 | #define PTR_IF(cond, ptr) ((cond) ? (ptr) : NULL)
 |  ^~~
   include/linux/pm.h:332:21: note: in expansion of macro 'pm_ptr'
 332 |  .runtime_suspend = pm_ptr(suspend_fn), \
 | ^~
   drivers/input/rmi4/rmi_spi.c:508:2: note: in expansion of macro 
'SET_RUNTIME_PM_OPS'
 508 |  SET_RUNTIME_PM_OPS(rmi_spi_runtime_suspend, rmi_spi_runtime_resume,
 |  ^~
>> drivers/input/rmi4/rmi_spi.c:508:46: error: 'rmi_spi_runtime_resume' 
>> undeclared here (not in a function)
 508 |  SET_RUNTIME_PM_OPS(rmi_spi_runtime_suspend, rmi_spi_runtime_resume,
 |  ^~
   include/linux/kernel.h:41:38: note: in definition of macro 'PTR_IF'
  41 | #define PTR_IF(cond, ptr) ((cond) ? (ptr) : NULL)
 |  ^~~
   include/linux/pm.h:333:21: note: in expansion of macro 'pm_ptr'
 333 |  .runtime_resume  = pm_ptr(resume_fn), \
 | ^~
   drivers/input/rmi4/rmi_spi.c:508:2: note: in expansion of macro 
'SET_RUNTIME_PM_OPS'
 508 |  SET_RUNTIME_PM_OPS(rmi_spi_runtime_suspend, rmi_spi_runtime_resume,
 |  ^~
--
   In file included from include/linux/list.h:9,
from include/linux/rculist.h:10,
from include/linux/pid.h:5,

Re: [PATCH net v2 0/2] lantiq: GSWIP: two more fixes

2021-04-08 Thread patchwork-bot+netdevbpf

Hello:

This series was applied to netdev/net.git (refs/heads/master):

On Thu,  8 Apr 2021 20:38:26 +0200 you wrote:
> Hello,
> 
> after my last patch got accepted and is now in net as commit
> 3e6fdeb28f4c33 ("net: dsa: lantiq_gswip: Let GSWIP automatically set
> the xMII clock") [0] some more people from the OpenWrt community
> (many thanks to everyone involved) helped test the GSWIP driver: [1]
> 
> [...]

Here is the summary with links:
  - [net,v2,1/2] net: dsa: lantiq_gswip: Don't use PHY auto polling
https://git.kernel.org/netdev/net/c/3e9005be8777
  - [net,v2,2/2] net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG 
bits
https://git.kernel.org/netdev/net/c/4b5923249b8f

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

Re: [PATCH net v1] lan743x: fix ethernet frame cutoff issue

2021-04-08 Thread Andrew Lunn

Hi Sven

> Many thanks to Heiner Kallweit for suggesting this solution. 

Adding a Suggested-by: would be good. And it might sometime help
Johnathan Corbet extract some interesting statistics from the commit
messages if everybody uses the same format.

Andrew

[PATCH v2] char: tpm: fix error return code in tpm_cr50_i2c_tis_recv()

2021-04-08 Thread Zhihao Cheng

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 3a253caaad11 ("char: tpm: add i2c driver for cr50")
Reported-by: Hulk Robot 
Signed-off-by: Zhihao Cheng 
---
 drivers/char/tpm/tpm_tis_i2c_cr50.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/char/tpm/tpm_tis_i2c_cr50.c 
b/drivers/char/tpm/tpm_tis_i2c_cr50.c
index ec9a65e7887d..f19c227d20f4 100644
--- a/drivers/char/tpm/tpm_tis_i2c_cr50.c
+++ b/drivers/char/tpm/tpm_tis_i2c_cr50.c
@@ -483,6 +483,7 @@ static int tpm_cr50_i2c_tis_recv(struct tpm_chip *chip, u8 
*buf, size_t buf_len)
expected = be32_to_cpup((__be32 *)(buf + 2));
if (expected > buf_len) {
dev_err(>dev, "Buffer too small to receive i2c data\n");
+   rc = -E2BIG;
goto out_err;
}
 
-- 
2.25.4

Re: [RFC v2] KVM: x86: Support KVM VMs sharing SEV context

2021-04-08 Thread James Bottomley

On Thu, 2021-04-08 at 17:41 -0700, Steve Rutherford wrote:
> On Thu, Apr 8, 2021 at 2:15 PM James Bottomley 
> wrote:
> > On Thu, 2021-04-08 at 12:48 -0700, Steve Rutherford wrote:
> > > On Thu, Apr 8, 2021 at 10:43 AM James Bottomley <
> > > j...@linux.ibm.com>
> > > wrote:
> > > > On Fri, 2021-04-02 at 16:20 +0200, Paolo Bonzini wrote:
[...]
> > > > > However, it would be nice to collaborate on the low-level
> > > > > (SEC/PEI) firmware patches to detect whether a CPU is part of
> > > > > the primary VM or the mirror.  If Google has any OVMF patches
> > > > > already done for that, it would be great to combine it with
> > > > > IBM's SEV migration code and merge it into upstream OVMF.
> > > > 
> > > > We've reached the stage with our prototyping where not having
> > > > the OVMF support is blocking us from working on QEMU.  If we're
> > > > going to have to reinvent the wheel in OVMF because Google is
> > > > unwilling to publish the patches, can you at least give some
> > > > hints about how you did it?
> > > > 
> > > > Thanks,
> > > > 
> > > > James
> > > 
> > > Hey James,
> > > It's not strictly necessary to modify OVMF to make SEV VMs live
> > > migrate. If we were to modify OVMF, we would contribute those
> > > changes
> > > upstream.
> > 
> > Well, no, we already published an OVMF RFC to this list that does
> > migration.  However, the mirror approach requires a different boot
> > mechanism for the extra vCPU in the mirror.  I assume you're doing
> > this bootstrap through OVMF so the hypervisor can interrogate it to
> > get the correct entry point?  That's the code we're asking to see
> > because that's what replaces our use of the MP service in the RFC.
> > 
> > James
> 
> Hey James,
> The intention would be to have a separate, stand-alone firmware-like
> binary run by the mirror. Since the VMM is in control of where it
> places that binary in the guest physical address space and the
> initial configuration of the vCPUs, it can point the vCPUs at an
> entry point contained within that binary, rather than at the standard
> x86 reset vector.

If you want to share ASIDs you have to share the firmware that the
running VM has been attested to.  Once the VM moves from LAUNCH to
RUNNING, the PSP won't allow the VMM to inject any more firmware or do
any more attestations.  What you mirror after this point can thus only
contain what has already been measured or what the guest added.  This
is why we think there has to be a new entry path into the VM for the
mirror vCPU.

So assuming you're thinking you'll inject two pieces of firmware at
start of day: the OVFM and this separate binary and attest to both,
then you can do that, but then you have two problems:

   1. Preventing OVMF from trampling all over your separate binary while
  it's booting
   2. Launching the vCPU up into this separate binary in a way it can
  execute (needs stack and heap)

I think you can likely solve 1. by making the separate binary look like
a ROM, but then you have the problem of where you steal the RAM you
need for a heap and stack and it still brings us back to how to launch
the vCPU which was the original question.

With ES we can set the registers at launch, so a vCPU that's never
launched can still be pre-programmed with the separate binary entry
point but solving the stack and heap looks like it requires co-
operation from OVMF.

That's why we were thinking the easiest straight line approach is to
have a runtime DXE which has a declared initialization routine that
allocates memory for the stack and a heap and a separate declared entry
point for the vCPU which picks up the already allocated and mapped
stack and heap.

James

[PATCH v3] hwmon: Add driver for fsp-3y PSUs and PDUs

2021-04-08 Thread Václav Kubernát

This patch adds support for these devices:
- YH-5151E - the PDU
- YM-2151E - the PSU

The device datasheet says that the devices support PMBus 1.2, but in my
testing, a lot of the commands aren't supported and if they are, they
sometimes behave strangely or inconsistently. For example, writes to the
PAGE command requires using PEC, otherwise the write won't work and the
page won't switch, even though, the standard says that PEC is opiotnal.
On the other hand, writes the SMBALERT don't require PEC. Because of
this, the driver is mostly reverse engineered with the help of a tool
called pmbus_peek written by David Brownell (and later adopted by my
colleague Jan Kundrát).

The device also has some sort of a timing issue when switching pages,
which is explained further in the code.

Because of this, the driver support is limited. It exposes only the
values, that have been tested to work correctly.

Signed-off-by: Václav Kubernát 
---
 Documentation/hwmon/fsp-3y.rst |  26 
 drivers/hwmon/pmbus/Kconfig|  10 ++
 drivers/hwmon/pmbus/Makefile   |   1 +
 drivers/hwmon/pmbus/fsp-3y.c   | 236 +
 4 files changed, 273 insertions(+)
 create mode 100644 Documentation/hwmon/fsp-3y.rst
 create mode 100644 drivers/hwmon/pmbus/fsp-3y.c

diff --git a/Documentation/hwmon/fsp-3y.rst b/Documentation/hwmon/fsp-3y.rst
new file mode 100644
index ..68a547021846
--- /dev/null
+++ b/Documentation/hwmon/fsp-3y.rst
@@ -0,0 +1,26 @@
+Kernel driver fsp3y
+==
+Supported devices:
+  * 3Y POWER YH-5151E
+  * 3Y POWER YM-2151E
+
+Author: Václav Kubernát 
+
+Description
+---
+This driver implements limited support for two 3Y POWER devices.
+
+Sysfs entries
+-
+in1_inputinput voltage
+in2_input12V output voltage
+in3_input5V output voltage
+curr1_input  input current
+curr2_input  12V output current
+curr3_input  5V output current
+fan1_input   fan rpm
+temp1_input  temperature 1
+temp2_input  temperature 2
+temp3_input  temperature 3
+power1_input input power
+power2_input output power
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index 03606d4298a4..9d12d446396c 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -56,6 +56,16 @@ config SENSORS_BEL_PFE
  This driver can also be built as a module. If so, the module will
  be called bel-pfe.
 
+config SENSORS_FSP_3Y
+   tristate "FSP/3Y-Power power supplies"
+   help
+ If you say yes here you get hardware monitoring support for
+ FSP/3Y-Power hot-swap power supplies.
+ Supported models: YH-5151E, YM-2151E
+
+ This driver can also be built as a module. If so, the module will
+ be called fsp-3y.
+
 config SENSORS_IBM_CFFPS
tristate "IBM Common Form Factor Power Supply"
depends on LEDS_CLASS
diff --git a/drivers/hwmon/pmbus/Makefile b/drivers/hwmon/pmbus/Makefile
index 6a4ba0fdc1db..bfe218ad898f 100644
--- a/drivers/hwmon/pmbus/Makefile
+++ b/drivers/hwmon/pmbus/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_SENSORS_PMBUS) += pmbus.o
 obj-$(CONFIG_SENSORS_ADM1266)  += adm1266.o
 obj-$(CONFIG_SENSORS_ADM1275)  += adm1275.o
 obj-$(CONFIG_SENSORS_BEL_PFE)  += bel-pfe.o
+obj-$(CONFIG_SENSORS_FSP_3Y)   += fsp-3y.o
 obj-$(CONFIG_SENSORS_IBM_CFFPS)+= ibm-cffps.o
 obj-$(CONFIG_SENSORS_INSPUR_IPSPS) += inspur-ipsps.o
 obj-$(CONFIG_SENSORS_IR35221)  += ir35221.o
diff --git a/drivers/hwmon/pmbus/fsp-3y.c b/drivers/hwmon/pmbus/fsp-3y.c
new file mode 100644
index ..f03c4e27ec8c
--- /dev/null
+++ b/drivers/hwmon/pmbus/fsp-3y.c
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Hardware monitoring driver for FSP 3Y-Power PSUs
+ *
+ * Copyright (c) 2021 Václav Kubernát, CESNET
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "pmbus.h"
+
+#define YM2151_PAGE_12V_LOG0x00
+#define YM2151_PAGE_12V_REAL   0x00
+#define YM2151_PAGE_5VSB_LOG   0x01
+#define YM2151_PAGE_5VSB_REAL  0x20
+#define YH5151E_PAGE_12V_LOG   0x00
+#define YH5151E_PAGE_12V_REAL  0x00
+#define YH5151E_PAGE_5V_LOG0x01
+#define YH5151E_PAGE_5V_REAL   0x10
+#define YH5151E_PAGE_3V3_LOG   0x02
+#define YH5151E_PAGE_3V3_REAL  0x11
+
+enum chips {
+   ym2151e,
+   yh5151e
+};
+
+struct fsp3y_data {
+   struct pmbus_driver_info info;
+   enum chips chip;
+   int page;
+};
+
+#define to_fsp3y_data(x) container_of(x, struct fsp3y_data, info)
+
+static int page_log_to_page_real(int page_log, enum chips chip)
+{
+   switch (chip) {
+   case ym2151e:
+   switch (page_log) {
+   case YM2151_PAGE_12V_LOG:
+   return YM2151_PAGE_12V_REAL;
+   case YM2151_PAGE_5VSB_LOG:
+   return YM2151_PAGE_5VSB_REAL;
+   }
+   return -EINVAL;
+   case yh5151e:
+

Re: [PATCH v4 3/3] dt-bindings: clock: add ti,lmk04832 bindings

2021-04-08 Thread Liam Beguin

On Thu Apr 8, 2021 at 4:13 PM EDT, Rob Herring wrote:
> On Tue, Apr 06, 2021 at 08:53:30PM -0400, Liam Beguin wrote:
> > From: Liam Beguin 
> > 
> > Document devicetree bindings for Texas Instruments' LMK04832.
> > The LMK04208 is a high performance clock conditioner with superior clock
> > jitter cleaning, generation, and distribution with JEDEC JESD204B
> > support.
> > 
> > Signed-off-by: Liam Beguin 
> > ---
> >  .../bindings/clock/ti,lmk04832.yaml   | 209 ++
> >  1 file changed, 209 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/clock/ti,lmk04832.yaml
> > 
> > diff --git a/Documentation/devicetree/bindings/clock/ti,lmk04832.yaml 
> > b/Documentation/devicetree/bindings/clock/ti,lmk04832.yaml
> > new file mode 100644
> > index ..a9f8b9b720fc
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/clock/ti,lmk04832.yaml
> > @@ -0,0 +1,209 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/clock/ti,lmk04832.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Clock bindings for the Texas Instruments LMK04832
> > +
> > +maintainers:
> > +  - Liam Beguin 
> > +
> > +description: |
> > +  Devicetree binding for the LMK04832, a clock conditioner with JEDEC 
> > JESD204B
> > +  support. The LMK04832 is pin compatible with the LMK0482x family.
> > +
> > +  Link to datasheet, https://www.ti.com/lit/ds/symlink/lmk04832.pdf
> > +
> > +properties:
> > +  compatible:
> > +enum:
> > +  - ti,lmk04832
> > +
> > +  reg:
> > +maxItems: 1
> > +
> > +  '#address-cells':
> > +const: 1
> > +
> > +  '#size-cells':
> > +const: 0
> > +
> > +  '#clock-cells':
> > +const: 1
> > +
> > +  spi-max-frequency:
> > +$ref: /schemas/types.yaml#/definitions/uint32
> > +description:
> > +  Maximum SPI clocking speed of the device in Hz.
>
> Already has a type and description, just need:
>
> spi-max-frequency: true
>
> (Or a range of values if you know the maximum).
>

I have the max, will use that instead.

> > +
> > +  clocks:
> > +items:
> > +  - description: PLL2 reference clock.
> > +
> > +  clock-names:
> > +items:
> > +  - const: oscin
> > +
> > +  reset-gpios:
> > +maxItems: 1
> > +
> > +  ti,spi-4wire-rdbk:
> > +description: |
> > +  Select SPI 4wire readback pin configuration.
> > +  Available readback pins are,
> > +CLKin_SEL0 0
> > +CLKin_SEL1 1
> > +RESET 2
> > +$ref: /schemas/types.yaml#/definitions/uint32
> > +enum: [0, 1, 2]
> > +default: 1
> > +
> > +  ti,vco-hz:
> > +description: Optional to set VCO frequency of the PLL in Hertz.
> > +
> > +  ti,sysref-ddly:
> > +description: SYSREF digital delay value.
> > +$ref: /schemas/types.yaml#/definitions/uint32
> > +minimum: 8
> > +maximum: 8191
> > +default: 8
> > +
> > +  ti,sysref-mux:
> > +description: |
> > +  SYSREF Mux configuration.
> > +  Available options are,
> > +Normal SYNC 0
> > +Re-clocked 1
> > +SYSREF Pulser 2
> > +SYSREF Continuous 3
> > +$ref: /schemas/types.yaml#/definitions/uint32
> > +enum: [0, 1, 2, 3]
> > +default: 3
> > +
> > +  ti,sync-mode:
> > +description: SYNC pin configuration.
> > +$ref: /schemas/types.yaml#/definitions/uint32
> > +enum: [0, 1, 2]
> > +default: 1
> > +
> > +  ti,sysref-pulse-count:
> > +description:
> > +  Number of SYSREF pulses to send when SYSREF is not in continuous 
> > mode.
> > +$ref: /schemas/types.yaml#/definitions/uint32
> > +enum: [1, 2, 4, 8]
> > +default: 4
> > +
> > +patternProperties:
> > +  "@[0-9a-d]+$":
> > +type: object
> > +description:
> > +  Child nodes used to configure output clocks.
> > +
> > +properties:
> > +  reg:
> > +description:
> > +  clock output identifier.
> > +minimum: 0
> > +maximum: 13
> > +
> > +  ti,clkout-fmt:
> > +description:
> > +  Clock output format.
> > +  Available options are,
> > +Powerdown 0x00
> > +LVDS 0x01
> > +HSDS 6 mA 0x02
> > +HSDS 8 mA 0x03
> > +LVPECL 1600 mV 0x04
> > +LVPECL 2000 mV 0x05
> > +LCPECL 0x06
> > +CML 16 mA 0x07
> > +CML 24 mA 0x08
> > +CML 32 mA 0x09
> > +CMOS (Off/Inverted) 0x0a
> > +CMOS (Normal/Off) 0x0b
> > +CMOS (Inverted/Inverted) 0x0c
> > +CMOS (Inverted/Normal) 0x0d
> > +CMOS (Normal/Inverted) 0x0e
> > +CMOS (Normal/Normal) 0x0f
> > +$ref: /schemas/types.yaml#/definitions/uint32
> > +minimum: 0
> > +maximum: 15
> > +
> > +  ti,clkout-sysref:
> > +description:
> > +  Select SYSREF clock path for output clock.
> > +type: boolean
> > +
> > +

RE: [PATCH v2 2/3] fsdax: Factor helper: dax_fault_actor()

2021-04-08 Thread ruansy.f...@fujitsu.com



> -Original Message-
> From: Darrick J. Wong 
> Sent: Friday, April 9, 2021 5:11 AM
> Subject: Re: [PATCH v2 2/3] fsdax: Factor helper: dax_fault_actor()
> 
> On Wed, Apr 07, 2021 at 09:38:22PM +0800, Shiyang Ruan wrote:
> > The core logic in the two dax page fault functions is similar. So,
> > move the logic into a common helper function. Also, to facilitate the
> > addition of new features, such as CoW, switch-case is no longer used
> > to handle different iomap types.
> >
> > Signed-off-by: Shiyang Ruan 
> > Reviewed-by: Christoph Hellwig 
> > Reviewed-by: Ritesh Harjani 
> > ---
> >  fs/dax.c | 294
> > ---
> >  1 file changed, 148 insertions(+), 146 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index f843fb8fbbf1..6dea1fc11b46 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1054,6 +1054,66 @@ static vm_fault_t dax_load_hole(struct xa_state
> *xas,
> > return ret;
> >  }
> >
> > +#ifdef CONFIG_FS_DAX_PMD
> > +static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault
> *vmf,
> > +   struct iomap *iomap, void **entry)
> > +{
> > +   struct address_space *mapping = vmf->vma->vm_file->f_mapping;
> > +   unsigned long pmd_addr = vmf->address & PMD_MASK;
> > +   struct vm_area_struct *vma = vmf->vma;
> > +   struct inode *inode = mapping->host;
> > +   pgtable_t pgtable = NULL;
> > +   struct page *zero_page;
> > +   spinlock_t *ptl;
> > +   pmd_t pmd_entry;
> > +   pfn_t pfn;
> > +
> > +   zero_page = mm_get_huge_zero_page(vmf->vma->vm_mm);
> > +
> > +   if (unlikely(!zero_page))
> > +   goto fallback;
> > +
> > +   pfn = page_to_pfn_t(zero_page);
> > +   *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn,
> > +   DAX_PMD | DAX_ZERO_PAGE, false);
> > +
> > +   if (arch_needs_pgtable_deposit()) {
> > +   pgtable = pte_alloc_one(vma->vm_mm);
> > +   if (!pgtable)
> > +   return VM_FAULT_OOM;
> > +   }
> > +
> > +   ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd);
> > +   if (!pmd_none(*(vmf->pmd))) {
> > +   spin_unlock(ptl);
> > +   goto fallback;
> > +   }
> > +
> > +   if (pgtable) {
> > +   pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
> > +   mm_inc_nr_ptes(vma->vm_mm);
> > +   }
> > +   pmd_entry = mk_pmd(zero_page, vmf->vma->vm_page_prot);
> > +   pmd_entry = pmd_mkhuge(pmd_entry);
> > +   set_pmd_at(vmf->vma->vm_mm, pmd_addr, vmf->pmd, pmd_entry);
> > +   spin_unlock(ptl);
> > +   trace_dax_pmd_load_hole(inode, vmf, zero_page, *entry);
> > +   return VM_FAULT_NOPAGE;
> > +
> > +fallback:
> > +   if (pgtable)
> > +   pte_free(vma->vm_mm, pgtable);
> > +   trace_dax_pmd_load_hole_fallback(inode, vmf, zero_page, *entry);
> > +   return VM_FAULT_FALLBACK;
> > +}
> > +#else
> > +static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault
> *vmf,
> > +   struct iomap *iomap, void **entry)
> > +{
> > +   return VM_FAULT_FALLBACK;
> > +}
> > +#endif /* CONFIG_FS_DAX_PMD */
> > +
> >  s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)  {
> > sector_t sector = iomap_sector(iomap, pos & PAGE_MASK); @@ -1291,6
> > +1351,64 @@ static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf,
> struct iomap *iomap,
> > return ret;
> >  }
> >
> > +/**
> > + * dax_fault_actor - Common actor to handle pfn insertion in PTE/PMD fault.
> > + * @vmf:   vm fault instance
> > + * @pfnp:  pfn to be returned
> > + * @xas:   the dax mapping tree of a file
> > + * @entry: an unlocked dax entry to be inserted
> > + * @pmd:   distinguish whether it is a pmd fault
> > + * @flags: iomap flags
> > + * @iomap: from iomap_begin()
> > + * @srcmap:from iomap_begin(), not equal to iomap if it is a CoW
> > + */
> > +static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
> > +   struct xa_state *xas, void **entry, bool pmd,
> > +   unsigned int flags, struct iomap *iomap, struct iomap *srcmap) {
> > +   struct address_space *mapping = vmf->vma->vm_file->f_mapping;
> > +   size_t size = pmd ? PMD_SIZE : PAGE_SIZE;
> > +   loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
> > +   bool write = vmf->flags & FAULT_FLAG_WRITE;
> > +   bool sync = dax_fault_is_synchronous(flags, vmf->vma, iomap);
> > +   unsigned long entry_flags = pmd ? DAX_PMD : 0;
> > +   int err = 0;
> > +   pfn_t pfn;
> > +
> > +   /* if we are reading UNWRITTEN and HOLE, return a hole. */
> > +   if (!write &&
> > +   (iomap->type == IOMAP_UNWRITTEN || iomap->type ==
> IOMAP_HOLE)) {
> > +   if (!pmd)
> > +   return dax_load_hole(xas, mapping, entry, vmf);
> > +   else
> > +   return dax_pmd_load_hole(xas, vmf, iomap, entry);
> > +   }
> > +
> > +   if (iomap->type != IOMAP_MAPPED) {
> > +   WARN_ON_ONCE(1);
> > +   return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS;
> > +   }
> > +
> > +   err = dax_iomap_pfn(iomap, pos,

Re: [GIT PULL] SMB3 Fixes

2021-04-08 Thread pr-tracker-bot

The pull request you sent on Thu, 8 Apr 2021 20:48:16 -0500:

> git://git.samba.org/sfrench/cifs-2.6.git tags/5.12-rc6-smb3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/17e7124aad766b3f158943acb51467f86220afe9

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

Re: [PATCH v2 7/8] cxl/port: Introduce cxl_port objects

2021-04-08 Thread Dan Williams

Hi Bjorn, thanks for taking a look.

On Thu, Apr 8, 2021 at 3:42 PM Bjorn Helgaas  wrote:
>
> [+cc Greg, Rafael, Matthew: device model questions]
>
> Hi Dan,
>
> On Thu, Apr 01, 2021 at 07:31:20AM -0700, Dan Williams wrote:
> > Once the cxl_root is established then other ports in the hierarchy can
> > be attached. The cxl_port object, unlike cxl_root that is associated
> > with host bridges, is associated with PCIE Root Ports or PCIE Switch
> > Ports. Add cxl_port instances for all PCIE Root Ports in an ACPI0016
> > host bridge.
>
> I'm not a device model expert, but I'm not sure about adding a new
> /sys/bus/cxl/devices hierarchy.  I'm under the impression that CXL
> devices will be enumerated by the PCI core as PCIe devices.

Yes, PCIe is involved, but mostly only for the CXL.io slow path
(configuration and provisioning via mailbox) when we're talking about
memory expander devices (CXL calls these Type-3). So-called "Type-3"
support is the primary driver of this infrastructure.

You might be thinking of CXL accelerator devices that will look like
plain PCIe devices that happen to participate in the CPU cache
hierarchy (CXL calls these Type-1). There will also be accelerator
devices that want to share coherent memory with the system (CXL calls
these Type-2).

The infrastructure being proposed here is primarily for the memory
expander (Type-3) device case where the PCI sysfs hierarchy is wholly
unsuited for modeling it. A single CXL memory region device may span
multiple endpoints, switches, and host bridges. It poses similar
stress to an OS device model as RAID where there is a driver for the
component contributors to an upper level device / driver that exposes
the RAID Volume (CXL memory region interleave set). The CXL memory
decode space (HDM: Host Managed Device Memory) is independent of the
PCIe MMIO BAR space.

That's where the /sys/bus/cxl hierarchy is needed, to manage the HDM
space across the CXL topology in a way that is foreign to PCIE (HDM
Decoder hierarchy).

> Doesn't
> that mean we will have one struct device in the pci_dev, and another
> one in the cxl_port?

Yes, that is the proposal.

> That seems like an issue to me.  More below.

hmm...

>
> > The cxl_port instances for PCIE Switch Ports are not
> > included here as those are to be modeled as another service device
> > registered on the pcie_port_bus_type.
>
> I'm hesitant about the idea of adding more uses of pcie_port_bus_type.
> I really dislike portdrv because it makes a parallel hierarchy:
>
>   /sys/bus/pci
>   /sys/bus/pci_express
>
> for things that really should not be different.  There's a struct
> device in pci_dev, and potentially several pcie_devices, each with
> another struct device.  We make these pcie_device things for AER, DPC,
> hotplug, etc.  E.g.,
>
>   /sys/bus/pci/devices/:00:1c.0
>   /sys/bus/pci_express/devices/:00:1c.0:pcie002  # AER
>   /sys/bus/pci_express/devices/:00:1c.0:pcie010  # BW notification
>
> These are all the same PCI device.  AER is a PCI capability.
> Bandwidth notification is just a feature of all Downstream Ports.  I
> think it makes zero sense to have extra struct devices for them.  From
> a device point of view (enumeration, power management, VM assignment),
> we can't manage them separately from the underlying PCI device.  For
> example, we have three separate "power/" directories, but obviously
> there's only one point of control (00:1c.0):
>
>   /sys/devices/pci:00/:00:1c.0/power/
>   /sys/devices/pci:00/:00:1c.0/:00:1c.0:pcie002/power/
>   /sys/devices/pci:00/:00:1c.0/:00:1c.0:pcie010/power/

The superfluous power/ issue can be cleaned up with
device_set_pm_not_required().

What are the other problems this poses, because in other areas this
ability to subdivide a device's functionality into sub-drivers is a
useful organization principle? So much so that several device writer
teams came together to create the auxiliary-bus for the purpose of
allowing sub-drivers to be carved off for independent functionality
similar to the portdrv organization.

That said, I'm open to CXL switch support *not* building on the
portdrv model, but I'm not yet on the same page with your concern.

Re: [PATCH] sched/fair: use signed long when compute energy delta in eas

2021-04-08 Thread Xuewen Yan

Hi
>
> Hi,
> > Hi
> >
> > On Wed, Apr 7, 2021 at 10:11 PM Pierre  wrote:
> > >
> > > Hi,
> > > > I test the patch, but the overflow still exists.
> > > > In the "sched/fair: Use pd_cache to speed up
> > find_energy_efficient_cpu()"
> > > > I wonder why recompute the cpu util when cpu==dst_cpu in
> > compute_energy(),
> > > > when the dst_cpu's util change, it also would cause the overflow.
> > >
> > > The patches aim to cache the energy values for the CPUs whose
> > > utilization is not modified (so we don't have to compute it multiple
> > > times). The values cached are the 'base values' of the CPUs, i.e. when
> > > the task is not placed on the CPU. When (cpu==dst_cpu) in
> > > compute_energy(), it means the energy values need to be updated instead
> > > of using the cached ones.
> > >
> > well, is it better to use the task_util(p) + cache values ? but in
> > this case, the cache
> > values may need more parameters.
>
> This patch-set is not significantly improving the execution time of
> feec(). The results we have so far are an improvement of 5-10% in
> execution time, with feec() being executed in < 10us. So the gain is not
> spectacular.

well， I meaned to cache all util value and compute energy with caches, when
(cpu==dst_cpu), use caches instead of updating util, and do not get
util with function:
 "effective_cpu_util()", to compute util with cache.
I add more parameters into pd_cache:
struct pd_cache {
unsigned long util;
unsigned long util_est;
unsigned long util_cfs;
unsigned long util_irq;
unsigned long util_rt;
unsigned long util_dl;
unsigned long bw_dl;
unsigned long freq_util;
unsigned long nrg_util;
};
In this way, it can avoid util update while feec. I tested with it,
and the negative delta disappeared.
Maybe this is not a good method, but it does work.
>
> >
> > > You are right, there is still a possibility to have a negative delta
> > > with the patches at:
> > >
> > https://gitlab.arm.com/linux-arm/linux-power/-/commits/eas/next/integration-20210129
> > 
> > > Adding a check before subtracting the values, and bailing out in such
> > > case would avoid this, such as at:
> > > https://gitlab.arm.com/linux-arm/linux-pg/-/commits/feec_bail_out/
> > 
> > >
> > In your patch, you bail out the case by "go to fail", that means you
> > don't use eas in such
> > case. However, in the actual scene, the case often occurr when select
> > cpu for small task.
> > As a result, the small task would not select cpu according to the eas,
> > it may affect
> > power consumption?
> With this patch (bailing out), the percentage of feec() returning due to
> a negative delta I get are:
> on a Juno-r2, with 2 big CPUs and 4 CPUs (capacity of 383), with a
> workload running during 5s with task having a period of 16 ms and:
>   - 50 tasks at 1%:   0.14%
>   - 30 tasks at 1%:   0.54%
>   - 10 tasks at 1%: < 0.1%
>   - 30 tasks at 5%: < 0.1%
>   - 10 tasks at 5%: < 0.1%
> It doesn't happen so often to me.If we bail out of feec(), the task will
> still have another opportunity in the next call. However I agree this
> can lead to a bad placement when this happens.
> >
> > > I think a similar modification should be done in your patch. Even though
> > > this is a good idea to group the calls to compute_energy() to reduce the
> > > chances of having updates of utilization values in between the
> > > compute_energy() calls,
> > > there is still a chance to have updates. I think it happened when I
> > > applied your patch.
> > >
> > > About changing the delta(s) from 'unsigned long' to 'long', I am not
> > > sure of the meaning of having a negative delta. I thing it would be
> > > better to check and fail before it happens instead.
> > >
> > > Regards
> > >
>
>
>

[PATCH -next] mmc: sdhci-st: Remove unnecessary error log

2021-04-08 Thread Laibin Qiu

devm_ioremap_resource() has recorded error log, so it's
unnecessary to record log again.

Reported-by: Hulk Robot 
Signed-off-by: Laibin Qiu 
---
 drivers/mmc/host/sdhci-st.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/mmc/host/sdhci-st.c b/drivers/mmc/host/sdhci-st.c
index 78941ac3a1d6..d41582c21aa3 100644
--- a/drivers/mmc/host/sdhci-st.c
+++ b/drivers/mmc/host/sdhci-st.c
@@ -400,10 +400,8 @@ static int sdhci_st_probe(struct platform_device *pdev)
res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
   "top-mmc-delay");
pdata->top_ioaddr = devm_ioremap_resource(>dev, res);
-   if (IS_ERR(pdata->top_ioaddr)) {
-   dev_warn(>dev, "FlashSS Top Dly registers not available");
+   if (IS_ERR(pdata->top_ioaddr))
pdata->top_ioaddr = NULL;
-   }
 
pltfm_host->clk = clk;
pdata->icnclk = icnclk;
-- 
2.25.1

RE: [PATCH v5 2/5] media: dt-bindings: media: renesas,drif: Convert to json-schema

2021-04-08 Thread Fabrizio Castro

Hi Rob,

thanks for your feedback.

> From: Rob Herring 
> Sent: 07 April 2021 19:27
> Subject: Re: [PATCH v5 2/5] media: dt-bindings: media: renesas,drif:
> Convert to json-schema
> 
> On Thu, Jan 14, 2021 at 7:02 AM Geert Uytterhoeven 
> wrote:
> >
> > Hi Fabrizio, Rob,
> >
> > On Wed, Oct 21, 2020 at 3:53 PM Fabrizio Castro
> >  wrote:
> > > Convert the Renesas DRIF bindings to DT schema and update
> > > MAINTAINERS accordingly.
> > >
> > > Signed-off-by: Fabrizio Castro 
> > > Reviewed-by: Lad Prabhakar 
> > > Reviewed-by: Laurent Pinchart 
> > > Reviewed-by: Geert Uytterhoeven 
> > > Reviewed-by: Rob Herring 
> >
> > Thanks for your patch!
> >
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/media/renesas,drif.yaml
> >
> > > +  clock-names:
> > > +maxItems: 1
> > > +items:
> > > +  - const: fck
> >
> > With latest dt-schema, "make dt_binding_check" complains:
> >
> > Documentation/devicetree/bindings/media/renesas,drif.yaml:
> > properties:clock-names:maxItems: False schema does not allow 1
> > Documentation/devicetree/bindings/media/renesas,drif.yaml:
> > ignoring, error in schema: properties: clock-names: maxItems
> 
> Seems this just got applied, and now this error is in linux-next.

I'll send a patch to fix the problem shortly.

Thanks,
Fab

> 
> >
> > Using
> >
> >clock-names:
> >  const: fck
> >
> > Fixes that.
> >
> > However, I'm wondering why I do not get a complaint about the similar
> > clock/clock-names in
> > Documentation/devicetree/bindings/display/bridge/renesas,lvds.yaml.
> > Because they're part of an else branch?
> 
> Probably. if/then/else schemas have fewer checks as they can be
> incomplete (only additional constraints on the top-level schema).
> 
> Rob

Re: [PATCH v4 0/5] Next revision of the L1D flush patches

2021-04-08 Thread Kees Cook

*thread necromancy*
https://lore.kernel.org/lkml/20210108121056.21940-1-sbl...@amazon.com/

On Mon, Jan 25, 2021 at 09:27:38AM +, Singh, Balbir wrote:
> On Fri, 2021-01-08 at 23:10 +1100, Balbir Singh wrote:
> > Implement a mechanism that allows tasks to conditionally flush
> > their L1D cache (mitigation mechanism suggested in [2]). The previous
> > posts of these patches were sent for inclusion (see [3]) and were not
> > included due to the concern for the need for additional checks,
> > those checks were:
> > 
> > 1. Implement this mechanism only for CPUs affected by the L1TF bug
> > 2. Disable the software fallback
> > 3. Provide an override to enable this mechanism
> > 4. Be SMT aware in the implementation
> > [...]
> Ping on any review comments? Suggested refactoring?

Hi!

I'd still really like to see this -- it's a big hammer, but that's the
point for cases where some new flaw appears and we can point to the
toolbox and say "you can mitigate it with this while you wait for new
kernel/CPU."

Any further thoughts from x86 maintainers? This seems like it addressed
all of tglx's review comments.

-- 
Kees Cook

Re: [PATCH v1 2/2] drivers/gpu/drm: don't select DMA_CMA or CMA from aspeed or etnaviv

2021-04-08 Thread Arnd Bergmann

On Thu, Apr 8, 2021 at 6:45 PM David Hildenbrand  wrote:
> On 08.04.21 14:49, Linus Walleij wrote:
> > On Thu, Apr 8, 2021 at 2:01 PM David Hildenbrand  wrote:
> >
> >>> This is something you could do using a hidden helper symbol like
> >>>
> >>> config DRMA_ASPEED_GFX
> >>>  bool "Aspeed display driver"
> >>>  select DRM_WANT_CMA
> >>>
> >>> config DRM_WANT_CMA
> >>>  bool
> >>>  help
> >>> Select this from any driver that benefits from CMA being 
> >>> enabled
> >>>
> >>> config DMA_CMA
> >>>  bool "Use CMA helpers for DRM"
> >>>  default DRM_WANT_CMA
> >>>
> >>>Arnd
> >>>
> >>
> >> That's precisely what I had first, with an additional "WANT_CMA" --  but
> >> looking at the number of such existing options (I was able to spot 1 !)
> >
> > If you do this it probably makes sense to fix a few other drivers
> > Kconfig in the process. It's not just a problem with your driver.
> > "my" drivers:
> >
>
> :) I actually wanted to convert them to "depends on DMA_CMA" but ran
> into recursive dependencies ...
>
> > drivers/gpu/drm/mcde/Kconfig
> > drivers/gpu/drm/pl111/Kconfig
> > drivers/gpu/drm/tve200/Kconfig

Right, this is the main problem caused by using 'select' to
force-enable symbols that other drivers depend on.

Usually, the answer is to be consistent about the use of 'select'
and 'depends on', using the former only to enable symbols that
are hidden, while using 'depends on' for anything that is an
actual build time dependency.

> I was assuming these are "real" dependencies. Will it also work without
> DMA_CMA?

I think in this case, it is fairly likely to work without DMA_CMA when the
probe function gets called during a fresh boot, but fairly likely to fail if
it gets called after the system has run for long enough to fragment the
free memory.

The point of DMA_CMA is to make it work reliably.

  Arnd

Re: [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory

2021-04-08 Thread Shakeel Butt

On Thu, Apr 8, 2021 at 11:01 AM Yang Shi  wrote:
>
> On Thu, Apr 8, 2021 at 10:19 AM Shakeel Butt  wrote:
> >
> > Hi Tim,
> >
> > On Mon, Apr 5, 2021 at 11:08 AM Tim Chen  wrote:
> > >
> > > Traditionally, all memory is DRAM.  Some DRAM might be closer/faster than
> > > others NUMA wise, but a byte of media has about the same cost whether it
> > > is close or far.  But, with new memory tiers such as Persistent Memory
> > > (PMEM).  there is a choice between fast/expensive DRAM and slow/cheap
> > > PMEM.
> > >
> > > The fast/expensive memory lives in the top tier of the memory hierachy.
> > >
> > > Previously, the patchset
> > > [PATCH 00/10] [v7] Migrate Pages in lieu of discard
> > > https://lore.kernel.org/linux-mm/20210401183216.443c4...@viggo.jf.intel.com/
> > > provides a mechanism to demote cold pages from DRAM node into PMEM.
> > >
> > > And the patchset
> > > [PATCH 0/6] [RFC v6] NUMA balancing: optimize memory placement for memory 
> > > tiering system
> > > https://lore.kernel.org/linux-mm/20210311081821.138467-1-ying.hu...@intel.com/
> > > provides a mechanism to promote hot pages in PMEM to the DRAM node
> > > leveraging autonuma.
> > >
> > > The two patchsets together keep the hot pages in DRAM and colder pages
> > > in PMEM.
> >
> > Thanks for working on this as this is becoming more and more important
> > particularly in the data centers where memory is a big portion of the
> > cost.
> >
> > I see you have responded to Michal and I will add my more specific
> > response there. Here I wanted to give my high level concern regarding
> > using v1's soft limit like semantics for top tier memory.
> >
> > This patch series aims to distribute/partition top tier memory between
> > jobs of different priorities. We want high priority jobs to have
> > preferential access to the top tier memory and we don't want low
> > priority jobs to hog the top tier memory.
> >
> > Using v1's soft limit like behavior can potentially cause high
> > priority jobs to stall to make enough space on top tier memory on
> > their allocation path and I think this patchset is aiming to reduce
> > that impact by making kswapd do that work. However I think the more
> > concerning issue is the low priority job hogging the top tier memory.
> >
> > The possible ways the low priority job can hog the top tier memory are
> > by allocating non-movable memory or by mlocking the memory. (Oh there
> > is also pinning the memory but I don't know if there is a user api to
> > pin memory?) For the mlocked memory, you need to either modify the
> > reclaim code or use a different mechanism for demoting cold memory.
>
> Do you mean long term pin? RDMA should be able to simply pin the
> memory for weeks. A lot of transient pins come from Direct I/O. They
> should be less concerned.
>
> The low priority jobs should be able to be restricted by cpuset, for
> example, just keep them on second tier memory nodes. Then all the
> above problems are gone.
>

Yes that's an extreme way to overcome the issue but we can do less
extreme by just (hard) limiting the top tier usage of low priority
jobs.

> >
> > Basically I am saying we should put the upfront control (limit) on the
> > usage of top tier memory by the jobs.
>
> This sounds similar to what I talked about in LSFMM 2019
> (https://lwn.net/Articles/787418/). We used to have some potential
> usecase which divides DRAM:PMEM ratio for different jobs or memcgs
> when I was with Alibaba.
>
> In the first place I thought about per NUMA node limit, but it was
> very hard to configure it correctly for users unless you know exactly
> about your memory usage and hot/cold memory distribution.
>
> I'm wondering, just off the top of my head, if we could extend the
> semantic of low and min limit. For example, just redefine low and min
> to "the limit on top tier memory". Then we could have low priority
> jobs have 0 low/min limit.
>

The low and min limits have semantics similar to the v1's soft limit
for this situation i.e. letting the low priority job occupy top tier
memory and depending on reclaim to take back the excess top tier
memory use of such jobs.

I have some thoughts on NUMA node limits which I will share in the other thread.

Re: [PATCH v4 1/4] KVM: x86: Fix a spurious -E2BIG in KVM_GET_EMULATED_CPUID

2021-04-08 Thread Sean Christopherson

On Thu, Apr 08, 2021, Emanuele Giuseppe Esposito wrote:
> When retrieving emulated CPUID entries, check for an insufficient array
> size if and only if KVM is actually inserting an entry.
> If userspace has a priori knowledge of the exact array size,
> KVM_GET_EMULATED_CPUID will incorrectly fail due to effectively requiring
> an extra, unused entry.
> 
> Fixes: 433f4ba19041 ("KVM: x86: fix out-of-bounds write in 
> KVM_GET_EMULATED_CPUID (CVE-2019-19332)")
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  arch/x86/kvm/cpuid.c | 33 -
>  1 file changed, 16 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 6bd2f8b830e4..d30194081892 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -567,34 +567,33 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct 
> kvm_cpuid_array *array,
>  
>  static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
>  {
> - struct kvm_cpuid_entry2 *entry;
> -
> - if (array->nent >= array->maxnent)
> - return -E2BIG;
> + struct kvm_cpuid_entry2 entry;
>  
> - entry = >entries[array->nent];
> - entry->function = func;
> - entry->index = 0;
> - entry->flags = 0;
> + memset(, 0, sizeof(entry));
>  
>   switch (func) {
>   case 0:
> - entry->eax = 7;
> - ++array->nent;
> + entry.eax = 7;
>   break;
>   case 1:
> - entry->ecx = F(MOVBE);
> - ++array->nent;
> + entry.ecx = F(MOVBE);
>   break;
>   case 7:
> - entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> - entry->eax = 0;
> - entry->ecx = F(RDPID);
> - ++array->nent;
> - default:
> + entry.flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> + entry.ecx = F(RDPID);
>   break;
> + default:
> + goto out;
>   }
>  
> + /* This check is performed only when func is valid */

Sorry to keep nitpicking and bikeshedding.  Funcs aren't really "invalid", KVM
just doesn't have any features it emulates in other leafs.  Maybe be more 
literal
in describing what triggers the check?

/* Check the array capacity iff the entry is being copied over. */

Not a sticking point, so either way:

Reviewed-by: Sean Christopherson 

> + if (array->nent >= array->maxnent)
> + return -E2BIG;
> +
> + entry.function = func;
> + memcpy(>entries[array->nent++], , sizeof(entry));
> +
> +out:
>   return 0;
>  }
>  
> -- 
> 2.30.2
>

Re: [PATCH 04/10] mm/migrate: make migrate_pages() return nr_succeeded

2021-04-08 Thread Yang Shi

On Thu, Apr 8, 2021 at 11:17 AM Oscar Salvador  wrote:
>
> On Thu, Apr 08, 2021 at 10:26:54AM -0700, Yang Shi wrote:
>
> > Thanks, Oscar. Yes, kind of. But we have to remember to initialize
> > "nr_succedded" pointer properly for every migrate_pages() callsite,
> > right? And it doesn't prevent from returning wrong value if
> > migrate_pages() is called multiple times by one caller although there
> > might be not such case (calls migrate_pages() multiple times and care
> > about nr_succeded) for now.
>
> Hi Yang,
>
> I might be missing something but AFAICS you only need to initialize
> nr_succeded pointer where it matters.
> The local nr_succeeded in migrate_pages() doesn't go, and so it gets
> initialized every time you call in it to 0.
> And if you pass a valid pointer, *ret_succeeded == nr_succedeed.
>
> I am talking about this (not even compile-tested):
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 3a389633b68f..fd661cb2ce13 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -40,7 +40,8 @@ extern int migrate_page(struct address_space *mapping,
> struct page *newpage, struct page *page,
> enum migrate_mode mode);
>  extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t 
> free,
> -   unsigned long private, enum migrate_mode mode, int reason);
> +   unsigned long private, enum migrate_mode mode, int reason,
> +   unsigned int *ret_succeeded);
>  extern struct page *alloc_migration_target(struct page *page, unsigned long 
> private);
>  extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
>  extern void putback_movable_page(struct page *page);
> @@ -58,7 +59,7 @@ extern int migrate_page_move_mapping(struct address_space 
> *mapping,
>  static inline void putback_movable_pages(struct list_head *l) {}
>  static inline int migrate_pages(struct list_head *l, new_page_t new,
> free_page_t free, unsigned long private, enum migrate_mode 
> mode,
> -   int reason)
> +   int reason, unsigned int *ret_succeeded)
> { return -ENOSYS; }
>  static inline struct page *alloc_migration_target(struct page *page,
> unsigned long private)
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e04f4476e68e..7238e8faff04 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2364,7 +2364,7 @@ compact_zone(struct compact_control *cc, struct 
> capture_control *capc)
>
> err = migrate_pages(>migratepages, compaction_alloc,
> compaction_free, (unsigned long)cc, cc->mode,
> -   MR_COMPACTION);
> +   MR_COMPACTION, NULL);
>
> trace_mm_compaction_migratepages(cc->nr_migratepages, err,
> >migratepages);
> diff --git a/mm/gup.c b/mm/gup.c
> index e40579624f10..b70d463aa1fc 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1606,7 +1606,7 @@ static long check_and_migrate_cma_pages(struct 
> mm_struct *mm,
> put_page(pages[i]);
>
> if (migrate_pages(_page_list, alloc_migration_target, 
> NULL,
> -   (unsigned long), MIGRATE_SYNC, MR_CONTIG_RANGE)) {
> +   (unsigned long), MIGRATE_SYNC, MR_CONTIG_RANGE, 
> NULL)) {
> /*
>  * some of the pages failed migration. Do 
> get_user_pages
>  * without migration.
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 24210c9bd843..a17e0f039076 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1852,7 +1852,8 @@ static int __soft_offline_page(struct page *page)
>
> if (isolate_page(hpage, )) {
> ret = migrate_pages(, alloc_migration_target, NULL,
> -   (unsigned long), MIGRATE_SYNC, MR_MEMORY_FAILURE);
> +   (unsigned long), MIGRATE_SYNC, MR_MEMORY_FAILURE,
> +   NULL);
> if (!ret) {
> bool release = !huge;
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0cdbbfbc5757..28496376de94 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1466,7 +1466,8 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
> end_pfn)
> if (nodes_empty(nmask))
> node_set(mtc.nid, nmask);
> ret = migrate_pages(, alloc_migration_target, NULL,
> -   (unsigned long), MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> +   (unsigned long), MIGRATE_SYNC, MR_MEMORY_HOTPLUG,
> +   NULL);
> if (ret) {
> list_for_each_entry(page, , lru) {
> pr_warn("migrating pfn %lx failed ret:%d ",
> diff --git

Re: [PATCH V2 3/3] dt-bindings: pinctrl: qcom-pmic-gpio: Convert qcom pmic gpio bindings to YAML

2021-04-08 Thread Rob Herring

On Thu, Apr 01, 2021 at 06:05:45PM +0530, satya priya wrote:
> Convert Qualcomm PMIC GPIO bindings from .txt to .yaml format.
> 
> Signed-off-by: satya priya 
> ---
> Changes in V3:
>  - As per Rob's comments fixed bot erros.
>  - Moved this patch to end of the series so that other patches are not
>blocked on this.
> 
>  .../devicetree/bindings/pinctrl/qcom,pmic-gpio.txt | 280 
>  .../bindings/pinctrl/qcom,pmic-gpio.yaml   | 281 
> +
>  2 files changed, 281 insertions(+), 280 deletions(-)
>  delete mode 100644 
> Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.txt
>  create mode 100644 
> Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml


> diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml 
> b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml
> new file mode 100644
> index 000..e7e7027
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-gpio.yaml
> @@ -0,0 +1,281 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/pinctrl/qcom,pmic-gpio.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Qualcomm PMIC GPIO block
> +
> +maintainers:
> +  - Bjorn Andersson 
> +
> +description: |
> +  This binding describes the GPIO block(s) found in the 8xxx series of
> +  PMIC's from Qualcomm.
> +
> +properties:
> +  compatible:
> +items:
> +  - enum:
> +  - qcom,pm8005-gpio
> +  - qcom,pm8018-gpio
> +  - qcom,pm8038-gpio
> +  - qcom,pm8058-gpio
> +  - qcom,pm8916-gpio
> +  - qcom,pm8917-gpio
> +  - qcom,pm8921-gpio
> +  - qcom,pm8941-gpio
> +  - qcom,pm8950-gpio
> +  - qcom,pm8994-gpio
> +  - qcom,pm8998-gpio
> +  - qcom,pma8084-gpio
> +  - qcom,pmi8950-gpio
> +  - qcom,pmi8994-gpio
> +  - qcom,pmi8998-gpio
> +  - qcom,pms405-gpio
> +  - qcom,pm660-gpio
> +  - qcom,pm660l-gpio
> +  - qcom,pm8150-gpio
> +  - qcom,pm8150b-gpio
> +  - qcom,pm6150-gpio
> +  - qcom,pm6150l-gpio
> +  - qcom,pmx55-gpio
> +  - qcom,pm7325-gpio
> +  - qcom,pm8350c-gpio
> +  - qcom,pmk8350-gpio
> +  - qcom,pmr735a-gpio
> +
> +  - enum:
> +  - qcom,spmi-gpio
> +  - qcom,ssbi-gpio

Any combination of the 1st and 2nd entry is valid?

> +
> +  reg:
> +description: Register base of the GPIO block and length.

Just: 

maxItems: 1

> +
> +  interrupts:
> +description: |
> +Must contain an array of encoded interrupt specifiers for
> +each available GPIO

Need to define how many interrupts. I assume there's some max.

> +
> +  '#interrupt-cells':
> +const: 2
> +
> +  interrupt-controller: true
> +
> +  gpio-controller: true
> +
> +  gpio-ranges:
> +maxItems: 1
> +
> +  '#gpio-cells':
> +const: 2
> +description: |
> +The first cell will be used to define gpio number and the
> +second denotes the flags for this gpio
> +
> +  gpio-keys:
> +type: object
> +properties:
> +  volume-keys:
> +type: object

Needs a $ref to pinmux-node.yaml and pincfg-node.yaml.

> +properties:
> +  pins:
> +description: |
> +List of gpio pins affected by the properties specified in
> +this subnode.  Valid pins are
> + - gpio1-gpio4 for pm8005
> + - gpio1-gpio6 for pm8018
> + - gpio1-gpio12 for pm8038
> + - gpio1-gpio40 for pm8058
> + - gpio1-gpio4 for pm8916
> + - gpio1-gpio38 for pm8917
> + - gpio1-gpio44 for pm8921
> + - gpio1-gpio36 for pm8941
> + - gpio1-gpio8 for pm8950 (hole on gpio3)
> + - gpio1-gpio22 for pm8994
> + - gpio1-gpio26 for pm8998
> + - gpio1-gpio22 for pma8084
> + - gpio1-gpio2 for pmi8950
> + - gpio1-gpio10 for pmi8994
> + - gpio1-gpio12 for pms405 (holes on gpio1, gpio9
> +and gpio10)
> + - gpio1-gpio10 for pm8150 (holes on gpio2, gpio5,
> +gpio7 and gpio8)
> + - gpio1-gpio12 for pm8150b (holes on gpio3, gpio4
> + and gpio7)
> + - gpio1-gpio12 for pm8150l (hole on gpio7)
> + - gpio1-gpio10 for pm6150
> + - gpio1-gpio12 for pm6150l
> + - gpio1-gpio10 for pm7325
> + - gpio1-gpio9 for pm8350c
> + - gpio1-gpio4 for pmk8350
> + - gpio1-gpio4 for pmr735a
>

Subject: Re: [PATCH v3] kbuild: add support for zstd compressed modules

2021-04-08 Thread Piotr Gorski

No, the --rm option is essential. xz and gzip have the --rm option built in as 
opposed to zstd, which is why I used it. I've been using zstd module 
compression since last december (although I set a different compression level 
on mine) and everything works fine. Oleksandr also tested it at his place and 
didn't report any objections.

[RFC][PATCH] mm: Split page_has_private() in two to better handle PG_private_2

2021-04-08 Thread David Howells

Hi Willy, Linus,

How about this to handle the situation with PG_private_2?  I think it handles
things according to Linus's suggestion.

David
---
mm: Split page_has_private() in two to better handle PG_private_2

Split page_has_private() into two functions:

 (1) page_needs_cleanup() to find out if a page needs the ->releasepage(),
 ->invalidatepage(), etc. address space ops calling upon it.

 This returns true when either PG_private or PG_private_2 are set.

 (2) page_private_count() which returns a count of the number of refs
 contributed to a page for attached private data.

 This returns 1 if PG_private is set and 0 otherwise.

I think the suggestion[1] is that PG_private_2 should just have a ref on
the page, but this isn't accounted in the same way as PG_private's ref.

Notes:

 (*) The following:

btrfs_migratepage()
iomap_set_range_uptodate()
iomap_migrate_page()
to_iomap_page()

 should probably all use PagePrivate() rather than page_has_private()
 since they're interested in what's attached to page->private when
 they're doing this, and not PG_private_2.

 It may not matter in these cases since page->private is probably NULL
 if PG_private is not set.

 (*) Do we actually need PG_private, or is it possible just to see if
 page->private is NULL?

 (*) There's a lot of "if (page_has_private()) try_to_release_page()"
 combos.  Does it make sense to create a inline function for this?

Signed-off-by: David Howells 
Link: 
https://lore.kernel.org/linux-fsdevel/CAHk-=whwojhgemn85loh9fx-5d2-upzmv1m2zmyxvd31tkp...@mail.gmail.com/
 [1]
---
 fs/btrfs/disk-io.c |2 +-
 fs/btrfs/inode.c   |2 +-
 fs/ext4/move_extent.c  |8 
 fs/fuse/dev.c  |2 +-
 fs/iomap/buffered-io.c |6 +++---
 fs/splice.c|2 +-
 include/linux/page-flags.h |   17 +++--
 include/trace/events/pagemap.h |2 +-
 mm/khugepaged.c|4 ++--
 mm/migrate.c   |   10 +-
 mm/readahead.c |2 +-
 mm/truncate.c  |   12 ++--
 mm/vmscan.c|   12 ++--
 13 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 41b718cfea40..d95f8d4b3004 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -936,7 +936,7 @@ static int btree_migratepage(struct address_space *mapping,
 * Buffers may be managed in a filesystem specific way.
 * We must have no buffers or drop them.
 */
-   if (page_has_private(page) &&
+   if (page_needs_cleanup(page) &&
!try_to_release_page(page, GFP_KERNEL))
return -EAGAIN;
return migrate_page(mapping, newpage, page, mode);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7cdf65be3707..94f038d34f16 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8333,7 +8333,7 @@ static int btrfs_migratepage(struct address_space 
*mapping,
if (ret != MIGRATEPAGE_SUCCESS)
return ret;
 
-   if (page_has_private(page))
+   if (PagePrivate(page))
attach_page_private(newpage, detach_page_private(page));
 
if (PagePrivate2(page)) {
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 64a579734f93..16d0a7a73191 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -329,9 +329,9 @@ move_extent_per_page(struct file *o_filp, struct inode 
*donor_inode,
ext4_double_up_write_data_sem(orig_inode, donor_inode);
goto data_copy;
}
-   if ((page_has_private(pagep[0]) &&
+   if ((page_needs_cleanup(pagep[0]) &&
 !try_to_release_page(pagep[0], 0)) ||
-   (page_has_private(pagep[1]) &&
+   (page_needs_cleanup(pagep[1]) &&
 !try_to_release_page(pagep[1], 0))) {
*err = -EBUSY;
goto drop_data_sem;
@@ -351,8 +351,8 @@ move_extent_per_page(struct file *o_filp, struct inode 
*donor_inode,
 
/* At this point all buffers in range are uptodate, old mapping layout
 * is no longer required, try to drop it now. */
-   if ((page_has_private(pagep[0]) && !try_to_release_page(pagep[0], 0)) ||
-   (page_has_private(pagep[1]) && !try_to_release_page(pagep[1], 0))) {
+   if ((page_needs_cleanup(pagep[0]) && !try_to_release_page(pagep[0], 0)) 
||
+   (page_needs_cleanup(pagep[1]) && !try_to_release_page(pagep[1], 
0))) {
*err = -EBUSY;
goto unlock_pages;
}
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index c0fee830a34e..76e8ca9e47fa 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -837,7 +837,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, 
struct page **pagep)
 */

Re: [RFC v2] KVM: x86: Support KVM VMs sharing SEV context

2021-04-08 Thread James Bottomley

On Thu, 2021-04-08 at 12:48 -0700, Steve Rutherford wrote:
> On Thu, Apr 8, 2021 at 10:43 AM James Bottomley 
> wrote:
> > On Fri, 2021-04-02 at 16:20 +0200, Paolo Bonzini wrote:
> > > On 02/04/21 13:58, Ashish Kalra wrote:
> > > > Hi Nathan,
> > > > 
> > > > Will you be posting a corresponding Qemu patch for this ?
> > > 
> > > Hi Ashish,
> > > 
> > > as far as I know IBM is working on QEMU patches for guest-based
> > > migration helpers.
> > 
> > Yes, that's right, we'll take on this part.
> > 
> > > However, it would be nice to collaborate on the low-level
> > > (SEC/PEI) firmware patches to detect whether a CPU is part of the
> > > primary VM or the mirror.  If Google has any OVMF patches already
> > > done for that, it would be great to combine it with IBM's SEV
> > > migration code and merge it into upstream OVMF.
> > 
> > We've reached the stage with our prototyping where not having the
> > OVMF support is blocking us from working on QEMU.  If we're going
> > to have to reinvent the wheel in OVMF because Google is unwilling
> > to publish the patches, can you at least give some hints about how
> > you did it?
> > 
> > Thanks,
> > 
> > James
> 
> Hey James,
> It's not strictly necessary to modify OVMF to make SEV VMs live
> migrate. If we were to modify OVMF, we would contribute those changes
> upstream.

Well, no, we already published an OVMF RFC to this list that does
migration.  However, the mirror approach requires a different boot
mechanism for the extra vCPU in the mirror.  I assume you're doing this
bootstrap through OVMF so the hypervisor can interrogate it to get the
correct entry point?  That's the code we're asking to see because
that's what replaces our use of the MP service in the RFC.

James

Re: [Outreachy kernel] [PATCH 1/2] media: zoran: add spaces around '<<'

2021-04-08 Thread Julia Lawall




On Fri, 9 Apr 2021, Mitali Borkar wrote:

> Added spaces around '<<' operator to improve readability and meet linux
> kernel coding style.
> Reported by checkpatch
>
> Signed-off-by: Mitali Borkar 
> ---
>  drivers/staging/media/zoran/zr36057.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/staging/media/zoran/zr36057.h 
> b/drivers/staging/media/zoran/zr36057.h
> index 71b651add35a..a2a75fd9f535 100644
> --- a/drivers/staging/media/zoran/zr36057.h
> +++ b/drivers/staging/media/zoran/zr36057.h
> @@ -30,13 +30,13 @@
>  #define ZR36057_VFESPFR_HOR_DCM  14
>  #define ZR36057_VFESPFR_VER_DCM  8
>  #define ZR36057_VFESPFR_DISP_MODE6
> -#define ZR36057_VFESPFR_YUV422  (0<<3)
> -#define ZR36057_VFESPFR_RGB888  (1<<3)
> -#define ZR36057_VFESPFR_RGB565  (2<<3)
> -#define ZR36057_VFESPFR_RGB555  (3<<3)
> -#define ZR36057_VFESPFR_ERR_DIF  (1<<2)
> -#define ZR36057_VFESPFR_PACK24  (1<<1)
> -#define ZR36057_VFESPFR_LITTLE_ENDIAN(1<<0)
> +#define ZR36057_VFESPFR_YUV422  (0 << 3)
> +#define ZR36057_VFESPFR_RGB888  (1 << 3)
> +#define ZR36057_VFESPFR_RGB565  (2 << 3)
> +#define ZR36057_VFESPFR_RGB555  (3 << 3)
> +#define ZR36057_VFESPFR_ERR_DIF  (1 << 2)
> +#define ZR36057_VFESPFR_PACK24  (1 << 1)
> +#define ZR36057_VFESPFR_LITTLE_ENDIAN(1 << 0)

Are these all aligned in the actual file?

julia

>  #define ZR36057_VDTR0x00c/* Video Display "Top" Register 
> */
>
> --
> 2.30.2
>
> --
> You received this message because you are subscribed to the Google Groups 
> "outreachy-kernel" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to outreachy-kernel+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/outreachy-kernel/8e8ac690d97478f7cbb9b91d46ef7a95ee5f.1617912177.git.mitaliborkar810%40gmail.com.
>

Re: [PATCH v4 1/1] of: unittest: overlay: ensure proper alignment of copied FDT

2021-04-08 Thread Guenter Roeck

On 4/8/21 2:28 PM, Rob Herring wrote:
> 
> Applying now so this gets into linux-next this week.
> 
The patch doesn't apply on top of today's -next; it conflicts
with "of: properly check for error returned by fdt_get_name()".

I reverted that patch and applied this one, and the DT unittests
run with it on openrisc. I do get a single test failure, but I that
is a different problem (possibly with the test case itself).

### dt-test ### FAIL of_unittest_dma_ranges_one():923 of_dma_get_range: wrong 
DMA addr 0x
(expecting 1) on node 
/testcase-data/address-tests/bus@8000/device@1000

Tested-by: Guenter Roeck 

Guenter

Re: [PATCH] KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command

2021-04-08 Thread Brijesh Singh

On 4/1/21 8:44 PM, Steve Rutherford wrote:
> After completion of SEND_START, but before SEND_FINISH, the source VMM can
> issue the SEND_CANCEL command to stop a migration. This is necessary so
> that a cancelled migration can restart with a new target later.
>
> Signed-off-by: Steve Rutherford 
> ---
>  .../virt/kvm/amd-memory-encryption.rst|  9 +++
>  arch/x86/kvm/svm/sev.c| 24 +++
>  include/linux/psp-sev.h   | 10 
>  include/uapi/linux/kvm.h  |  2 ++
>  4 files changed, 45 insertions(+)

Can we add a new case statement in sev_cmd_buffer_len()
[drivers/crypto/ccp/sev-dev.c] for this command ? I understand that the
command just contains the handle. I have found dyndbg very helpful. If
the command is not added in the sev_cmd_buffer_len() then we don't dump
the command buffer.

With that fixed.

Reviewed-by: Brijesh Singh

Re: [PATCH net v1] Revert "lan743x: trim all 4 bytes of the FCS; not just 2"

2021-04-08 Thread Sven Van Asbroeck

Hi Heiner,

On Thu, Apr 8, 2021 at 3:06 PM Heiner Kallweit  wrote:
>
> A completely unrelated question:
> How about VLAN packets with a 802.1Q tag? Should VLAN_ETH_HLEN be used?

That's a good question. My use-case does not involve 802.1Q though, so
I'm unable to test.

Thank you so much for your suggestion earlier, I'll put a proper
attribution to you in the new patch's commit message.

Re: [PATCH 2/4] mm/hugeltb: simplify the return code of __vma_reservation_common()

2021-04-08 Thread Mike Kravetz

On 4/7/21 7:44 PM, Miaohe Lin wrote:
> On 2021/4/8 5:23, Mike Kravetz wrote:
>> On 4/6/21 8:09 PM, Miaohe Lin wrote:
>>> On 2021/4/7 10:37, Mike Kravetz wrote:
 On 4/6/21 7:05 PM, Miaohe Lin wrote:
> Hi:
> On 2021/4/7 8:53, Mike Kravetz wrote:
>> On 4/2/21 2:32 AM, Miaohe Lin wrote:
>>> It's guaranteed that the vma is associated with a resv_map, i.e. either
>>> VM_MAYSHARE or HPAGE_RESV_OWNER, when the code reaches here or we would
>>> have returned via !resv check above. So ret must be less than 0 in the
>>> 'else' case. Simplify the return code to make this clear.
>>
>> I believe we still neeed that ternary operator in the return statement.
>> Why?
>>
>> There are two basic types of mappings to be concerned with:
>> shared and private.
>> For private mappings, a task can 'own' the mapping as indicated by
>> HPAGE_RESV_OWNER.  Or, it may not own the mapping.  The most common way
>> to create a non-owner private mapping is to have a task with a private
>> mapping fork.  The parent process will have HPAGE_RESV_OWNER set, the
>> child process will not.  The idea is that since the child has a COW copy
>> of the mapping it should not consume reservations made by the parent.
>
> The child process will not have HPAGE_RESV_OWNER set because at fork 
> time, we do:
>   /*
>* Clear hugetlb-related page reserves for children. This only
>* affects MAP_PRIVATE mappings. Faults generated by the child
>* are not guaranteed to succeed, even if read-only
>*/
>   if (is_vm_hugetlb_page(tmp))
>   reset_vma_resv_huge_pages(tmp);
> i.e. we have vma->vm_private_data = (void *)0; for child process and 
> vma_resv_map() will
> return NULL in this case.
> Or am I missed something?
>
>> Only the parent (HPAGE_RESV_OWNER) is allowed to consume the
>> reservations.
>> Hope that makens sense?
>>
>>>
>>> Signed-off-by: Miaohe Lin 
>>> ---
>>>  mm/hugetlb.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index a03a50b7c410..b7864abded3d 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -2183,7 +2183,7 @@ static long __vma_reservation_common(struct 
>>> hstate *h,
>>> return 1;
>>> }
>>> else
>>
>> This else also handles the case !HPAGE_RESV_OWNER.  In this case, we
>
> IMO, for the case !HPAGE_RESV_OWNER, we won't reach here. What do you 
> think?
>

 I think you are correct.

 However, if this is true we should be able to simply the code even
 further.  There is no need to check for HPAGE_RESV_OWNER because we know
 it must be set.  Correct?  If so, the code could look something like:

if (vma->vm_flags & VM_MAYSHARE)
return ret;

/* We know private mapping with HPAGE_RESV_OWNER */
 * ...   *
 * Add that existing comment */

if (ret > 0)
return 0;
if (ret == 0)
return 1;
return ret;

>>>
>>> Many thanks for good suggestion! What do you mean is this ?
>>
>> I think the below changes would work fine.
>>
>> However, this patch/discussion has made me ask the question.  Do we need
>> the HPAGE_RESV_OWNER flag?  Is the followng true?
>> !(vm_flags & VM_MAYSHARE) && vma_resv_map()  ===> HPAGE_RESV_OWNER
>> !(vm_flags & VM_MAYSHARE) && !vma_resv_map() ===> !HPAGE_RESV_OWNER
>>
> 
> I agree with you.
> 
> HPAGE_RESV_OWNER is set in hugetlb_reserve_pages() and there's no way to 
> clear it
> in the owner process. The child process can not inherit both HPAGE_RESV_OWNER 
> and
> resv_map. So for !HPAGE_RESV_OWNER vma, it knows nothing about resv_map.
> 
> IMO, in !(vm_flags & VM_MAYSHARE) case, we must have:
>   !!vma_resv_map() == !!HPAGE_RESV_OWNER
> 
>> I am not suggesting we eliminate the flag and make corresponding
>> changes.  Just curious if you believe we 'could' remove the flag and
>> depend on the above conditions.
>>
>> One reason for NOT removing the flag is that that flag itself and
>> supporting code and commnets help explain what happens with hugetlb
>> reserves for COW mappings.  That code is hard to understand and the
>> existing code and coments around HPAGE_RESV_OWNER help with
>> understanding.
> 
> Agree. These codes took me several days to understand...
> 

Please prepare v2 with the changes to remove the HPAGE_RESV_OWNER check
and move the large comment.


I would prefer to leave other places that mention HPAGE_RESV_OWNER
unchanged.

Thanks,
-- 
Mike Kravetz

Re: [PATCH 3/4] mm/hugeltb: fix potential wrong gbl_reserve value for hugetlb_acct_memory()

2021-04-08 Thread Mike Kravetz

On 4/7/21 8:26 PM, Miaohe Lin wrote:
> On 2021/4/8 11:24, Miaohe Lin wrote:
>> On 2021/4/8 4:53, Mike Kravetz wrote:
>>> On 4/7/21 12:24 AM, Miaohe Lin wrote:
 Hi:
 On 2021/4/7 10:49, Mike Kravetz wrote:
> On 4/2/21 2:32 AM, Miaohe Lin wrote:
>> The resv_map could be NULL since this routine can be called in the evict
>> inode path for all hugetlbfs inodes. So we could have chg = 0 and this
>> would result in a negative value when chg - freed. This is unexpected for
>> hugepage_subpool_put_pages() and hugetlb_acct_memory().
>
> I am not sure if this is possible.
>
> It is true that resv_map could be NULL.  However, I believe resv map
> can only be NULL for inodes that are not regular or link inodes.  This
> is the inode creation code in hugetlbfs_get_inode().
>
>/*
>  * Reserve maps are only needed for inodes that can have 
> associated
>  * page allocations.
>  */
> if (S_ISREG(mode) || S_ISLNK(mode)) {
> resv_map = resv_map_alloc();
> if (!resv_map)
> return NULL;
> }
>

 Agree.

> If resv_map is NULL, then no hugetlb pages can be allocated/associated
> with the file.  As a result, remove_inode_hugepages will never find any
> huge pages associated with the inode and the passed value 'freed' will
> always be zero.
>

 But I am confused now. AFAICS, remove_inode_hugepages() searches the 
 address_space of
 the inode to remove the hugepages while does not care if inode has 
 associated resv_map.
 How does it prevent hugetlb pages from being allocated/associated with the 
 file if
 resv_map is NULL? Could you please explain this more?

>>>
>>> Recall that there are only two ways to get huge pages associated with
>>> a hugetlbfs file: fallocate and mmap/write fault.  Directly writing to
>>> hugetlbfs files is not supported.
>>>
>>> If you take a closer look at hugetlbfs_get_inode, it has that code to
>>> allocate the resv map mentioned above as well as the following:
>>>
>>> switch (mode & S_IFMT) {
>>> default:
>>> init_special_inode(inode, mode, dev);
>>> break;
>>> case S_IFREG:
>>> inode->i_op = _inode_operations;
>>> inode->i_fop = _file_operations;
>>> break;
>>> case S_IFDIR:
>>> inode->i_op = _dir_inode_operations;
>>> inode->i_fop = _dir_operations;
>>>
>>> /* directory inodes start off with i_nlink == 2 (for 
>>> "." entry) */
>>> inc_nlink(inode);
>>> break;
>>> case S_IFLNK:
>>> inode->i_op = _symlink_inode_operations;
>>> inode_nohighmem(inode);
>>> break;
>>> }
>>>
>>> Notice that only S_IFREG inodes will have i_fop == 
>>> _file_operations.
>>> hugetlbfs_file_operations contain the hugetlbfs specific mmap and fallocate
>>> routines.  Hence, only files with S_IFREG inodes can potentially have
>>> associated huge pages.  S_IFLNK inodes can as well via file linking.
>>>
>>> If an inode is not S_ISREG(mode) || S_ISLNK(mode), then it will not have
>>> a resv_map.  In addition, it will not have hugetlbfs_file_operations and
>>> can not have associated huge pages.
>>>
>>
>> Many many thanks for detailed and patient explanation! :) I think I have got 
>> the idea!
>>
>>> I looked at this closely when adding commits
>>> 58b6e5e8f1ad hugetlbfs: fix memory leak for resv_map
>>> f27a5136f70a hugetlbfs: always use address space in inode for resv_map 
>>> pointer
>>>
>>> I may not be remembering all of the details correctly.  Commit f27a5136f70a
>>> added the comment that resv_map could be NULL to hugetlb_unreserve_pages.
>>>
>>
>> Since we must have freed == 0 while chg == 0. Should we make this assumption 
>> explict
>> by something like below?
>>
>> WARN_ON(chg < freed);
>>
> 
> Or just a comment to avoid confusion ?
> 

Yes, add a comment to hugetlb_unreserve_pages saying that !resv_map
implies freed == 0.

It would also be helpful to check for (chg - freed) == 0 and skip the
calls to hugepage_subpool_put_pages() and hugetlb_acct_memory().  Both
of those routines may perform an unnecessary lock/unlock cycle in this
case.

A simple
if (chg == free)
return 0;
before the call to hugepage_subpool_put_pages would work.
-- 
Mike Kravetz

[PATCH] LoadPin: Allow filesystem switch when not enforcing

2021-04-08 Thread Kees Cook

For LoadPin to be used at all in a classic distro environment, it needs
to allow for switching filesystems (from the initramfs to the "real"
root filesystem). If the "enforce" mode is not set, reset the pinned
filesystem tracking when the pinned filesystem gets unmounted.

Signed-off-by: Kees Cook 
---
 security/loadpin/loadpin.c | 110 +
 1 file changed, 63 insertions(+), 47 deletions(-)

diff --git a/security/loadpin/loadpin.c b/security/loadpin/loadpin.c
index b12f7d986b1e..ca1bbfe4a44b 100644
--- a/security/loadpin/loadpin.c
+++ b/security/loadpin/loadpin.c
@@ -45,7 +45,6 @@ static struct super_block *pinned_root;
 static DEFINE_SPINLOCK(pinned_root_spinlock);
 
 #ifdef CONFIG_SYSCTL
-
 static struct ctl_path loadpin_sysctl_path[] = {
{ .procname = "kernel", },
{ .procname = "loadpin", },
@@ -59,69 +58,81 @@ static struct ctl_table loadpin_sysctl_table[] = {
.maxlen = sizeof(int),
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax,
-   .extra1 = SYSCTL_ZERO,
+   .extra1 = SYSCTL_ONE,
.extra2 = SYSCTL_ONE,
},
{ }
 };
 
-/*
- * This must be called after early kernel init, since then the rootdev
- * is available.
- */
-static void check_pinning_enforcement(struct super_block *mnt_sb)
+static void set_sysctl(bool is_writable)
 {
-   bool ro = false;
-
/*
 * If load pinning is not enforced via a read-only block
 * device, allow sysctl to change modes for testing.
 */
-   if (mnt_sb->s_bdev) {
-   char bdev[BDEVNAME_SIZE];
-
-   ro = bdev_read_only(mnt_sb->s_bdev);
-   bdevname(mnt_sb->s_bdev, bdev);
-   pr_info("%s (%u:%u): %s\n", bdev,
-   MAJOR(mnt_sb->s_bdev->bd_dev),
-   MINOR(mnt_sb->s_bdev->bd_dev),
-   ro ? "read-only" : "writable");
-   } else
-   pr_info("mnt_sb lacks block device, treating as: writable\n");
-
-   if (!ro) {
-   if (!register_sysctl_paths(loadpin_sysctl_path,
-  loadpin_sysctl_table))
-   pr_notice("sysctl registration failed!\n");
-   else
-   pr_info("enforcement can be disabled.\n");
-   } else
-   pr_info("load pinning engaged.\n");
+   if (is_writable)
+   loadpin_sysctl_table[0].extra1 = SYSCTL_ZERO;
+   else
+   loadpin_sysctl_table[0].extra1 = SYSCTL_ONE;
 }
 #else
-static void check_pinning_enforcement(struct super_block *mnt_sb)
+static bool set_sysctl(bool is_writable) { }
+#endif
+
+/*
+ * This must be called after early kernel init, since then the rootdev
+ * is available.
+ */
+static bool sb_is_writable(struct super_block *mnt_sb, struct block_device 
**bdev)
+{
+   bool writable = true;
+
+   *bdev = mnt_sb->s_bdev;
+   if (*bdev)
+   writable = !bdev_read_only(*bdev);
+
+   return writable;
+}
+
+static void report_writable(struct block_device *bdev)
 {
-   pr_info("load pinning engaged.\n");
+   if (bdev) {
+   char name[BDEVNAME_SIZE];
+
+   bdevname(bdev, name);
+   pr_info("%s (%u:%u): %s\n", name,
+   MAJOR(bdev->bd_dev),
+   MINOR(bdev->bd_dev),
+   load_root_writable ? "writable" : "read-only");
+   } else {
+   pr_info("pinned filesystem lacks block device, treating as: 
writable\n");
+   }
 }
-#endif
 
 static void loadpin_sb_free_security(struct super_block *mnt_sb)
 {
/*
 * When unmounting the filesystem we were using for load
 * pinning, we acknowledge the superblock release, but make sure
-* no other modules or firmware can be loaded.
+* no other modules or firmware can be loaded when we are in
+* enforcing mode. Otherwise, allow the root to be reestablished.
 */
if (!IS_ERR_OR_NULL(pinned_root) && mnt_sb == pinned_root) {
-   pinned_root = ERR_PTR(-EIO);
-   pr_info("umount pinned fs: refusing further loads\n");
+   if (enforced) {
+   pinned_root = ERR_PTR(-EIO);
+   pr_info("umount pinned fs: refusing further loads\n");
+   } else {
+   pinned_root = NULL;
+   }
}
 }
 
 static int loadpin_read_file(struct file *file, enum kernel_read_file_id id,
 bool contents)
 {
+   struct block_device *bdev = NULL;
struct super_block *load_root;
+   bool load_root_writable, first_root_pin, sysctl_needed;
const char *origin = kernel_read_file_id_str(id);
 
/*
@@ -152,26 +163,27 @@ static int loadpin_read_file(struct file *file, enum 
kernel_read_file_id id,

[tip:x86/cleanups] BUILD SUCCESS 3e7bbe15ed84e3baa7dfab3aebed3a06fd39b806

2021-04-08 Thread kernel test robot

   ip22_defconfig
m68k  multi_defconfig
sh  lboxre2_defconfig
arm64alldefconfig
powerpc mpc5200_defconfig
powerpc  ep88xc_defconfig
m68k  amiga_defconfig
arm  colibri_pxa270_defconfig
powerpcmvme5100_defconfig
mipsmaltaup_xpa_defconfig
armtrizeps4_defconfig
armxcep_defconfig
ia64zx1_defconfig
sh  sh7785lcr_32bit_defconfig
powerpc  pasemi_defconfig
powerpc mpc832x_rdb_defconfig
powerpc   mpc834x_itxgp_defconfig
arm  ep93xx_defconfig
powerpc mpc83xx_defconfig
armdove_defconfig
powerpc mpc85xx_cds_defconfig
arcvdk_hs38_smp_defconfig
mips   ip32_defconfig
armrealview_defconfig
x86_64   alldefconfig
armmvebu_v7_defconfig
arm  collie_defconfig
powerpc ps3_defconfig
arm  gemini_defconfig
arm  iop32x_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
x86_64   randconfig-a004-20210408
x86_64   randconfig-a005-20210408
x86_64   randconfig-a003-20210408
x86_64   randconfig-a001-20210408
x86_64   randconfig-a002-20210408
x86_64   randconfig-a006-20210408
i386 randconfig-a006-20210408
i386 randconfig-a003-20210408
i386 randconfig-a001-20210408
i386 randconfig-a004-20210408
i386 randconfig-a005-20210408
i386 randconfig-a002-20210408
i386 randconfig-a014-20210408
i386 randconfig-a016-20210408
i386 randconfig-a011-20210408
i386 randconfig-a012-20210408
i386 randconfig-a013-20210408
i386 randconfig-a015-20210408
riscvnommu_k210_defconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv  rv32_defconfig
um   allmodconfig
um   allyesconfig
um  defconfig
x86_64rhel-8.3-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-a014-20210408
x86_64   randconfig-a015-20210408
x86_64   randconfig-a012-20210408
x86_64   randconfig-a011-20210408
x86_64   randconfig-a013-20210408
x86_64   randconfig-a016-20210408

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 1713 matches

Mail list logo