Re: [PATCH 1/2] staging: bcm2835-audio: Check if workqueue allocation failed
On Fri, Jul 13, 2018 at 09:48:16AM +0300, Dan Carpenter wrote: > On Fri, Jul 13, 2018 at 12:54:16AM +0300, Tuomas Tynkkynen wrote: > > @@ -424,7 +411,9 @@ int bcm2835_audio_open(struct bcm2835_alsa_stream > > *alsa_stream) > > int status; > > int ret; > > > > - my_workqueue_init(alsa_stream); > > + alsa_stream->my_wq = alloc_workqueue("my_queue", WQ_HIGHPRI, 1); > > + if (!alsa_stream->my_wq) > > + return -ENOMEM; > > > > ret = bcm2835_audio_open_connection(alsa_stream); > > if (ret) { > > This patch is good but if bcm2835_audio_open_connection() fails then > we need to release alsa_stream->my_wq. Never mind, you handle it in the next patch. The bug *was* there in the original code as well, so that's a legit way to split the patches. regards, dan carpenter
Re: [PATCH v3] checkpatch: Warn if missing author Signed-off-by
On Thu, Jul 12, 2018 at 12:03 PM Geert Uytterhoeven wrote: > Print a warning if none of the Signed-off-by lines cover the patch > author. > > Non-ASCII quoted printable encoding in From: headers and (lack of) > double quotes are handled. > Split From: headers are not fully handled: only the first part is > compared. > > Signed-off-by: Geert Uytterhoeven > --- > Tested using a set of ca. 4000 real world commits. > > Most common offenders are people using: > - different email addresses for author and Sob, > - different firstname/lastname order, or other different name > spelling, > - suse.de vs. suse.com. Pretty cool patch! Acked-by: Linus Walleij Yours, Linus Walleij
Re: [PATCH 1/2] staging: bcm2835-audio: Check if workqueue allocation failed
On Fri, Jul 13, 2018 at 12:54:16AM +0300, Tuomas Tynkkynen wrote: > @@ -424,7 +411,9 @@ int bcm2835_audio_open(struct bcm2835_alsa_stream > *alsa_stream) > int status; > int ret; > > - my_workqueue_init(alsa_stream); > + alsa_stream->my_wq = alloc_workqueue("my_queue", WQ_HIGHPRI, 1); > + if (!alsa_stream->my_wq) > + return -ENOMEM; > > ret = bcm2835_audio_open_connection(alsa_stream); > if (ret) { This patch is good but if bcm2835_audio_open_connection() fails then we need to release alsa_stream->my_wq. regards, dan carpenter
[PATCH] fat: Fix potential shift wrap with FITRIM ioctl on FAT
This patch is the fix of fat-add-fitrim-ioctl-for-fat-file-system.patch. Maybe better to merge with it (if it is easy). Anyway, please apply this with above patch. From: Wentao Wang If we keep "trimmed" as an u32, there will be a potential shift wrap. It would be a problem on a larger than 4GB partition with FAT32. Though most tools who call this ioctl would ignore this value, it would be great to fix it. Signed-off-by: Wentao Wang Signed-off-by: OGAWA Hirofumi --- fs/fat/fatent.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff -puN fs/fat/fatent.c~fat-fitrim-fix fs/fat/fatent.c --- linux/fs/fat/fatent.c~fat-fitrim-fix2018-07-13 15:39:14.417110998 +0900 +++ linux-hirofumi/fs/fat/fatent.c 2018-07-13 15:39:14.418110996 +0900 @@ -705,8 +705,8 @@ int fat_trim_fs(struct inode *inode, str struct msdos_sb_info *sbi = MSDOS_SB(sb); const struct fatent_operations *ops = sbi->fatent_ops; struct fat_entry fatent; - u64 ent_start, ent_end, minlen; - u32 free = 0, trimmed = 0; + u64 ent_start, ent_end, minlen, trimmed = 0; + u32 free = 0; unsigned long reada_blocks, reada_mask, cur_block = 0; int err = 0; _ -- OGAWA Hirofumi
Re: [PATCH v2 0/4] vfs: track per-sb writeback errors and report them via fsinfo()
On Tue, Jul 10, 2018 at 10:01:23AM -0400, Jeff Layton wrote: > v2: drop buffer.c patch to record wb errors when underlying blockdev > flush fails. We may eventually want that, but at this point I don't have > a clear way to test it to determine its efficacy. > > At LSF/MM this year, the PostgreSQL developers mentioned that they'd > like to have some mechanism to check whether there have been any > writeback errors on a filesystem, without necessarily flushing any of > the cached data first. > > Given that we have a new fsinfo syscall being introduced, we may as well > use it to report writeback errors on a per superblock basis. This allows > us to provide the info that the PostgreSQL developers wanted, without > needing to change an existing interface. > > This seems to do the right thing when tested by hand, but I don't yet > have an xfstest for it, since the syscall is still quite new. Once that > goes in and we get fsinfo support in xfs_io, it should be rather > trivial to roll a testcase for this. > Whole patch sounds fine, you can add: Reviewed-by: Carlos Maiolino Cheers > Al, if this looks ok, could you pull this into the branch where you > have David's fsinfo patches queued up? > > Thanks, > Jeff > > Jeff Layton (4): > vfs: track per-sb writeback errors > errseq: add a new errseq_scrape function > vfs: allow fsinfo to fetch the current state of s_wb_err > samples: extend test-fsinfo to access error_state > > fs/statfs.c | 9 + > include/linux/errseq.h | 1 + > include/linux/fs.h | 3 +++ > include/linux/pagemap.h | 5 - > include/uapi/linux/fsinfo.h | 11 +++ > lib/errseq.c| 33 +++-- > samples/statx/test-fsinfo.c | 13 + > 7 files changed, 72 insertions(+), 3 deletions(-) > > -- > 2.17.1 > -- Carlos
Re: [v2,3/3] i2c: at91: added slave mode support
On Thu, Jul 12, 2018 at 11:56:24PM +0200, Wolfram Sang wrote: > > > Yes sure, you can add my Ack. I would be pleased to see the slave > > support taken. > > Sadly, I can't get it to apply cleanly. Could you rebase and retest? > Ok I'll handle it and add my Acked-by. Ludovic
[PATCH] PCI: Unify pci and normal dma direction definition
Current DMA direction definitions in pci-dma-compat.h and dma-direction.h are mirrored in value. Unify them to enhance readability and avoid possible inconsistency. Cc: Joey Zheng Signed-off-by: Shunyong Yang --- include/linux/dma-direction.h | 2 +- include/linux/pci-dma-compat.h | 8 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/include/linux/dma-direction.h b/include/linux/dma-direction.h index 3649a031893a..9d52716e9218 100644 --- a/include/linux/dma-direction.h +++ b/include/linux/dma-direction.h @@ -2,7 +2,7 @@ #ifndef _LINUX_DMA_DIRECTION_H #define _LINUX_DMA_DIRECTION_H /* - * These definitions mirror those in pci.h, so they can be used + * These definitions mirror those in pci-dma-compat.h, so they can be used * interchangeably with their PCI_ counterparts. */ enum dma_data_direction { diff --git a/include/linux/pci-dma-compat.h b/include/linux/pci-dma-compat.h index 0dd1a3f7b309..c1c8d49b6072 100644 --- a/include/linux/pci-dma-compat.h +++ b/include/linux/pci-dma-compat.h @@ -8,10 +8,10 @@ #include /* This defines the direction arg to the DMA mapping routines. */ -#define PCI_DMA_BIDIRECTIONAL 0 -#define PCI_DMA_TODEVICE 1 -#define PCI_DMA_FROMDEVICE 2 -#define PCI_DMA_NONE 3 +#define PCI_DMA_BIDIRECTIONAL (DMA_BIDIRECTIONAL) +#define PCI_DMA_TODEVICE (DMA_TO_DEVICE) +#define PCI_DMA_FROMDEVICE (DMA_FROM_DEVICE) +#define PCI_DMA_NONE (DMA_NONE) static inline void * pci_alloc_consistent(struct pci_dev *hwdev, size_t size, -- 1.8.3.1
linux-next: Tree for Jul 13
Hi all, Changes since 20180712: The net-next tree gained a conflict against the net tree. The drm-intel tree gained a build failure due to an interaction with Linus' tree for which I added a merge fix patch. I removed a patch from the akpm-current tree to fix the PowerPC boot failures. Non-merge commits (relative to Linus' tree): 5724 5689 files changed, 206787 insertions(+), 117769 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 283 trees (counting Linus' and 65 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (63f047771621 Merge tag 'mtd/fixes-for-4.18-rc5' of git://git.infradead.org/linux-mtd) Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging kbuild-current/fixes (6d79a7b424a5 kbuild: suppress warnings from 'getconf LFS_*') Merging arc-current/for-curr (6e3761145a9b ARC: Fix CONFIG_SWAP) Merging arm-current/fixes (6ef09e48c2bc ARM: 8780/1: ftrace: Only set kernel memory back to read-only after boot) Merging arm64-fixes/for-next/fixes (2fd8eb4ad871 arm64: neon: Fix function may_use_simd() return error status) Merging m68k-current/for-linus (b12c8a70643f m68k: Set default dma mask for platform devices) Merging powerpc-fixes/fixes (22db552b50fa powerpc/powermac: Fix rtc read/write functions) Merging sparc/master (1aaccb5fa0ea Merge tag 'rtc-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (70b7ff130224 tcp: allow user to create repair socket without window probes) Merging bpf/master (c7a897843224 bpf: don't leave partial mangled prog in jit_subprogs error path) Merging ipsec/master (7284fdf39a91 esp6: fix memleak on error path in esp6_input) Merging netfilter/master (0026129c8629 rhashtable: add restart routine in rhashtable_free_and_destroy()) Merging ipvs/master (312564269535 net: netsec: reduce DMA mask to 40 bits) Merging wireless-drivers/master (248c690a2dc8 Merge tag 'wireless-drivers-for-davem-2018-07-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers) Merging mac80211/master (5cf3006cc81d nl80211: Add a missing break in parse_station_flags) Merging rdma-fixes/for-rc (d63c46734c54 RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path) Merging sound-current/for-linus (c5a59d2477ab ALSA: hda/ca0132: Update a pci quirk device name) Merging sound-asoc-fixes/for-linus (7dc4ac12ac2b Merge branch 'asoc-4.18' into asoc-linus) Merging regmap-fixes/for-linus (1e4b044d2251 Linux 4.18-rc4) Merging regulator-fixes/for-linus (c1362eb9f806 Merge branch 'regulator-4.18' into regulator-linus) Merging spi-fixes/for-linus (3c81743ecf5b Merge branch 'spi-4.18' into spi-linus) Merging pci-current/for-linus (a83a21734416 PCI: endpoint: Fix NULL pointer dereference error when CONFIGFS is disabled) Merging driver-core.current/driver-core-linus (722e5f2b1eec driver core: Partially revert "driver core: correct device's shutdown order") Merging tty.current/tty-linus (021c91791a5e Linux 4.18-rc3) Merging usb.current/usb-linus (c25c
Re: [RFC PATCH v2 1/4] dt-bindings: misc: Add bindings for misc. BMC control fields
Hi Rob, Ben, I've replied to you both inline below, hopefully it's clear enough from the context. On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote: > On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote: > > On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery wrote: > > > > > > Hi Rob, > > > > > > Thanks for the response. > > > > > > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote: > > > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: > > > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist > > > > > to > > > > > provide remote management of (primarily) server platforms. BMCs are > > > > > often tightly coupled to the platform in terms of behaviour and > > > > > provide > > > > > many hardware features integral to booting and running the host > > > > > system. > > > > > > > > > > Some of these hardware features are simple, for example scratch > > > > > registers provided by the BMC that are exposed to both the host and > > > > > the > > > > > BMC. In other cases there's a single bit switch to enable or disable > > > > > some of the provided functionality. > > > > > > > > > > The documentation defines bindings for fields in registers that do not > > > > > integrate well into other driver models yet must be described to allow > > > > > the BMC kernel to assume control of these features. > > > > > > > > So we'll get a new binding when that happens? That will break > > > > compatibility. > > > > > > Can you please expand on this? I'm not following. > > > > If we have a subsystem in the future, then there would likely be an > > associated binding which would be different. So if you update the DT, > > then old kernels won't work with it. > > What kind of "subsystem" ? There is almost no way there could be one > for that sort of BMC tunables. We've look at several BMC chips out > there and requirements from several vendors, BIOS and system > manufacturers and it's all over the place. Right - This is the fundamental principle backing these patches: There will never be a coherent subsystem catering to any of what we want to describe with these bindings. > > > > I feel like this is an argument of tradition. Maybe people have > > > been dissuaded from doing so when they don't have a reasonable use- > > > case? I'm not saying that what I'm proposing is unquestionably > > > reasonable, but I don't want to dismiss it out of hand. > > ... > > > > It comes up with system controller type blocks too that just have a > > bunch of random registers. This matches the situation at hand. > > Those change in every SoC and not in any > > controlled or ordered way that would make describing the individual > > sub-functions in DT worthwhile. "Not worthwhile" is what I'm pushing back against for our use cases. I think they are narrow and limited enough to make it worthwhile. Obviously we want to avoid describing these things *badly* - you mentioned the clock bindings - so I'm happy to hash out what the right representation should be. But I struggle to think the solution is not describing some of our hardware features at all. > > So what's the alternative ? Because without something like what we > propose, what's going to happen is /dev/mem ... that's what people do > today. Yep. And I've outlined in the cover letter what I think are the advantages of what I'm proposing over /dev/mem. It's not an incredible gain, but has several of nice-to-have properties. > > > > > A node per register bit doesn't scale. > > > > > > It isn't meant to scale in terms of a single system. Using it > > > extensively is very likely wrong. Separately, register-bit-led does > > > pretty much the same thing. Doesn't the scale argument apply there? > > > Who is to stop me from attaching an insane number of LEDs to a > > > system? > > > > Review. > > > > If you look, register-bit-led is rarely used outside of some ARM, Ltd. > > boards. It's simply quite rare to have MMIO register bits that have a > > fixed function of LED control. > > Well, same here, we hope to review what goes upstream to make it > reasonable. Otherwise it doens't matter. If a random vendor, let's say > IBM, chose to chip a system where they put an insane amount of cruft in > there, it will only affect those systems's BMC and the userspace stack > on it. > > Thankfully that stack is OpenBMC and IBM is aiming at having their > device-tree's upstream, thus reviewed, thus it won't happen. > > *Anything* can be abused. The point here is that we have a number, > thankfully rather small, maybe a dozen or two, of tunables that are > quite specific to a combination (system vendor, bmc vendor, system > model) which control a few HW features that essentially do *NOT* fit in > a subsystem. Exactly. I tried to head off the abuse vector by requiring that uses be listed in the bindings document, and thus enforce some level of review. It might not be the most effective approach at the end of the day, but at least i
[PATCH v4 2/2] ARM: dts: am335x: add am335x-sancloud-bbe board support
The "Beaglebone Enhanced" by Sancloud is based on the Beaglebone Black, but with the following differences: * Gigabit capable PHY * Extra USB hub, optional i2c control * lps3331ap barometer connected over i2c * MPU6050 6 axis MEMS accelerometer/gyro connected over i2c * 1GiB DDR3 RAM * RTL8723 Wifi/Bluetooth connected over USB Tested on a revision G board. Signed-off-by: Koen Kooi --- v4: No changes v3: Drop oppnitro-10, not needed on the versions Sancloud is using v2: * Add missing #include * Fix Barometer compatible string v1: Initial submission arch/arm/boot/dts/Makefile| 1 + arch/arm/boot/dts/am335x-sancloud-bbe.dts | 136 ++ 2 files changed, 137 insertions(+) create mode 100644 arch/arm/boot/dts/am335x-sancloud-bbe.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index 37a3de7..83a4d61 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -695,6 +695,7 @@ dtb-$(CONFIG_SOC_AM33XX) += \ am335x-pepper.dtb \ am335x-phycore-rdk.dtb \ am335x-pocketbeagle.dtb \ + am335x-sancloud-bbe.dtb \ am335x-shc.dtb \ am335x-sbc-t335.dtb \ am335x-sl50.dtb \ diff --git a/arch/arm/boot/dts/am335x-sancloud-bbe.dts b/arch/arm/boot/dts/am335x-sancloud-bbe.dts new file mode 100644 index 000..ba5f4bd --- /dev/null +++ b/arch/arm/boot/dts/am335x-sancloud-bbe.dts @@ -0,0 +1,136 @@ +/* + * Copyright (C) 2012 Texas Instruments Incorporated - http://www.ti.com/ + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +/dts-v1/; + +#include "am33xx.dtsi" +#include "am335x-bone-common.dtsi" +#include "am335x-boneblack-common.dtsi" +#include + +/ { + model = "SanCloud BeagleBone Enhanced"; + compatible = "sancloud,am335x-boneenhanced", "ti,am335x-bone-black", "ti,am335x-bone", "ti,am33xx"; +}; + +&am33xx_pinmux { + pinctrl-names = "default"; + + cpsw_default: cpsw_default { + pinctrl-single,pins = < + /* Slave 1 */ + AM33XX_IOPAD(0x914, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txen.rgmii1_tctl */ + AM33XX_IOPAD(0x918, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxdv.rgmii1_rctl */ + AM33XX_IOPAD(0x91c, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txd3.rgmii1_td3 */ + AM33XX_IOPAD(0x920, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txd2.rgmii1_td2 */ + AM33XX_IOPAD(0x924, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txd1.rgmii1_td1 */ + AM33XX_IOPAD(0x928, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txd0.rgmii1_td0 */ + AM33XX_IOPAD(0x92c, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txclk.rgmii1_tclk */ + AM33XX_IOPAD(0x930, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxclk.rgmii1_rclk */ + AM33XX_IOPAD(0x934, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxd3.rgmii1_rd3 */ + AM33XX_IOPAD(0x938, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxd2.rgmii1_rd2 */ + AM33XX_IOPAD(0x93c, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxd1.rgmii1_rd1 */ + AM33XX_IOPAD(0x940, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxd0.rgmii1_rd0 */ + >; + }; + + cpsw_sleep: cpsw_sleep { + pinctrl-single,pins = < + /* Slave 1 reset value */ + AM33XX_IOPAD(0x914, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x918, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x91c, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x920, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x924, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x928, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x92c, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x930, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x934, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x938, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x93c, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x940, PIN_INPUT_PULLDOWN | MUX_MODE7) + >; + }; + + davinci_mdio_default: davinci_mdio_default { + pinctrl-single,pins = < + /* MDIO */ + AM33XX_IOPAD(0x948, PIN_INPUT_PULLUP | SLEWCTRL_FAST | MUX_MODE0) /* mdio_data.mdio_data */ + AM33XX_IOPAD(0x94c, PIN_OUTPUT_PULLUP | MUX_MODE0) /* mdio_clk.mdio_clk */ + >; +
[PATCH v4 1/2] dt-bindings: Add vendor prefix for Sancloud
Add vendor prefix for Sancloud Ltd. Signed-off-by: Koen Kooi Acked-by: Rob Herring --- v4: Add Acked-by v3: No changes v2: No changes v1: Initial submission Documentation/devicetree/bindings/vendor-prefixes.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index 7cad066..c7aaa1f 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -314,6 +314,7 @@ rohmROHM Semiconductor Co., Ltd roofullShenzhen Roofull Technology Co, Ltd samsungSamsung Semiconductor samtec Samtec/Softing company +sancloud Sancloud Ltd sandiskSandisk Corporation sbsSmart Battery System schindler Schindler -- 2.0.1
[PATCH v4 0/2] ARM: dts: am3355: add support for the Sancloud Beaglebone Enhanced
The "Beaglebone Enhanced" by Sancloud is based on the Beaglebone Black, but with the following differences: * Gigabit capable PHY * Extra USB hub, optional i2c control * lps3331ap barometer connected over i2c * MPU6050 6 axis MEMS accelerometer/gyro connected over i2c * 1GiB DDR3 RAM * RTL8723 Wifi/Bluetooth connected over USB This series adds the Sancloud vendor prefix as well as the actual dts. v4: Add Robs Acked-by for 1/2 v3: Drop 1GHz Opp tweak v2: * Add missing #include * Fix barometer compatible string v1: Initial submission, not the dts actually tested :/ Also double checked if the kbuild error has been fixed: koen@beast:/build/pkg/linux-torvalds$ git describe v4.18-rc4-71-gd69088d koen@beast:/build/pkg/linux-torvalds$ ARCH=arm CROSS_COMPILE=arm-angstrom-linux-gnueabi- make am335x-sancloud-bbe.dtb DTC arch/arm/boot/dts/am335x-sancloud-bbe.dtb koen@beast:/build/pkg/linux-torvalds$ Same successful result on tmlind/for-next (which has v2 already) and robh/for-next Koen Kooi (2): dt-bindings: Add vendor prefix for Sancloud ARM: dts: am335x: add am335x-sancloud-bbe board support .../devicetree/bindings/vendor-prefixes.txt| 1 + arch/arm/boot/dts/Makefile | 1 + arch/arm/boot/dts/am335x-sancloud-bbe.dts | 146 + 3 files changed, 148 insertions(+) create mode 100644 arch/arm/boot/dts/am335x-sancloud-bbe.dts -- 2.0.1
[PATCH v3 2/2] ARM: dts: am335x: add am335x-sancloud-bbe board support
The "Beaglebone Enhanced" by Sancloud is based on the Beaglebone Black, but with the following differences: * Gigabit capable PHY * Extra USB hub, optional i2c control * lps3331ap barometer connected over i2c * MPU6050 6 axis MEMS accelerometer/gyro connected over i2c * 1GiB DDR3 RAM * RTL8723 Wifi/Bluetooth connected over USB Tested on a revision G board. Signed-off-by: Koen Kooi --- v3: Drop oppnitro-10, not needed on the versions Sancloud is using v2: * Add missing #include * Fix Barometer compatible string v1: Initial submission arch/arm/boot/dts/Makefile| 1 + arch/arm/boot/dts/am335x-sancloud-bbe.dts | 136 ++ 2 files changed, 137 insertions(+) create mode 100644 arch/arm/boot/dts/am335x-sancloud-bbe.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index 37a3de7..83a4d61 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -695,6 +695,7 @@ dtb-$(CONFIG_SOC_AM33XX) += \ am335x-pepper.dtb \ am335x-phycore-rdk.dtb \ am335x-pocketbeagle.dtb \ + am335x-sancloud-bbe.dtb \ am335x-shc.dtb \ am335x-sbc-t335.dtb \ am335x-sl50.dtb \ diff --git a/arch/arm/boot/dts/am335x-sancloud-bbe.dts b/arch/arm/boot/dts/am335x-sancloud-bbe.dts new file mode 100644 index 000..ba5f4bd --- /dev/null +++ b/arch/arm/boot/dts/am335x-sancloud-bbe.dts @@ -0,0 +1,136 @@ +/* + * Copyright (C) 2012 Texas Instruments Incorporated - http://www.ti.com/ + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +/dts-v1/; + +#include "am33xx.dtsi" +#include "am335x-bone-common.dtsi" +#include "am335x-boneblack-common.dtsi" +#include + +/ { + model = "SanCloud BeagleBone Enhanced"; + compatible = "sancloud,am335x-boneenhanced", "ti,am335x-bone-black", "ti,am335x-bone", "ti,am33xx"; +}; + +&am33xx_pinmux { + pinctrl-names = "default"; + + cpsw_default: cpsw_default { + pinctrl-single,pins = < + /* Slave 1 */ + AM33XX_IOPAD(0x914, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txen.rgmii1_tctl */ + AM33XX_IOPAD(0x918, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxdv.rgmii1_rctl */ + AM33XX_IOPAD(0x91c, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txd3.rgmii1_td3 */ + AM33XX_IOPAD(0x920, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txd2.rgmii1_td2 */ + AM33XX_IOPAD(0x924, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txd1.rgmii1_td1 */ + AM33XX_IOPAD(0x928, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txd0.rgmii1_td0 */ + AM33XX_IOPAD(0x92c, PIN_OUTPUT_PULLDOWN | MUX_MODE2) /* mii1_txclk.rgmii1_tclk */ + AM33XX_IOPAD(0x930, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxclk.rgmii1_rclk */ + AM33XX_IOPAD(0x934, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxd3.rgmii1_rd3 */ + AM33XX_IOPAD(0x938, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxd2.rgmii1_rd2 */ + AM33XX_IOPAD(0x93c, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxd1.rgmii1_rd1 */ + AM33XX_IOPAD(0x940, PIN_INPUT_PULLDOWN | MUX_MODE2) /* mii1_rxd0.rgmii1_rd0 */ + >; + }; + + cpsw_sleep: cpsw_sleep { + pinctrl-single,pins = < + /* Slave 1 reset value */ + AM33XX_IOPAD(0x914, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x918, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x91c, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x920, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x924, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x928, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x92c, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x930, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x934, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x938, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x93c, PIN_INPUT_PULLDOWN | MUX_MODE7) + AM33XX_IOPAD(0x940, PIN_INPUT_PULLDOWN | MUX_MODE7) + >; + }; + + davinci_mdio_default: davinci_mdio_default { + pinctrl-single,pins = < + /* MDIO */ + AM33XX_IOPAD(0x948, PIN_INPUT_PULLUP | SLEWCTRL_FAST | MUX_MODE0) /* mdio_data.mdio_data */ + AM33XX_IOPAD(0x94c, PIN_OUTPUT_PULLUP | MUX_MODE0) /* mdio_clk.mdio_clk */ + >; + }; + + d
[PATCH v3 1/2] dt-bindings: Add vendor prefix for Sancloud
Add vendor prefix for Sancloud Ltd. Signed-off-by: Koen Kooi --- v3: No changes v2: No changes v1: Initial submission Documentation/devicetree/bindings/vendor-prefixes.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index 7cad066..c7aaa1f 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -314,6 +314,7 @@ rohmROHM Semiconductor Co., Ltd roofullShenzhen Roofull Technology Co, Ltd samsungSamsung Semiconductor samtec Samtec/Softing company +sancloud Sancloud Ltd sandiskSandisk Corporation sbsSmart Battery System schindler Schindler -- 2.0.1
[PATCH v3 0/2] ARM: dts: am3355: add support for the Sancloud Beaglebone Enhanced
The "Beaglebone Enhanced" by Sancloud is based on the Beaglebone Black, but with the following differences: * Gigabit capable PHY * Extra USB hub, optional i2c control * lps3331ap barometer connected over i2c * MPU6050 6 axis MEMS accelerometer/gyro connected over i2c * 1GiB DDR3 RAM * RTL8723 Wifi/Bluetooth connected over USB This series adds the Sancloud vendor prefix as well as the actual dts. v3: Drop 1GHz Opp tweak v2: * Add missing #include * Fix barometer compatible string v1: Initial submission, not the dts actually tested :/ Also double checked if the kbuild error has been fixed: koen@beast:/build/pkg/linux-torvalds$ git describe v4.18-rc4-71-gd69088d koen@beast:/build/pkg/linux-torvalds$ ARCH=arm CROSS_COMPILE=arm-angstrom-linux-gnueabi- make am335x-sancloud-bbe.dtb DTC arch/arm/boot/dts/am335x-sancloud-bbe.dtb koen@beast:/build/pkg/linux-torvalds$ Same successful result on tmlind/for-next (which has v2 already) and robh/for-next Koen Kooi (2): dt-bindings: Add vendor prefix for Sancloud ARM: dts: am335x: add am335x-sancloud-bbe board support .../devicetree/bindings/vendor-prefixes.txt| 1 + arch/arm/boot/dts/Makefile | 1 + arch/arm/boot/dts/am335x-sancloud-bbe.dts | 146 + 3 files changed, 148 insertions(+) create mode 100644 arch/arm/boot/dts/am335x-sancloud-bbe.dts -- 2.0.1
[PATCH v4 1/2] leds: core: Introduce generic pattern interface
From: Bjorn Andersson Some LED controllers have support for autonomously controlling brightness over time, according to some preprogrammed pattern or function. This adds a new optional operator that LED class drivers can implement if they support such functionality as well as a new device attribute to configure the pattern for a given LED. [Baolin Wang did some improvements.] Signed-off-by: Bjorn Andersson Signed-off-by: Baolin Wang --- Changes from v3: - Move the check in pattern_show() to of_led_classdev_register(). - Add more documentation to explain how to set/clear one pattern. Changes from v2: - Change kernel version to 4.19. - Force user to return error pointer if failed to issue pattern_get(). - Use strstrip() to trim trailing newline. - Other optimization. Changes from v1: - Add some comments suggested by Pavel. - Change 'delta_t' can be 0. Note: I removed the pattern repeat check and will get the repeat number by adding one extra file named 'pattern_repeat' according to previous discussion. --- Documentation/ABI/testing/sysfs-class-led | 20 + drivers/leds/led-class.c | 118 + include/linux/leds.h | 19 + 3 files changed, 157 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-class-led b/Documentation/ABI/testing/sysfs-class-led index 5f67f7a..f4b73ad 100644 --- a/Documentation/ABI/testing/sysfs-class-led +++ b/Documentation/ABI/testing/sysfs-class-led @@ -61,3 +61,23 @@ Description: gpio and backlight triggers. In case of the backlight trigger, it is useful when driving a LED which is intended to indicate a device in a standby like state. + +What: /sys/class/leds//pattern +Date: July 2018 +KernelVersion: 4.19 +Description: + Specify a pattern for the LED, for LED hardware that support + altering the brightness as a function of time. + + The pattern is given by a series of tuples, of brightness and + duration (ms). The LED is expected to traverse the series and + each brightness value for the specified duration. Duration of + 0 means brightness should immediately change to new value. + + As LED hardware might have different capabilities and precision + the requested pattern might be slighly adjusted by the driver + and the resulting pattern of such operation should be returned + when this file is read. + + Writing non-empty string to this file will active the pattern, + and empty string will disable the pattern. diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c index 3c7e348..0992a0e 100644 --- a/drivers/leds/led-class.c +++ b/drivers/leds/led-class.c @@ -74,6 +74,119 @@ static ssize_t max_brightness_show(struct device *dev, } static DEVICE_ATTR_RO(max_brightness); +static ssize_t pattern_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct led_classdev *led_cdev = dev_get_drvdata(dev); + struct led_pattern *pattern; + size_t offset = 0; + int count, n, i; + + pattern = led_cdev->pattern_get(led_cdev, &count); + if (IS_ERR(pattern)) + return PTR_ERR(pattern); + + for (i = 0; i < count; i++) { + n = snprintf(buf + offset, PAGE_SIZE - offset, "%d %d ", +pattern[i].brightness, pattern[i].delta_t); + + if (offset + n >= PAGE_SIZE) + goto err_nospc; + + offset += n; + } + + buf[offset - 1] = '\n'; + + kfree(pattern); + return offset; + +err_nospc: + kfree(pattern); + return -ENOSPC; +} + +static ssize_t pattern_store(struct device *dev, +struct device_attribute *attr, +const char *buf, size_t size) +{ + struct led_classdev *led_cdev = dev_get_drvdata(dev); + struct led_pattern *pattern = NULL; + char *sbegin, *elem, *s; + unsigned long val; + int ret = 0, len = 0; + bool odd = true; + + sbegin = kstrndup(buf, size, GFP_KERNEL); + if (!sbegin) + return -ENOMEM; + + /* +* Trim trailing newline, if the remaining string is empty, +* clear the pattern. +*/ + s = strstrip(sbegin); + if (!*s) { + if (led_cdev->pattern_clear) + ret = led_cdev->pattern_clear(led_cdev); + goto out; + } + + pattern = kcalloc(size, sizeof(*pattern), GFP_KERNEL); + if (!pattern) { + ret = -ENOMEM; + goto out; + } + + /* Parse out the brightness & delta_t touples */ + while ((elem = strsep(&s, " ")) != NULL) { + ret = kstrtoul(elem, 10, &val); + if (ret) +
[PATCH v4 2/2] leds: sc27xx: Add pattern_set/get/clear interfaces for LED controller
This patch implements the 'pattern_set', 'pattern_get' and 'pattern_clear' interfaces to support SC27XX LED breathing mode. Signed-off-by: Baolin Wang --- Changes from v3: - None. Changes from v2: - No updates. Changes from v1: - No updates. --- drivers/leds/leds-sc27xx-bltc.c | 160 +++ 1 file changed, 160 insertions(+) diff --git a/drivers/leds/leds-sc27xx-bltc.c b/drivers/leds/leds-sc27xx-bltc.c index 9d9b7aa..898f92d 100644 --- a/drivers/leds/leds-sc27xx-bltc.c +++ b/drivers/leds/leds-sc27xx-bltc.c @@ -6,6 +6,7 @@ #include #include #include +#include #include /* PMIC global control register definition */ @@ -32,8 +33,13 @@ #define SC27XX_DUTY_MASK GENMASK(15, 0) #define SC27XX_MOD_MASKGENMASK(7, 0) +#define SC27XX_CURVE_SHIFT 8 +#define SC27XX_CURVE_L_MASKGENMASK(7, 0) +#define SC27XX_CURVE_H_MASKGENMASK(15, 8) + #define SC27XX_LEDS_OFFSET 0x10 #define SC27XX_LEDS_MAX3 +#define SC27XX_LEDS_PATTERN_CNT4 struct sc27xx_led { char name[LED_MAX_NAME_SIZE]; @@ -122,6 +128,157 @@ static int sc27xx_led_set(struct led_classdev *ldev, enum led_brightness value) return err; } +static int sc27xx_led_pattern_clear(struct led_classdev *ldev) +{ + struct sc27xx_led *leds = to_sc27xx_led(ldev); + struct regmap *regmap = leds->priv->regmap; + u32 base = sc27xx_led_get_offset(leds); + u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL; + u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line; + int err; + + mutex_lock(&leds->priv->lock); + + /* Reset the rise, high, fall and low time to zero. */ + regmap_write(regmap, base + SC27XX_LEDS_CURVE0, 0); + regmap_write(regmap, base + SC27XX_LEDS_CURVE1, 0); + + err = regmap_update_bits(regmap, ctrl_base, + (SC27XX_LED_RUN | SC27XX_LED_TYPE) << ctrl_shift, 0); + + mutex_unlock(&leds->priv->lock); + + return err; +} + +static int sc27xx_led_pattern_set(struct led_classdev *ldev, + struct led_pattern *pattern, + int len) +{ + struct sc27xx_led *leds = to_sc27xx_led(ldev); + u32 base = sc27xx_led_get_offset(leds); + u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL; + u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line; + struct regmap *regmap = leds->priv->regmap; + int err; + + /* +* Must contain 4 patterns to configure the rise time, high time, fall +* time and low time to enable the breathing mode. +*/ + if (len != SC27XX_LEDS_PATTERN_CNT) + return -EINVAL; + + mutex_lock(&leds->priv->lock); + + err = regmap_update_bits(regmap, base + SC27XX_LEDS_CURVE0, +SC27XX_CURVE_L_MASK, pattern[0].delta_t); + if (err) + goto out; + + err = regmap_update_bits(regmap, base + SC27XX_LEDS_CURVE1, +SC27XX_CURVE_L_MASK, pattern[1].delta_t); + if (err) + goto out; + + err = regmap_update_bits(regmap, base + SC27XX_LEDS_CURVE0, +SC27XX_CURVE_H_MASK, +pattern[2].delta_t << SC27XX_CURVE_SHIFT); + if (err) + goto out; + + + err = regmap_update_bits(regmap, base + SC27XX_LEDS_CURVE1, +SC27XX_CURVE_H_MASK, +pattern[3].delta_t << SC27XX_CURVE_SHIFT); + if (err) + goto out; + + + err = regmap_update_bits(regmap, base + SC27XX_LEDS_DUTY, +SC27XX_DUTY_MASK, +(pattern[0].brightness << SC27XX_DUTY_SHIFT) | +SC27XX_MOD_MASK); + if (err) + goto out; + + /* Enable the LED breathing mode */ + err = regmap_update_bits(regmap, ctrl_base, +SC27XX_LED_RUN << ctrl_shift, +SC27XX_LED_RUN << ctrl_shift); + +out: + mutex_unlock(&leds->priv->lock); + + return err; +} + +static struct led_pattern *sc27xx_led_pattern_get(struct led_classdev *ldev, + int *len) +{ + struct sc27xx_led *leds = to_sc27xx_led(ldev); + u32 base = sc27xx_led_get_offset(leds); + struct regmap *regmap = leds->priv->regmap; + struct led_pattern *pattern; + int i, err; + u32 val; + + /* +* Must allocate 4 patterns to show the rise time, high time, fall time +* and low time. +*/ + pattern = kcalloc(SC27XX_LEDS_PATTERN_CNT, sizeof(*pattern), + GFP_KERNEL); + if (!pattern) + return ERR_PTR(-ENOMEM); + + mutex_lock(&leds->priv->lock); + + err = regmap_read(regmap, base + SC27XX_LEDS_CURVE0, &val)
Re: [patch -mm] mm, oom: remove oom_lock from exit_mmap
What a simplified description of oom_lock... Positive effects (1) Serialize "setting TIF_MEMDIE and calling __thaw_task()/atomic_inc() from mark_oom_victim()" and "setting oom_killer_disabled = true from oom_killer_disable()". (2) Serialize all printk() messages from out_of_memory(). (3) Prevent from selecting new OOM victim when there is an !MMF_OOM_SKIP mm which current thread should wait for. (4) Mutex blocking_notifier_call_chain() from out_of_memory() because some of callbacks might not be thread-safe and/or serialized call might release more memory than needed. Negative effects (A) Threads which called mutex_lock(&oom_lock) before calling out_of_memory() are blocked waiting for "__oom_reap_task_mm() from exit_mmap()" and/or "__oom_reap_task_mm() from oom_reap_task_mm()". (B) Threads which do not call out_of_memory() because mutex_trylock(&oom_lock) failed continue consuming CPU resources pointlessly. Regarding (A), we can reduce the range oom_lock serializes from "__oom_reap_task_mm()" to "setting MMF_OOM_SKIP", for oom_lock is useful for (3). Therefore, we can apply below change on top of your patch. But I don't like sharing MMF_UNSBALE for two purposes (reason is explained below). Regarding (B), we can do direct OOM reaping (like my proposal does). --- kernel/fork.c | 5 + mm/mmap.c | 21 + mm/oom_kill.c | 57 ++--- 3 files changed, 36 insertions(+), 47 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 6747298..f37d481 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -984,6 +984,11 @@ static inline void __mmput(struct mm_struct *mm) } if (mm->binfmt) module_put(mm->binfmt->module); + if (unlikely(mm_is_oom_victim(mm))) { + mutex_lock(&oom_lock); + set_bit(MMF_OOM_SKIP, &mm->flags); + mutex_unlock(&oom_lock); + } mmdrop(mm); } diff --git a/mm/mmap.c b/mm/mmap.c index 7f918eb..203061f 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3075,19 +3075,17 @@ void exit_mmap(struct mm_struct *mm) __oom_reap_task_mm(mm); /* -* Now, set MMF_UNSTABLE to avoid racing with the oom reaper. -* This needs to be done before calling munlock_vma_pages_all(), -* which clears VM_LOCKED, otherwise the oom reaper cannot -* reliably test for it. If the oom reaper races with -* munlock_vma_pages_all(), this can result in a kernel oops if -* a pmd is zapped, for example, after follow_page_mask() has -* checked pmd_none(). +* Wait for the oom reaper to complete. This needs to be done +* before calling munlock_vma_pages_all(), which clears +* VM_LOCKED, otherwise the oom reaper cannot reliably test for +* it. If the oom reaper races with munlock_vma_pages_all(), +* this can result in a kernel oops if a pmd is zapped, for +* example, after follow_page_mask() has checked pmd_none(). * -* Taking mm->mmap_sem for write after setting MMF_UNSTABLE will -* guarantee that the oom reaper will not run on this mm again -* after mmap_sem is dropped. +* Taking mm->mmap_sem for write will guarantee that the oom +* reaper will not run on this mm again after mmap_sem is +* dropped. */ - set_bit(MMF_UNSTABLE, &mm->flags); down_write(&mm->mmap_sem); up_write(&mm->mmap_sem); } @@ -3115,7 +3113,6 @@ void exit_mmap(struct mm_struct *mm) unmap_vmas(&tlb, vma, 0, -1); free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); tlb_finish_mmu(&tlb, 0, -1); - set_bit(MMF_OOM_SKIP, &mm->flags); /* * Walk the list again, actually closing and freeing it, diff --git a/mm/oom_kill.c b/mm/oom_kill.c index e6328ce..7ed4ed0 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -488,11 +488,9 @@ void __oom_reap_task_mm(struct mm_struct *mm) * Tell all users of get_user/copy_from_user etc... that the content * is no longer stable. No barriers really needed because unmapping * should imply barriers already and the reader would hit a page fault -* if it stumbled over a reaped memory. If MMF_UNSTABLE is already set, -* reaping as already occurred so nothing left to do. +* if it stumbled over a reaped memory. */ - if (test_and_set_bit(MMF_UNSTABLE, &mm->flags)) - return; + set_bit(MMF_UNSTABLE, &mm->flags); for (vma = mm->mmap ; vma; vma = vma->vm_next) { if (!can_madv_dontneed_vma(vma)) @@ -524,25 +522,9 @@ void __oom_reap_task_mm(struct mm_struct *mm) s
Re: [PATCH] perf tools: Synthesize GROUP_DESC feature in pipe mode
Hi Jiri, On Thu, Jul 12, 2018 at 9:49 AM Jiri Olsa wrote: > > On Thu, Jul 12, 2018 at 09:34:45AM -0700, Stephane Eranian wrote: > > Hi Jiri, > > On Thu, Jul 12, 2018 at 6:52 AM Jiri Olsa wrote: > > > > > > Stephan reported, that pipe mode does not carry the group > > > information and thus the piped report won't display the > > > grouped output for following command: > > > > > Thanks for fixing this quickly. > > could I have your tested/acked by? > Acked-by: Stephane Eranian > > I think we should have more testing on the pipe mode, in general. > > yea, we should > > jirka > > > > > > # perf record -e '{cycles,instructions,branches}' -a sleep 4 | perf > > > report > > > > > > It has no idea about the group setup, so it will display > > > events separately: > > > > > > # Overhead Command Shared Object ... > > > # ... ... > > > # > > >6.71% swapper [kernel.kallsyms] > > >2.28% offlineimap libpython2.7.so.1.0 > > >0.78% perf [kernel.kallsyms] > > > ... > > > > > > Fixing GROUP_DESC feature record to be synthesized in pipe mode, > > > so the report output is grouped if there's group defined in record: > > > > > > # Overhead Command Shared... > > > # ... ... > > > # > > >7.57% 0.16% 0.30% swapper [kernel > > >1.87% 3.15% 2.46% offlineimap libpyth > > >1.33% 0.00% 0.00% perf [kernel > > > ... > > > > > > Cc: David Carrillo-Cisneros > > > Reported-by: Stephane Eranian > > > Link: http://lkml.kernel.org/n/tip-ybqyh8ac4g173iy3xt4px...@git.kernel.org > > > Signed-off-by: Jiri Olsa > > > --- > > > tools/perf/util/header.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c > > > index 59fcc790c865..af9aaf28f976 100644 > > > --- a/tools/perf/util/header.c > > > +++ b/tools/perf/util/header.c > > > @@ -2587,7 +2587,7 @@ static const struct feature_ops > > > feat_ops[HEADER_LAST_FEATURE] = { > > > FEAT_OPR(NUMA_TOPOLOGY, numa_topology, true), > > > FEAT_OPN(BRANCH_STACK, branch_stack, false), > > > FEAT_OPR(PMU_MAPPINGS, pmu_mappings, false), > > > - FEAT_OPN(GROUP_DESC,group_desc, false), > > > + FEAT_OPR(GROUP_DESC,group_desc, false), > > > FEAT_OPN(AUXTRACE, auxtrace, false), > > > FEAT_OPN(STAT, stat, false), > > > FEAT_OPN(CACHE, cache, true), > > > -- > > > 2.17.1 > > >
Re: [PATCH v2 2/3] dmaengine: imx-sdma: add memcpy interface
On Fri, Jul 13, 2018 at 09:08:46PM +0800, Robin Gong wrote: > Add MEMCPY capability for imx-sdma driver. > > Signed-off-by: Robin Gong > --- > drivers/dma/imx-sdma.c | 95 > -- > 1 file changed, 92 insertions(+), 3 deletions(-) > > @@ -1318,6 +1347,63 @@ static struct sdma_desc *sdma_transfer_init(struct > sdma_channel *sdmac, > return NULL; > } > > +static struct dma_async_tx_descriptor *sdma_prep_memcpy( > + struct dma_chan *chan, dma_addr_t dma_dst, > + dma_addr_t dma_src, size_t len, unsigned long flags) > +{ > + struct sdma_channel *sdmac = to_sdma_chan(chan); > + struct sdma_engine *sdma = sdmac->sdma; > + int channel = sdmac->channel; > + size_t count; > + int i = 0, param; > + struct sdma_buffer_descriptor *bd; > + struct sdma_desc *desc; > + > + if (!chan || !len) > + return NULL; > + > + dev_dbg(sdma->dev, "memcpy: %pad->%pad, len=%zu, channel=%d.\n", > + &dma_src, &dma_dst, len, channel); > + > + desc = sdma_transfer_init(sdmac, DMA_MEM_TO_MEM, > + len / SDMA_BD_MAX_CNT + 1); > + if (!desc) > + return NULL; > + > + do { > + count = min_t(size_t, len, SDMA_BD_MAX_CNT); > + bd = &desc->bd[i]; > + bd->buffer_addr = dma_src; > + bd->ext_buffer_addr = dma_dst; > + bd->mode.count = count; > + desc->chn_count += count; > + /* align with sdma->dma_device.copy_align: 4bytes */ > + bd->mode.command = 0; > + > + dma_src += count; > + dma_dst += count; > + len -= count; > + i++; NACK. Please actually look at your code and find out where you do unaligned DMA accesses. Hint: What happens when this loop body is executed more than once? Sascha > + > + param = BD_DONE | BD_EXTD | BD_CONT; > + /* last bd */ > + if (!len) { > + param |= BD_INTR; > + param |= BD_LAST; > + param &= ~BD_CONT; > + } > + > + dev_dbg(sdma->dev, "entry %d: count: %zd dma: 0x%x %s%s\n", > + i, count, bd->buffer_addr, > + param & BD_WRAP ? "wrap" : "", > + param & BD_INTR ? " intr" : ""); > + > + bd->mode.status = param; > + } while (len); > + -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0| Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- |
Re: [PATCH 1/2] ARM: dts: imx51-zii-scu3-esb: Add switch IRQ line pinumx config
On Thu, Jul 12, 2018 at 6:37 AM Fabio Estevam wrote: > > Hi Andrey, > > On Wed, Jul 11, 2018 at 11:33 PM, Andrey Smirnov > wrote: > > > + pinctrl_switch: switchgrp { > > + fsl,pins = < > > + MX51_PAD_AUD3_BB_CK__GPIO4_20 0xc5 > > The i.MX51 Reference Manual states that 0xa5 is the default reset > value for the register IOMUXC_SW_PAD_CTL_PAD_AUD3_BB_CK. > > By reading your commit log I had the impression you wanted to provide > the default value explicitly. > > Please clarify. I wanted to avoid relying on defaults be it register reset values or settings that bootloader left us with. Default value of 0xa5 works, but, given how the pin is IRQ_TYPE_LEVEL_HIGH, I though it would be better to configure it to have a pulldown. Do you want me to add that to commit log? Thanks, Andrey Smirnov
Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow
On 2018-07-12 20:15, Bharat Kumar Gogada wrote: Currently PCI_BRIDGE_CTL_SERR is being enabled only in ACPI flow. This bit is required for forwarding errors reported by EP devices to upstream device. This patch enables SERR# for Type-1 PCI device. Signed-off-by: Bharat Kumar Gogada --- drivers/pci/pcie/aer.c | 23 +++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index a2e8838..943e084 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct pci_dev *dev) if (!dev->aer_cap) return -EIO; + if (!IS_ENABLED(CONFIG_ACPI) && + dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + u16 control; + + /* +* A Type-1 PCI bridge will not forward ERR_ messages coming +* from an endpoint if SERR# forwarding is not enabled. +*/ + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &control); + control |= PCI_BRIDGE_CTL_SERR; + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control); + } + return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS); } EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting); @@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev) if (pcie_aer_get_firmware_first(dev)) return -EIO; + if (!IS_ENABLED(CONFIG_ACPI) && + dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + u16 control; + + /* Clear SERR Forwarding */ + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &control); + control &= ~PCI_BRIDGE_CTL_SERR; + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control); + } + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS); } Should this configuration no be set by Firmware ? why should Linux dictate it ? Regards, Oza.
[PATCH v2 3/3] ARM: configs: imx_v6_v7_defconfig: add DMATEST support
Add DMATEST support and remove invalid options, such as CONFIG_BT_HCIUART_H4 is default enabled and CONFIG_SND_SOC_IMX_WM8962 is out of date and not appear in any config file. Please refer to Documentation/driver-api/dmaengine/dmatest.rst to test MEMCPY feature of imx-sdma. Signed-off-by: Robin Gong --- arch/arm/configs/imx_v6_v7_defconfig | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig index e381d05..f28d4d9 100644 --- a/arch/arm/configs/imx_v6_v7_defconfig +++ b/arch/arm/configs/imx_v6_v7_defconfig @@ -81,7 +81,6 @@ CONFIG_CAN=y CONFIG_CAN_FLEXCAN=y CONFIG_BT=y CONFIG_BT_HCIUART=y -CONFIG_BT_HCIUART_H4=y CONFIG_BT_HCIUART_LL=y CONFIG_CFG80211=y CONFIG_CFG80211_WEXT=y @@ -282,7 +281,6 @@ CONFIG_SND_SOC_FSL_ASRC=y CONFIG_SND_IMX_SOC=y CONFIG_SND_SOC_PHYCORE_AC97=y CONFIG_SND_SOC_EUKREA_TLV320=y -CONFIG_SND_SOC_IMX_WM8962=y CONFIG_SND_SOC_IMX_ES8328=y CONFIG_SND_SOC_IMX_SGTL5000=y CONFIG_SND_SOC_IMX_SPDIF=y @@ -371,6 +369,7 @@ CONFIG_DMADEVICES=y CONFIG_FSL_EDMA=y CONFIG_IMX_SDMA=y CONFIG_MXS_DMA=y +CONFIG_DMATEST=m CONFIG_STAGING=y CONFIG_STAGING_MEDIA=y CONFIG_VIDEO_IMX_MEDIA=y -- 2.7.4
[PATCH v2 2/3] dmaengine: imx-sdma: add memcpy interface
Add MEMCPY capability for imx-sdma driver. Signed-off-by: Robin Gong --- drivers/dma/imx-sdma.c | 95 -- 1 file changed, 92 insertions(+), 3 deletions(-) diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c index e3d5e73..ef50f2c 100644 --- a/drivers/dma/imx-sdma.c +++ b/drivers/dma/imx-sdma.c @@ -342,6 +342,7 @@ struct sdma_desc { * @pc_from_device:script address for those device_2_memory * @pc_to_device: script address for those memory_2_device * @device_to_device: script address for those device_2_device + * @pc_to_pc: script address for those memory_2_memory * @flags: loop mode or not * @per_address: peripheral source or destination address in common case * destination address in p_2_p case @@ -367,6 +368,7 @@ struct sdma_channel { enum dma_slave_buswidth word_size; unsigned intpc_from_device, pc_to_device; unsigned intdevice_to_device; + unsigned intpc_to_pc; unsigned long flags; dma_addr_t per_address, per_address2; unsigned long event_mask[2]; @@ -869,14 +871,16 @@ static void sdma_get_pc(struct sdma_channel *sdmac, * These are needed once we start to support transfers between * two peripherals or memory-to-memory transfers */ - int per_2_per = 0; + int per_2_per = 0, emi_2_emi = 0; sdmac->pc_from_device = 0; sdmac->pc_to_device = 0; sdmac->device_to_device = 0; + sdmac->pc_to_pc = 0; switch (peripheral_type) { case IMX_DMATYPE_MEMORY: + emi_2_emi = sdma->script_addrs->ap_2_ap_addr; break; case IMX_DMATYPE_DSP: emi_2_per = sdma->script_addrs->bp_2_ap_addr; @@ -949,6 +953,7 @@ static void sdma_get_pc(struct sdma_channel *sdmac, sdmac->pc_from_device = per_2_emi; sdmac->pc_to_device = emi_2_per; sdmac->device_to_device = per_2_per; + sdmac->pc_to_pc = emi_2_emi; } static int sdma_load_context(struct sdma_channel *sdmac) @@ -965,6 +970,8 @@ static int sdma_load_context(struct sdma_channel *sdmac) load_address = sdmac->pc_from_device; else if (sdmac->direction == DMA_DEV_TO_DEV) load_address = sdmac->device_to_device; + else if (sdmac->direction == DMA_MEM_TO_MEM) + load_address = sdmac->pc_to_pc; else load_address = sdmac->pc_to_device; @@ -1214,10 +1221,28 @@ static int sdma_alloc_chan_resources(struct dma_chan *chan) { struct sdma_channel *sdmac = to_sdma_chan(chan); struct imx_dma_data *data = chan->private; + struct imx_dma_data mem_data; int prio, ret; - if (!data) - return -EINVAL; + /* +* MEMCPY may never setup chan->private by filter function such as +* dmatest, thus create 'struct imx_dma_data mem_data' for this case. +* Please note in any other slave case, you have to setup chan->private +* with 'struct imx_dma_data' in your own filter function if you want to +* request dma channel by dma_request_channel() rather than +* dma_request_slave_channel(). Othwise, 'MEMCPY in case?' will appear +* to warn you to correct your filter function. +*/ + if (!data) { + dev_dbg(sdmac->sdma->dev, "MEMCPY in case?\n"); + mem_data.priority = 2; + mem_data.peripheral_type = IMX_DMATYPE_MEMORY; + mem_data.dma_request = 0; + mem_data.dma_request2 = 0; + data = &mem_data; + + sdma_get_pc(sdmac, IMX_DMATYPE_MEMORY); + } switch (data->priority) { case DMA_PRIO_HIGH: @@ -1307,6 +1332,10 @@ static struct sdma_desc *sdma_transfer_init(struct sdma_channel *sdmac, if (sdma_alloc_bd(desc)) goto err_desc_out; + /* No slave_config called in MEMCPY case, so do here */ + if (direction == DMA_MEM_TO_MEM) + sdma_config_ownership(sdmac, false, true, false); + if (sdma_load_context(sdmac)) goto err_desc_out; @@ -1318,6 +1347,63 @@ static struct sdma_desc *sdma_transfer_init(struct sdma_channel *sdmac, return NULL; } +static struct dma_async_tx_descriptor *sdma_prep_memcpy( + struct dma_chan *chan, dma_addr_t dma_dst, + dma_addr_t dma_src, size_t len, unsigned long flags) +{ + struct sdma_channel *sdmac = to_sdma_chan(chan); + struct sdma_engine *sdma = sdmac->sdma; + int channel = sdmac->channel; + size_t count; + int i = 0, param; + struct sdma_buffer_descriptor *bd; + struct sdma_desc *desc; + + if (!chan || !len) + return NULL; + + dev
[lkp-robot] [xarray] f0b90e702f: BUG:soft_lockup-CPU##stuck_for#s
FYI, we noticed the following commit (built with gcc-7): commit: f0b90e702fe74fa575b7382ec3474d341098d5b1 ("xarray: Add XArray unconditional store operations") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master in testcase: boot on test machine: qemu-system-i386 -enable-kvm -cpu Haswell,+smep,+smap -m 360M caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): ++++ || 3d730c4294 | f0b90e702f | ++++ | boot_successes | 0 | 0 | | boot_failures | 14 | 25 | | WARNING:at_mm/slab_common.c:#kmalloc_slab | 14 | 25 | | EIP:kmalloc_slab | 14 | 25 | | Mem-Info | 14 | 25 | | INFO:trying_to_register_non-static_key | 14 | 25 | | BUG:unable_to_handle_kernel| 14 || | Oops:#[##] | 14 || | EIP:__pci_epf_register_driver | 14 || | Kernel_panic-not_syncing:Fatal_exception | 14 || | BUG:soft_lockup-CPU##stuck_for#s | 0 | 25 | | EIP:xa_entry | 0 | 5 | | Kernel_panic-not_syncing:softlockup:hung_tasks | 0 | 25 | | EIP:xa_is_node | 0 | 8 | | EIP:xas_load | 0 | 2 | | EIP:debug_lockdep_rcu_enabled | 0 | 1 | | EIP:xa_load| 0 | 3 | | EIP:xas_descend| 0 | 2 | | EIP:xa_head| 0 | 1 | | EIP:xas_start | 0 | 3 | ++++ [ 44.03] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1] [ 44.03] irq event stamp: 1072387 [ 44.03] hardirqs last enabled at (1072387): [<4106ebde>] console_unlock+0x3f3/0x42d [ 44.03] hardirqs last disabled at (1072386): [<4106e84f>] console_unlock+0x64/0x42d [ 44.03] softirqs last enabled at (1072364): [<417ecbeb>] __do_softirq+0x183/0x1b3 [ 44.03] softirqs last disabled at (1072357): [<41007967>] do_softirq_own_stack+0x1d/0x23 [ 44.03] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 4.18.0-rc3-00012-gf0b90e7 #169 [ 44.03] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 44.03] EIP: xa_is_node+0x0/0x1a [ 44.03] Code: 89 73 08 89 7b 0c eb 0b 39 43 14 72 0c 8b 75 ec 8b 7d f0 89 73 10 89 7b 14 8d 4d ec 89 d8 e8 88 fe ff ff 5a 59 5b 5e 5f 5d c3 <89> c2 55 83 e2 03 83 fa 02 89 e5 0f 94 c2 3d 00 10 00 00 0f 97 c0 [ 44.03] EAX: 4c93caf2 EBX: 5442fec0 ECX: 4c93caf2 EDX: 0001 [ 44.03] ESI: EDI: EBP: 5442feb4 ESP: 5442feac [ 44.03] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00200293 [ 44.03] CR0: 80050033 CR2: CR3: 01d27000 CR4: 000406b0 [ 44.03] Call Trace: [ 44.03] ? xas_load+0x26/0x2f [ 44.03] ? xa_load+0x35/0x52 [ 44.03] ? xarray_checks+0x8c2/0x984 [ 44.03] ? check_xa_tag_1+0x308/0x308 [ 44.03] ? do_one_initcall+0x6a/0x13c [ 44.03] ? parse_args+0xd9/0x1e3 [ 44.03] ? kernel_init_freeable+0xe1/0x172 [ 44.03] ? rest_init+0xaf/0xaf [ 44.03] ? kernel_init+0x8/0xd0 [ 44.03] ? ret_from_fork+0x19/0x24 [ 44.03] Kernel panic - not syncing: softlockup: hung tasks [ 44.03] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GWL 4.18.0-rc3-00012-gf0b90e7 #169 [ 44.03] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 44.03] Call Trace: [ 44.03] ? dump_stack+0x79/0xab [ 44.03] ? panic+0x99/0x1d8 [ 44.03] ? watchdog_timer_fn+0x1ac/0x1d3 [ 44.03] ? __hrtimer_run_queues+0xa0/0x114 [ 44.03] ? watchdog+0x16/0x16 [ 44.03] ? hrtimer_run_queues+0xd2/0xe5 [ 44.03] ? run_local_timers+0x15/0x39 [ 44.03] ? update_process_times+0x18/0x39 [ 44.03] ? tick_nohz_handler+0xba/0xfb [ 44.03] ? smp_apic_timer_interrupt+0x54/0x67 [ 44.03] ? apic_timer_interrupt+0x41/0x48 [ 44.03] ? siphash_2u64+0x54f/0x7de [ 44.03] ? minmax_running_min+0x6f/0x6f [ 44.03] ? xas_load+0x26/0x2f [ 44.03] ? xa_load+0x35/0x52 [ 44.03] ? xarray_checks+0x8c2/0x984 [ 44.03] ? check_xa_tag_1+0x308/0x3
[PATCH v2 1/3] dmaengine: imx-sdma: add SDMA_BD_MAX_CNT to replace '0xffff'
Add macro SDMA_BD_MAX_CNT to replace '0x'. Signed-off-by: Robin Gong --- drivers/dma/imx-sdma.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c index 3b622d6..e3d5e73 100644 --- a/drivers/dma/imx-sdma.c +++ b/drivers/dma/imx-sdma.c @@ -185,6 +185,7 @@ * Mode/Count of data node descriptors - IPCv2 */ struct sdma_mode_count { +#define SDMA_BD_MAX_CNT0x u32 count : 16; /* size of the buffer pointed by this BD */ u32 status : 8; /* E,R,I,C,W,D status bits stored here */ u32 command : 8; /* command mostly used for channel 0 */ @@ -1344,9 +1345,9 @@ static struct dma_async_tx_descriptor *sdma_prep_slave_sg( count = sg_dma_len(sg); - if (count > 0x) { + if (count > SDMA_BD_MAX_CNT) { dev_err(sdma->dev, "SDMA channel %d: maximum bytes for sg entry exceeded: %d > %d\n", - channel, count, 0x); + channel, count, SDMA_BD_MAX_CNT); goto err_bd_out; } @@ -1421,9 +1422,9 @@ static struct dma_async_tx_descriptor *sdma_prep_dma_cyclic( sdmac->flags |= IMX_DMA_SG_LOOP; - if (period_len > 0x) { + if (period_len > SDMA_BD_MAX_CNT) { dev_err(sdma->dev, "SDMA channel %d: maximum period size exceeded: %zu > %d\n", - channel, period_len, 0x); + channel, period_len, SDMA_BD_MAX_CNT); goto err_bd_out; } @@ -1970,7 +1971,7 @@ static int sdma_probe(struct platform_device *pdev) sdma->dma_device.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT; sdma->dma_device.device_issue_pending = sdma_issue_pending; sdma->dma_device.dev->dma_parms = &sdma->dma_parms; - dma_set_max_seg_size(sdma->dma_device.dev, 65535); + dma_set_max_seg_size(sdma->dma_device.dev, SDMA_BD_MAX_CNT); platform_set_drvdata(pdev, sdma); -- 2.7.4
[PATCH v2 0/3] add memcpy support for sdma
This patchset is to add memcpy interface for imx-sdma, besides,to support dmatest and enable config by default, so that could test dma easily without any other device support such as uart/audio/spi... Change from v1: 1. remove bus_width check for memcpy since only max bus width needed for memcpy case to speedup copy. 2. remove DMATEST support patch, since DMATEST is a common memcpy case. 3. split to single patch for SDMA_BD_MAX_CNT instead of '0x' 4. move sdma_config_ownership() from alloc_chan into sdma_prep_memcpy. 5. address some minor review comments. Robin Gong (3): dmaengine: imx-sdma: add SDMA_BD_MAX_CNT to replace '0x' dmaengine: imx-sdma: add memcpy interface ARM: configs: imx_v6_v7_defconfig: add DMATEST support arch/arm/configs/imx_v6_v7_defconfig | 3 +- drivers/dma/imx-sdma.c | 106 --- 2 files changed, 99 insertions(+), 10 deletions(-) -- 2.7.4
Re: [PATCH] vfio-pci: Disable binding to PFs with SR-IOV enabled
On Thu, Jul 12, 2018 at 04:33:04PM -0600, Alex Williamson wrote: > We expect to receive PFs with SR-IOV disabled, however some host > drivers leave SR-IOV enabled at unbind. This puts us in a state where > we can potentially assign both the PF and the VF, leading to both > functionality as well as security concerns due to lack of managing the > SR-IOV state as well as vendor dependent isolation from the PF to VF. > If we were to attempt to actively disable SR-IOV on driver probe, we > risk VF bound drivers blocking, potentially risking live lock > scenarios. Therefore simply refuse to bind to PFs with SR-IOV enabled > with a warning message indicating the issue. Users can resolve this > by re-binding to the host driver and disabling SR-IOV before > attempting to use the device with vfio-pci. > > Signed-off-by: Alex Williamson Reviewed-by: Peter Xu -- Peter Xu
[RFC PATCH] vfio/pci: map prefetchble bars as writecombine
By default all BARs map with VMA access permissions as pgprot_noncached. In ARM64 pgprot_noncached is MT_DEVICE_nGnRnE which is strongly ordered and allows aligned access. This type of mapping works for NON-PREFETCHABLE bars containing EP controller registers. But it restricts PREFETCHABLE bars from doing unaligned access. In CMB NVMe drives PREFETCHABLE bars are required to map as MT_NORMAL_NC to do unaligned access. Signed-off-by: Srinath Mannam Reviewed-by: Ray Jui Reviewed-by: Vikram Prakash --- drivers/vfio/pci/vfio_pci.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index b423a30..eff6b65 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1142,7 +1142,10 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) } vma->vm_private_data = vdev; - vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + if (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH) + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); + else + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, -- 2.7.4
Important Notice...Reply Now
My wife and I won the Euro Millions Lottery of £53 Million British Pounds and we have voluntarily decided to donate 1,000,000EUR(One Million Euros) to 5 individuals randomly as part of our own charity project. To verify our lottery winnings,please see our interview by visiting the web page below: telegraph.co.uk/news/newstopics/howaboutthat/11511467/Lincolnshire-couple-win-53m-on-EuroMillions.html Your email address was among the emails which were submitted to us by the Google Inc. as a web user; if you have received our email,kindly send us the below details so that we can transfer your 1,000,000.00 EUR(One Million Euros) to you in your own country. Full Names: Mobile No: Age: Occupation: Country: Send your response to: rmaxwel...@yahoo.com Best Regards, Richard & Angela Maxwell
Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire
On Thu, Jul 12, 2018 at 07:05:39PM -0700, Daniel Lustig wrote: > On 7/12/2018 11:10 AM, Linus Torvalds wrote: > > On Thu, Jul 12, 2018 at 11:05 AM Peter Zijlstra > > wrote: > >> > >> The locking pattern is fairly simple and shows where RCpc comes apart > >> from expectation real nice. > > > > So who does RCpc right now for the unlock-lock sequence? Somebody > > mentioned powerpc. Anybody else? > > > > How nasty would be be to make powerpc conform? I will always advocate > > tighter locking and ordering rules over looser ones.. > > > > Linus > > RISC-V probably would have been RCpc if we weren't having this discussion. > Depending on how we map atomics/acquire/release/unlock/lock, we can end up > producing RCpc, "RCtso" (feel free to find a better name here...), or RCsc > behaviors, and we're trying to figure out which we actually need. > > I think the debate is this: > > Obviously programmers would prefer just to have RCsc and not have to figure > out > all the complexity of the other options. On x86 or architectures with native > RCsc operations (like ARMv8), that's generally easy enough to get. > > For weakly-ordered architectures that use fences for ordering (including > PowerPC and sometimes RISC-V, see below), though, it takes extra fences to go > from RCpc to either "RCtso" or RCsc. People using these architectures are > concerned about whether there's a negative performance impact from those extra > fences. > > However, some scheduler code, some RCU code, and probably some other examples > already implicitly or explicitly assume unlock()/lock() provides stronger > ordering than RCpc. Just to be clear, the RCU code uses smp_mb__after_unlock_lock() to get the ordering that it needs out of spinlocks. Maybe that is what you meant by "explicitly assume", but I figured I should clarify. Thanx, Paul
Re: Consolidating RCU-bh, RCU-preempt, and RCU-sched
On Fri, Jul 13, 2018 at 11:47:18AM +0800, Lai Jiangshan wrote: > On Fri, Jul 13, 2018 at 8:02 AM, Paul E. McKenney > wrote: > > Hello! > > > > I now have a semi-reasonable prototype of changes consolidating the > > RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree. > > There are likely still bugs to be fixed and probably other issues as well, > > but a prototype does exist. > > > > Assuming continued good rcutorture results and no objections, I am > > thinking in terms of this timeline: > > > > o Preparatory work and cleanups are slated for the v4.19 merge window. > > > > o The actual consolidation and post-consolidation cleanup is slated > > for the merge window after v4.19 (v5.0?). These cleanups include > > the replacements called out below within the RCU implementation > > itself (but excluding kernel/rcu/sync.c, see question below). > > > > o Replacement of now-obsolete update APIs is slated for the second > > merge window after v4.19 (v5.1?). The replacements are currently > > expected to be as follows: > > > > synchronize_rcu_bh() -> synchronize_rcu() > > synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited() > > call_rcu_bh() -> call_rcu() > > rcu_barrier_bh() -> rcu_barrier() > > synchronize_sched() -> synchronize_rcu() > > synchronize_sched_expedited() -> synchronize_rcu_expedited() > > call_rcu_sched() -> call_rcu() > > rcu_barrier_sched() -> rcu_barrier() > > get_state_synchronize_sched() -> get_state_synchronize_rcu() > > cond_synchronize_sched() -> cond_synchronize_rcu() > > synchronize_rcu_mult() -> synchronize_rcu() > > > > I have done light testing of these replacements with good results. > > > > Any objections to this timeline? > > > > I also have some questions on the ultimate end point. I have default > > choices, which I will likely take if there is no discussion. > > > > o > > Currently, I am thinking in terms of keeping the per-flavor > > read-side functions. For example, rcu_read_lock_bh() would > > continue to disable softirq, and would also continue to tell > > lockdep about the RCU-bh read-side critical section. However, > > synchronize_rcu() will wait for all flavors of read-side critical > > sections, including those introduced by (say) preempt_disable(), > > so there will no longer be any possibility of mismatching (say) > > RCU-bh readers with RCU-sched updaters. > > > > I could imagine other ways of handling this, including: > > > > a. Eliminate rcu_read_lock_bh() in favor of > > local_bh_disable() and so on. Rely on lockdep > > instrumentation of these other functions to identify RCU > > readers, introducing such instrumentation as needed. I am > > not a fan of this approach because of the large number of > > places in the Linux kernel where interrupts, preemption, > > and softirqs are enabled or disabled "behind the scenes". > > > > b. Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(), > > and required callers to also disable softirqs, preemption, > > or whatever as needed. I am not a fan of this approach > > because it seems a lot less convenient to users of RCU-bh > > and RCU-sched. > > > > At the moment, I therefore favor keeping the RCU-bh and RCU-sched > > read-side APIs. But are there better approaches? > > Hello, Paul > > Since local_bh_disable() will be guaranteed to be protected by RCU > and more general. I'm afraid it will be preferred over > rcu_read_lock_bh() which will be gradually being phased out. > > In other words, keeping the RCU-bh read-side APIs will be a slower > version of the option A. So will the same approach for the RCU-sched. > But it'll still be better than the hurrying option A, IMHO. I am OK with the read-side RCU-bh and RCU-sched interfaces going away, it is just that I am not willing to put all that much effort into it myself. ;-) Unless there is a good reason for me to hurry it along, of course. Thanx, Paul > Thanks, > Lai > > > > > o How should kernel/rcu/sync.c be handled? Here are some > > possibilities: > > > > a. Leave the full gp_ops[] array and simply translate > > the obsolete update-side functions to their RCU > > equivalents. > > > > b. Leave the current gp_ops[] array, but only have > > the RCU_SYNC entry. The __INIT_HELD field would > > be set to a function that was OK with being in an > > RCU read-side critical section, an interrupt-disabled > > section, etc. > > > >
Re: Consolidating RCU-bh, RCU-preempt, and RCU-sched
On Fri, Jul 13, 2018 at 8:02 AM, Paul E. McKenney wrote: > Hello! > > I now have a semi-reasonable prototype of changes consolidating the > RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree. > There are likely still bugs to be fixed and probably other issues as well, > but a prototype does exist. > > Assuming continued good rcutorture results and no objections, I am > thinking in terms of this timeline: > > o Preparatory work and cleanups are slated for the v4.19 merge window. > > o The actual consolidation and post-consolidation cleanup is slated > for the merge window after v4.19 (v5.0?). These cleanups include > the replacements called out below within the RCU implementation > itself (but excluding kernel/rcu/sync.c, see question below). > > o Replacement of now-obsolete update APIs is slated for the second > merge window after v4.19 (v5.1?). The replacements are currently > expected to be as follows: > > synchronize_rcu_bh() -> synchronize_rcu() > synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited() > call_rcu_bh() -> call_rcu() > rcu_barrier_bh() -> rcu_barrier() > synchronize_sched() -> synchronize_rcu() > synchronize_sched_expedited() -> synchronize_rcu_expedited() > call_rcu_sched() -> call_rcu() > rcu_barrier_sched() -> rcu_barrier() > get_state_synchronize_sched() -> get_state_synchronize_rcu() > cond_synchronize_sched() -> cond_synchronize_rcu() > synchronize_rcu_mult() -> synchronize_rcu() > > I have done light testing of these replacements with good results. > > Any objections to this timeline? > > I also have some questions on the ultimate end point. I have default > choices, which I will likely take if there is no discussion. > > o > Currently, I am thinking in terms of keeping the per-flavor > read-side functions. For example, rcu_read_lock_bh() would > continue to disable softirq, and would also continue to tell > lockdep about the RCU-bh read-side critical section. However, > synchronize_rcu() will wait for all flavors of read-side critical > sections, including those introduced by (say) preempt_disable(), > so there will no longer be any possibility of mismatching (say) > RCU-bh readers with RCU-sched updaters. > > I could imagine other ways of handling this, including: > > a. Eliminate rcu_read_lock_bh() in favor of > local_bh_disable() and so on. Rely on lockdep > instrumentation of these other functions to identify RCU > readers, introducing such instrumentation as needed. I am > not a fan of this approach because of the large number of > places in the Linux kernel where interrupts, preemption, > and softirqs are enabled or disabled "behind the scenes". > > b. Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(), > and required callers to also disable softirqs, preemption, > or whatever as needed. I am not a fan of this approach > because it seems a lot less convenient to users of RCU-bh > and RCU-sched. > > At the moment, I therefore favor keeping the RCU-bh and RCU-sched > read-side APIs. But are there better approaches? Hello, Paul Since local_bh_disable() will be guaranteed to be protected by RCU and more general. I'm afraid it will be preferred over rcu_read_lock_bh() which will be gradually being phased out. In other words, keeping the RCU-bh read-side APIs will be a slower version of the option A. So will the same approach for the RCU-sched. But it'll still be better than the hurrying option A, IMHO. Thanks, Lai > > o How should kernel/rcu/sync.c be handled? Here are some > possibilities: > > a. Leave the full gp_ops[] array and simply translate > the obsolete update-side functions to their RCU > equivalents. > > b. Leave the current gp_ops[] array, but only have > the RCU_SYNC entry. The __INIT_HELD field would > be set to a function that was OK with being in an > RCU read-side critical section, an interrupt-disabled > section, etc. > > This allows for possible addition of SRCU functionality. > It is also a trivial change. Note that the sole user > of sync.c uses RCU_SCHED_SYNC, and this would need to > be changed to RCU_SYNC. > > But is it likely that we will ever add SRCU? > > c. Eliminate that gp_ops[] array, hard-coding the function > pointers into their call sites. > > I don't really have a preference. Left to myself, I will be lazy > and take opt
[PATCH v1 1/2] mm: fix race on soft-offlining free huge pages
There's a race condition between soft offline and hugetlb_fault which causes unexpected process killing and/or hugetlb allocation failure. The process killing is caused by the following flow: CPU 0 CPU 1 CPU 2 soft offline get_any_page // find the hugetlb is free mmap a hugetlb file page fault ... hugetlb_fault hugetlb_no_page alloc_huge_page // succeed soft_offline_free_page // set hwpoison flag mmap the hugetlb file page fault ... hugetlb_fault hugetlb_no_page find_lock_page return VM_FAULT_HWPOISON mm_fault_error do_sigbus // kill the process The hugetlb allocation failure comes from the following flow: CPU 0 CPU 1 mmap a hugetlb file // reserve all free page but don't fault-in soft offline get_any_page // find the hugetlb is free soft_offline_free_page // set hwpoison flag dissolve_free_huge_page // fail because all free hugepages are reserved page fault ... hugetlb_fault hugetlb_no_page alloc_huge_page ... dequeue_huge_page_node_exact // ignore hwpoisoned hugepage // and finally fail due to no-mem The root cause of this is that current soft-offline code is written based on an assumption that PageHWPoison flag should beset at first to avoid accessing the corrupted data. This makes sense for memory_failure() or hard offline, but does not for soft offline because soft offline is about corrected (not uncorrected) error and is safe from data lost. This patch changes soft offline semantics where it sets PageHWPoison flag only after containment of the error page completes successfully. Reported-by: Xishi Qiu Suggested-by: Xishi Qiu Signed-off-by: Naoya Horiguchi --- mm/hugetlb.c| 11 +-- mm/memory-failure.c | 22 -- mm/migrate.c| 2 -- 3 files changed, 21 insertions(+), 14 deletions(-) diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/hugetlb.c v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/hugetlb.c index 430be42..937c142 100644 --- v4.18-rc4-mmotm-2018-07-10-16-50/mm/hugetlb.c +++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/hugetlb.c @@ -1479,22 +1479,20 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, /* * Dissolve a given free hugepage into free buddy pages. This function does * nothing for in-use (including surplus) hugepages. Returns -EBUSY if the - * number of free hugepages would be reduced below the number of reserved - * hugepages. + * dissolution fails because a give page is not a free hugepage, or because + * free hugepages are fully reserved. */ int dissolve_free_huge_page(struct page *page) { - int rc = 0; + int rc = -EBUSY; spin_lock(&hugetlb_lock); if (PageHuge(page) && !page_count(page)) { struct page *head = compound_head(page); struct hstate *h = page_hstate(head); int nid = page_to_nid(head); - if (h->free_huge_pages - h->resv_huge_pages == 0) { - rc = -EBUSY; + if (h->free_huge_pages - h->resv_huge_pages == 0) goto out; - } /* * Move PageHWPoison flag from head page to the raw error page, * which makes any subpages rather than the error page reusable. @@ -1508,6 +1506,7 @@ int dissolve_free_huge_page(struct page *page) h->free_huge_pages_node[nid]--; h->max_huge_pages--; update_and_free_page(h, head); + rc = 0; } out: spin_unlock(&hugetlb_lock); diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c index 9d142b9..c63d982 100644 --- v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c +++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c @@ -1598,8 +1598,18 @@ static int
[PATCH v1 0/2] mm: soft-offline: fix race against page allocation
Xishi recently reported the issue about race on reusing the target pages of soft offlining. Discussion and analysis showed that we need make sure that setting PG_hwpoison should be done in the right place under zone->lock for soft offline. 1/2 handles free hugepage's case, and 2/2 hanldes free buddy page's case. Thanks, Naoya Horiguchi --- Summary: Naoya Horiguchi (2): mm: fix race on soft-offlining free huge pages mm: soft-offline: close the race against page allocation include/linux/page-flags.h | 5 + include/linux/swapops.h| 10 -- mm/hugetlb.c | 11 +-- mm/memory-failure.c| 44 +++- mm/migrate.c | 4 +--- mm/page_alloc.c| 29 + 6 files changed, 75 insertions(+), 28 deletions(-)
[PATCH v1 2/2] mm: soft-offline: close the race against page allocation
A process can be killed with SIGBUS(BUS_MCEERR_AR) when it tries to allocate a page that was just freed on the way of soft-offline. This is undesirable because soft-offline (which is about corrected error) is less aggressive than hard-offline (which is about uncorrected error), and we can make soft-offline fail and keep using the page for good reason like "system is busy." Two main changes of this patch are: - setting migrate type of the target page to MIGRATE_ISOLATE. As done in free_unref_page_commit(), this makes kernel bypass pcplist when freeing the page. So we can assume that the page is in freelist just after put_page() returns, - setting PG_hwpoison on free page under zone->lock which protects freelists, so this allows us to avoid setting PG_hwpoison on a page that is decided to be allocated soon. Reported-by: Xishi Qiu Signed-off-by: Naoya Horiguchi --- include/linux/page-flags.h | 5 + include/linux/swapops.h| 10 -- mm/memory-failure.c| 26 +- mm/migrate.c | 2 +- mm/page_alloc.c| 29 + 5 files changed, 56 insertions(+), 16 deletions(-) diff --git v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/page-flags.h v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/page-flags.h index 901943e..74bee8c 100644 --- v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/page-flags.h +++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/page-flags.h @@ -369,8 +369,13 @@ PAGEFLAG_FALSE(Uncached) PAGEFLAG(HWPoison, hwpoison, PF_ANY) TESTSCFLAG(HWPoison, hwpoison, PF_ANY) #define __PG_HWPOISON (1UL << PG_hwpoison) +extern bool set_hwpoison_free_buddy_page(struct page *page); #else PAGEFLAG_FALSE(HWPoison) +static inline bool set_hwpoison_free_buddy_page(struct page *page) +{ + return 0; +} #define __PG_HWPOISON 0 #endif diff --git v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/swapops.h v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/swapops.h index 9c0eb4d..fe8e08b 100644 --- v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/swapops.h +++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/swapops.h @@ -335,11 +335,6 @@ static inline int is_hwpoison_entry(swp_entry_t entry) return swp_type(entry) == SWP_HWPOISON; } -static inline bool test_set_page_hwpoison(struct page *page) -{ - return TestSetPageHWPoison(page); -} - static inline void num_poisoned_pages_inc(void) { atomic_long_inc(&num_poisoned_pages); @@ -362,11 +357,6 @@ static inline int is_hwpoison_entry(swp_entry_t swp) return 0; } -static inline bool test_set_page_hwpoison(struct page *page) -{ - return false; -} - static inline void num_poisoned_pages_inc(void) { } diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c index c63d982..794687a 100644 --- v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c +++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c @@ -57,6 +57,7 @@ #include #include #include +#include #include "internal.h" #include "ras/ras_event.h" @@ -1697,6 +1698,7 @@ static int __soft_offline_page(struct page *page, int flags) static int soft_offline_in_use_page(struct page *page, int flags) { int ret; + int mt; struct page *hpage = compound_head(page); if (!PageHuge(page) && PageTransHuge(hpage)) { @@ -1715,23 +1717,37 @@ static int soft_offline_in_use_page(struct page *page, int flags) put_hwpoison_page(hpage); } + /* +* Setting MIGRATE_ISOLATE here ensures that the page will be linked +* to free list immediately (not via pcplist) when released after +* successful page migration. Otherwise we can't guarantee that the +* page is really free after put_page() returns, so +* set_hwpoison_free_buddy_page() highly likely fails. +*/ + mt = get_pageblock_migratetype(page); + set_pageblock_migratetype(page, MIGRATE_ISOLATE); if (PageHuge(page)) ret = soft_offline_huge_page(page, flags); else ret = __soft_offline_page(page, flags); - + set_pageblock_migratetype(page, mt); return ret; } -static void soft_offline_free_page(struct page *page) +static int soft_offline_free_page(struct page *page) { int rc = 0; struct page *head = compound_head(page); if (PageHuge(head)) rc = dissolve_free_huge_page(page); - if (!rc && !TestSetPageHWPoison(page)) - num_poisoned_pages_inc(); + if (!rc) { + if (set_hwpoison_free_buddy_page(page)) + num_poisoned_pages_inc(); + else + rc = -EBUSY; + } + return rc; } /** @@ -1775,7 +1791,7 @@ int soft_offline_page(struct page *page, int flags) if (ret > 0) ret = soft_offline
Re: [PATCH 5/5] f2fs: do not __punch_discard_cmd in lfs mode
On 2018/7/12 23:09, Yunlong Song wrote: > In lfs mode, it is better to submit and wait for discard of the > new_blkaddr's overall section, rather than punch it which makes > more small discards and is not friendly with flash alignment. And > f2fs does not have to wait discard of each new_blkaddr except for the > start_block of each section with this patch. For non-zoned block device, unaligned discard can be allowed; and if synchronous discard is very slow, it will block block allocator here, rather than that, I prefer just punch 4k lba of discard entry for performance. If you don't want to encounter this condition, I suggest issue large size discard more quickly. Thanks, > > Signed-off-by: Yunlong Song > --- > fs/f2fs/segment.c | 76 > ++- > fs/f2fs/segment.h | 7 - > 2 files changed, 75 insertions(+), 8 deletions(-) > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index f6c20e0..bce321a 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -893,7 +893,19 @@ static void __remove_discard_cmd(struct f2fs_sb_info > *sbi, > static void f2fs_submit_discard_endio(struct bio *bio) > { > struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private; > + struct f2fs_sb_info *sbi = F2FS_SB(dc->bdev->bd_super); > > + if (test_opt(sbi, LFS)) { > + unsigned int segno = GET_SEGNO(sbi, dc->lstart); > + unsigned int secno = GET_SEC_FROM_SEG(sbi, segno); > + int cnt = (dc->len >> sbi->log_blocks_per_seg) / > + sbi->segs_per_sec; > + > + while (cnt--) { > + set_bit(secno, FREE_I(sbi)->discard_secmap); > + secno++; > + } > + } > dc->error = blk_status_to_errno(bio->bi_status); > dc->state = D_DONE; > complete_all(&dc->wait); > @@ -1349,8 +1361,15 @@ static void f2fs_wait_discard_bio(struct f2fs_sb_info > *sbi, block_t blkaddr) > dc = (struct discard_cmd *)f2fs_lookup_rb_tree(&dcc->root, > NULL, blkaddr); > if (dc) { > - if (dc->state == D_PREP) { > + if (dc->state == D_PREP && !test_opt(sbi, LFS)) > __punch_discard_cmd(sbi, dc, blkaddr); > + else if (dc->state == D_PREP && test_opt(sbi, LFS)) { > + struct discard_policy dpolicy; > + > + __init_discard_policy(sbi, &dpolicy, DPOLICY_FORCE, 1); > + __submit_discard_cmd(sbi, &dpolicy, dc); > + dc->ref++; > + need_wait = true; > } else { > dc->ref++; > need_wait = true; > @@ -2071,9 +2090,10 @@ static void get_new_segment(struct f2fs_sb_info *sbi, > unsigned int hint = GET_SEC_FROM_SEG(sbi, *newseg); > unsigned int old_zoneno = GET_ZONE_FROM_SEG(sbi, *newseg); > unsigned int left_start = hint; > - bool init = true; > + bool init = true, check_discard = test_opt(sbi, LFS) ? true : false; > int go_left = 0; > int i; > + unsigned long *free_secmap; > > spin_lock(&free_i->segmap_lock); > > @@ -2084,11 +2104,25 @@ static void get_new_segment(struct f2fs_sb_info *sbi, > goto got_it; > } > find_other_zone: > - secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint); > + if (check_discard) { > + int entries = f2fs_bitmap_size(MAIN_SECS(sbi)) / > sizeof(unsigned long); > + > + free_secmap = free_i->tmp_secmap; > + for (i = 0; i < entries; i++) > + free_secmap[i] = (!(free_i->free_secmap[i] ^ > + free_i->discard_secmap[i])) | > free_i->free_secmap[i]; > + } else > + free_secmap = free_i->free_secmap; > + > + secno = find_next_zero_bit(free_secmap, MAIN_SECS(sbi), hint); > if (secno >= MAIN_SECS(sbi)) { > if (dir == ALLOC_RIGHT) { > - secno = find_next_zero_bit(free_i->free_secmap, > + secno = find_next_zero_bit(free_secmap, > MAIN_SECS(sbi), 0); > + if (secno >= MAIN_SECS(sbi) && check_discard) { > + check_discard = false; > + goto find_other_zone; > + } > f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi)); > } else { > go_left = 1; > @@ -2098,13 +2132,17 @@ static void get_new_segment(struct f2fs_sb_info *sbi, > if (go_left == 0) > goto skip_left; > > - while (test_bit(left_start, free_i->free_secmap)) { > + while (test_bit(left_start, free_secmap)) { > if (left_start > 0) { > left_start--; > continue; >
Re: [PATCH v2 1/3] dt-bindings: thermal: Add binding document for SR thermal
Hi Rob, I have provided my inputs for the purpose of having multiple nodes. Please get back if you have any comments or suggestions. Regards, Srinath. On Tue, Jul 3, 2018 at 4:15 PM, Srinath Mannam wrote: > Hi Rob, > > Kindly provide your feedback. > > Regards, > Srinath. > > On Fri, Jun 22, 2018 at 11:21 AM, Srinath Mannam > wrote: >> Hi Rob, >> >> Please find my comments for the reason to have multiple DT nodes. >> >> On Thu, Jun 21, 2018 at 1:22 AM, Rob Herring wrote: >>> On Mon, Jun 18, 2018 at 02:01:17PM +0530, Srinath Mannam wrote: From: Pramod Kumar Add binding document for supported thermal implementation in Stingray. Signed-off-by: Pramod Kumar Reviewed-by: Ray Jui Reviewed-by: Scott Branden Reviewed-by: Srinath Mannam --- .../bindings/thermal/brcm,sr-thermal.txt | 45 ++ 1 file changed, 45 insertions(+) create mode 100644 Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt diff --git a/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt b/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt new file mode 100644 index 000..33f9e11 --- /dev/null +++ b/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt @@ -0,0 +1,45 @@ +* Broadcom Stingray Thermal + +This binding describes thermal sensors that is part of Stingray SoCs. + +Required properties: +- compatible : Must be "brcm,sr-thermal" +- reg : memory where tmon data will be available. + +Example: + tmons { + compatible = "simple-bus"; + #address-cells = <1>; + #size-cells = <1>; + ranges; + + tmon_ihost0: thermal@8f10 { + compatible = "brcm,sr-thermal"; + reg = <0x8f10 0x4>; + }; >>> >>> You still haven't given me a compelling reason why you need a node per >>> register. >>> >>> You have a single range of registers. Make this 1 node. >>> >> >> We Have two reasons to have multiple nodes.. >> 1. Our chip has multiple functional blocks. Each functional block has >> its own thermal zone. >> Functional blocks and their thermal zones enabled/disabled based on end >> product. >> Few functional blocks need to disabled for few products so thermal >> zones also need to disable. >> In that case, nodes of specific thermal zones are removed from DTS >> file of corresponding product. >> >> 2. Thermal framework provides sysfs interface to configure thermal >> zones and read temperature of thermal zone. >> To configure individual thermal zone, we need to have separate DT node. >> Same to read temperature of individual thermal zone. >> Ex: To read temperature of thermal zone 0. >> cat /sys/class/thermal/thermal_zone0/temp >> To configure trip temperature of thermal zone 0. >> echo 11 > /sys/class/thermal/thermal_zone0/trip_point_0_temp >> >> Also to avoid driver source change for the multiple products it is >> clean to have multiple DT nodes. >> >>> Rob
general protection fault in propagate_entity_cfs_rq
Hello, syzbot found the following crash on: HEAD commit:6fd066604123 Merge branch 'bpf-arm-jit-improvements' git tree: bpf-next console output: https://syzkaller.appspot.com/x/log.txt?x=11e9267840 kernel config: https://syzkaller.appspot.com/x/.config?x=a501a01deaf0fe9 dashboard link: https://syzkaller.appspot.com/bug?extid=2e37f794f31be5667a88 compiler: gcc (GCC) 8.0.1 20180413 (experimental) syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1014db9440 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11f81e7840 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+2e37f794f31be5667...@syzkaller.appspotmail.com IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready 8021q: adding VLAN 0 to HW filter on device team0 IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: [#1] SMP KASAN CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #51 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:propagate_entity_cfs_rq.isra.70+0x199/0x20c0 kernel/sched/fair.c:10039 Code: 0d 02 00 00 48 c7 c0 60 70 2a 89 48 89 f9 48 c1 e8 03 48 01 d8 48 89 85 28 fb ff ff 4c 8d a9 58 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 18 00 0f 85 5e 11 00 00 4c 8b a1 58 01 00 00 0f 1f 44 00 00 RSP: 0018:8801daf06c90 EFLAGS: 00010003 RAX: 03fffe20074fc1d0 RBX: dc00 RCX: 11003a7e0d2c RDX: 11003a7e0d2a RSI: 11003b5e0e7f RDI: 11003a7e0d2c RBP: 8801daf071a0 R08: 8801dae2cbc0 R09: 111a25cc R10: 019d6e0b R11: R12: 11003b5e0e3b R13: 11003a7e0e84 R14: 8801d3f06800 R15: FS: () GS:8801daf0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fb1b24d7e78 CR3: 0001ab04b000 CR4: 001406e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: detach_entity_cfs_rq+0x6e3/0xf50 kernel/sched/fair.c:10059 migrate_task_rq_fair+0xba/0x290 kernel/sched/fair.c:6709 set_task_cpu+0x131/0x770 kernel/sched/core.c:1194 detach_task.isra.89+0xdb/0x150 kernel/sched/fair.c:7438 detach_tasks kernel/sched/fair.c:7525 [inline] load_balance+0xf0b/0x3640 kernel/sched/fair.c:8884 rebalance_domains+0x82a/0xd90 kernel/sched/fair.c:9262 run_rebalance_domains+0x365/0x4c0 kernel/sched/fair.c:9884 __do_softirq+0x2e8/0xb17 kernel/softirq.c:288 invoke_softirq kernel/softirq.c:368 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:408 exiting_irq arch/x86/include/asm/apic.h:527 [inline] smp_apic_timer_interrupt+0x186/0x730 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54 Code: c7 48 89 45 d8 e8 5a 04 24 fa 48 8b 45 d8 e9 d2 fe ff ff 48 89 df e8 49 04 24 fa eb 8a 90 90 90 90 90 90 90 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 90 90 90 90 90 RSP: 0018:8801d9af7c38 EFLAGS: 0286 ORIG_RAX: ff13 RAX: dc00 RBX: 11003b35ef8a RCX: 81667982 RDX: 111e3610 RSI: 0004 RDI: 88f1b080 RBP: 8801d9af7c38 R08: ed003b5e46d7 R09: ed003b5e46d6 R10: ed003b5e46d6 R11: 8801daf236b3 R12: 0001 R13: 8801d9af7cf0 R14: 899edd20 R15: arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0xc7/0x450 arch/x86/kernel/process.c:500 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:491 default_idle_call+0x6d/0x90 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x3aa/0x570 kernel/sched/idle.c:262 cpu_startup_entry+0x10c/0x120 kernel/sched/idle.c:368 start_secondary+0x433/0x5d0 arch/x86/kernel/smpboot.c:265 secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace cb0cd83b57bb4bba ]--- RIP: 0010:propagate_entity_cfs_rq.isra.70+0x199/0x20c0 kernel/sched/fair.c:10039 Code: 0d 02 00 00 48 c7 c0 60 70 2a 89 48 89 f9 48 c1 e8 03 48 01 d8 48 89 85 28 fb ff ff 4c 8d a9 58 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 18 00 0f 85 5e 11 00 00 4c 8b a1 58 01 00 00 0f 1f 44 00 00 RSP: 0018:8801daf06c90 EFLAGS: 00010003 RAX: 03fffe20074fc1d0 RBX: dc00 RCX: 11003a7e0d2c RDX: 11003a7e0d2a RSI: 11003b5e0e7f RDI: 11003a7e0d2c RBP: 8801daf071a0 R08: 8801dae2cbc0 R09: 111a25cc R10: 019d6e0b R11: R12: 11003b5e0e3b R13: 11003a7e0e84 R14: 8801d3f06800 R15: FS: () GS:8801daf0() knlGS: CS: 0010 DS: ES: CR0: 800500
Re: [PATCH 1/2] tracing: kprobes: Prohibit probing on notrace functions
On Thu, 12 Jul 2018 13:54:12 -0400 Francis Deslauriers wrote: > From: Masami Hiramatsu > > Prohibit kprobe-events probing on notrace function. > Since probing on the notrace function can cause recursive > event call. In most case those are just skipped, but > in some case it falls into infinite recursive call. BTW, I'm considering to add an option to allow putting kprobes on notrace function - just for debugging ftrace by kprobes. That is "developer only" option so generally it should be disabled, but for debugging the ftrace, we still need it. Or should I introduce another kprobes module for debugging it? Thank you, -- Masami Hiramatsu
Re: [V9fs-developer] [PATCH v2 3/6] 9p: Replace the fidlist with an IDR
On Fri, Jul 13, 2018 at 10:05:50AM +0800, jiangyiwen wrote: > > @@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client > > *clnt) > > { > > int ret; > > struct p9_fid *fid; > > - unsigned long flags; > > > > p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt); > > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL); > > if (!fid) > > return NULL; > > > > - ret = p9_idpool_get(clnt->fidpool); > > - if (ret < 0) > > - goto error; > > - fid->fid = ret; > > - > > memset(&fid->qid, 0, sizeof(struct p9_qid)); > > fid->mode = -1; > > fid->uid = current_fsuid(); > > fid->clnt = clnt; > > fid->rdir = NULL; > > - spin_lock_irqsave(&clnt->lock, flags); > > - list_add(&fid->flist, &clnt->fidlist); > > - spin_unlock_irqrestore(&clnt->lock, flags); > > + fid->fid = 0; > > > > - return fid; > > + idr_preload(GFP_KERNEL); > > It is best to use GFP_NOFS instead, or else it may cause some > unpredictable problem, because when out of memory it will > reclaim memory from v9fs. Earlier in this function, fid was allocated with GFP_KERNEL: > > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL); > > + spin_lock_irq(&clnt->lock); > > + ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, P9_NOFID - 1, > > + GFP_NOWAIT); > > + spin_unlock_irq(&clnt->lock); > > use spin_lock instead, clnt->lock is not used in irq context. I don't think that's right. What about p9_fid_destroy? It was already using spin_lock_irqsave(), so I just assumed that whoever wrote that code at least considered that it might be called from interrupt context. Also consider p9_free_req() which shares the same lock. We could get rid of clnt->lock altogether as there's a lock embedded in each IDR, but that'll introduce an unwanted dependence on the RDMA tree in this merge window. > > @@ -1095,14 +1086,11 @@ void p9_client_destroy(struct p9_client *clnt) > > > > v9fs_put_trans(clnt->trans_mod); > > > > - list_for_each_entry_safe(fid, fidptr, &clnt->fidlist, flist) { > > + idr_for_each_entry(&clnt->fids, fid, id) { > > pr_info("Found fid %d not clunked\n", fid->fid); > > p9_fid_destroy(fid); > > } > > > > - if (clnt->fidpool) > > - p9_idpool_destroy(clnt->fidpool); > > - > > I suggest add idr_destroy in the end. Why? p9_fid_destroy calls idr_remove() for each fid, so it'll already be empty. Thanks for all the review, to everyone who's submitted review. This is a really healthy community.
Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]
On Thu, Jul 12, 2018 at 11:54:41PM +0100, David Howells wrote: > > Would that mean then that doing: > > mount /dev/sda3 /a > mount /dev/sda3 /b > > would then fail on the second command because /dev/sda3 is already open > exclusively? Good point. One workaround would be to require an open with O_PATH instead. - Ted
Re: [PATCH 0/2] scsi: arcmsr: fix error of resuming from hibernation
Ching, > This patch series are against to mkp's 4.19/scsi-queue. > > 1. Fix error of resuming from hibernation for adapter type E. > 2. Update driver version to v1.40.00.09-20180709 Applied to 4.19/scsi-queue, thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire
On 7/12/2018 2:45 AM, Will Deacon wrote: > On Thu, Jul 12, 2018 at 11:34:32AM +0200, Peter Zijlstra wrote: >> On Thu, Jul 12, 2018 at 09:40:40AM +0200, Peter Zijlstra wrote: >>> And I think if we raise atomic*_acquire() to require TSO (but ideally >>> raise it to RCsc) we're there. >> >> To clarify, just the RmW-acquire. Things like atomic_read_acquire() can >> stay smp_load_acquire() and be RCpc. > > I don't have strong opinions about strengthening RmW atomics to TSO, so > if it helps to unblock Alan's patch (which doesn't go near this!) then I'll > go with it. The important part is that we continue to allow roach motel > into the RmW for other accesses in the non-fully-ordered cases. > > Daniel -- your AMO instructions are cool with this, right? It's just the > fence-based implementations that will need help? > > Will Right, let me pull this part out of the overly-long response I just gave on the thread with Linus :) if we pair AMOs with AMOs, we get RCsc, and everything is fine. If we start mixing in fences (mostly because we don't currently have native load-acquire or store-release opcodes), then that's when all the rest of the complexity comes in. Dan
RE: [PATCH V2] ARM: dts: make pfuze switch always-on for imx platforms
Hi, Shawn Although the commit 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for switch") is reverted to avoid the boot failure on some i.MX platforms, but adding the "regulator-always-on" property for those pfuze's critical switches are the right way and making sense, no matter how the pfuze regulator's switch ON/OFF function will be implemented, below patches should can be applied anyway? ARM: dts: imx6sll-evk: make pfuze100 sw4 always on ARM: dts: make pfuze switch always-on for imx platforms ARM: dts: imx6sl-evk: keep sw4 always on Let me know your thoughts, thanks! Anson Huang Best Regards! > -Original Message- > From: Anson Huang > Sent: Wednesday, June 27, 2018 9:31 AM > To: shawn...@kernel.org; s.ha...@pengutronix.de; ker...@pengutronix.de; > Fabio Estevam ; robh...@kernel.org; > mark.rutl...@arm.com > Cc: dl-linux-imx ; linux-arm-ker...@lists.infradead.org; > devicet...@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: [PATCH V2] ARM: dts: make pfuze switch always-on for imx platforms > > commit 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for switch") > will cause those unreferenced switches being turned off if > "regulator-always-on" is NOT present, as pfuze switches are normally used by > critical modules which must be always ON or shared by many peripherals which > do NOT implement power domain control, so just make sure all switches > always ON to avoid any system issue caused by unexpectedly turning off > switches. > > Fixes: 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for switch") > Signed-off-by: Anson Huang > Reviewed-by: Fabio Estevam > --- > changes since V1: > improve the way of referencing commit, and add fix tag. > arch/arm/boot/dts/imx6q-display5.dtsi | 1 + > arch/arm/boot/dts/imx6q-mccmon6.dts| 1 + > arch/arm/boot/dts/imx6q-novena.dts | 1 + > arch/arm/boot/dts/imx6q-pistachio.dts | 1 + > arch/arm/boot/dts/imx6qdl-gw54xx.dtsi | 1 + > arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 1 + > arch/arm/boot/dts/imx6sx-sdb-reva.dts | 1 + > 7 files changed, 7 insertions(+) > > diff --git a/arch/arm/boot/dts/imx6q-display5.dtsi > b/arch/arm/boot/dts/imx6q-display5.dtsi > index 85232c7..33d266f 100644 > --- a/arch/arm/boot/dts/imx6q-display5.dtsi > +++ b/arch/arm/boot/dts/imx6q-display5.dtsi > @@ -326,6 +326,7 @@ > sw4_reg: sw4 { > regulator-min-microvolt = <80>; > regulator-max-microvolt = <330>; > + regulator-always-on; > }; > > swbst_reg: swbst { > diff --git a/arch/arm/boot/dts/imx6q-mccmon6.dts > b/arch/arm/boot/dts/imx6q-mccmon6.dts > index b7e9f38..e6429c5 100644 > --- a/arch/arm/boot/dts/imx6q-mccmon6.dts > +++ b/arch/arm/boot/dts/imx6q-mccmon6.dts > @@ -166,6 +166,7 @@ > sw4_reg: sw4 { > regulator-min-microvolt = <80>; > regulator-max-microvolt = <330>; > + regulator-always-on; > }; > > swbst_reg: swbst { > diff --git a/arch/arm/boot/dts/imx6q-novena.dts > b/arch/arm/boot/dts/imx6q-novena.dts > index fcd824d..0b3c651 100644 > --- a/arch/arm/boot/dts/imx6q-novena.dts > +++ b/arch/arm/boot/dts/imx6q-novena.dts > @@ -341,6 +341,7 @@ > reg_sw4: sw4 { > regulator-min-microvolt = <80>; > regulator-max-microvolt = <330>; > + regulator-always-on; > }; > > reg_swbst: swbst { > diff --git a/arch/arm/boot/dts/imx6q-pistachio.dts > b/arch/arm/boot/dts/imx6q-pistachio.dts > index a31e83c..6ea09f9 100644 > --- a/arch/arm/boot/dts/imx6q-pistachio.dts > +++ b/arch/arm/boot/dts/imx6q-pistachio.dts > @@ -253,6 +253,7 @@ > sw4_reg: sw4 { > regulator-min-microvolt = <80>; > regulator-max-microvolt = <330>; > + regulator-always-on; > }; > > swbst_reg: swbst { > diff --git a/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi > b/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi > index a1a6fb5..281cae5 100644 > --- a/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi > +++ b/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi > @@ -268,6 +268,7 @@ > sw4_reg: sw4 { > regulator-min-microvolt = <80>; > regulator-max-microvolt = <330>; > + regulator-always-on; > }; > > swbst_reg: swbst { > diff --git a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi > b/arch/arm/boot/dts/imx6qdl-sabresd.dtsi > index 15744ad..6e46a19 100644 > --- a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi > +++ b/arch
[PATCH] mm, swap: Make CONFIG_THP_SWAP depends on CONFIG_SWAP
From: Huang Ying CONFIG_THP_SWAP should depend on CONFIG_SWAP, because it's unreasonable to optimize swapping for THP (Transparent Huge Page) without basic swapping support. In original code, when CONFIG_SWAP=n and CONFIG_THP_SWAP=y, split_swap_cluster() will not be built because it is in swapfile.c, but it will be called in huge_memory.c. This doesn't trigger a build error in practice because the call site is enclosed by PageSwapCache(), which is defined to be constant 0 when CONFIG_SWAP=n. But this is fragile and should be fixed. The comments are fixed too to reflect the latest progress. Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out") Signed-off-by: "Huang, Ying" Reviewed-by: Dan Williams Reviewed-by: Naoya Horiguchi Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Zi Yan Cc: Daniel Jordan --- mm/Kconfig | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index b78e7cd4e9fe..97114c94239c 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -419,10 +419,11 @@ config ARCH_WANTS_THP_SWAP config THP_SWAP def_bool y - depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP + depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP help Swap transparent huge pages in one piece, without splitting. - XXX: For now this only does clustered swap space allocation. + XXX: For now, swap cluster backing transparent huge page + will be split after swapout. For selection by architectures with reasonable THP sizes. -- 2.16.4
Re: [V9fs-developer] [PATCH v2 3/6] 9p: Replace the fidlist with an IDR
On 2018/7/12 5:02, Matthew Wilcox wrote: > The p9_idpool being used to allocate the IDs uses an IDR to allocate > the IDs ... which we then keep in a doubly-linked list, rather than in > the IDR which allocated them. We can use an IDR directly which saves > two pointers per p9_fid, and a tiny memory allocation per p9_client. > > Signed-off-by: Matthew Wilcox > --- > include/net/9p/client.h | 9 +++-- > net/9p/client.c | 44 +++-- > 2 files changed, 19 insertions(+), 34 deletions(-) > > diff --git a/include/net/9p/client.h b/include/net/9p/client.h > index 7af9d769b97d..e405729cd1c7 100644 > --- a/include/net/9p/client.h > +++ b/include/net/9p/client.h > @@ -27,6 +27,7 @@ > #define NET_9P_CLIENT_H > > #include > +#include > > /* Number of requests per row */ > #define P9_ROW_MAXTAG 255 > @@ -128,8 +129,7 @@ struct p9_req_t { > * @proto_version: 9P protocol version to use > * @trans_mod: module API instantiated with this client > * @trans: tranport instance state and API > - * @fidpool: fid handle accounting for session > - * @fidlist: List of active fid handles > + * @fids: All active FID handles > * @tagpool - transaction id accounting for session > * @reqs - 2D array of requests > * @max_tag - current maximum tag id allocated > @@ -169,8 +169,7 @@ struct p9_client { > } tcp; > } trans_opts; > > - struct p9_idpool *fidpool; > - struct list_head fidlist; > + struct idr fids; > > struct p9_idpool *tagpool; > struct p9_req_t *reqs[P9_ROW_MAXTAG]; > @@ -188,7 +187,6 @@ struct p9_client { > * @iounit: the server reported maximum transaction size for this file > * @uid: the numeric uid of the local user who owns this handle > * @rdir: readdir accounting structure (allocated on demand) > - * @flist: per-client-instance fid tracking > * @dlist: per-dentry fid tracking > * > * TODO: This needs lots of explanation. > @@ -204,7 +202,6 @@ struct p9_fid { > > void *rdir; > > - struct list_head flist; > struct hlist_node dlist;/* list of all fids attached to a > dentry */ > }; > > diff --git a/net/9p/client.c b/net/9p/client.c > index 389a2904b7b3..b89c7298267c 100644 > --- a/net/9p/client.c > +++ b/net/9p/client.c > @@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client > *clnt) > { > int ret; > struct p9_fid *fid; > - unsigned long flags; > > p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt); > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL); > if (!fid) > return NULL; > > - ret = p9_idpool_get(clnt->fidpool); > - if (ret < 0) > - goto error; > - fid->fid = ret; > - > memset(&fid->qid, 0, sizeof(struct p9_qid)); > fid->mode = -1; > fid->uid = current_fsuid(); > fid->clnt = clnt; > fid->rdir = NULL; > - spin_lock_irqsave(&clnt->lock, flags); > - list_add(&fid->flist, &clnt->fidlist); > - spin_unlock_irqrestore(&clnt->lock, flags); > + fid->fid = 0; > > - return fid; > + idr_preload(GFP_KERNEL); It is best to use GFP_NOFS instead, or else it may cause some unpredictable problem, because when out of memory it will reclaim memory from v9fs. > + spin_lock_irq(&clnt->lock); > + ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, P9_NOFID - 1, > + GFP_NOWAIT); > + spin_unlock_irq(&clnt->lock); use spin_lock instead, clnt->lock is not used in irq context. > + idr_preload_end(); > + > + if (!ret) > + return fid; > > -error: > kfree(fid); > return NULL; > } > @@ -943,9 +942,8 @@ static void p9_fid_destroy(struct p9_fid *fid) > > p9_debug(P9_DEBUG_FID, "fid %d\n", fid->fid); > clnt = fid->clnt; > - p9_idpool_put(fid->fid, clnt->fidpool); > spin_lock_irqsave(&clnt->lock, flags); > - list_del(&fid->flist); > + idr_remove(&clnt->fids, fid->fid); > spin_unlock_irqrestore(&clnt->lock, flags); > kfree(fid->rdir); > kfree(fid); > @@ -1028,7 +1026,7 @@ struct p9_client *p9_client_create(const char > *dev_name, char *options) > memcpy(clnt->name, client_id, strlen(client_id) + 1); > > spin_lock_init(&clnt->lock); > - INIT_LIST_HEAD(&clnt->fidlist); > + idr_init(&clnt->fids); > > err = p9_tag_init(clnt); > if (err < 0) > @@ -1048,18 +1046,12 @@ struct p9_client *p9_client_create(const char > *dev_name, char *options) > goto destroy_tagpool; > } > > - clnt->fidpool = p9_idpool_create(); > - if (IS_ERR(clnt->fidpool)) { > - err = PTR_ERR(clnt->fidpool); > - goto put_trans; > - } > - > p9_debug(P9_DEBUG_MUX, "clnt %p trans %p msize %d protocol %d\n", >clnt, clnt->trans_mod, clnt->msize, clnt->proto_version); > > err = clnt->trans_mod->create(clnt, dev_name, options); > if (err) > -
Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire
On 7/12/2018 11:10 AM, Linus Torvalds wrote: > On Thu, Jul 12, 2018 at 11:05 AM Peter Zijlstra wrote: >> >> The locking pattern is fairly simple and shows where RCpc comes apart >> from expectation real nice. > > So who does RCpc right now for the unlock-lock sequence? Somebody > mentioned powerpc. Anybody else? > > How nasty would be be to make powerpc conform? I will always advocate > tighter locking and ordering rules over looser ones.. > > Linus RISC-V probably would have been RCpc if we weren't having this discussion. Depending on how we map atomics/acquire/release/unlock/lock, we can end up producing RCpc, "RCtso" (feel free to find a better name here...), or RCsc behaviors, and we're trying to figure out which we actually need. I think the debate is this: Obviously programmers would prefer just to have RCsc and not have to figure out all the complexity of the other options. On x86 or architectures with native RCsc operations (like ARMv8), that's generally easy enough to get. For weakly-ordered architectures that use fences for ordering (including PowerPC and sometimes RISC-V, see below), though, it takes extra fences to go from RCpc to either "RCtso" or RCsc. People using these architectures are concerned about whether there's a negative performance impact from those extra fences. However, some scheduler code, some RCU code, and probably some other examples already implicitly or explicitly assume unlock()/lock() provides stronger ordering than RCpc. So, we have to decide whether to: 1) define unlock()/lock() to enforce "RCtso" or RCsc, insert more fences on PowerPC and RISC-V accordingly, and probably negatively regress PowerPC 2) leave unlock()/lock() as enforcing only RCpc, fix any code that currently assumes something stronger than RCpc is being provided, and hope people don't get it wrong in the future 3) some mixture like having unlock()/lock() be "RCtso" but smp_store_release()/ smp_cond_load_acquire() be only RCpc Also, FWIW, if other weakly-ordered architectures come along in the future and also use any kind of lightweight fence rather than native RCsc operations, they'll likely be in the same boat as RISC-V and Power here, in the sense of not providing RCsc by default either. Is that a fair assessment everyone? I can also not-so-briefly summarize RISC-V's status here, since I think there's been a bunch of confusion about where we're coming from: First of all, I promise we're not trying to start a fight about all this :) We're trying to understand the LKMM requirements so we know what instructions to use. With that, the easy case: RISC-V is RCsc if we use AMOs or load-reserved/ store-conditional, all of which have RCsc .aq and .rl bits: (a) ... amoswap.w.rl x0, x0, [lock] // unlock() ... loop: amoswap.w.aq a0, t1, [lock] // lock() bnez a0, loop// lock() (b) ... (a) is ordered before (b) here, regardless of (a) and (b). Likewise for our load-reserved/store-conditional instructions, which also have .aq and rl. That's similiar to how ARM behaves, and is no problem. We're happy with that too. Unfortunately, we don't (currently?) have plain load-acquire or store-release opcodes in the ISA. (That's a different discussion...) For those, we need fences instead. And that's where it gets messier. RISC-V *would* end up providing only RCpc if we use what I'd argue is the most "natural" fence-based mapping for store-release operations, and then pair that with LR/SC: (a) ... fence rw,w // unlock() sw x0, [lock] // unlock() ... loop: lr.w.aq a0, [lock] // lock() sc.w t1, [lock] // lock() bnez loop // lock() (b) ... However, if (a) and (b) are loads to different addresses, then (a) is not ordered before (b) here. One unpaired RCsc operation is not a full fence. Clearly "fence rw,w" is not sufficient if the scheduler, RCU, and elsewhere depend on "RCtso" or RCsc. RISC-V can get back to "RCtso", matching PowerPC, by using a stronger fence: (a) ... fence.tso // unlock(), fence.tso == fence rw,w + fence r,r sw x0, [lock] // unlock() ... loop: lr.w.aq a0, [lock] // lock() sc.w t1, [lock] // lock() bnez loop // lock() (b) ... (a) is ordered before (b), unless (a) is a store and (b) is a load to a different address. (Modeling note: this example is why I asked for Alan's v3 patch over the v2 patch, which I believe would only have worked if the fence.tso were at the end) To get full RCsc here, we'd need a fence rw,rw in between the unlock store and the lock load, much like PowerPC would I believe need a heavyweight sync: (a) ... fence rw,w // unlock() sw x0, [lock] // unlock() ... fence rw,rw// can attach either to lock() or to unlock() ... loop: lr.w.aq a0, [lock] // lock() sc.w t1, [lock] // lock() bnez loop // lock() (b) ... In general, RISC-V's fence.tso will suffice wherever PowerPC's lwsync does, and RISC-V's fence r
Re: [PATCH v13 06/18] x86/xen/time: initialize pv xen time in init_hypervisor_platform
> -void __ref xen_init_time_ops(void) > +void __init xen_init_time_ops(void) > { > pv_time_ops = xen_time_ops; > > @@ -542,17 +542,11 @@ void __init xen_hvm_init_time_ops(void) > return; > > if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { > - printk(KERN_INFO "Xen doesn't support pvclock on HVM," > - "disable pv timer\n"); > + pr_info("Xen doesn't support pvclock on HVM, disable pv > timer"); > return; > } > - > - pv_time_ops = xen_time_ops; > + xen_init_time_ops(); > x86_init.timers.setup_percpu_clockev = xen_time_init; > x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents; Boris reported a bug on HVM, which causes a panic in x86_late_time_init(). It is introduced here: xen_init_time_ops() sets: x86_init.timers.timer_init = xen_time_init; which was hpet_time_init() in HVM. However, we might not even need hpet here. Thus, adding x86_init.timers.timer_init = x86_init_noop; to the end of xen_hvm_init_time_ops() should be sufficient. Thank you, Pavel
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Re: [PATCH v3 1/2] leds: core: Introduce generic pattern interface
Hi Jacek, On 13 July 2018 at 05:41, Jacek Anaszewski wrote: > Hi Baolin, > > > On 07/12/2018 02:24 PM, Baolin Wang wrote: >> >> Hi Jacek, >> >> On 12 July 2018 at 05:10, Jacek Anaszewski >> wrote: >>> >>> Hi Baolin. >>> >>> >>> On 07/11/2018 01:02 PM, Baolin Wang wrote: Hi Jacek and Pavel, On 29 June 2018 at 13:03, Baolin Wang wrote: > > > From: Bjorn Andersson > > Some LED controllers have support for autonomously controlling > brightness over time, according to some preprogrammed pattern or > function. > > This adds a new optional operator that LED class drivers can implement > if they support such functionality as well as a new device attribute to > configure the pattern for a given LED. > > [Baolin Wang did some minor improvements.] > > Signed-off-by: Bjorn Andersson > Signed-off-by: Baolin Wang > --- > Changes from v2: >- Change kernel version to 4.19. >- Force user to return error pointer if failed to issue > pattern_get(). >- Use strstrip() to trim trailing newline. >- Other optimization. > > Changes from v1: >- Add some comments suggested by Pavel. >- Change 'delta_t' can be 0. > > Note: I removed the pattern repeat check and will get the repeat number > by adding > one extra file named 'pattern_repeat' according to previous discussion. > --- Do you have any comments for this version patch set? Thanks. >>> >>> >>> >>> I tried modifying uleds.c driver to support pattern ops, but >>> I'm getting segfault when doing "cat pattern". I didn't give >>> it serious testing and analysis - will do it at weekend. >>> >>> It also drew my attention to the issue of desired pattern sysfs >>> interface semantics on uninitialized pattern. In your implementation >>> user seems to be unable to determine if the pattern is activated >>> or not. We should define the semantics for this use case and >>> describe it in the documentation. Possibly pattern could >>> return alone new line character then. >> >> >> I am not sure I get your points correctly. If user writes values to >> pattern interface which means we activated the pattern. >> If I am wrong, could you elaborate on the issue you concerned? Thanks. > > > Now I see, that writing empty string disables the pattern, right? > It should be explicitly stated in the pattern file documentation. Yes, you are right. OK, I will add some documentation for this. Thanks. >>> This is the code snippet I've used for testing pattern interface: >>> >>> static struct led_pattern ptrn[10]; >>> static int ptrn_len; >>> >>> static int uled_pattern_clear(struct led_classdev *ldev) >>> { >>> return 0; >>> } >>> >>> static int uled_pattern_set(struct led_classdev *ldev, >>>struct led_pattern *pattern, >>>int len) >>> { >>> int i; >>> >>> for (i = 0; i < len; i++) { >>> ptrn[i].brightness = pattern[i].brightness; >>> ptrn[i].delta_t = pattern[i].delta_t; >>> } >>> >>> ptrn_len = len; >>> >>> return 0; >>> } >>> >>> static struct led_pattern *uled_pattern_get(struct led_classdev *ldev, >>>int *len) >>> { >>> int i; >>> >>> for (i = 0; i < ptrn_len; i++) { >>> ptrn[i].brightness = 3; >>> ptrn[i].delta_t = 5; >>> } >>> >>> *len = ptrn_len; >>> >>> return ptrn; >>> >>> } >> >> >> The reason you met segfault when doing "cat pattern" is you should not >> return one static pattern array, since in pattern_show() it will help >> to free the pattern memory, could you change to return one pattern >> pointer with dynamic allocation like my patch 2? > > > Thanks for pointing this out. > > >Documentation/ABI/testing/sysfs-class-led | 17 + >drivers/leds/led-class.c | 119 > + >include/linux/leds.h | 19 + >3 files changed, 155 insertions(+) > > diff --git a/Documentation/ABI/testing/sysfs-class-led > b/Documentation/ABI/testing/sysfs-class-led > index 5f67f7a..e01ac55 100644 > --- a/Documentation/ABI/testing/sysfs-class-led > +++ b/Documentation/ABI/testing/sysfs-class-led > @@ -61,3 +61,20 @@ Description: > gpio and backlight triggers. In case of the backlight > trigger, > it is useful when driving a LED which is intended to > indicate > a device in a standby like state. > + > +What: /sys/class/leds//pattern > +Date: June 2018 > +KernelVersion: 4.19 > +Description: > + Specify a pattern for the LED, for LED hardware that support > + altering the brightness as a functio
Re: [PATCH v2 1/3] clk: meson: add DT documentation for emmc clock controller
Hi Rob, Jerome, Kevin see my comments On 07/13/18 08:15, Rob Herring wrote: > On Thu, Jul 12, 2018 at 5:29 PM Yixun Lan wrote: >> >> HI Rob >> >> see my comments >> >> On 07/12/2018 10:17 PM, Rob Herring wrote: >>> On Wed, Jul 11, 2018 at 8:47 PM Yixun Lan wrote: Hi Rob see my comments On 07/12/18 03:43, Rob Herring wrote: > On Tue, Jul 10, 2018 at 04:36:56PM +, Yixun Lan wrote: >> Document the MMC sub clock controller driver, the potential consumer >> of this driver is MMC or NAND. > > So you all have decided to properly model this now? > Yes, ;-) >> >> Signed-off-by: Yixun Lan >> --- >> .../bindings/clock/amlogic,mmc-clkc.txt | 31 +++ >> 1 file changed, 31 insertions(+) >> create mode 100644 >> Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt >> >> diff --git >> a/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt >> b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt >> new file mode 100644 >> index ..ff6b4bf3ecf9 >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt >> @@ -0,0 +1,31 @@ >> +* Amlogic MMC Sub Clock Controller Driver >> + >> +The Amlogic MMC clock controller generates and supplies clock to support >> +MMC and NAND controller >> + >> +Required Properties: >> + >> +- compatible: should be: >> +"amlogic,meson-gx-mmc-clkc" >> +"amlogic,meson-axg-mmc-clkc" >> + >> +- #clock-cells: should be 1. >> +- clocks: phandles to clocks corresponding to the clock-names property >> +- clock-names: list of parent clock names >> +- "clkin0", "clkin1" >> + >> +Parent node should have the following properties : >> +- compatible: "syscon", "simple-mfd, and "amlogic,meson-axg-mmc-clkc" > > You don't need "simple-mfd" and probably not syscon either. The order is > wrong too. Most specific first. > Ok, I will drop "simple-mfd".. but the syscon is a must, since this mmc clock model access registers via the regmap interface >>> >>> A syscon compatible should not be the only way to get a regmap. >> do you have any suggestion about other function that I can use? is >> devm_regmap_init_mmio() feasible >> >>> Removing lines 56/57 of drivers/mfd/syscon.c should be sufficient. >>> >> I'm not sure what's the valid point of removing compatible 'syscon' in >> driver/mfd/syscon.c, sounds this will break a lot DT/or need to fix? >> will you propose a patch for this? then I can certainly adjust here > > Removing the 2 lines will simply allow any node to be a syscon. If > there's a specific driver for a node, then that makes sense to allow > that. > >> >>> Why do you need a regmap in the first place? What else needs to access >>> this register directly? >> Yes, the SD_EMMC_CLOCK register contain several bits which not fit well >> into common clock model, and they need to be access in the NAND or eMMC >> driver itself, Martin had explained this in early thread[1] >> >> In this register >> Bit[31] select NAND or eMMC function >> Bit[25] enable SDIO IRQ >> Bit[24] Clock always on >> Bit[15:14] SRAM Power down >> >> [1] >> https://lkml.kernel.org/r/CAFBinCBeyXf6LNaZzAw6WnsxzDAv8E=yp2eem0xcpwmeui6...@mail.gmail.com >> >>> Don't you need a patch removing the clock code >>> from within the emmc driver? It's not even using regmap, so using >>> regmap here doesn't help. >>> >> No, and current eMMC driver still use iomap to access the register > > Which means a read-modify-write can corrupt the register value if both > users don't access thru regmap. Changes are probably infrequent enough > that you get lucky... > What's you says here is true. and we try to guarantee that only one of NAND or eMMC is enabled, so no race condition, as a example of the use cases: 1) for enabling NAND driver, we do a) enable both mmc-clkc, and NAND driver in DT, they can access register by using regmap interface b) disable eMMC DT node 2) for enabling eMMC driver, we do a) enable eMMC node, access register by using iomap (for now) b) disable both mmc-clkc and NAND in DT >> I think we probably would like to take two steps approach. >> first, from the hardware perspective, the NAND and eMMC(port C) driver >> can't exist at same time, since they share the pins, clock, internal >> ram, So we have to only enable one of NAND or eMMC in DT, not enable >> both of them. > > Yes, of course. > >> Second, we might like to convert eMMC driver to also use mmc-clkc model. > > IMO, this should be done as part of merging this series. Otherwise, we > have duplicated code for the same thing. IMO, I'd leave this out of this series, since this patch series is quite complete as itself. Although, the downside is code duplication. Still, I need to hear Jerome, or Kevin's option, to see
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Re: Bug report about KASLR and ZONE_MOVABLE
On Fri, Jul 13, 2018 at 07:52:40AM +0800, Baoquan He wrote: >Hi Michal, > >On 07/12/18 at 02:32pm, Michal Hocko wrote: >> On Thu 12-07-18 14:01:15, Chao Fan wrote: >> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote: >> > >Hi Baoquan, >> > > >> > >At 07/11/2018 08:40 PM, Baoquan He wrote: >> > >> Please try this v3 patch: >> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 >> > >> From: Baoquan He >> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800 >> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text >> > >> >> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting >> > >> PFN movable zone begins in each node, kernel text position is not >> > >> considered. KASLR may put kernel after which movable zone begins. >> > >> >> > >> Fix it by finding movable zone after kernel text on that node. >> > >> >> > >> Signed-off-by: Baoquan He >> > > >> > > >> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or >> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the >> > >last node, or more. >> > >> > I think it may not fail. >> > There is a 'restart' to do another pass. >> > >> > > >> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying >> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix >> > >this in the KASLR side. >> > > >> > >> > I think it's better to fix here, but not KASLR side. >> > Cause much more code will be change if doing it in KASLR side. >> > Since we didn't parse 'kernelcore' in compressed code, and you can see >> > the distribution of ZONE_MOVABLE need so much code, so we do not need >> > to do so much job in KASLR side. But here, several lines will be OK. >> >> I am not able to find the beginning of the email thread right now. Could >> you summarize what is the actual problem please? > >The bug is found on x86 now. > >When added "kernelcore=" or "movablecore=" into kernel command line, >kernel memory is spread evenly among nodes. However, this is right when >KASLR is not enabled, then kernel will be at 16M of place in x86 arch. >If KASLR enabled, it could be put any place from 16M to 64T randomly. > >Consider a scenario, we have 10 nodes, and each node has 20G memory, and >we specify "kernelcore=50%", means each node will take 10G for >kernelcore, 10G for movable area. But this doesn't take kernel position >into consideration. E.g if kernel is put at 15G of 2nd node, namely >node1. Then we think on node1 there's 10G for kernelcore, 10G for >movable, in fact there's only 5G available for movable, just after >kernel. > >I made a v4 patch which possibly can fix it. > > >From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001 >From: Baoquan He >Date: Fri, 13 Jul 2018 07:49:29 +0800 >Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text > >In find_zone_movable_pfns_for_nodes(), when try to find the starting >PFN movable zone begins at in each node, kernel text position is not >considered. KASLR may put kernel after which movable zone begins. > >Fix it by finding movable zone after kernel text on that node. > >Signed-off-by: Baoquan He You can post it as alone PATCH, then I will test it next week. Thanks, Chao Fan >--- > mm/page_alloc.c | 15 +-- > 1 file changed, 13 insertions(+), 2 deletions(-) > >diff --git a/mm/page_alloc.c b/mm/page_alloc.c >index 1521100f1e63..5bc1a47dafda 100644 >--- a/mm/page_alloc.c >+++ b/mm/page_alloc.c >@@ -6547,7 +6547,7 @@ static unsigned long __init >early_calculate_totalpages(void) > static void __init find_zone_movable_pfns_for_nodes(void) > { > int i, nid; >- unsigned long usable_startpfn; >+ unsigned long usable_startpfn, kernel_endpfn, arch_startpfn; > unsigned long kernelcore_node, kernelcore_remaining; > /* save the state before borrow the nodemask */ > nodemask_t saved_node_state = node_states[N_MEMORY]; >@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void) > if (!required_kernelcore || required_kernelcore >= totalpages) > goto out; > >+ kernel_endpfn = PFN_UP(__pa_symbol(_end)); > /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ >- usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; >+ arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; > > restart: > /* Spread kernelcore memory as evenly as possible throughout nodes */ >@@ -6659,6 +6660,16 @@ static void __init >find_zone_movable_pfns_for_nodes(void) > unsigned long start_pfn, end_pfn; > > /* >+ * KASLR may put kernel near tail of node memory, >+ * start after kernel on that node to find PFN >+ * at which zone begins. >+ */ >+ if (pfn_to_nid(kernel_endpfn) == nid) >+ usable_startpfn = max(arch_startpfn, kernel_endpfn); >
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
REGRESSION: [PATCH] mmc: tegra: Use sdhci_pltfm_clk_get_max_clock
On Mon, 2018-07-02 at 15:16 +0200, Ulf Hansson wrote: > On 4 June 2018 at 17:35, Aapo Vienamo wrote: > > The sdhci get_max_clock callback is set to > > sdhci_pltfm_clk_get_max_clock > > and tegra_sdhci_get_max_clock is removed. It appears that the > > shdci-tegra specific callback was originally introduced due to the > > requirement that the host clock has to be twice the bus clock on > > DDR50 > > mode. As far as I can tell the only effect the removal has on DDR50 > > mode > > is in cases where the parent clock is unable to supply the > > requested > > clock rate, causing the DDR50 mode to run at a lower frequency. > > Currently the DDR50 mode isn't enabled on any of the SoCs and would > > also > > require configuring the SDHCI clock divider register to function > > properly. > > > > The problem with tegra_sdhci_get_max_clock is that it divides the > > clock > > rate by two and thus artificially limits the maximum frequency of > > faster > > signaling modes which don't have the host-bus frequency ratio > > requirement > > of DDR50 such as SDR104 and HS200. Furthermore, the call to > > clk_round_rate() may return an error which isn't handled by > > tegra_sdhci_get_max_clock. > > > > Signed-off-by: Aapo Vienamo > > Thanks, applied for next! > > Kind regards > Uffe > > > --- > > drivers/mmc/host/sdhci-tegra.c | 15 ++- > > 1 file changed, 2 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/mmc/host/sdhci-tegra.c > > b/drivers/mmc/host/sdhci-tegra.c > > index 970d38f6..c8745b5 100644 > > --- a/drivers/mmc/host/sdhci-tegra.c > > +++ b/drivers/mmc/host/sdhci-tegra.c > > @@ -234,17 +234,6 @@ static void > > tegra_sdhci_set_uhs_signaling(struct sdhci_host *host, > > sdhci_set_uhs_signaling(host, timing); > > } > > > > -static unsigned int tegra_sdhci_get_max_clock(struct sdhci_host > > *host) > > -{ > > - struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); > > - > > - /* > > -* DDR modes require the host to run at double the card > > frequency, so > > -* the maximum rate we can support is half of the module > > input clock. > > -*/ > > - return clk_round_rate(pltfm_host->clk, UINT_MAX) / 2; > > -} > > - > > static void tegra_sdhci_set_tap(struct sdhci_host *host, unsigned > > int tap) > > { > > u32 reg; > > @@ -309,7 +298,7 @@ static const struct sdhci_ops tegra_sdhci_ops = > > { > > .platform_execute_tuning = tegra_sdhci_execute_tuning, > > .set_uhs_signaling = tegra_sdhci_set_uhs_signaling, > > .voltage_switch = tegra_sdhci_voltage_switch, > > - .get_max_clock = tegra_sdhci_get_max_clock, > > + .get_max_clock = sdhci_pltfm_clk_get_max_clock, > > }; > > > > static const struct sdhci_pltfm_data sdhci_tegra20_pdata = { > > @@ -357,7 +346,7 @@ static const struct sdhci_ops > > tegra114_sdhci_ops = { > > .platform_execute_tuning = tegra_sdhci_execute_tuning, > > .set_uhs_signaling = tegra_sdhci_set_uhs_signaling, > > .voltage_switch = tegra_sdhci_voltage_switch, > > - .get_max_clock = tegra_sdhci_get_max_clock, > > + .get_max_clock = sdhci_pltfm_clk_get_max_clock, > > }; > > > > static const struct sdhci_pltfm_data sdhci_tegra114_pdata = { > > -- > > 2.7.4 Hm, for us this definitely breaks stuff. While using Stefan's patch set [1] we may not only run eMMC at DDR52 even SD cards run stable at SDR104. With this patch however the clock gets crippled to 45.33 resp. 48 MHz always. This is observed both on Apalis/Colibri T30 as well as Apalis TK1. Current next-20180712 just with Stefan's 3 patches: root@apalis-t30:~# cat /sys/kernel/debug/mmc1/ios clock: 4800 Hz actual clock: 4533 Hz vdd:21 (3.3 ~ 3.4 V) bus mode: 2 (push-pull) chip select:0 (don't care) power mode: 2 (on) bus width: 3 (8 bits) timing spec:8 (mmc DDR52) signal voltage: 1 (1.80 V) driver type:0 (driver type B) root@apalis-t30:~# hdparm -t /dev/mmcblk1 /dev/mmcblk1: Timing buffered disk reads: 218 MB in 3.03 seconds = 71.95 MB/sec root@apalis-t30:~# cat /sys/kernel/debug/mmc2/ios clock: 4800 Hz actual clock: 4800 Hz vdd:21 (3.3 ~ 3.4 V) bus mode: 2 (push-pull) chip select:0 (don't care) power mode: 2 (on) bus width: 2 (4 bits) timing spec:6 (sd uhs SDR104) signal voltage: 1 (1.80 V) driver type:0 (driver type B) root@apali
Re: [PATCH 22/32] vfs: Provide documentation for new mount API [ver #9]
On 07/10/2018 03:43 PM, David Howells wrote: > Provide documentation for the new mount API. > > Signed-off-by: David Howells > --- > > Documentation/filesystems/mount_api.txt | 439 > +++ > 1 file changed, 439 insertions(+) > create mode 100644 Documentation/filesystems/mount_api.txt Hi, I would review this but it sounds like I should just wait for the next version. -- ~Randy
Re: [PATCH v8 2/2] regulator: add QCOM RPMh regulator driver
On 07/12/2018 09:54 AM, Mark Brown wrote: > On Mon, Jul 09, 2018 at 04:44:14PM -0700, David Collins wrote: >> On 07/02/2018 03:28 AM, Mark Brown wrote: >>> On Fri, Jun 22, 2018 at 05:46:14PM -0700, David Collins wrote: +static unsigned int rpmh_regulator_pmic4_ldo_of_map_mode(unsigned int mode) +{ + static const unsigned int of_mode_map[RPMH_REGULATOR_MODE_COUNT] = { + [RPMH_REGULATOR_MODE_RET] = REGULATOR_MODE_STANDBY, + [RPMH_REGULATOR_MODE_LPM] = REGULATOR_MODE_IDLE, + [RPMH_REGULATOR_MODE_AUTO] = REGULATOR_MODE_INVALID, + [RPMH_REGULATOR_MODE_HPM] = REGULATOR_MODE_FAST, + }; > >>> Same here, based on that it looks like auto mode is a good map for >>> normal. > >> LDO type regulators physically do not support AUTO mode. That is why I >> specified REGULATOR_MODE_INVALID in the mapping. > > The other question here is why this is even in the table if it's not > valid (I'm not seeing a need for the MODE_COUNT define)? I thought that having a table would be more concise and easier to follow. I can change this to a switch case statement. Take care, David -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2 00/19] C-SKY(csky) Linux Kernel Port
On Thu, Jul 12, 2018 at 10:04:10AM -0600, Sandra Loosemore wrote: > On 07/12/2018 06:51 AM, Guo Ren wrote: > >On Wed, Jul 11, 2018 at 10:51:33AM +0100, David Howells wrote: > >>Can you say what the --target tuple should be so that I can add the arch to > >>my > >>collection of Fedora cross-binutils and cross-gcc tools built from upstream > >>binutils and gcc sources? > >Metor Graghics are helping us upstream gcc and binutils. > > > >@Sandra, > > > >Could you help me to reply the question? > > Neither binutils nor gcc support for C-SKY are in the upstream repositories > yet. We should be resubmitting the binutils port soon (with bug fixes to > address the test failures that caused it to be rejected the last time), and > the gcc port will follow that shortly. > > The target triplets we have been testing are csky-elf, csky-linux-gnu, and > csky-linux-uclibc. Note that the gcc port will only support v2 > processors/ABI so that is the default ABI for these triplets. > > I'm not familiar with the Fedora tools, but to build a complete toolchain > you'll need library support as well and I'm not sure what the submission > status/plans for that are. E.g. Mentor did a newlib/libgloss port for local > testing of the ELF toolchain and provided it to C-SKY, but pushing that to > the upstream repository ourselves is not on our todo list. > > -Sandra Thank you, Sandra. Guo Ren
[PATCH 16/18] tools/accounting: change strncpy+truncation to strlcpy
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci Signed-off-by: Dominique Martinet --- Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the first patch of the serie) for the motivation behind this patch tools/accounting/getdelays.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/accounting/getdelays.c b/tools/accounting/getdelays.c index 9f420d98b5fb..66817a7a4fce 100644 --- a/tools/accounting/getdelays.c +++ b/tools/accounting/getdelays.c @@ -314,8 +314,7 @@ int main(int argc, char *argv[]) err(1, "Invalid rcv buf size\n"); break; case 'm': - strncpy(cpumask, optarg, sizeof(cpumask)); - cpumask[sizeof(cpumask) - 1] = '\0'; + strlcpy(cpumask, optarg, sizeof(cpumask)); maskset = 1; printf("cpumask %s maskset %d\n", cpumask, maskset); break; -- 2.17.1
[PATCH 18/18] cpupower: change strncpy+truncation to strlcpy
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci Signed-off-by: Dominique Martinet --- Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the first patch of the serie) for the motivation behind this patch tools/power/cpupower/bench/parse.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/power/cpupower/bench/parse.c b/tools/power/cpupower/bench/parse.c index 9ba8a44ad2a7..1566b89989b2 100644 --- a/tools/power/cpupower/bench/parse.c +++ b/tools/power/cpupower/bench/parse.c @@ -221,9 +221,8 @@ int prepare_config(const char *path, struct config *config) sscanf(val, "%u", &config->cpu); else if (strcmp("governor", opt) == 0) { - strncpy(config->governor, val, + strlcpy(config->governor, val, sizeof(config->governor)); - config->governor[sizeof(config->governor) - 1] = '\0'; } else if (strcmp("priority", opt) == 0) { -- 2.17.1
[PATCH 17/18] perf: change strncpy+truncation to strlcpy
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci Signed-off-by: Dominique Martinet --- Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the first patch of the serie) for the motivation behind this patch tools/perf/util/bpf-loader.h | 3 +-- tools/perf/util/util.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h index 5d3aefd6fae7..8d08a1fc97a0 100644 --- a/tools/perf/util/bpf-loader.h +++ b/tools/perf/util/bpf-loader.h @@ -143,10 +143,9 @@ __bpf_strerror(char *buf, size_t size) { if (!size) return 0; - strncpy(buf, + strlcpy(buf, "ERROR: eBPF object loading is disabled during compiling.\n", size); - buf[size - 1] = '\0'; return 0; } diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c index eac5b858a371..8b9e3aa7aad3 100644 --- a/tools/perf/util/util.c +++ b/tools/perf/util/util.c @@ -459,8 +459,7 @@ fetch_kernel_version(unsigned int *puint, char *str, return -1; if (str && str_size) { - strncpy(str, utsname.release, str_size); - str[str_size - 1] = '\0'; + strlcpy(str, utsname.release, str_size); } if (!puint || int_ver_ready) -- 2.17.1
[PATCH 13/18] ibmvscsi: change strncpy+truncation to strlcpy
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci Signed-off-by: Dominique Martinet --- Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the first patch of the serie) for the motivation behind this patch drivers/scsi/ibmvscsi/ibmvscsi.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 17df76f0be3c..79eb8af03a19 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -1274,14 +1274,12 @@ static void send_mad_capabilities(struct ibmvscsi_host_data *hostdata) if (hostdata->client_migrated) hostdata->caps.flags |= cpu_to_be32(CLIENT_MIGRATED); - strncpy(hostdata->caps.name, dev_name(&hostdata->host->shost_gendev), + strlcpy(hostdata->caps.name, dev_name(&hostdata->host->shost_gendev), sizeof(hostdata->caps.name)); - hostdata->caps.name[sizeof(hostdata->caps.name) - 1] = '\0'; location = of_get_property(of_node, "ibm,loc-code", NULL); location = location ? location : dev_name(hostdata->dev); - strncpy(hostdata->caps.loc, location, sizeof(hostdata->caps.loc)); - hostdata->caps.loc[sizeof(hostdata->caps.loc) - 1] = '\0'; + strlcpy(hostdata->caps.loc, location, sizeof(hostdata->caps.loc)); req->common.type = cpu_to_be32(VIOSRP_CAPABILITIES_TYPE); req->buffer = cpu_to_be64(hostdata->caps_addr); -- 2.17.1
[PATCH 15/18] blktrace: change strncpy+truncation to strlcpy
Using strlcpy fixes this new gcc warning: kernel/trace/blktrace.c: In function ‘do_blk_trace_setup’: kernel/trace/blktrace.c:497:2: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation] strncpy(buts->name, name, BLKTRACE_BDEV_SIZE); ^ Generated by scripts/coccinelle/misc/strncpy_truncation.cocci Signed-off-by: Dominique Martinet --- Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the first patch of the serie) for the motivation behind this patch kernel/trace/blktrace.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 987d9a9ae283..2478d9838eab 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -494,8 +494,7 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, if (!buts->buf_size || !buts->buf_nr) return -EINVAL; - strncpy(buts->name, name, BLKTRACE_BDEV_SIZE); - buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0'; + strlcpy(buts->name, name, BLKTRACE_BDEV_SIZE); /* * some device names have larger paths - convert the slashes -- 2.17.1
[PATCH 12/18] test_power: change strncpy+truncation to strlcpy
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci Signed-off-by: Dominique Martinet --- Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the first patch of the serie) for the motivation behind this patch drivers/power/supply/test_power.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/power/supply/test_power.c b/drivers/power/supply/test_power.c index 57246cdbd042..64adf630f64f 100644 --- a/drivers/power/supply/test_power.c +++ b/drivers/power/supply/test_power.c @@ -297,8 +297,7 @@ static int map_get_value(struct battery_property_map *map, const char *key, char buf[MAX_KEYLENGTH]; int cr; - strncpy(buf, key, MAX_KEYLENGTH); - buf[MAX_KEYLENGTH-1] = '\0'; + strlcpy(buf, key, MAX_KEYLENGTH); cr = strnlen(buf, MAX_KEYLENGTH) - 1; if (cr < 0) -- 2.17.1
[PATCH 14/18] kdb_support: change strncpy+truncation to strlcpy
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci Signed-off-by: Dominique Martinet --- Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the first patch of the serie) for the motivation behind this patch kernel/debug/kdb/kdb_support.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/debug/kdb/kdb_support.c b/kernel/debug/kdb/kdb_support.c index 990b3cc526c8..1f6a4b6bde0b 100644 --- a/kernel/debug/kdb/kdb_support.c +++ b/kernel/debug/kdb/kdb_support.c @@ -119,8 +119,7 @@ int kdbnearsym(unsigned long addr, kdb_symtab_t *symtab) * What was Rusty smoking when he wrote that code? */ if (symtab->sym_name != knt1) { - strncpy(knt1, symtab->sym_name, knt1_size); - knt1[knt1_size-1] = '\0'; + strlcpy(knt1, symtab->sym_name, knt1_size); } for (i = 0; i < ARRAY_SIZE(kdb_name_table); ++i) { if (kdb_name_table[i] && -- 2.17.1
[PATCH 05/18] iio: change strncpy+truncation to strlcpy
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci Signed-off-by: Dominique Martinet --- Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the first patch of the serie) for the motivation behind this patch drivers/iio/common/st_sensors/st_sensors_core.c | 3 +-- drivers/iio/pressure/st_pressure_i2c.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/iio/common/st_sensors/st_sensors_core.c b/drivers/iio/common/st_sensors/st_sensors_core.c index 57db19182e95..26fbd1bd9413 100644 --- a/drivers/iio/common/st_sensors/st_sensors_core.c +++ b/drivers/iio/common/st_sensors/st_sensors_core.c @@ -380,8 +380,7 @@ void st_sensors_of_name_probe(struct device *dev, return; /* The name from the OF match takes precedence if present */ - strncpy(name, of_id->data, len); - name[len - 1] = '\0'; + strlcpy(name, of_id->data, len); } EXPORT_SYMBOL(st_sensors_of_name_probe); #else diff --git a/drivers/iio/pressure/st_pressure_i2c.c b/drivers/iio/pressure/st_pressure_i2c.c index fbb59059e942..2026a1012012 100644 --- a/drivers/iio/pressure/st_pressure_i2c.c +++ b/drivers/iio/pressure/st_pressure_i2c.c @@ -94,9 +94,8 @@ static int st_press_i2c_probe(struct i2c_client *client, if ((ret < 0) || (ret >= ST_PRESS_MAX)) return -ENODEV; - strncpy(client->name, st_press_id_table[ret].name, + strlcpy(client->name, st_press_id_table[ret].name, sizeof(client->name)); - client->name[sizeof(client->name) - 1] = '\0'; } else if (!id) return -ENODEV; -- 2.17.1
[PATCH 08/18] myricom: change strncpy+truncation to strlcpy
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci Signed-off-by: Dominique Martinet --- Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the first patch of the serie) for the motivation behind this patch drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c index b2d2ec8c11e2..f7178cdb6bd8 100644 --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c @@ -553,8 +553,7 @@ myri10ge_validate_firmware(struct myri10ge_priv *mgp, } /* save firmware version for ethtool */ - strncpy(mgp->fw_version, hdr->version, sizeof(mgp->fw_version)); - mgp->fw_version[sizeof(mgp->fw_version) - 1] = '\0'; + strlcpy(mgp->fw_version, hdr->version, sizeof(mgp->fw_version)); sscanf(mgp->fw_version, "%d.%d.%d", &mgp->fw_ver_major, &mgp->fw_ver_minor, &mgp->fw_ver_tiny); -- 2.17.1
Re: [bug] kpti, perf_event, bts: sporadic truncated trace
On Thu, 12 Jul 2018, Metzger, Markus T wrote: > Hello, > > Starting with 4.15 I noticed that BTS is sporadically missing the tail > of the trace in the perf_event data buffer. It shows as > > [decode error (1): instruction overflow] > > in GDB. Chances to see this are higher the longer the debuggee is > running. With this [1] tiny patch to one of GDB's tests, I am able to > reproduce it reliably on my box. To run the test, use: > > $ make -s check RUNTESTFLAGS="gdb.btrace/exception.exp" > > from the gdb/ sub-directory in the GDB build directory. > > The issue remains when I use 'nopti' on the kernel command-line. > > > Bisecting yielded commit > > c1961a4 x86/events/intel/ds: Map debug buffers in cpu_entry_area > > I reverted the commit on top of v4.17 [2] and the issue disappears > when I use 'nopti' on the kernel command-line. > > regards, > markus. > > > [1] > diff --git a/gdb/testsuite/gdb.btrace/exception.exp > b/gdb/testsuite/gdb.btrace/exception.exp > index 9408d61..a24ddd3 100755 > --- a/gdb/testsuite/gdb.btrace/exception.exp > +++ b/gdb/testsuite/gdb.btrace/exception.exp > @@ -36,16 +36,12 @@ if ![runto_main] { > gdb_test_no_output "set record function-call-history-size 0" > > # set bp > -set bp_1 [gdb_get_line_number "bp.1" $srcfile] > set bp_2 [gdb_get_line_number "bp.2" $srcfile] > -gdb_breakpoint $bp_1 > gdb_breakpoint $bp_2 > > -# trace the code between the two breakpoints > -gdb_continue_to_breakpoint "cont to bp.1" ".*$srcfile:$bp_1\r\n.*" > # increase the BTS buffer size - the trace can be quite big > -gdb_test_no_output "set record btrace bts buffer-size 128000" > -gdb_test_no_output "record btrace" > +gdb_test_no_output "set record btrace bts buffer-size 1024000" > +gdb_test_no_output "record btrace bts" > gdb_continue_to_breakpoint "cont to bp.2" ".*$srcfile:$bp_2\r\n.*" > > # show the flat branch trace > > > [2] > diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c [ snipped the revert ] Although my name was kept on that commit as a generous courtesy, it did change a lot after leaving my fingers - and I was never the best person to be making perf changes in the first place! I'm sorry to hear that it's breaking you, I've spent a little while looking through its final state, most of it looks fine to me, but I notice one discrepancy: whose effect I cannot predict at all, but there's a chance that it has something to do with what you're seeing. A little "optimization" crept into alloc_bts_buffer() along the way, which now places bts_interrupt_threshold not on a record boundary. And Stephane has shown me the sentence in Vol 3B, 17.4.9, which says "This address must point to an offset from the BTS buffer base that is a multiple of the BTS record size." Please give the patch below a try, and let us know if it helps (if it does not, then I think we'll need perfier expertise than I can give). Hugh --- 4.18-rc4/arch/x86/events/intel/ds.c 2018-06-03 14:15:21.0 -0700 +++ linux/arch/x86/events/intel/ds.c2018-07-12 17:38:28.471378616 -0700 @@ -408,9 +408,11 @@ static int alloc_bts_buffer(int cpu) ds->bts_buffer_base = (unsigned long) cea; ds_update_cea(cea, buffer, BTS_BUFFER_SIZE, PAGE_KERNEL); ds->bts_index = ds->bts_buffer_base; - max = BTS_RECORD_SIZE * (BTS_BUFFER_SIZE / BTS_RECORD_SIZE); - ds->bts_absolute_maximum = ds->bts_buffer_base + max; - ds->bts_interrupt_threshold = ds->bts_absolute_maximum - (max / 16); + max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE; + ds->bts_absolute_maximum = ds->bts_buffer_base + + max * BTS_RECORD_SIZE; + ds->bts_interrupt_threshold = ds->bts_absolute_maximum - + (max / 16) * BTS_RECORD_SIZE; return 0; }
Re: [V9fs-developer] [PATCH v2 2/6] 9p: Change p9_fid_create calling convention
On 2018/7/12 5:02, Matthew Wilcox wrote: > Return NULL instead of ERR_PTR when we can't allocate a FID. The ENOSPC > return value was getting all the way back to userspace, and that's > confusing for a userspace program which isn't expecting read() to tell it > there's no space left on the filesystem. The best error we can return to > indicate a temporary failure caused by lack of client resources is ENOMEM. > > Maybe it would be better to sleep until a FID is available, but that's > not a change I'm comfortable making. > > Signed-off-by: Matthew Wilcox Reviewed-by: Yiwen Jiang > --- > net/9p/client.c | 23 +-- > 1 file changed, 9 insertions(+), 14 deletions(-) > > diff --git a/net/9p/client.c b/net/9p/client.c > index 999eceb8af98..389a2904b7b3 100644 > --- a/net/9p/client.c > +++ b/net/9p/client.c > @@ -913,13 +913,11 @@ static struct p9_fid *p9_fid_create(struct p9_client > *clnt) > p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt); > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL); > if (!fid) > - return ERR_PTR(-ENOMEM); > + return NULL; > > ret = p9_idpool_get(clnt->fidpool); > - if (ret < 0) { > - ret = -ENOSPC; > + if (ret < 0) > goto error; > - } > fid->fid = ret; > > memset(&fid->qid, 0, sizeof(struct p9_qid)); > @@ -935,7 +933,7 @@ static struct p9_fid *p9_fid_create(struct p9_client > *clnt) > > error: > kfree(fid); > - return ERR_PTR(ret); > + return NULL; > } > > static void p9_fid_destroy(struct p9_fid *fid) > @@ -1137,9 +1135,8 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, > struct p9_fid *afid, > p9_debug(P9_DEBUG_9P, ">>> TATTACH afid %d uname %s aname %s\n", >afid ? afid->fid : -1, uname, aname); > fid = p9_fid_create(clnt); > - if (IS_ERR(fid)) { > - err = PTR_ERR(fid); > - fid = NULL; > + if (!fid) { > + err = -ENOMEM; > goto error; > } > fid->uid = n_uname; > @@ -1188,9 +1185,8 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, > uint16_t nwname, > clnt = oldfid->clnt; > if (clone) { > fid = p9_fid_create(clnt); > - if (IS_ERR(fid)) { > - err = PTR_ERR(fid); > - fid = NULL; > + if (!fid) { > + err = -ENOMEM; > goto error; > } > > @@ -2018,9 +2014,8 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid > *file_fid, > err = 0; > clnt = file_fid->clnt; > attr_fid = p9_fid_create(clnt); > - if (IS_ERR(attr_fid)) { > - err = PTR_ERR(attr_fid); > - attr_fid = NULL; > + if (!attr_fid) { > + err = -ENOMEM; > goto error; > } > p9_debug(P9_DEBUG_9P, >
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Re: [RFC PATCH v2 1/4] dt-bindings: misc: Add bindings for misc. BMC control fields
On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote: > On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery wrote: > > > > Hi Rob, > > > > Thanks for the response. > > > > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote: > > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote: > > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to > > > > provide remote management of (primarily) server platforms. BMCs are > > > > often tightly coupled to the platform in terms of behaviour and provide > > > > many hardware features integral to booting and running the host system. > > > > > > > > Some of these hardware features are simple, for example scratch > > > > registers provided by the BMC that are exposed to both the host and the > > > > BMC. In other cases there's a single bit switch to enable or disable > > > > some of the provided functionality. > > > > > > > > The documentation defines bindings for fields in registers that do not > > > > integrate well into other driver models yet must be described to allow > > > > the BMC kernel to assume control of these features. > > > > > > So we'll get a new binding when that happens? That will break > > > compatibility. > > > > Can you please expand on this? I'm not following. > > If we have a subsystem in the future, then there would likely be an > associated binding which would be different. So if you update the DT, > then old kernels won't work with it. What kind of "subsystem" ? There is almost no way there could be one for that sort of BMC tunables. We've look at several BMC chips out there and requirements from several vendors, BIOS and system manufacturers and it's all over the place. > > I feel like this is an argument of tradition. Maybe people have > > been dissuaded from doing so when they don't have a reasonable use- > > case? I'm not saying that what I'm proposing is unquestionably > > reasonable, but I don't want to dismiss it out of hand. > > One of experience. The one that stands out is clock bindings. > Initially we were doing a node per clock modelling which could end up > being 100s of nodes and is difficult to get right (with DT being an > ABI). > > It comes up with system controller type blocks too that just have a > bunch of random registers. Those change in every SoC and not in any > controlled or ordered way that would make describing the individual > sub-functions in DT worthwhile. So what's the alternative ? Because without something like what we propose, what's going to happen is /dev/mem ... that's what people do today. > > > A node per register bit doesn't scale. > > > > It isn't meant to scale in terms of a single system. Using it > > extensively is very likely wrong. Separately, register-bit-led does > > pretty much the same thing. Doesn't the scale argument apply there? > > Who is to stop me from attaching an insane number of LEDs to a > > system? > > Review. > > If you look, register-bit-led is rarely used outside of some ARM, Ltd. > boards. It's simply quite rare to have MMIO register bits that have a > fixed function of LED control. Well, same here, we hope to review what goes upstream to make it reasonable. Otherwise it doens't matter. If a random vendor, let's say IBM, chose to chip a system where they put an insane amount of cruft in there, it will only affect those systems's BMC and the userspace stack on it. Thankfully that stack is OpenBMC and IBM is aiming at having their device-tree's upstream, thus reviewed, thus it won't happen. *Anything* can be abused. The point here is that we have a number, thankfully rather small, maybe a dozen or two, of tunables that are quite specific to a combination (system vendor, bmc vendor, system model) which control a few HW features that essentially do *NOT* fit in a subsystem. For everything that does, we have created proper drivers (and are doing more). > > Obviously if there are lots of systems using it sparingly and > > legitimately then maybe there's a scale issue, but isn't that just > > a reality of different hardware designs? Whoever is implementing > > support for the system is going to have to describe the hardware > > one way or another. > > > > > > > > Maybe this should be modelled using GPIO binding? There's a line there > > > too as whether the signals are "general purpose" or not. > > > > I don't think so, mainly because some of the things it is intended to be > > used for are not GPIOs. For instance, take the DAC mux I've described in > > the patch. It doesn't directly influence anything external to the SoC (i.e. > > it's certainly not a traditional GPIO in any sense). However, it does > > *indirectly* influence the SoC's behaviour by muxing the DAC internally > > between: > > > > 0. VGA device exposed on the host PCIe bus > > 1. The "Graphics CRT" controller > > 2. VGA port A > > 3. VGA port B > > And this mux control is fixed in the SoC design? This specific family of SoC (Aspeed) support those 4 configurations.
[PATCH -next] fsi: sbefifo: Fix missing unlock on error in sbefifo_dump_ffdc()
Add the missing unlock before return from function sbefifo_dump_ffdc() in the error handling case. Fixes: 9f4a8a2d7f9d ("fsi/sbefifo: Add driver for the SBE FIFO") Signed-off-by: Wei Yongjun --- drivers/fsi/fsi-sbefifo.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/fsi/fsi-sbefifo.c b/drivers/fsi/fsi-sbefifo.c index 6b31cc24..35f2749 100644 --- a/drivers/fsi/fsi-sbefifo.c +++ b/drivers/fsi/fsi-sbefifo.c @@ -150,6 +150,7 @@ static void sbefifo_dump_ffdc(struct device *dev, const __be32 *ffdc, u32 w0, w1, w2, i; if (ffdc_sz < 3) { dev_err(dev, "SBE invalid FFDC package size %zd\n", ffdc_sz); + mutex_unlock(&sbefifo_ffdc_mutex); return; } w0 = be32_to_cpu(*(ffdc++));
Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]
> On Jul 12, 2018, at 5:03 PM, David Howells wrote: > > Andy Lutomirski wrote: > I tend to think that this *should* fail using the new API. The semantics of the second mount request are bizarre at best. >>> >>> You still have to support existing behaviour lest you break userspace. >>> >> >> I assume the existing behavior is that a bind mount is created? If so, the >> new mount(8) tool could do it in user code. > > You have a race there. > > Also you can't currently directly create a bind mount from userspace as you > can only bind from another path point - which you may not be able to access > (either by permission failure or because it's not in your mount namespace). > Are you trying to preserve the magic bind semantics with the new API? If you are, I think it should be by explicit opt in only. Otherwise you risk having your shiny new way to specify fs options get ignored when the magic bind mount happens.
Re: [PATCH v5 2/2] cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver
Hi, On Thu, Jul 12, 2018 at 11:35:45PM +0530, Taniya Das wrote: > The CPUfreq HW present in some QCOM chipsets offloads the steps necessary > for changing the frequency of CPUs. The driver implements the cpufreq > driver interface for this hardware engine. > > Signed-off-by: Saravana Kannan > Signed-off-by: Taniya Das > --- > drivers/cpufreq/Kconfig.arm | 10 ++ > drivers/cpufreq/Makefile | 1 + > drivers/cpufreq/qcom-cpufreq-hw.c | 344 > ++ > 3 files changed, 355 insertions(+) > create mode 100644 drivers/cpufreq/qcom-cpufreq-hw.c > > diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm > index 52f5f1a..141ec3e 100644 > --- a/drivers/cpufreq/Kconfig.arm > +++ b/drivers/cpufreq/Kconfig.arm > @@ -312,3 +312,13 @@ config ARM_PXA2xx_CPUFREQ > This add the CPUFreq driver support for Intel PXA2xx SOCs. > > If in doubt, say N. > + > +config ARM_QCOM_CPUFREQ_HW > + bool "QCOM CPUFreq HW driver" > + help > + Support for the CPUFreq HW driver. > + Some QCOM chipsets have a HW engine to offload the steps > + necessary for changing the frequency of the CPUs. Firmware loaded > + in this engine exposes a programming interafce to the High-level OS. > + The driver implements the cpufreq driver interface for this HW engine. > + Say Y if you want to support CPUFreq HW. > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile > index fb4a2ec..1226a3e 100644 > --- a/drivers/cpufreq/Makefile > +++ b/drivers/cpufreq/Makefile > @@ -86,6 +86,7 @@ obj-$(CONFIG_ARM_TEGRA124_CPUFREQ) += tegra124-cpufreq.o > obj-$(CONFIG_ARM_TEGRA186_CPUFREQ) += tegra186-cpufreq.o > obj-$(CONFIG_ARM_TI_CPUFREQ) += ti-cpufreq.o > obj-$(CONFIG_ARM_VEXPRESS_SPC_CPUFREQ) += vexpress-spc-cpufreq.o > +obj-$(CONFIG_ARM_QCOM_CPUFREQ_HW)+= qcom-cpufreq-hw.o > > > > ## > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c > b/drivers/cpufreq/qcom-cpufreq-hw.c > new file mode 100644 > index 000..fa25a95 > --- /dev/null > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c > @@ -0,0 +1,344 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (c) 2018, The Linux Foundation. All rights reserved. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define INIT_RATE3UL > +#define XO_RATE 1920UL > +#define LUT_MAX_ENTRIES 40U > +#define CORE_COUNT_VAL(val) (((val) & (GENMASK(18, 16))) >> 16) > +#define LUT_ROW_SIZE 32 > + > +enum { > + REG_ENABLE, > + REG_LUT_TABLE, > + REG_PERF_STATE, > + > + REG_ARRAY_SIZE, > +}; > + > +struct cpufreq_qcom { > + struct cpufreq_frequency_table *table; > + struct device *dev; > + const u16 *reg_offset; > + void __iomem *base; > + cpumask_t related_cpus; > + unsigned int max_cores; Same comment as on v4: Why *max*_cores? This seems to be the number of CPUs in a cluster and qcom_read_lut() expects the core count read from the LUT to match exactly. Maybe it's the name from the datasheet? Should it still be 'num_cores' or similer? > +static struct cpufreq_qcom *qcom_freq_domain_map[NR_CPUS]; It would be an option to limit this to the number of CPU clusters and allocate it dynamically when the driver is initialized (key = first core in the cluster). Probably not worth the hassle with the limited number of cores though. > +static int qcom_read_lut(struct platform_device *pdev, > + struct cpufreq_qcom *c) > +{ > + struct device *dev = &pdev->dev; > + unsigned int offset; > + u32 data, src, lval, i, core_count, prev_cc, prev_freq, cur_freq; > + > + c->table = devm_kcalloc(dev, LUT_MAX_ENTRIES + 1, > + sizeof(*c->table), GFP_KERNEL); > + if (!c->table) > + return -ENOMEM; > + > + offset = c->reg_offset[REG_LUT_TABLE]; > + > + for (i = 0; i < LUT_MAX_ENTRIES; i++) { > + data = readl_relaxed(c->base + offset + i * LUT_ROW_SIZE); > + src = ((data & GENMASK(31, 30)) >> 30); > + lval = (data & GENMASK(7, 0)); > + core_count = CORE_COUNT_VAL(data); > + > + if (src == 0) > + c->table[i].frequency = INIT_RATE / 1000; > + else > + c->table[i].frequency = XO_RATE * lval / 1000; You changed the condition from '!src' to 'src == 0'. My suggestion on v4 was in part about a negative condition, but also about the order. If it doesn't obstruct the code otherwise I think for an if-else branch it is good practice to handle the more common case first and then the 'exception'. I would expect most entries to have an actual rate. Just a nit in any case, feel free to ignore if you prefer as is. > +static int qcom_cpu_resource
Re: [PATCH v2 1/3] clk: meson: add DT documentation for emmc clock controller
On Thu, Jul 12, 2018 at 5:29 PM Yixun Lan wrote: > > HI Rob > > see my comments > > On 07/12/2018 10:17 PM, Rob Herring wrote: > > On Wed, Jul 11, 2018 at 8:47 PM Yixun Lan wrote: > >> > >> Hi Rob > >> > >> see my comments > >> > >> On 07/12/18 03:43, Rob Herring wrote: > >>> On Tue, Jul 10, 2018 at 04:36:56PM +, Yixun Lan wrote: > Document the MMC sub clock controller driver, the potential consumer > of this driver is MMC or NAND. > >>> > >>> So you all have decided to properly model this now? > >>> > >> Yes, ;-) > >> > > Signed-off-by: Yixun Lan > --- > .../bindings/clock/amlogic,mmc-clkc.txt | 31 +++ > 1 file changed, 31 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt > > diff --git > a/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt > b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt > new file mode 100644 > index ..ff6b4bf3ecf9 > --- /dev/null > +++ b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt > @@ -0,0 +1,31 @@ > +* Amlogic MMC Sub Clock Controller Driver > + > +The Amlogic MMC clock controller generates and supplies clock to support > +MMC and NAND controller > + > +Required Properties: > + > +- compatible: should be: > +"amlogic,meson-gx-mmc-clkc" > +"amlogic,meson-axg-mmc-clkc" > + > +- #clock-cells: should be 1. > +- clocks: phandles to clocks corresponding to the clock-names property > +- clock-names: list of parent clock names > +- "clkin0", "clkin1" > + > +Parent node should have the following properties : > +- compatible: "syscon", "simple-mfd, and "amlogic,meson-axg-mmc-clkc" > >>> > >>> You don't need "simple-mfd" and probably not syscon either. The order is > >>> wrong too. Most specific first. > >>> > >> Ok, I will drop "simple-mfd".. > >> > >> but the syscon is a must, since this mmc clock model access registers > >> via the regmap interface > > > > A syscon compatible should not be the only way to get a regmap. > do you have any suggestion about other function that I can use? is > devm_regmap_init_mmio() feasible > > > Removing lines 56/57 of drivers/mfd/syscon.c should be sufficient. > > > I'm not sure what's the valid point of removing compatible 'syscon' in > driver/mfd/syscon.c, sounds this will break a lot DT/or need to fix? > will you propose a patch for this? then I can certainly adjust here Removing the 2 lines will simply allow any node to be a syscon. If there's a specific driver for a node, then that makes sense to allow that. > > > Why do you need a regmap in the first place? What else needs to access > > this register directly? > Yes, the SD_EMMC_CLOCK register contain several bits which not fit well > into common clock model, and they need to be access in the NAND or eMMC > driver itself, Martin had explained this in early thread[1] > > In this register > Bit[31] select NAND or eMMC function > Bit[25] enable SDIO IRQ > Bit[24] Clock always on > Bit[15:14] SRAM Power down > > [1] > https://lkml.kernel.org/r/CAFBinCBeyXf6LNaZzAw6WnsxzDAv8E=yp2eem0xcpwmeui6...@mail.gmail.com > > > Don't you need a patch removing the clock code > > from within the emmc driver? It's not even using regmap, so using > > regmap here doesn't help. > > > No, and current eMMC driver still use iomap to access the register Which means a read-modify-write can corrupt the register value if both users don't access thru regmap. Changes are probably infrequent enough that you get lucky... > I think we probably would like to take two steps approach. > first, from the hardware perspective, the NAND and eMMC(port C) driver > can't exist at same time, since they share the pins, clock, internal > ram, So we have to only enable one of NAND or eMMC in DT, not enable > both of them. Yes, of course. > Second, we might like to convert eMMC driver to also use mmc-clkc model. IMO, this should be done as part of merging this series. Otherwise, we have duplicated code for the same thing. Rob
Re: [PATCH] vfio-pci: Disable binding to PFs with SR-IOV enabled
On Thu, Jul 12, 2018 at 04:33:04PM -0600, Alex Williamson wrote: > We expect to receive PFs with SR-IOV disabled, however some host > drivers leave SR-IOV enabled at unbind. This puts us in a state where > we can potentially assign both the PF and the VF, leading to both > functionality as well as security concerns due to lack of managing the > SR-IOV state as well as vendor dependent isolation from the PF to VF. > If we were to attempt to actively disable SR-IOV on driver probe, we > risk VF bound drivers blocking, potentially risking live lock > scenarios. Therefore simply refuse to bind to PFs with SR-IOV enabled > with a warning message indicating the issue. Users can resolve this > by re-binding to the host driver and disabling SR-IOV before > attempting to use the device with vfio-pci. > > Signed-off-by: Alex Williamson Reviewed-by: David Gibson > --- > drivers/vfio/pci/vfio_pci.c | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index b423a309a6e0..f372f209c5c2 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -1189,6 +1189,19 @@ static int vfio_pci_probe(struct pci_dev *pdev, const > struct pci_device_id *id) > if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) > return -EINVAL; > > + /* > + * Prevent binding to PFs with VFs enabled, this too easily allows > + * userspace instance with VFs and PFs from the same device, which > + * cannot work. Disabling SR-IOV here would initiate removing the > + * VFs, which would unbind the driver, which is prone to blocking > + * if that VF is also in use by vfio-pci. Just reject these PFs > + * and let the user sort it out. > + */ > + if (pci_num_vf(pdev)) { > + pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n"); > + return -EBUSY; > + } > + > group = vfio_iommu_group_get(&pdev->dev); > if (!group) > return -EINVAL; > -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
[PATCH v2 2/7] proc/kcore: replace kclist_lock rwlock with rwsem
From: Omar Sandoval Now we only need kclist_lock from user context and at fs init time, and the following changes need to sleep while holding the kclist_lock. Signed-off-by: Omar Sandoval --- fs/proc/kcore.c | 32 +++- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index ddeeb3a5a015..def92fccb167 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -59,8 +59,8 @@ struct memelfnote }; static LIST_HEAD(kclist_head); -static DEFINE_RWLOCK(kclist_lock); -static int kcore_need_update = 1; +static DECLARE_RWSEM(kclist_lock); +static atomic_t kcore_need_update = ATOMIC_INIT(1); /* This doesn't grab kclist_lock, so it should only be used at init time. */ void @@ -117,8 +117,8 @@ static void __kcore_update_ram(struct list_head *list) struct kcore_list *tmp, *pos; LIST_HEAD(garbage); - write_lock(&kclist_lock); - if (kcore_need_update) { + down_write(&kclist_lock); + if (atomic_cmpxchg(&kcore_need_update, 1, 0)) { list_for_each_entry_safe(pos, tmp, &kclist_head, list) { if (pos->type == KCORE_RAM || pos->type == KCORE_VMEMMAP) @@ -127,9 +127,8 @@ static void __kcore_update_ram(struct list_head *list) list_splice_tail(list, &kclist_head); } else list_splice(list, &garbage); - kcore_need_update = 0; proc_root_kcore->size = get_kcore_size(&nphdr, &size); - write_unlock(&kclist_lock); + up_write(&kclist_lock); free_kclist_ents(&garbage); } @@ -452,11 +451,11 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) int nphdr; unsigned long start; - read_lock(&kclist_lock); + down_read(&kclist_lock); size = get_kcore_size(&nphdr, &elf_buflen); if (buflen == 0 || *fpos >= size) { - read_unlock(&kclist_lock); + up_read(&kclist_lock); return 0; } @@ -473,11 +472,11 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) tsz = buflen; elf_buf = kzalloc(elf_buflen, GFP_ATOMIC); if (!elf_buf) { - read_unlock(&kclist_lock); + up_read(&kclist_lock); return -ENOMEM; } elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen); - read_unlock(&kclist_lock); + up_read(&kclist_lock); if (copy_to_user(buffer, elf_buf + *fpos, tsz)) { kfree(elf_buf); return -EFAULT; @@ -492,7 +491,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) if (buflen == 0) return acc; } else - read_unlock(&kclist_lock); + up_read(&kclist_lock); /* * Check to see if our file offset matches with any of @@ -505,12 +504,12 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) while (buflen) { struct kcore_list *m; - read_lock(&kclist_lock); + down_read(&kclist_lock); list_for_each_entry(m, &kclist_head, list) { if (start >= m->addr && start < (m->addr+m->size)) break; } - read_unlock(&kclist_lock); + up_read(&kclist_lock); if (&m->list == &kclist_head) { if (clear_user(buffer, tsz)) @@ -563,7 +562,7 @@ static int open_kcore(struct inode *inode, struct file *filp) if (!filp->private_data) return -ENOMEM; - if (kcore_need_update) + if (atomic_read(&kcore_need_update)) kcore_update_ram(); if (i_size_read(inode) != proc_root_kcore->size) { inode_lock(inode); @@ -593,9 +592,8 @@ static int __meminit kcore_callback(struct notifier_block *self, switch (action) { case MEM_ONLINE: case MEM_OFFLINE: - write_lock(&kclist_lock); - kcore_need_update = 1; - write_unlock(&kclist_lock); + atomic_set(&kcore_need_update, 1); + break; } return NOTIFY_OK; } -- 2.18.0
[PATCH v2 5/7] proc/kcore: clean up ELF header generation
From: Omar Sandoval Currently, the ELF file header, program headers, and note segment are allocated all at once, in some icky code dating back to 2.3. Programs tend to read the file header, then the program headers, then the note segment, all separately, so this is a waste of effort. It's cleaner and more efficient to handle the three separately. Signed-off-by: Omar Sandoval --- fs/proc/kcore.c | 350 +++- 1 file changed, 141 insertions(+), 209 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index f1ae848c7bcc..a7e730b40154 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -49,15 +49,6 @@ static struct proc_dir_entry *proc_root_kcore; #definekc_offset_to_vaddr(o) ((o) + PAGE_OFFSET) #endif -/* An ELF note in memory */ -struct memelfnote -{ - const char *name; - int type; - unsigned int datasz; - void *data; -}; - static LIST_HEAD(kclist_head); static DECLARE_RWSEM(kclist_lock); static atomic_t kcore_need_update = ATOMIC_INIT(1); @@ -73,7 +64,8 @@ kclist_add(struct kcore_list *new, void *addr, size_t size, int type) list_add_tail(&new->list, &kclist_head); } -static size_t get_kcore_size(int *nphdr, size_t *elf_buflen) +static size_t get_kcore_size(int *nphdr, size_t *phdrs_len, size_t *notes_len, +size_t *data_offset) { size_t try, size; struct kcore_list *m; @@ -87,15 +79,15 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen) size = try; *nphdr = *nphdr + 1; } - *elf_buflen = sizeof(struct elfhdr) + - (*nphdr + 2)*sizeof(struct elf_phdr) + - 3 * ((sizeof(struct elf_note)) + -roundup(sizeof(CORE_STR), 4)) + - roundup(sizeof(struct elf_prstatus), 4) + - roundup(sizeof(struct elf_prpsinfo), 4) + - roundup(arch_task_struct_size, 4); - *elf_buflen = PAGE_ALIGN(*elf_buflen); - return size + *elf_buflen; + + *phdrs_len = *nphdr * sizeof(struct elf_phdr); + *notes_len = (3 * (sizeof(struct elf_note) + ALIGN(sizeof(CORE_STR), 4)) + + ALIGN(sizeof(struct elf_prstatus), 4) + + ALIGN(sizeof(struct elf_prpsinfo), 4) + + ALIGN(arch_task_struct_size, 4)); + *data_offset = PAGE_ALIGN(sizeof(struct elfhdr) + *phdrs_len + + *notes_len); + return *data_offset + size; } #ifdef CONFIG_HIGHMEM @@ -241,7 +233,7 @@ static int kcore_update_ram(void) LIST_HEAD(list); LIST_HEAD(garbage); int nphdr; - size_t size; + size_t phdrs_len, notes_len, data_offset; struct kcore_list *tmp, *pos; int ret = 0; @@ -263,7 +255,8 @@ static int kcore_update_ram(void) } list_splice_tail(&list, &kclist_head); - proc_root_kcore->size = get_kcore_size(&nphdr, &size); + proc_root_kcore->size = get_kcore_size(&nphdr, &phdrs_len, ¬es_len, + &data_offset); out: up_write(&kclist_lock); @@ -274,228 +267,168 @@ static int kcore_update_ram(void) return ret; } -/*/ -/* - * determine size of ELF note - */ -static int notesize(struct memelfnote *en) +static void append_kcore_note(char *notes, size_t *i, const char *name, + unsigned int type, const void *desc, + size_t descsz) { - int sz; - - sz = sizeof(struct elf_note); - sz += roundup((strlen(en->name) + 1), 4); - sz += roundup(en->datasz, 4); - - return sz; -} /* end notesize() */ - -/*/ -/* - * store a note in the header buffer - */ -static char *storenote(struct memelfnote *men, char *bufp) -{ - struct elf_note en; - -#define DUMP_WRITE(addr,nr) do { memcpy(bufp,addr,nr); bufp += nr; } while(0) - - en.n_namesz = strlen(men->name) + 1; - en.n_descsz = men->datasz; - en.n_type = men->type; - - DUMP_WRITE(&en, sizeof(en)); - DUMP_WRITE(men->name, en.n_namesz); - - /* XXX - cast from long long to long to avoid need for libgcc.a */ - bufp = (char*) roundup((unsigned long)bufp,4); - DUMP_WRITE(men->data, men->datasz); - bufp = (char*) roundup((unsigned long)bufp,4); - -#undef DUMP_WRITE - - return bufp; -} /* end storenote() */ - -/* - * store an ELF coredump header in the supplied buffer - * nphdr is the number of elf_phdr to insert - */ -static void elf_kcore_store_hdr(char *bufp, int nphdr, int dataoff) -{ - struct elf_prstatus prstatus; /* NT_PRSTATUS */ - struct elf_prpsinfo prpsinfo; /* NT_PRPSINFO */ - struct elf_phdr *nhdr, *phdr; - stru
[PATCH v2 4/7] proc/kcore: hold lock during read
From: Omar Sandoval Now that we're using an rwsem, we can hold it during the entirety of read_kcore() and have a common return path. This is preparation for the next change. Signed-off-by: Omar Sandoval --- fs/proc/kcore.c | 70 - 1 file changed, 40 insertions(+), 30 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 33667db6e370..f1ae848c7bcc 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -440,19 +440,18 @@ static ssize_t read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) { char *buf = file->private_data; - ssize_t acc = 0; size_t size, tsz; size_t elf_buflen; int nphdr; unsigned long start; + size_t orig_buflen = buflen; + int ret = 0; down_read(&kclist_lock); size = get_kcore_size(&nphdr, &elf_buflen); - if (buflen == 0 || *fpos >= size) { - up_read(&kclist_lock); - return 0; - } + if (buflen == 0 || *fpos >= size) + goto out; /* trim buflen to not go beyond EOF */ if (buflen > size - *fpos) @@ -465,28 +464,26 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) tsz = elf_buflen - *fpos; if (buflen < tsz) tsz = buflen; - elf_buf = kzalloc(elf_buflen, GFP_ATOMIC); + elf_buf = kzalloc(elf_buflen, GFP_KERNEL); if (!elf_buf) { - up_read(&kclist_lock); - return -ENOMEM; + ret = -ENOMEM; + goto out; } elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen); - up_read(&kclist_lock); if (copy_to_user(buffer, elf_buf + *fpos, tsz)) { kfree(elf_buf); - return -EFAULT; + ret = -EFAULT; + goto out; } kfree(elf_buf); buflen -= tsz; *fpos += tsz; buffer += tsz; - acc += tsz; /* leave now if filled buffer already */ if (buflen == 0) - return acc; - } else - up_read(&kclist_lock); + goto out; + } /* * Check to see if our file offset matches with any of @@ -499,25 +496,29 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) while (buflen) { struct kcore_list *m; - down_read(&kclist_lock); list_for_each_entry(m, &kclist_head, list) { if (start >= m->addr && start < (m->addr+m->size)) break; } - up_read(&kclist_lock); if (&m->list == &kclist_head) { - if (clear_user(buffer, tsz)) - return -EFAULT; + if (clear_user(buffer, tsz)) { + ret = -EFAULT; + goto out; + } } else if (m->type == KCORE_VMALLOC) { vread(buf, (char *)start, tsz); /* we have to zero-fill user buffer even if no read */ - if (copy_to_user(buffer, buf, tsz)) - return -EFAULT; + if (copy_to_user(buffer, buf, tsz)) { + ret = -EFAULT; + goto out; + } } else if (m->type == KCORE_USER) { /* User page is handled prior to normal kernel page: */ - if (copy_to_user(buffer, (char *)start, tsz)) - return -EFAULT; + if (copy_to_user(buffer, (char *)start, tsz)) { + ret = -EFAULT; + goto out; + } } else { if (kern_addr_valid(start)) { /* @@ -525,26 +526,35 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) * hardened user copy kernel text checks. */ if (probe_kernel_read(buf, (void *) start, tsz)) { - if (clear_user(buffer, tsz)) - return -EFAULT; + if (clear_user(buffer, tsz)) { + ret = -EFAULT; + goto out; + } } el
[PATCH v2 1/7] proc/kcore: don't grab lock for kclist_add()
From: Omar Sandoval kclist_add() is only called at init time, so there's no point in grabbing any locks. We're also going to replace the rwlock with a rwsem, which we don't want to try grabbing during early boot. Signed-off-by: Omar Sandoval --- fs/proc/kcore.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 66c373230e60..ddeeb3a5a015 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -62,6 +62,7 @@ static LIST_HEAD(kclist_head); static DEFINE_RWLOCK(kclist_lock); static int kcore_need_update = 1; +/* This doesn't grab kclist_lock, so it should only be used at init time. */ void kclist_add(struct kcore_list *new, void *addr, size_t size, int type) { @@ -69,9 +70,7 @@ kclist_add(struct kcore_list *new, void *addr, size_t size, int type) new->size = size; new->type = type; - write_lock(&kclist_lock); list_add_tail(&new->list, &kclist_head); - write_unlock(&kclist_lock); } static size_t get_kcore_size(int *nphdr, size_t *elf_buflen) -- 2.18.0
[PATCH v2 6/7] proc/kcore: optimize multiple page reads
From: Omar Sandoval The current code does a full search of the segment list every time for every page. This is wasteful, since it's almost certain that the next page will be in the same segment. Instead, check if the previous segment covers the current page before doing the list search. Signed-off-by: Omar Sandoval --- fs/proc/kcore.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index a7e730b40154..d1b875afc359 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -428,10 +428,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) tsz = buflen; + m = NULL; while (buflen) { - list_for_each_entry(m, &kclist_head, list) { - if (start >= m->addr && start < (m->addr+m->size)) - break; + /* +* If this is the first iteration or the address is not within +* the previous entry, search for a matching entry. +*/ + if (!m || start < m->addr || start >= m->addr + m->size) { + list_for_each_entry(m, &kclist_head, list) { + if (start >= m->addr && + start < m->addr + m->size) + break; + } } if (&m->list == &kclist_head) { -- 2.18.0
[PATCH v2 7/7] proc/kcore: add vmcoreinfo note to /proc/kcore
From: Omar Sandoval The vmcoreinfo information is useful for runtime debugging tools, not just for crash dumps. A lot of this information can be determined by other means, but this is much more convenient. Signed-off-by: Omar Sandoval --- fs/proc/Kconfig| 1 + fs/proc/kcore.c| 18 -- include/linux/crash_core.h | 2 ++ kernel/crash_core.c| 4 ++-- 4 files changed, 21 insertions(+), 4 deletions(-) diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig index 0eaeb41453f5..817c02b13b1d 100644 --- a/fs/proc/Kconfig +++ b/fs/proc/Kconfig @@ -31,6 +31,7 @@ config PROC_FS config PROC_KCORE bool "/proc/kcore support" if !ARM depends on PROC_FS && MMU + select CRASH_CORE help Provides a virtual ELF core file of the live kernel. This can be read with gdb and other ELF tools. No modifications can be diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index d1b875afc359..bef78923b387 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -10,6 +10,7 @@ * Safe accesses to vmalloc/direct-mapped discontiguous areas, Kanoj Sarcar */ +#include #include #include #include @@ -81,10 +82,13 @@ static size_t get_kcore_size(int *nphdr, size_t *phdrs_len, size_t *notes_len, } *phdrs_len = *nphdr * sizeof(struct elf_phdr); - *notes_len = (3 * (sizeof(struct elf_note) + ALIGN(sizeof(CORE_STR), 4)) + + *notes_len = (4 * sizeof(struct elf_note) + + 3 * ALIGN(sizeof(CORE_STR), 4) + + VMCOREINFO_NOTE_NAME_BYTES + ALIGN(sizeof(struct elf_prstatus), 4) + ALIGN(sizeof(struct elf_prpsinfo), 4) + - ALIGN(arch_task_struct_size, 4)); + ALIGN(arch_task_struct_size, 4) + + ALIGN(vmcoreinfo_size, 4)); *data_offset = PAGE_ALIGN(sizeof(struct elfhdr) + *phdrs_len + *notes_len); return *data_offset + size; @@ -406,6 +410,16 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) sizeof(prpsinfo)); append_kcore_note(notes, &i, CORE_STR, NT_TASKSTRUCT, current, arch_task_struct_size); + /* +* vmcoreinfo_size is mostly constant after init time, but it +* can be changed by crash_save_vmcoreinfo(). Racing here with a +* panic on another CPU before the machine goes down is insanely +* unlikely, but it's better to not leave potential buffer +* overflows lying around, regardless. +*/ + append_kcore_note(notes, &i, VMCOREINFO_NOTE_NAME, 0, + vmcoreinfo_data, + min(vmcoreinfo_size, notes_len - i)); tsz = min_t(size_t, buflen, notes_offset + notes_len - *fpos); if (copy_to_user(buffer, notes + *fpos - notes_offset, tsz)) { diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index b511f6d24b42..525510a9f965 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -60,6 +60,8 @@ phys_addr_t paddr_vmcoreinfo_note(void); #define VMCOREINFO_CONFIG(name) \ vmcoreinfo_append_str("CONFIG_%s=y\n", #name) +extern unsigned char *vmcoreinfo_data; +extern size_t vmcoreinfo_size; extern u32 *vmcoreinfo_note; Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, diff --git a/kernel/crash_core.c b/kernel/crash_core.c index b66aced5e8c2..d02c58b94460 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -14,8 +14,8 @@ #include /* vmcoreinfo stuff */ -static unsigned char *vmcoreinfo_data; -static size_t vmcoreinfo_size; +unsigned char *vmcoreinfo_data; +size_t vmcoreinfo_size; u32 *vmcoreinfo_note; /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */ -- 2.18.0
[PATCH v2 3/7] proc/kcore: fix memory hotplug vs multiple opens race
From: Omar Sandoval There's a theoretical race condition that will cause /proc/kcore to miss a memory hotplug event: CPU0 CPU1 // hotplug event 1 kcore_need_update = 1 open_kcore() open_kcore() kcore_update_ram()kcore_update_ram() // Walk RAM // Walk RAM __kcore_update_ram() __kcore_update_ram() kcore_need_update = 0 // hotplug event 2 kcore_need_update = 1 kcore_need_update = 0 Note that CPU1 set up the RAM kcore entries with the state after hotplug event 1 but cleared the flag for hotplug event 2. The RAM entries will therefore be stale until there is another hotplug event. This is an extremely unlikely sequence of events, but the fix makes the synchronization saner, anyways: we serialize the entire update sequence, which means that whoever clears the flag will always succeed in replacing the kcore list. Signed-off-by: Omar Sandoval --- fs/proc/kcore.c | 93 +++-- 1 file changed, 44 insertions(+), 49 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index def92fccb167..33667db6e370 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -98,53 +98,15 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen) return size + *elf_buflen; } -static void free_kclist_ents(struct list_head *head) -{ - struct kcore_list *tmp, *pos; - - list_for_each_entry_safe(pos, tmp, head, list) { - list_del(&pos->list); - kfree(pos); - } -} -/* - * Replace all KCORE_RAM/KCORE_VMEMMAP information with passed list. - */ -static void __kcore_update_ram(struct list_head *list) -{ - int nphdr; - size_t size; - struct kcore_list *tmp, *pos; - LIST_HEAD(garbage); - - down_write(&kclist_lock); - if (atomic_cmpxchg(&kcore_need_update, 1, 0)) { - list_for_each_entry_safe(pos, tmp, &kclist_head, list) { - if (pos->type == KCORE_RAM - || pos->type == KCORE_VMEMMAP) - list_move(&pos->list, &garbage); - } - list_splice_tail(list, &kclist_head); - } else - list_splice(list, &garbage); - proc_root_kcore->size = get_kcore_size(&nphdr, &size); - up_write(&kclist_lock); - - free_kclist_ents(&garbage); -} - - #ifdef CONFIG_HIGHMEM /* * If no highmem, we can assume [0...max_low_pfn) continuous range of memory * because memory hole is not as big as !HIGHMEM case. * (HIGHMEM is special because part of memory is _invisible_ from the kernel.) */ -static int kcore_update_ram(void) +static int kcore_ram_list(struct list_head *head) { - LIST_HEAD(head); struct kcore_list *ent; - int ret = 0; ent = kmalloc(sizeof(*ent), GFP_KERNEL); if (!ent) @@ -152,9 +114,8 @@ static int kcore_update_ram(void) ent->addr = (unsigned long)__va(0); ent->size = max_low_pfn << PAGE_SHIFT; ent->type = KCORE_RAM; - list_add(&ent->list, &head); - __kcore_update_ram(&head); - return ret; + list_add(&ent->list, head); + return 0; } #else /* !CONFIG_HIGHMEM */ @@ -253,11 +214,10 @@ kclist_add_private(unsigned long pfn, unsigned long nr_pages, void *arg) return 1; } -static int kcore_update_ram(void) +static int kcore_ram_list(struct list_head *list) { int nid, ret; unsigned long end_pfn; - LIST_HEAD(head); /* Not inializedupdate now */ /* find out "max pfn" */ @@ -269,15 +229,50 @@ static int kcore_update_ram(void) end_pfn = node_end; } /* scan 0 to max_pfn */ - ret = walk_system_ram_range(0, end_pfn, &head, kclist_add_private); - if (ret) { - free_kclist_ents(&head); + ret = walk_system_ram_range(0, end_pfn, list, kclist_add_private); + if (ret) return -ENOMEM; + return 0; +} +#endif /* CONFIG_HIGHMEM */ + +static int kcore_update_ram(void) +{ + LIST_HEAD(list); + LIST_HEAD(garbage); + int nphdr; + size_t size; + struct kcore_list *tmp, *pos; + int ret = 0; + + down_write(&kclist_lock); + if (!atomic_cmpxchg(&kcore_need_update, 1, 0)) + goto out; + + ret = kcore_ram_list(&list); + if (ret) { + /* Couldn't get the RAM list, try again next time. */ + atomic_set(&kcore_need_update, 1); + list_splice_tail(&list, &garbage); + goto out; + } + + list_for_each_entry_safe(pos, tmp, &kclist_head, list) { + if (pos->type == KCORE_RAM || pos->type == KCORE_VMEMMAP) + list_move(&pos->list, &garbage); + } + list_splice_tail(&list, &kclist_head); + +
[PATCH v2 0/7] /proc/kcore improvements
From: Omar Sandoval Hi, This series makes a few improvements to /proc/kcore. Patches 1 and 2 are prep patches. Patch 3 is a fix/cleanup. Patch 4 is another prep patch. Patches 5 and 6 are optimizations to ->read(). Patch 7 adds vmcoreinfo to /proc/kcore (apparently I'm not the only one who wants this, see https://www.spinics.net/lists/arm-kernel/msg665103.html). I tested that the crash utility still works with this applied, and readelf is happy with it, as well. Andrew, since this didn't get any traction on the fsdevel side, and you're already carrying James' patch, could you take this through -mm? Thanks! Changes from v1: - Rebased onto v4.18-rc4 + James' patch (https://patchwork.kernel.org/patch/10519739/) in the mm tree - Fix spurious sparse warning (see the report and response in https://patchwork.kernel.org/patch/10512431/) Omar Sandoval (7): proc/kcore: don't grab lock for kclist_add() proc/kcore: replace kclist_lock rwlock with rwsem proc/kcore: fix memory hotplug vs multiple opens race proc/kcore: hold lock during read proc/kcore: clean up ELF header generation proc/kcore: optimize multiple page reads proc/kcore: add vmcoreinfo note to /proc/kcore fs/proc/Kconfig| 1 + fs/proc/kcore.c| 536 + include/linux/crash_core.h | 2 + kernel/crash_core.c| 4 +- 4 files changed, 251 insertions(+), 292 deletions(-) -- 2.18.0
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]
Andy Lutomirski wrote: > >> I tend to think that this *should* fail using the new API. The semantics > >> of the second mount request are bizarre at best. > > > > You still have to support existing behaviour lest you break userspace. > > > > I assume the existing behavior is that a bind mount is created? If so, the > new mount(8) tool could do it in user code. You have a race there. Also you can't currently directly create a bind mount from userspace as you can only bind from another path point - which you may not be able to access (either by permission failure or because it's not in your mount namespace). David
Re: [PATCH] x86: vdso: Fix leaky vdso link with CC=clang
On Thu, Jul 12, 2018 at 4:20 PM Andy Lutomirski wrote: > > > On Jul 12, 2018, at 3:06 PM, H. Peter Anvin wrote: > > > >> On 07/12/18 13:37, Alistair Strachan wrote: > >>> On Thu, Jul 12, 2018 at 1:25 PM H. Peter Anvin wrote: > On 07/12/18 13:10, Alistair Strachan wrote: > The vdso{32,64}.so can fail to link with CC=clang when clang tries to > find a suitable GCC toolchain to link these libraries with. > > /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: > access beyond end of merged section (782) > > This happens because the host environment leaked into the cross > compiler environment due to the way clang searches for suitable GCC > toolchains. > > >>> > >>> Is this another clang bug that you want a workaround for in the kernel? > >> > >> Clang is a retargetable compiler (specified with --target=) > >> and so it has a mechanism for searching for suitable binutils (from > >> another "GCC toolchain") to perform assembly and linkage. This > >> mechanism relies on both --target and --gcc-toolchain when > >> cross-compiling, otherwise it will fall back to searching /usr. > >> > >> The --target and --gcc-toolchain flags are already specified correctly > >> in the top level Makefile, but the vdso Makefile rolls its own linker > >> flags and doesn't use KBUILD_CFLAGS. Therefore, these flags get > >> incorrectly dropped from the vdso $CC link command line, and an > >> inconsistency is created between the "GCC toolchain" used to generate > >> the objects for the vdso, and the linker used to link them. > >> > > > > It sounds like there needs to be a more fundamental symbol than > > KBUILD_CFLAGS to contain these kinds of things. > > How about $(CC)? I'm guessing, but I think this wasn't done originally because CC is something the user might reasonably specify on the command line (the other bit comes from CROSS_COMPILE), so doing this via CC would require us to override the CC passed in on the command line. Not sure how to do that, since the vdso makefile is executed with a submake, so the usual "override CC := $(CC) something else" followed by "export CC" doesn't work.
Consolidating RCU-bh, RCU-preempt, and RCU-sched
Hello! I now have a semi-reasonable prototype of changes consolidating the RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree. There are likely still bugs to be fixed and probably other issues as well, but a prototype does exist. Assuming continued good rcutorture results and no objections, I am thinking in terms of this timeline: o Preparatory work and cleanups are slated for the v4.19 merge window. o The actual consolidation and post-consolidation cleanup is slated for the merge window after v4.19 (v5.0?). These cleanups include the replacements called out below within the RCU implementation itself (but excluding kernel/rcu/sync.c, see question below). o Replacement of now-obsolete update APIs is slated for the second merge window after v4.19 (v5.1?). The replacements are currently expected to be as follows: synchronize_rcu_bh() -> synchronize_rcu() synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited() call_rcu_bh() -> call_rcu() rcu_barrier_bh() -> rcu_barrier() synchronize_sched() -> synchronize_rcu() synchronize_sched_expedited() -> synchronize_rcu_expedited() call_rcu_sched() -> call_rcu() rcu_barrier_sched() -> rcu_barrier() get_state_synchronize_sched() -> get_state_synchronize_rcu() cond_synchronize_sched() -> cond_synchronize_rcu() synchronize_rcu_mult() -> synchronize_rcu() I have done light testing of these replacements with good results. Any objections to this timeline? I also have some questions on the ultimate end point. I have default choices, which I will likely take if there is no discussion. o Currently, I am thinking in terms of keeping the per-flavor read-side functions. For example, rcu_read_lock_bh() would continue to disable softirq, and would also continue to tell lockdep about the RCU-bh read-side critical section. However, synchronize_rcu() will wait for all flavors of read-side critical sections, including those introduced by (say) preempt_disable(), so there will no longer be any possibility of mismatching (say) RCU-bh readers with RCU-sched updaters. I could imagine other ways of handling this, including: a. Eliminate rcu_read_lock_bh() in favor of local_bh_disable() and so on. Rely on lockdep instrumentation of these other functions to identify RCU readers, introducing such instrumentation as needed. I am not a fan of this approach because of the large number of places in the Linux kernel where interrupts, preemption, and softirqs are enabled or disabled "behind the scenes". b. Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(), and required callers to also disable softirqs, preemption, or whatever as needed. I am not a fan of this approach because it seems a lot less convenient to users of RCU-bh and RCU-sched. At the moment, I therefore favor keeping the RCU-bh and RCU-sched read-side APIs. But are there better approaches? o How should kernel/rcu/sync.c be handled? Here are some possibilities: a. Leave the full gp_ops[] array and simply translate the obsolete update-side functions to their RCU equivalents. b. Leave the current gp_ops[] array, but only have the RCU_SYNC entry. The __INIT_HELD field would be set to a function that was OK with being in an RCU read-side critical section, an interrupt-disabled section, etc. This allows for possible addition of SRCU functionality. It is also a trivial change. Note that the sole user of sync.c uses RCU_SCHED_SYNC, and this would need to be changed to RCU_SYNC. But is it likely that we will ever add SRCU? c. Eliminate that gp_ops[] array, hard-coding the function pointers into their call sites. I don't really have a preference. Left to myself, I will be lazy and take option #a. Are there better approaches? o Currently, if a lock related to the scheduler's rq or pi locks is held across rcu_read_unlock(), that lock must be held across the entire read-side critical section in order to avoid deadlock. Now that the end of the RCU read-side critical section is deferred until sometime after interrupts are re-enabled, this requirement could be lifted. However, because the end of the RCU read-side critical section is detected sometime after interrupts are re-enabled, this means that a low-priority RCU reader might remain p
Re: Bug report about KASLR and ZONE_MOVABLE
Hi Michal, On 07/12/18 at 02:32pm, Michal Hocko wrote: > On Thu 12-07-18 14:01:15, Chao Fan wrote: > > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote: > > >Hi Baoquan, > > > > > >At 07/11/2018 08:40 PM, Baoquan He wrote: > > >> Please try this v3 patch: > > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 > > >> From: Baoquan He > > >> Date: Wed, 11 Jul 2018 20:31:51 +0800 > > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text > > >> > > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting > > >> PFN movable zone begins in each node, kernel text position is not > > >> considered. KASLR may put kernel after which movable zone begins. > > >> > > >> Fix it by finding movable zone after kernel text on that node. > > >> > > >> Signed-off-by: Baoquan He > > > > > > > > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or > > >'movablecore=' failed if the KASLR puts the kernel back the tail of the > > >last node, or more. > > > > I think it may not fail. > > There is a 'restart' to do another pass. > > > > > > > >Due to we have fix the mirror memory in KASLR side, and Chao is trying > > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix > > >this in the KASLR side. > > > > > > > I think it's better to fix here, but not KASLR side. > > Cause much more code will be change if doing it in KASLR side. > > Since we didn't parse 'kernelcore' in compressed code, and you can see > > the distribution of ZONE_MOVABLE need so much code, so we do not need > > to do so much job in KASLR side. But here, several lines will be OK. > > I am not able to find the beginning of the email thread right now. Could > you summarize what is the actual problem please? The bug is found on x86 now. When added "kernelcore=" or "movablecore=" into kernel command line, kernel memory is spread evenly among nodes. However, this is right when KASLR is not enabled, then kernel will be at 16M of place in x86 arch. If KASLR enabled, it could be put any place from 16M to 64T randomly. Consider a scenario, we have 10 nodes, and each node has 20G memory, and we specify "kernelcore=50%", means each node will take 10G for kernelcore, 10G for movable area. But this doesn't take kernel position into consideration. E.g if kernel is put at 15G of 2nd node, namely node1. Then we think on node1 there's 10G for kernelcore, 10G for movable, in fact there's only 5G available for movable, just after kernel. I made a v4 patch which possibly can fix it. >From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Fri, 13 Jul 2018 07:49:29 +0800 Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text In find_zone_movable_pfns_for_nodes(), when try to find the starting PFN movable zone begins at in each node, kernel text position is not considered. KASLR may put kernel after which movable zone begins. Fix it by finding movable zone after kernel text on that node. Signed-off-by: Baoquan He --- mm/page_alloc.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100f1e63..5bc1a47dafda 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6547,7 +6547,7 @@ static unsigned long __init early_calculate_totalpages(void) static void __init find_zone_movable_pfns_for_nodes(void) { int i, nid; - unsigned long usable_startpfn; + unsigned long usable_startpfn, kernel_endpfn, arch_startpfn; unsigned long kernelcore_node, kernelcore_remaining; /* save the state before borrow the nodemask */ nodemask_t saved_node_state = node_states[N_MEMORY]; @@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void) if (!required_kernelcore || required_kernelcore >= totalpages) goto out; + kernel_endpfn = PFN_UP(__pa_symbol(_end)); /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ - usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; + arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; restart: /* Spread kernelcore memory as evenly as possible throughout nodes */ @@ -6659,6 +6660,16 @@ static void __init find_zone_movable_pfns_for_nodes(void) unsigned long start_pfn, end_pfn; /* +* KASLR may put kernel near tail of node memory, +* start after kernel on that node to find PFN +* at which zone begins. +*/ + if (pfn_to_nid(kernel_endpfn) == nid) + usable_startpfn = max(arch_startpfn, kernel_endpfn); + else + usable_startpfn = arch_startpfn; + + /* * Recalculate kernelcore_node if the division per node * now exceeds what is necessary to satisfy the req
Re: [PATCH v13 16/18] sched: move sched clock initialization and merge with generic clock
Hi Pavel, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.18-rc4 next-20180712] [cannot apply to tip/x86/core] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Pavel-Tatashin/Early-boot-time-stamps/20180712-200238 config: microblaze-mmu_defconfig (attached as .config) compiler: microblaze-linux-gcc (GCC) 8.1.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=8.1.0 make.cross ARCH=microblaze All errors (new ones prefixed by >>): kernel/sched/clock.c: In function 'sched_clock_init': >> kernel/sched/clock.c:440:2: error: implicit declaration of function >> 'generic_sched_clock_init'; did you mean 'sched_clock_init'? >> [-Werror=implicit-function-declaration] generic_sched_clock_init(); ^~~~ sched_clock_init cc1: some warnings being treated as errors vim +440 kernel/sched/clock.c 436 437 void __init sched_clock_init(void) 438 { 439 sched_clock_running = 1; > 440 generic_sched_clock_init(); 441 } 442 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]
> On Jul 12, 2018, at 4:35 PM, David Howells wrote: > > Andy Lutomirski wrote: > >> I tend to think that this *should* fail using the new API. The semantics of >> the second mount request are bizarre at best. > > You still have to support existing behaviour lest you break userspace. > I assume the existing behavior is that a bind mount is created? If so, the new mount(8) tool could do it in user code.
Re: [PATCH v5 1/2] dt-bindings: cpufreq: Introduce QCOM CPUFREQ Firmware bindings
Quoting Taniya Das (2018-07-12 11:05:44) [..] > + compatible = "qcom,kryo385"; > + reg = <0x0 0x600>; > + enable-method = "psci"; > + next-level-cache = <&L2_600>; > + qcom,freq-domain = <&freq_domain_table1>; > + L2_600: l2-cache { > + compatible = "cache"; > + next-level-cache = <&L3_0>; > + }; > + }; > + > + CPU7: cpu@700 { > + device_type = "cpu"; > + compatible = "qcom,kryo385"; > + reg = <0x0 0x700>; > + enable-method = "psci"; > + next-level-cache = <&L2_700>; > + qcom,freq-domain = <&freq_domain_table1>; > + L2_700: l2-cache { > + compatible = "cache"; > + next-level-cache = <&L3_0>; > + }; > + }; > + }; > + > + qcom,cpufreq-hw { > + compatible = "qcom,cpufreq-hw"; > + #address-cells = <2>; > + #size-cells = <2>; > + ranges; > + freq_domain_table0: freq_table0 { > + reg = <0 0x17d43000 0 0x1400>; > + }; > + > + freq_domain_table1: freq_table1 { > + reg = <0 0x17d45800 0 0x1400>; > + }; It seems that we need to map the CPUs in the cpus node to the frequency domains in the cpufreq-hw node. Wouldn't that be better served via a #foo-cells and <&phandle foo-cell> property in the CPU node? It's annoying that the cpufreq-hw node doesn't have a reg property, when it really should have one that goes over the whole register space (or is split across the frequency domains so that there are two reg properties here).
Re: [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure
On Thu, 12 Jul 2018 13:29:42 -0400 Johannes Weiner wrote: > Right now, psi reports pressure and stall times of already concluded > stall events. For most use cases this is current enough, but certain > highly latency-sensitive applications, like the Android OOM killer, > might want to know about and react to stall states before they have > even concluded (e.g. a prolonged reclaim cycle). > > This patches the procfs/cgroupfs interface such that when the pressure > metrics are read, the current per-cpu states, if any, are taken into > account as well. > > Any ongoing states are concluded, their time snapshotted, and then > restarted. This requires holding the rq lock to avoid corruption. It > could use some form of rq lock ratelimiting or avoidance. > > Requested-by: Suren Baghdasaryan > Not-yet-signed-off-by: Johannes Weiner What-does-that-mean:?
Re: [RFC v4 0/3] mm: zap pages with read mmap_sem in munmap for large mapping
On 7/12/18 1:04 AM, Michal Hocko wrote: On Wed 11-07-18 10:04:48, Yang Shi wrote: [...] One approach is to save all the vmas on a separate list, then zap_page_range does unmap with this list. Just detached unmapped vma chain from mm. You can keep the existing vm_next chain and reuse it. Yes. Other than this, we still need do: * Tell zap_page_range not update vm_flags as what I did in v4. Of course without VM_DEAD this time * Extract pagetable free code then do it after zap_page_range. I think I can just cal free_pgd_range() directly.
Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2
On Thu, 12 Jul 2018 13:29:32 -0400 Johannes Weiner wrote: > > ... > > The io file is similar to memory. Because the block layer doesn't have > a concept of hardware contention right now (how much longer is my IO > request taking due to other tasks?), it reports CPU potential lost on > all IO delays, not just the potential lost due to competition. Probably dumb question: disks aren't the only form of IO. Does it make sense to accumulate PSI for other forms of IO? Networking comes to mind...
Re: [PATCH v5 2/2] cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver
Quoting Taniya Das (2018-07-12 11:05:45) > The CPUfreq HW present in some QCOM chipsets offloads the steps necessary > for changing the frequency of CPUs. The driver implements the cpufreq > driver interface for this hardware engine. > > Signed-off-by: Saravana Kannan > Signed-off-by: Taniya Das > diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm > index 52f5f1a..141ec3e 100644 > --- a/drivers/cpufreq/Kconfig.arm > +++ b/drivers/cpufreq/Kconfig.arm > @@ -312,3 +312,13 @@ config ARM_PXA2xx_CPUFREQ > This add the CPUFreq driver support for Intel PXA2xx SOCs. > > If in doubt, say N. > + > +config ARM_QCOM_CPUFREQ_HW > + bool "QCOM CPUFreq HW driver" Why can't it be a module? > + help > +Support for the CPUFreq HW driver. > +Some QCOM chipsets have a HW engine to offload the steps > +necessary for changing the frequency of the CPUs. Firmware loaded > +in this engine exposes a programming interafce to the High-level OS. typo on interface. Why is High capitalized? Just say OS? > +The driver implements the cpufreq driver interface for this HW > engine. So much 'driver'. > +Say Y if you want to support CPUFreq HW. > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c > b/drivers/cpufreq/qcom-cpufreq-hw.c > new file mode 100644 > index 000..fa25a95 > --- /dev/null > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c > @@ -0,0 +1,344 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (c) 2018, The Linux Foundation. All rights reserved. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define INIT_RATE 3UL This doesn't need to be configured from DT? Or more likely be specified as some sort of PLL that is part of the clocks property so we know what the 'safe' or 'default' frequency is? > +#define XO_RATE1920UL This should come from DT via some clocks property. > +#define LUT_MAX_ENTRIES40U > +#define CORE_COUNT_VAL(val)(((val) & (GENMASK(18, 16))) >> 16) > +#define LUT_ROW_SIZE 32 > + > +enum { > + REG_ENABLE, > + REG_LUT_TABLE, > + REG_PERF_STATE, > + > + REG_ARRAY_SIZE, > +}; > + > +struct cpufreq_qcom { > + struct cpufreq_frequency_table *table; > + struct device *dev; > + const u16 *reg_offset; > + void __iomem *base; > + cpumask_t related_cpus; > + unsigned int max_cores; > +}; > + > +static u16 cpufreq_qcom_std_offsets[REG_ARRAY_SIZE] = { const? > + [REG_ENABLE]= 0x0, > + [REG_LUT_TABLE] = 0x110, > + [REG_PERF_STATE]= 0x920, Is the register map going to change again for the next device? It may be better to precalculate the offset for the fast switch so that the addition isn't in the hotpath. > +}; > + > +static struct cpufreq_qcom *qcom_freq_domain_map[NR_CPUS]; > + > +static int > +qcom_cpufreq_hw_target_index(struct cpufreq_policy *policy, > +unsigned int index) > +{ > + struct cpufreq_qcom *c = policy->driver_data; > + unsigned int offset = c->reg_offset[REG_PERF_STATE]; > + > + writel_relaxed(index, c->base + offset); > + > + return 0; > +} > + > +static unsigned int qcom_cpufreq_hw_get(unsigned int cpu) > +{ > + struct cpufreq_qcom *c; > + struct cpufreq_policy *policy; > + unsigned int index, offset; > + > + policy = cpufreq_cpu_get_raw(cpu); > + if (!policy) > + return 0; > + > + c = policy->driver_data; > + offset = c->reg_offset[REG_PERF_STATE]; > + > + index = readl_relaxed(c->base + offset); > + index = min(index, LUT_MAX_ENTRIES - 1); > + > + return policy->freq_table[index].frequency; > +} > + > +static unsigned int > +qcom_cpufreq_hw_fast_switch(struct cpufreq_policy *policy, > + unsigned int target_freq) > +{ > + struct cpufreq_qcom *c = policy->driver_data; > + unsigned int offset; > + int index; > + > + index = cpufreq_table_find_index_l(policy, target_freq); It's unfortunate that we have to search the table in software again. Why can't we use policy->cached_resolved_idx to avoid this search twice? > + if (index < 0) > + return 0; > + > + offset = c->reg_offset[REG_PERF_STATE]; > + > + writel_relaxed(index, c->base + offset); > + > + return policy->freq_table[index].frequency; > +} > + > +static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy) > +{ > + struct cpufreq_qcom *c; > + > + c = qcom_freq_domain_map[policy->cpu]; > + if (!c) { > + pr_err("No scaling support for CPU%d\n", policy->cpu); > + return -ENODEV; > + } > + > + cpumask_copy(policy->cpus, &c->related_cpus); > + > + policy->fast_switch_possible = true; > +
[PATCH 5/6] swap: Add __swap_entry_free_locked()
From: Huang Ying The part of __swap_entry_free() with lock held is separated into a new function __swap_entry_free_locked(). Because we want to reuse that piece of code in some other places. Just mechanical code refactoring, there is no any functional change in this function. Signed-off-by: "Huang, Ying" Cc: Dave Hansen Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Daniel Jordan Cc: Dan Williams --- mm/swapfile.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index e0df8d22ac92..bc488bf36c86 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1180,16 +1180,13 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) return NULL; } -static unsigned char __swap_entry_free(struct swap_info_struct *p, - swp_entry_t entry, unsigned char usage) +static unsigned char __swap_entry_free_locked(struct swap_info_struct *p, + unsigned long offset, + unsigned char usage) { - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); unsigned char count; unsigned char has_cache; - ci = lock_cluster_or_swap_info(p, offset); - count = p->swap_map[offset]; has_cache = count & SWAP_HAS_CACHE; @@ -1217,6 +1214,17 @@ static unsigned char __swap_entry_free(struct swap_info_struct *p, usage = count | has_cache; p->swap_map[offset] = usage ? : SWAP_HAS_CACHE; + return usage; +} + +static unsigned char __swap_entry_free(struct swap_info_struct *p, + swp_entry_t entry, unsigned char usage) +{ + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + + ci = lock_cluster_or_swap_info(p, offset); + usage = __swap_entry_free_locked(p, offset, usage); unlock_cluster_or_swap_info(p, ci); return usage; -- 2.16.4
[PATCH 2/6] mm/swapfile.c: Replace some #ifdef with IS_ENABLED()
From: Huang Ying In mm/swapfile.c, THP (Transparent Huge Page) swap specific code is enclosed by #ifdef CONFIG_THP_SWAP/#endif to avoid code dilating when THP isn't enabled. But #ifdef/#endif in .c file hurt the code readability, so Dave suggested to use IS_ENABLED(CONFIG_THP_SWAP) instead and let compiler to do the dirty job for us. This has potential to remove some duplicated code too. From output of `size`, text data bss dec hex filename THP=y: 26269 2076 340 28685700d mm/swapfile.o ifdef/endif: 24115 2028 340 264836773 mm/swapfile.o IS_ENABLED:24179 2028 340 2654767b3 mm/swapfile.o IS_ENABLED() based solution works quite well, almost as good as that of #ifdef/#endif. And from the diffstat, the removed lines are more than added lines. One #ifdef for split_swap_cluster() is kept. Because it is a public function with a stub implementation for CONFIG_THP_SWAP=n in swap.h. Signed-off-by: "Huang, Ying" Suggested-by: Dave Hansen Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Daniel Jordan Cc: Dan Williams --- mm/swapfile.c | 56 1 file changed, 16 insertions(+), 40 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index e31aa601d9c0..75c84aa763a3 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -870,7 +870,6 @@ static int scan_swap_map_slots(struct swap_info_struct *si, return n_ret; } -#ifdef CONFIG_THP_SWAP static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot) { unsigned long idx; @@ -878,6 +877,11 @@ static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot) unsigned long offset, i; unsigned char *map; + if (!IS_ENABLED(CONFIG_THP_SWAP)) { + VM_WARN_ON_ONCE(1); + return 0; + } + if (cluster_list_empty(&si->free_clusters)) return 0; @@ -908,13 +912,6 @@ static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx) unlock_cluster(ci); swap_range_free(si, offset, SWAPFILE_CLUSTER); } -#else -static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot) -{ - VM_WARN_ON_ONCE(1); - return 0; -} -#endif /* CONFIG_THP_SWAP */ static unsigned long scan_swap_map(struct swap_info_struct *si, unsigned char usage) @@ -1260,7 +1257,6 @@ static void swapcache_free(swp_entry_t entry) } } -#ifdef CONFIG_THP_SWAP static void swapcache_free_cluster(swp_entry_t entry) { unsigned long offset = swp_offset(entry); @@ -1271,6 +1267,9 @@ static void swapcache_free_cluster(swp_entry_t entry) unsigned int i, free_entries = 0; unsigned char val; + if (!IS_ENABLED(CONFIG_THP_SWAP)) + return; + si = _swap_info_get(entry); if (!si) return; @@ -1306,6 +1305,7 @@ static void swapcache_free_cluster(swp_entry_t entry) } } +#ifdef CONFIG_THP_SWAP int split_swap_cluster(swp_entry_t entry) { struct swap_info_struct *si; @@ -1320,11 +1320,7 @@ int split_swap_cluster(swp_entry_t entry) unlock_cluster(ci); return 0; } -#else -static inline void swapcache_free_cluster(swp_entry_t entry) -{ -} -#endif /* CONFIG_THP_SWAP */ +#endif void put_swap_page(struct page *page, swp_entry_t entry) { @@ -1483,7 +1479,6 @@ int swp_swapcount(swp_entry_t entry) return count; } -#ifdef CONFIG_THP_SWAP static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, swp_entry_t entry) { @@ -1494,6 +1489,9 @@ static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, int i; bool ret = false; + if (!IS_ENABLED(CONFIG_THP_SWAP)) + return swap_swapcount(si, entry) != 0; + ci = lock_cluster_or_swap_info(si, offset); if (!ci || !cluster_is_huge(ci)) { if (map[roffset] != SWAP_HAS_CACHE) @@ -1516,7 +1514,7 @@ static bool page_swapped(struct page *page) swp_entry_t entry; struct swap_info_struct *si; - if (likely(!PageTransCompound(page))) + if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!PageTransCompound(page))) return page_swapcount(page) != 0; page = compound_head(page); @@ -1540,10 +1538,8 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount, /* hugetlbfs shouldn't call it */ VM_BUG_ON_PAGE(PageHuge(page), page); - if (likely(!PageTransCompound(page))) { - mapcount = atomic_read(&page->_mapcount) + 1; - if (total_mapcount) - *total_mapcount = mapcount; + if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!PageTransCompound(page))) { + mapcount = page_t
[PATCH 1/6] swap: Add comments to lock_cluster_or_swap_info()
From: Huang Ying To improve the code readability. Signed-off-by: "Huang, Ying" Suggested-by: Dave Hansen Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Daniel Jordan Cc: Dan Williams --- mm/swapfile.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/mm/swapfile.c b/mm/swapfile.c index d8fddfb000ec..e31aa601d9c0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -297,6 +297,12 @@ static inline void unlock_cluster(struct swap_cluster_info *ci) spin_unlock(&ci->lock); } +/* + * At most times, fine grained cluster lock is sufficient to protect + * the operations on sis->swap_map. No need to acquire gross grained + * sis->lock. But cluster and cluster lock isn't available for HDD, + * so sis->lock will be instead for them. + */ static inline struct swap_cluster_info *lock_cluster_or_swap_info( struct swap_info_struct *si, unsigned long offset) -- 2.16.4