linux-next: Tree for Mar 30
Hi all, Changes since 20170329: Undropped tree: xen-tip The vfs tree gained a conflict against Linus' tree. The drm tree gained conflicts against the drm-intel-fixes tree. The mailbox tree lost its build failure. The phy-next tree gained a build failure, so I used the version from next-20170329. The vhost tree gained a build failure, so I used the version from next-20170329. Non-merge commits (relative to Linus' tree): 5728 6122 files changed, 441716 insertions(+), 108564 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 255 trees (counting Linus' and 37 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (806276b7f07a Merge branch 'for-linus' of git://git.kernel.dk/linux-block) Merging fixes/master (97da3854c526 Linux 4.11-rc3) Merging kbuild-current/fixes (9be3213b14d4 gconfig: remove misleading parentheses around a condition) Merging arc-current/for-curr (ae9955aeb8e4 ARC: vdk: Fix support of UIO) Merging arm-current/fixes (a1016e94cce9 ARM: wire up statx syscall) Merging m68k-current/for-linus (e3b1ebd67387 m68k: Wire up statx) Merging metag-fixes/fixes (35d04077ad96 metag: Only define atomic_dec_if_positive conditionally) Merging powerpc-fixes/fixes (cc638a488a52 gcc-plugins: update architecture list in documentation) Merging sparc/master (0ae2d26ffe70 arch/sparc: Avoid DCTI Couples) Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and linking special files) Merging net/master (8f1f7eeb22c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf) Merging ipsec/master (72ef9c4125c7 dccp: fix memory leak during tear-down of unsuccessful connection request) Merging netfilter/master (77c1c03c5b8e netfilter: nfnetlink_queue: fix secctx memory leak) Merging ipvs/master (5371bbf4b295 net: bcmgenet: Do not suspend PHY if Wake-on-LAN is enabled) Merging wireless-drivers/master (6be3b6cce1e2 ath10k: fix incorrect wlan_mac_base in qca6174_regs) Merging mac80211/master (7d65f82954da mac80211: unconditionally start new netdev queues with iTXQ support) Merging sound-current/for-linus (2d7d54002e39 ALSA: seq: Fix race during FIFO resize) Merging pci-current/for-linus (9abb27c7594a PCI: thunder-pem: Add legacy firmware support for Cavium ThunderX host controller) Merging driver-core.current/driver-core-linus (c02ed2e75ef4 Linux 4.11-rc4) Merging tty.current/tty-linus (c02ed2e75ef4 Linux 4.11-rc4) Merging usb.current/usb-linus (a7f12a21f6b3 usb: phy: isp1301: Fix build warning when CONFIG_OF is disabled) Merging usb-gadget-fixes/fixes (25cd9721c2b1 usb: gadget: f_hid: fix: Don't access hidg->req without spinlock held) Merging usb-serial-fixes/usb-linus (c02ed2e75ef4 Linux 4.11-rc4) Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move the lock initialization to core file) Merging phy/fixes (1a09b6a7c10e phy: qcom-usb-hs: Add depends on EXTCON) Merging staging.current/staging-linus (c02ed2e75ef4 Linux 4.11-rc4) Merging char-misc.current/char-misc-linus (c02ed2e75ef4 Linux 4.11-rc4) Merging input-current/for-linus (5659495a7a14 uapi: add missing install of userio.h) Merging crypto-current/master (9df0eb180c20 crypto: xts,lrw - fix out-of-bounds write after kmalloc failure) Merging ide/master (96297aee8bce ide: palm_bk3710: add __initdata to palm_bk3710_port_info) Merging vfio-fixes/for-linus (65b1adeb
linux-next: Tree for Mar 30
Hi all, Changes since 20170329: Undropped tree: xen-tip The vfs tree gained a conflict against Linus' tree. The drm tree gained conflicts against the drm-intel-fixes tree. The mailbox tree lost its build failure. The phy-next tree gained a build failure, so I used the version from next-20170329. The vhost tree gained a build failure, so I used the version from next-20170329. Non-merge commits (relative to Linus' tree): 5728 6122 files changed, 441716 insertions(+), 108564 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 255 trees (counting Linus' and 37 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (806276b7f07a Merge branch 'for-linus' of git://git.kernel.dk/linux-block) Merging fixes/master (97da3854c526 Linux 4.11-rc3) Merging kbuild-current/fixes (9be3213b14d4 gconfig: remove misleading parentheses around a condition) Merging arc-current/for-curr (ae9955aeb8e4 ARC: vdk: Fix support of UIO) Merging arm-current/fixes (a1016e94cce9 ARM: wire up statx syscall) Merging m68k-current/for-linus (e3b1ebd67387 m68k: Wire up statx) Merging metag-fixes/fixes (35d04077ad96 metag: Only define atomic_dec_if_positive conditionally) Merging powerpc-fixes/fixes (cc638a488a52 gcc-plugins: update architecture list in documentation) Merging sparc/master (0ae2d26ffe70 arch/sparc: Avoid DCTI Couples) Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and linking special files) Merging net/master (8f1f7eeb22c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf) Merging ipsec/master (72ef9c4125c7 dccp: fix memory leak during tear-down of unsuccessful connection request) Merging netfilter/master (77c1c03c5b8e netfilter: nfnetlink_queue: fix secctx memory leak) Merging ipvs/master (5371bbf4b295 net: bcmgenet: Do not suspend PHY if Wake-on-LAN is enabled) Merging wireless-drivers/master (6be3b6cce1e2 ath10k: fix incorrect wlan_mac_base in qca6174_regs) Merging mac80211/master (7d65f82954da mac80211: unconditionally start new netdev queues with iTXQ support) Merging sound-current/for-linus (2d7d54002e39 ALSA: seq: Fix race during FIFO resize) Merging pci-current/for-linus (9abb27c7594a PCI: thunder-pem: Add legacy firmware support for Cavium ThunderX host controller) Merging driver-core.current/driver-core-linus (c02ed2e75ef4 Linux 4.11-rc4) Merging tty.current/tty-linus (c02ed2e75ef4 Linux 4.11-rc4) Merging usb.current/usb-linus (a7f12a21f6b3 usb: phy: isp1301: Fix build warning when CONFIG_OF is disabled) Merging usb-gadget-fixes/fixes (25cd9721c2b1 usb: gadget: f_hid: fix: Don't access hidg->req without spinlock held) Merging usb-serial-fixes/usb-linus (c02ed2e75ef4 Linux 4.11-rc4) Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move the lock initialization to core file) Merging phy/fixes (1a09b6a7c10e phy: qcom-usb-hs: Add depends on EXTCON) Merging staging.current/staging-linus (c02ed2e75ef4 Linux 4.11-rc4) Merging char-misc.current/char-misc-linus (c02ed2e75ef4 Linux 4.11-rc4) Merging input-current/for-linus (5659495a7a14 uapi: add missing install of userio.h) Merging crypto-current/master (9df0eb180c20 crypto: xts,lrw - fix out-of-bounds write after kmalloc failure) Merging ide/master (96297aee8bce ide: palm_bk3710: add __initdata to palm_bk3710_port_info) Merging vfio-fixes/for-linus (65b1adeb
Re: [PATCH v8 3/3] printk: fix double printing with earlycon
On (03/28/17 14:56), Petr Mladek wrote: [..] > > > Is it better? If not, I will send a version with console_cmdline_last. > > > > personally I'm fine with the nested loop. the latest version > > "for (last = MAX_CMDLINECONSOLES - 1; last >= 0;..." > > > > is even easier to read. > > The number of elements is bumped on a single location, so there > is not much to synchronize. The old approach was fine because > the for cycles were needed anyway, they started on the 0th element, > and NULL ended arrays are rather common practice. > > But we are searching the array from the end now. Also we use the > for cycle just to get the number here. This is not a common > practice and it makes the code more complicated and strange from > my point of view. I'm fine with either way :) [..] > > neither add_preferred_console() nor __add_preferred_console() have any > > serialization. and I assume that we can call add_preferred_console() > > concurrently, can't we? [..] > If I did not miss anything, it would seem that > __add_preferred_console() are called synchronously > and only during boot by design. thanks. I think you are right. it's console_initcall or __init. -ss
Re: [PATCH v8 3/3] printk: fix double printing with earlycon
On (03/28/17 14:56), Petr Mladek wrote: [..] > > > Is it better? If not, I will send a version with console_cmdline_last. > > > > personally I'm fine with the nested loop. the latest version > > "for (last = MAX_CMDLINECONSOLES - 1; last >= 0;..." > > > > is even easier to read. > > The number of elements is bumped on a single location, so there > is not much to synchronize. The old approach was fine because > the for cycles were needed anyway, they started on the 0th element, > and NULL ended arrays are rather common practice. > > But we are searching the array from the end now. Also we use the > for cycle just to get the number here. This is not a common > practice and it makes the code more complicated and strange from > my point of view. I'm fine with either way :) [..] > > neither add_preferred_console() nor __add_preferred_console() have any > > serialization. and I assume that we can call add_preferred_console() > > concurrently, can't we? [..] > If I did not miss anything, it would seem that > __add_preferred_console() are called synchronously > and only during boot by design. thanks. I think you are right. it's console_initcall or __init. -ss
Re: [PATCH v4 19/23] drivers/fsi: Add GPIO based FSI master
On Thu, Mar 30, 2017 at 4:13 AM, Christopher Bosticwrote: > From: Chris Bostic > > Implement a FSI master using GPIO. Will generate FSI protocol for > read and write commands to particular addresses. Sends master command > and waits for and decodes a slave response. > > Includes changes from Edward A. James and Jeremy > Kerr . > > Signed-off-by: Edward A. James > Signed-off-by: Jeremy Kerr > Signed-off-by: Chris Bostic > Signed-off-by: Joel Stanley > --- > arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts | 10 + > arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts | 12 + > drivers/fsi/Kconfig | 11 + > drivers/fsi/Makefile | 1 + > drivers/fsi/fsi-master-gpio.c | 614 > ++ > 5 files changed, 648 insertions(+) > create mode 100644 drivers/fsi/fsi-master-gpio.c > > diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts > b/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts > index 1d2fc1e..cf7d7d7 100644 > --- a/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts > +++ b/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts > @@ -29,6 +29,16 @@ > reg = <0x5f00 0x0100>; /* 16M */ > }; > }; > + > + gpio-fsi { > + compatible = "fsi-master-gpio", "fsi-master"; > + > + clock-gpios = < ASPEED_GPIO(A, 4) GPIO_ACTIVE_HIGH>; > + data-gpios = < ASPEED_GPIO(A, 5) GPIO_ACTIVE_HIGH>; > + mux-gpios = < ASPEED_GPIO(A, 6) GPIO_ACTIVE_HIGH>; > + enable-gpios = < ASPEED_GPIO(D, 0) GPIO_ACTIVE_HIGH>; > + trans-gpios = < ASPEED_GPIO(H, 6) GPIO_ACTIVE_HIGH>; > + }; > }; > > { > diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts > b/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts > index 7a3b2b5..2fd7db7 100644 > --- a/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts > +++ b/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts > @@ -29,6 +29,18 @@ > reg = <0xbf00 0x0100>; /* 16M */ > }; > }; > + > + gpio-fsi { > + compatible = "fsi-master-gpio", "fsi-master"; > + > + status = "okay"; > + > + clock-gpios = < ASPEED_GPIO(AA, 0) GPIO_ACTIVE_HIGH>; > + data-gpios = < ASPEED_GPIO(AA, 2) GPIO_ACTIVE_HIGH>; > + mux-gpios = < ASPEED_GPIO(A, 6) GPIO_ACTIVE_HIGH>; > + enable-gpios = < ASPEED_GPIO(D, 0) GPIO_ACTIVE_HIGH>; > + trans-gpios = < ASPEED_GPIO(R, 2) GPIO_ACTIVE_HIGH>; > + }; > }; I'm not sure what happened here. The changes to the device trees should not be in this patch. Cheers, Joel > > { > diff --git a/drivers/fsi/Kconfig b/drivers/fsi/Kconfig > index 04c1a0e..9cf8345 100644 > --- a/drivers/fsi/Kconfig > +++ b/drivers/fsi/Kconfig > @@ -9,4 +9,15 @@ config FSI > ---help--- > FSI - the FRU Support Interface - is a simple bus for low-level > access to POWER-based hardware. > + > +if FSI > + > +config FSI_MASTER_GPIO > + tristate "GPIO-based FSI master" > + depends on FSI && GPIOLIB > + ---help--- > + This option enables a FSI master driver using GPIO lines. > + > +endif > + > endmenu > diff --git a/drivers/fsi/Makefile b/drivers/fsi/Makefile > index db0e5e7..ed28ac0 100644 > --- a/drivers/fsi/Makefile > +++ b/drivers/fsi/Makefile > @@ -1,2 +1,3 @@ > > obj-$(CONFIG_FSI) += fsi-core.o > +obj-$(CONFIG_FSI_MASTER_GPIO) += fsi-master-gpio.o > diff --git a/drivers/fsi/fsi-master-gpio.c b/drivers/fsi/fsi-master-gpio.c > new file mode 100644 > index 000..0bf5caa > --- /dev/null > +++ b/drivers/fsi/fsi-master-gpio.c > @@ -0,0 +1,614 @@ > +/* > + * A FSI master controller, using a simple GPIO bit-banging interface > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "fsi-master.h" > + > +#defineFSI_GPIO_STD_DLY1 /* Standard pin delay in nS */ > +#defineFSI_ECHO_DELAY_CLOCKS 16 /* Number clocks for echo > delay */ > +#defineFSI_PRE_BREAK_CLOCKS50 /* Number clocks to prep for > break */ > +#defineFSI_BREAK_CLOCKS256 /* Number of clocks to issue > break */ > +#defineFSI_POST_BREAK_CLOCKS 16000 /* Number clocks to set up > cfam */ > +#defineFSI_INIT_CLOCKS 5000/* Clock out any old data */ > +#defineFSI_GPIO_STD_DELAY 10 /* Standard GPIO delay in nS > */ > + /* todo: adjust down as low as */ > + /* possible or eliminate */ > +#defineFSI_GPIO_CMD_DPOLL 0x2 > +#defineFSI_GPIO_CMD_TERM 0x3f > +#define
Re: [PATCH v4 19/23] drivers/fsi: Add GPIO based FSI master
On Thu, Mar 30, 2017 at 4:13 AM, Christopher Bostic wrote: > From: Chris Bostic > > Implement a FSI master using GPIO. Will generate FSI protocol for > read and write commands to particular addresses. Sends master command > and waits for and decodes a slave response. > > Includes changes from Edward A. James and Jeremy > Kerr . > > Signed-off-by: Edward A. James > Signed-off-by: Jeremy Kerr > Signed-off-by: Chris Bostic > Signed-off-by: Joel Stanley > --- > arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts | 10 + > arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts | 12 + > drivers/fsi/Kconfig | 11 + > drivers/fsi/Makefile | 1 + > drivers/fsi/fsi-master-gpio.c | 614 > ++ > 5 files changed, 648 insertions(+) > create mode 100644 drivers/fsi/fsi-master-gpio.c > > diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts > b/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts > index 1d2fc1e..cf7d7d7 100644 > --- a/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts > +++ b/arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dts > @@ -29,6 +29,16 @@ > reg = <0x5f00 0x0100>; /* 16M */ > }; > }; > + > + gpio-fsi { > + compatible = "fsi-master-gpio", "fsi-master"; > + > + clock-gpios = < ASPEED_GPIO(A, 4) GPIO_ACTIVE_HIGH>; > + data-gpios = < ASPEED_GPIO(A, 5) GPIO_ACTIVE_HIGH>; > + mux-gpios = < ASPEED_GPIO(A, 6) GPIO_ACTIVE_HIGH>; > + enable-gpios = < ASPEED_GPIO(D, 0) GPIO_ACTIVE_HIGH>; > + trans-gpios = < ASPEED_GPIO(H, 6) GPIO_ACTIVE_HIGH>; > + }; > }; > > { > diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts > b/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts > index 7a3b2b5..2fd7db7 100644 > --- a/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts > +++ b/arch/arm/boot/dts/aspeed-bmc-opp-romulus.dts > @@ -29,6 +29,18 @@ > reg = <0xbf00 0x0100>; /* 16M */ > }; > }; > + > + gpio-fsi { > + compatible = "fsi-master-gpio", "fsi-master"; > + > + status = "okay"; > + > + clock-gpios = < ASPEED_GPIO(AA, 0) GPIO_ACTIVE_HIGH>; > + data-gpios = < ASPEED_GPIO(AA, 2) GPIO_ACTIVE_HIGH>; > + mux-gpios = < ASPEED_GPIO(A, 6) GPIO_ACTIVE_HIGH>; > + enable-gpios = < ASPEED_GPIO(D, 0) GPIO_ACTIVE_HIGH>; > + trans-gpios = < ASPEED_GPIO(R, 2) GPIO_ACTIVE_HIGH>; > + }; > }; I'm not sure what happened here. The changes to the device trees should not be in this patch. Cheers, Joel > > { > diff --git a/drivers/fsi/Kconfig b/drivers/fsi/Kconfig > index 04c1a0e..9cf8345 100644 > --- a/drivers/fsi/Kconfig > +++ b/drivers/fsi/Kconfig > @@ -9,4 +9,15 @@ config FSI > ---help--- > FSI - the FRU Support Interface - is a simple bus for low-level > access to POWER-based hardware. > + > +if FSI > + > +config FSI_MASTER_GPIO > + tristate "GPIO-based FSI master" > + depends on FSI && GPIOLIB > + ---help--- > + This option enables a FSI master driver using GPIO lines. > + > +endif > + > endmenu > diff --git a/drivers/fsi/Makefile b/drivers/fsi/Makefile > index db0e5e7..ed28ac0 100644 > --- a/drivers/fsi/Makefile > +++ b/drivers/fsi/Makefile > @@ -1,2 +1,3 @@ > > obj-$(CONFIG_FSI) += fsi-core.o > +obj-$(CONFIG_FSI_MASTER_GPIO) += fsi-master-gpio.o > diff --git a/drivers/fsi/fsi-master-gpio.c b/drivers/fsi/fsi-master-gpio.c > new file mode 100644 > index 000..0bf5caa > --- /dev/null > +++ b/drivers/fsi/fsi-master-gpio.c > @@ -0,0 +1,614 @@ > +/* > + * A FSI master controller, using a simple GPIO bit-banging interface > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "fsi-master.h" > + > +#defineFSI_GPIO_STD_DLY1 /* Standard pin delay in nS */ > +#defineFSI_ECHO_DELAY_CLOCKS 16 /* Number clocks for echo > delay */ > +#defineFSI_PRE_BREAK_CLOCKS50 /* Number clocks to prep for > break */ > +#defineFSI_BREAK_CLOCKS256 /* Number of clocks to issue > break */ > +#defineFSI_POST_BREAK_CLOCKS 16000 /* Number clocks to set up > cfam */ > +#defineFSI_INIT_CLOCKS 5000/* Clock out any old data */ > +#defineFSI_GPIO_STD_DELAY 10 /* Standard GPIO delay in nS > */ > + /* todo: adjust down as low as */ > + /* possible or eliminate */ > +#defineFSI_GPIO_CMD_DPOLL 0x2 > +#defineFSI_GPIO_CMD_TERM 0x3f > +#define FSI_GPIO_CMD_ABS_AR0x4 > + > +#defineFSI_GPIO_DPOLL_CLOCKS 100 /* < 21 will cause slave to > hang */ > + > +/* Bus errors */ > +#define
Re: [REGRESSION 4.11] Commit d8514d8edb5b ("ovl: copy up regular file using O_TMPFILE") breaks ubifs
Hi Richard, On Thu, 30 Mar 2017 00:15:31 +0200 Richard Weinbergerwrote: > Ralph, > > Am 29.03.2017 um 23:26 schrieb Ralph Sennhauser: > >> # create and link a tmpfile - then remove it > >> sudo rm -rf foo; sudo xfs_io -T -c "flink foo" . ; ls -l foo; sudo > >> rm foo > > > > next-20170328, obviously without your patch: > > > > # xfs_io -T -c "flink foo" . > > .: Not supported > > -T tells xfs_io to create a tmpfile but you have it disabled in UBIFS. Bash history says I booted the other partition than I flashed (uname just happened to match), the downside of the dual boot layout. :) # rm -rf foo # xfs_io -T -c "flink foo" . # ls -l foo -rw---1 root root 0 Mar 30 02:00 foo # rm foo [ 305.001436] UBIFS error (ubi0:0 pid 2493): ubifs_add_orphan: orphaned twice However, unlike with the overlay setup the filesystem can be mounted just fine without imediatly visible side effects. > > Anyway, can you please test the attached patch? > Amir was right, UBIFS misses a corner case in ubifs_link(). ;-\ > > I think I understand also why we never noticed this in xfstests, > UBIFS prints only a non-fatal error while umounting the filesystem in > this case. But will test tomorrow in more detail, need some sleep now. > > diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c > index 30825d882aa9..95e348a2da29 100644 > --- a/fs/ubifs/dir.c > +++ b/fs/ubifs/dir.c > @@ -748,6 +748,9 @@ static int ubifs_link(struct dentry *old_dentry, > struct inode *dir, goto out_fname; > > lock_2_inodes(dir, inode); > + if (inode->i_nlink == 0) > + ubifs_delete_orphan(c, inode->i_ino); > + > inc_nlink(inode); > ihold(inode); > inode->i_ctime = ubifs_current_time(inode); > > Thanks, > //richard With this patch I'm no longer able to reproduce _this_ issue, however, rename no longer works either: # mv /etc/config/wireless /etc/config/wireless.back mv: can't rename '/etc/config/wireless': Invalid argument # ls /etc/config/wireless /etc/config/wireless # rm /etc/config/wireless # ls /etc/config/wireless ls: /etc/config/wireless: No such file or directory Thanks Ralph
Re: [REGRESSION 4.11] Commit d8514d8edb5b ("ovl: copy up regular file using O_TMPFILE") breaks ubifs
Hi Richard, On Thu, 30 Mar 2017 00:15:31 +0200 Richard Weinberger wrote: > Ralph, > > Am 29.03.2017 um 23:26 schrieb Ralph Sennhauser: > >> # create and link a tmpfile - then remove it > >> sudo rm -rf foo; sudo xfs_io -T -c "flink foo" . ; ls -l foo; sudo > >> rm foo > > > > next-20170328, obviously without your patch: > > > > # xfs_io -T -c "flink foo" . > > .: Not supported > > -T tells xfs_io to create a tmpfile but you have it disabled in UBIFS. Bash history says I booted the other partition than I flashed (uname just happened to match), the downside of the dual boot layout. :) # rm -rf foo # xfs_io -T -c "flink foo" . # ls -l foo -rw---1 root root 0 Mar 30 02:00 foo # rm foo [ 305.001436] UBIFS error (ubi0:0 pid 2493): ubifs_add_orphan: orphaned twice However, unlike with the overlay setup the filesystem can be mounted just fine without imediatly visible side effects. > > Anyway, can you please test the attached patch? > Amir was right, UBIFS misses a corner case in ubifs_link(). ;-\ > > I think I understand also why we never noticed this in xfstests, > UBIFS prints only a non-fatal error while umounting the filesystem in > this case. But will test tomorrow in more detail, need some sleep now. > > diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c > index 30825d882aa9..95e348a2da29 100644 > --- a/fs/ubifs/dir.c > +++ b/fs/ubifs/dir.c > @@ -748,6 +748,9 @@ static int ubifs_link(struct dentry *old_dentry, > struct inode *dir, goto out_fname; > > lock_2_inodes(dir, inode); > + if (inode->i_nlink == 0) > + ubifs_delete_orphan(c, inode->i_ino); > + > inc_nlink(inode); > ihold(inode); > inode->i_ctime = ubifs_current_time(inode); > > Thanks, > //richard With this patch I'm no longer able to reproduce _this_ issue, however, rename no longer works either: # mv /etc/config/wireless /etc/config/wireless.back mv: can't rename '/etc/config/wireless': Invalid argument # ls /etc/config/wireless /etc/config/wireless # rm /etc/config/wireless # ls /etc/config/wireless ls: /etc/config/wireless: No such file or directory Thanks Ralph
[PATCH v2 3/4] zram: make deduplication feature optional
From: Joonsoo KimBenefit of deduplication is dependent on the workload so it's not preferable to always enable. Therefore, make it optional in Kconfig and device param. Default is 'off'. This option will be beneficial for users who use the zram as blockdev and stores build output to it. Signed-off-by: Joonsoo Kim --- Documentation/ABI/testing/sysfs-block-zram | 10 + Documentation/blockdev/zram.txt| 1 + drivers/block/zram/Kconfig | 14 +++ drivers/block/zram/Makefile| 5 ++- drivers/block/zram/zram_dedup.c| 31 ++- drivers/block/zram/zram_dedup.h| 33 +++- drivers/block/zram/zram_drv.c | 62 ++ drivers/block/zram/zram_drv.h | 1 + 8 files changed, 146 insertions(+), 11 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index 451b6d8..3c1945f 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram @@ -90,3 +90,13 @@ Description: device's debugging info useful for kernel developers. Its format is not documented intentionally and may change anytime without any notice. + +What: /sys/block/zram/use_dedup +Date: March 2017 +Contact: Joonsoo Kim +Description: + The use_dedup file is read-write and specifies deduplication + feature is used or not. If enabled, duplicated data is + managed by reference count and will not be stored in memory + twice. Benefit of this feature largely depends on the workload + so keep attention when use. diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 2cdc303..cbbe39b 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -168,6 +168,7 @@ max_comp_streams RWthe number of possible concurrent compress operations comp_algorithmRWshow and change the compression algorithm compact WOtrigger memory compaction debug_statROthis file is used for zram debugging purposes +use_dedupRWshow and set deduplication feature User space is advised to use the following files to read the device statistics. diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index b8ecba6..2f3dd1f 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -13,3 +13,17 @@ config ZRAM disks and maybe many more. See zram.txt for more information. + +config ZRAM_DEDUP + bool "Deduplication support for ZRAM data" + depends on ZRAM + default n + help + Deduplicate ZRAM data to reduce amount of memory consumption. + Advantage largely depends on the workload. In some cases, this + option reduces memory usage to the half. However, if there is no + duplicated data, the amount of memory consumption would be + increased due to additional metadata usage. And, there is + computation time trade-off. Please check the benefit before + enabling this option. Experiment shows the positive effect when + the zram is used as blockdev and is used to store build output. diff --git a/drivers/block/zram/Makefile b/drivers/block/zram/Makefile index 29cb008..1f6fecd 100644 --- a/drivers/block/zram/Makefile +++ b/drivers/block/zram/Makefile @@ -1,3 +1,4 @@ -zram-y := zcomp.o zram_drv.o zram_dedup.o +zram-y := zcomp.o zram_drv.o -obj-$(CONFIG_ZRAM) += zram.o +obj-$(CONFIG_ZRAM) += zram.o +obj-$(CONFIG_ZRAM_DEDUP) += zram_dedup.o diff --git a/drivers/block/zram/zram_dedup.c b/drivers/block/zram/zram_dedup.c index d313fc8..1df1ce1 100644 --- a/drivers/block/zram/zram_dedup.c +++ b/drivers/block/zram/zram_dedup.c @@ -27,6 +27,19 @@ u64 zram_dedup_meta_size(struct zram *zram) return (u64)atomic64_read(>stats.meta_data_size); } +static inline bool zram_dedup_enabled(struct zram_meta *meta) +{ + return meta->hash; +} + +unsigned long zram_dedup_handle(struct zram *zram, struct zram_entry *entry) +{ + if (!zram_dedup_enabled(zram->meta)) + return (unsigned long)entry; + + return entry->handle; +} + static u32 zram_dedup_checksum(unsigned char *mem) { return jhash(mem, PAGE_SIZE, 0); @@ -41,6 +54,9 @@ void zram_dedup_insert(struct zram *zram, struct zram_entry *new, struct rb_node **rb_node, *parent = NULL; struct zram_entry *entry; + if (!zram_dedup_enabled(zram->meta)) + return; + new->checksum = checksum; hash = >hash[checksum % meta->hash_size]; rb_root = >rb_root; @@ -149,6 +165,9 @@ static struct zram_entry *zram_dedup_get(struct zram *zram,
[PATCH v2 4/4] zram: compare all the entries with same checksum for deduplication
From: Joonsoo KimUntil now, we compare just one entry with same checksum when checking duplication since it is the simplest way to implement. However, for the completeness, checking all the entries is better so this patch implement to compare all the entries with same checksum. Since this event would be rare so there would be no performance loss. Signed-off-by: Joonsoo Kim --- drivers/block/zram/zram_dedup.c | 59 + 1 file changed, 48 insertions(+), 11 deletions(-) diff --git a/drivers/block/zram/zram_dedup.c b/drivers/block/zram/zram_dedup.c index 1df1ce1..c4dfd21 100644 --- a/drivers/block/zram/zram_dedup.c +++ b/drivers/block/zram/zram_dedup.c @@ -125,6 +125,51 @@ static unsigned long zram_dedup_put(struct zram *zram, struct zram_meta *meta, return refcount; } +static struct zram_entry *__zram_dedup_get(struct zram *zram, + struct zram_hash *hash, unsigned char *mem, + struct zram_entry *entry) +{ + struct zram_entry *tmp, *prev = NULL; + struct rb_node *rb_node; + + /* find left-most entry with same checksum */ + while ((rb_node = rb_prev(>rb_node))) { + tmp = rb_entry(rb_node, struct zram_entry, rb_node); + if (tmp->checksum != entry->checksum) + break; + + entry = tmp; + } + +again: + entry->refcount++; + atomic64_add(entry->len, >stats.dup_data_size); + spin_unlock(>lock); + + if (prev) + zram_entry_free(zram, zram->meta, prev); + + if (zram_dedup_match(zram, entry, mem)) + return entry; + + spin_lock(>lock); + tmp = NULL; + rb_node = rb_next(>rb_node); + if (rb_node) + tmp = rb_entry(rb_node, struct zram_entry, rb_node); + + if (tmp && (tmp->checksum == entry->checksum)) { + prev = entry; + entry = tmp; + goto again; + } + + spin_unlock(>lock); + zram_entry_free(zram, zram->meta, entry); + + return NULL; +} + static struct zram_entry *zram_dedup_get(struct zram *zram, unsigned char *mem, u32 checksum) { @@ -139,18 +184,10 @@ static struct zram_entry *zram_dedup_get(struct zram *zram, rb_node = hash->rb_root.rb_node; while (rb_node) { entry = rb_entry(rb_node, struct zram_entry, rb_node); - if (checksum == entry->checksum) { - entry->refcount++; - atomic64_add(entry->len, >stats.dup_data_size); - spin_unlock(>lock); - - if (zram_dedup_match(zram, entry, mem)) - return entry; - - zram_entry_free(zram, meta, entry); - return NULL; - } + /* lock will be released in the following function */ + if (checksum == entry->checksum) + return __zram_dedup_get(zram, hash, mem, entry); if (checksum < entry->checksum) rb_node = rb_node->rb_left; -- 2.7.4
Re: [PATCH net-next] virtio_net: don't reset twice on XDP on/off
On 2017年03月30日 04:14, Michael S. Tsirkin wrote: We already do a reset once in remove_vq_common - there appears to be no point in doing another one when we add/remove XDP. Signed-off-by: Michael S. Tsirkin--- drivers/net/virtio_net.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index de42e9a..ed8f548 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1810,7 +1810,6 @@ static int virtnet_reset(struct virtnet_info *vi, int curr_qp, int xdp_qp) virtnet_freeze_down(dev); _remove_vq_common(vi); - dev->config->reset(dev); virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE); virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER); Acked-by: Jason Wang
[PATCH v2 3/4] zram: make deduplication feature optional
From: Joonsoo Kim Benefit of deduplication is dependent on the workload so it's not preferable to always enable. Therefore, make it optional in Kconfig and device param. Default is 'off'. This option will be beneficial for users who use the zram as blockdev and stores build output to it. Signed-off-by: Joonsoo Kim --- Documentation/ABI/testing/sysfs-block-zram | 10 + Documentation/blockdev/zram.txt| 1 + drivers/block/zram/Kconfig | 14 +++ drivers/block/zram/Makefile| 5 ++- drivers/block/zram/zram_dedup.c| 31 ++- drivers/block/zram/zram_dedup.h| 33 +++- drivers/block/zram/zram_drv.c | 62 ++ drivers/block/zram/zram_drv.h | 1 + 8 files changed, 146 insertions(+), 11 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index 451b6d8..3c1945f 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram @@ -90,3 +90,13 @@ Description: device's debugging info useful for kernel developers. Its format is not documented intentionally and may change anytime without any notice. + +What: /sys/block/zram/use_dedup +Date: March 2017 +Contact: Joonsoo Kim +Description: + The use_dedup file is read-write and specifies deduplication + feature is used or not. If enabled, duplicated data is + managed by reference count and will not be stored in memory + twice. Benefit of this feature largely depends on the workload + so keep attention when use. diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 2cdc303..cbbe39b 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -168,6 +168,7 @@ max_comp_streams RWthe number of possible concurrent compress operations comp_algorithmRWshow and change the compression algorithm compact WOtrigger memory compaction debug_statROthis file is used for zram debugging purposes +use_dedupRWshow and set deduplication feature User space is advised to use the following files to read the device statistics. diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index b8ecba6..2f3dd1f 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -13,3 +13,17 @@ config ZRAM disks and maybe many more. See zram.txt for more information. + +config ZRAM_DEDUP + bool "Deduplication support for ZRAM data" + depends on ZRAM + default n + help + Deduplicate ZRAM data to reduce amount of memory consumption. + Advantage largely depends on the workload. In some cases, this + option reduces memory usage to the half. However, if there is no + duplicated data, the amount of memory consumption would be + increased due to additional metadata usage. And, there is + computation time trade-off. Please check the benefit before + enabling this option. Experiment shows the positive effect when + the zram is used as blockdev and is used to store build output. diff --git a/drivers/block/zram/Makefile b/drivers/block/zram/Makefile index 29cb008..1f6fecd 100644 --- a/drivers/block/zram/Makefile +++ b/drivers/block/zram/Makefile @@ -1,3 +1,4 @@ -zram-y := zcomp.o zram_drv.o zram_dedup.o +zram-y := zcomp.o zram_drv.o -obj-$(CONFIG_ZRAM) += zram.o +obj-$(CONFIG_ZRAM) += zram.o +obj-$(CONFIG_ZRAM_DEDUP) += zram_dedup.o diff --git a/drivers/block/zram/zram_dedup.c b/drivers/block/zram/zram_dedup.c index d313fc8..1df1ce1 100644 --- a/drivers/block/zram/zram_dedup.c +++ b/drivers/block/zram/zram_dedup.c @@ -27,6 +27,19 @@ u64 zram_dedup_meta_size(struct zram *zram) return (u64)atomic64_read(>stats.meta_data_size); } +static inline bool zram_dedup_enabled(struct zram_meta *meta) +{ + return meta->hash; +} + +unsigned long zram_dedup_handle(struct zram *zram, struct zram_entry *entry) +{ + if (!zram_dedup_enabled(zram->meta)) + return (unsigned long)entry; + + return entry->handle; +} + static u32 zram_dedup_checksum(unsigned char *mem) { return jhash(mem, PAGE_SIZE, 0); @@ -41,6 +54,9 @@ void zram_dedup_insert(struct zram *zram, struct zram_entry *new, struct rb_node **rb_node, *parent = NULL; struct zram_entry *entry; + if (!zram_dedup_enabled(zram->meta)) + return; + new->checksum = checksum; hash = >hash[checksum % meta->hash_size]; rb_root = >rb_root; @@ -149,6 +165,9 @@ static struct zram_entry *zram_dedup_get(struct zram *zram, struct zram_entry *zram_dedup_find(struct zram *zram, unsigned char *mem,
[PATCH v2 4/4] zram: compare all the entries with same checksum for deduplication
From: Joonsoo Kim Until now, we compare just one entry with same checksum when checking duplication since it is the simplest way to implement. However, for the completeness, checking all the entries is better so this patch implement to compare all the entries with same checksum. Since this event would be rare so there would be no performance loss. Signed-off-by: Joonsoo Kim --- drivers/block/zram/zram_dedup.c | 59 + 1 file changed, 48 insertions(+), 11 deletions(-) diff --git a/drivers/block/zram/zram_dedup.c b/drivers/block/zram/zram_dedup.c index 1df1ce1..c4dfd21 100644 --- a/drivers/block/zram/zram_dedup.c +++ b/drivers/block/zram/zram_dedup.c @@ -125,6 +125,51 @@ static unsigned long zram_dedup_put(struct zram *zram, struct zram_meta *meta, return refcount; } +static struct zram_entry *__zram_dedup_get(struct zram *zram, + struct zram_hash *hash, unsigned char *mem, + struct zram_entry *entry) +{ + struct zram_entry *tmp, *prev = NULL; + struct rb_node *rb_node; + + /* find left-most entry with same checksum */ + while ((rb_node = rb_prev(>rb_node))) { + tmp = rb_entry(rb_node, struct zram_entry, rb_node); + if (tmp->checksum != entry->checksum) + break; + + entry = tmp; + } + +again: + entry->refcount++; + atomic64_add(entry->len, >stats.dup_data_size); + spin_unlock(>lock); + + if (prev) + zram_entry_free(zram, zram->meta, prev); + + if (zram_dedup_match(zram, entry, mem)) + return entry; + + spin_lock(>lock); + tmp = NULL; + rb_node = rb_next(>rb_node); + if (rb_node) + tmp = rb_entry(rb_node, struct zram_entry, rb_node); + + if (tmp && (tmp->checksum == entry->checksum)) { + prev = entry; + entry = tmp; + goto again; + } + + spin_unlock(>lock); + zram_entry_free(zram, zram->meta, entry); + + return NULL; +} + static struct zram_entry *zram_dedup_get(struct zram *zram, unsigned char *mem, u32 checksum) { @@ -139,18 +184,10 @@ static struct zram_entry *zram_dedup_get(struct zram *zram, rb_node = hash->rb_root.rb_node; while (rb_node) { entry = rb_entry(rb_node, struct zram_entry, rb_node); - if (checksum == entry->checksum) { - entry->refcount++; - atomic64_add(entry->len, >stats.dup_data_size); - spin_unlock(>lock); - - if (zram_dedup_match(zram, entry, mem)) - return entry; - - zram_entry_free(zram, meta, entry); - return NULL; - } + /* lock will be released in the following function */ + if (checksum == entry->checksum) + return __zram_dedup_get(zram, hash, mem, entry); if (checksum < entry->checksum) rb_node = rb_node->rb_left; -- 2.7.4
Re: [PATCH net-next] virtio_net: don't reset twice on XDP on/off
On 2017年03月30日 04:14, Michael S. Tsirkin wrote: We already do a reset once in remove_vq_common - there appears to be no point in doing another one when we add/remove XDP. Signed-off-by: Michael S. Tsirkin --- drivers/net/virtio_net.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index de42e9a..ed8f548 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1810,7 +1810,6 @@ static int virtnet_reset(struct virtnet_info *vi, int curr_qp, int xdp_qp) virtnet_freeze_down(dev); _remove_vq_common(vi); - dev->config->reset(dev); virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE); virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER); Acked-by: Jason Wang
[PATCH v2 1/4] zram: introduce zram_entry to prepare dedup functionality
From: Joonsoo KimFollowing patch will implement deduplication functionality in the zram and it requires an indirection layer to manage the life cycle of zsmalloc handle. To prepare that, this patch introduces zram_entry which can be used to manage the life-cycle of zsmalloc handle. Many lines are changed due to rename but core change is just simple introduction about newly data structure. Signed-off-by: Joonsoo Kim --- drivers/block/zram/zram_drv.c | 82 +-- drivers/block/zram/zram_drv.h | 6 +++- 2 files changed, 60 insertions(+), 28 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 0194441..f3949da 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -419,6 +419,32 @@ static DEVICE_ATTR_RO(io_stat); static DEVICE_ATTR_RO(mm_stat); static DEVICE_ATTR_RO(debug_stat); +static struct zram_entry *zram_entry_alloc(struct zram *zram, + unsigned int len, gfp_t flags) +{ + struct zram_meta *meta = zram->meta; + struct zram_entry *entry; + + entry = kzalloc(sizeof(*entry), flags); + if (!entry) + return NULL; + + entry->handle = zs_malloc(meta->mem_pool, len, flags); + if (!entry->handle) { + kfree(entry); + return NULL; + } + + return entry; +} + +static inline void zram_entry_free(struct zram_meta *meta, + struct zram_entry *entry) +{ + zs_free(meta->mem_pool, entry->handle); + kfree(entry); +} + static void zram_meta_free(struct zram_meta *meta, u64 disksize) { size_t num_pages = disksize >> PAGE_SHIFT; @@ -426,15 +452,15 @@ static void zram_meta_free(struct zram_meta *meta, u64 disksize) /* Free all pages that are still in this zram device */ for (index = 0; index < num_pages; index++) { - unsigned long handle = meta->table[index].handle; + struct zram_entry *entry = meta->table[index].entry; /* * No memory is allocated for same element filled pages. * Simply clear same page flag. */ - if (!handle || zram_test_flag(meta, index, ZRAM_SAME)) + if (!entry || zram_test_flag(meta, index, ZRAM_SAME)) continue; - zs_free(meta->mem_pool, handle); + zram_entry_free(meta, entry); } zs_destroy_pool(meta->mem_pool); @@ -479,7 +505,7 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, u64 disksize) static void zram_free_page(struct zram *zram, size_t index) { struct zram_meta *meta = zram->meta; - unsigned long handle = meta->table[index].handle; + struct zram_entry *entry = meta->table[index].entry; /* * No memory is allocated for same element filled pages. @@ -492,16 +518,16 @@ static void zram_free_page(struct zram *zram, size_t index) return; } - if (!handle) + if (!entry) return; - zs_free(meta->mem_pool, handle); + zram_entry_free(meta, entry); atomic64_sub(zram_get_obj_size(meta, index), >stats.compr_data_size); atomic64_dec(>stats.pages_stored); - meta->table[index].handle = 0; + meta->table[index].entry = NULL; zram_set_obj_size(meta, index, 0); } @@ -510,20 +536,20 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index) int ret = 0; unsigned char *cmem; struct zram_meta *meta = zram->meta; - unsigned long handle; + struct zram_entry *entry; unsigned int size; bit_spin_lock(ZRAM_ACCESS, >table[index].value); - handle = meta->table[index].handle; + entry = meta->table[index].entry; size = zram_get_obj_size(meta, index); - if (!handle || zram_test_flag(meta, index, ZRAM_SAME)) { + if (!entry || zram_test_flag(meta, index, ZRAM_SAME)) { bit_spin_unlock(ZRAM_ACCESS, >table[index].value); zram_fill_page(mem, PAGE_SIZE, meta->table[index].element); return 0; } - cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO); + cmem = zs_map_object(meta->mem_pool, entry->handle, ZS_MM_RO); if (size == PAGE_SIZE) { copy_page(mem, cmem); } else { @@ -532,7 +558,7 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index) ret = zcomp_decompress(zstrm, cmem, size, mem); zcomp_stream_put(zram->comp); } - zs_unmap_object(meta->mem_pool, handle); + zs_unmap_object(meta->mem_pool, entry->handle); bit_spin_unlock(ZRAM_ACCESS, >table[index].value); /* Should NEVER happen. Return bio error if it does. */ @@ -554,7
[PATCH v2 2/4] zram: implement deduplication in zram
From: Joonsoo KimThis patch implements deduplication feature in zram. The purpose of this work is naturally to save amount of memory usage by zram. Android is one of the biggest users to use zram as swap and it's really important to save amount of memory usage. There is a paper that reports that duplication ratio of Android's memory content is rather high [1]. And, there is a similar work on zswap that also reports that experiments has shown that around 10-15% of pages stored in zswp are duplicates and deduplicate them provides some benefits [2]. Also, there is a different kind of workload that uses zram as blockdev and store build outputs into it to reduce wear-out problem of real blockdev. In this workload, deduplication hit is very high due to temporary files and intermediate object files. Detailed analysis is on the bottom of this description. Anyway, if we can detect duplicated content and avoid to store duplicated content at different memory space, we can save memory. This patch tries to do that. Implementation is almost simple and intuitive but I should note one thing about implementation detail. To check duplication, this patch uses checksum of the page and collision of this checksum could be possible. There would be many choices to handle this situation but this patch chooses to allow entry with duplicated checksum to be added to the hash, but, not to compare all entries with duplicated checksum when checking duplication. I guess that checksum collision is quite rare event and we don't need to pay any attention to such a case. Therefore, I decided the most simplest way to implement the feature. If there is a different opinion, I can accept and go that way. Following is the result of this patch. Test result #1 (Swap): Android Marshmallow, emulator, x86_64, Backporting to kernel v3.18 orig_data_size: 145297408 compr_data_size: 32408125 mem_used_total: 32276480 dup_data_size: 3188134 meta_data_size: 1444272 Last two metrics added to mm_stat are related to this work. First one, dup_data_size, is amount of saved memory by avoiding to store duplicated page. Later one, meta_data_size, is the amount of data structure to support deduplication. If dup > meta, we can judge that the patch improves memory usage. In Adnroid, we can save 5% of memory usage by this work. Test result #2 (Blockdev): build the kernel and store output to ext4 FS on zram Elapsed time: 249 s mm_stat: 430845952 191014886 196898816 0 196898816 28320 0 0 0 Elapsed time: 250 s mm_stat: 430505984 190971334 148365312 0 148365312 28404 0 47287038 3945792 There is no performance degradation and save 23% memory. Test result #3 (Blockdev): copy android build output dir(out/host) to ext4 FS on zram Elapsed time: out/host: 88 s mm_stat: 8834420736 3658184579 3834208256 0 3834208256 32889 0 0 0 Elapsed time: out/host: 100 s mm_stat: 8832929792 3657329322 2832015360 0 2832015360 32609 0 952568877 80880336 It shows performance degradation roughly 13% and save 24% memory. Maybe, it is due to overhead of calculating checksum and comparison. Test result #4 (Blockdev): copy android build output dir(out/target/common) to ext4 FS on zram Elapsed time: out/host: 203 s mm_stat: 4041678848 2310355010 2346577920 0 2346582016 500 4 0 0 Elapsed time: out/host: 201 s mm_stat: 4041666560 2310488276 1338150912 0 1338150912 476 0 989088794 24564336 Memory is saved by 42% and performance is the same. Even if there is overhead of calculating checksum and comparison, large hit ratio compensate it since hit leads to less compression attempt. I checked the detailed reason of savings on kernel build workload and there are some cases that deduplication happens. 1) *.cmd Build command is usually similar in one directory so content of these file are very similar. In my system, more than 789 lines in fs/ext4/.namei.o.cmd and fs/ext4/.inode.o.cmd are the same in 944 and 938 lines of the file, respectively. 2) intermediate object files built-in.o and temporary object file have the similar contents. More than 50% of fs/ext4/ext4.o is the same with fs/ext4/built-in.o. 3) vmlinux .tmp_vmlinux1 and .tmp_vmlinux2 and arch/x86/boo/compressed/vmlinux.bin have the similar contents. Android test has similar case that some of object files(.class and .so) are similar with another ones. (./host/linux-x86/lib/libartd.so and ./host/linux-x86-lib/libartd-comiler.so) Anyway, benefit seems to be largely dependent on the workload so following patch will make this feature optional. However, this feature can help some usecases so is deserved to be merged. [1]: MemScope: Analyzing Memory Duplication on Android Systems, dl.acm.org/citation.cfm?id=2797023 [2]: zswap: Optimize compressed pool memory utilization, lkml.kernel.org/r/1341407574.7551.1471584870761.JavaMail.weblogic@epwas3p2 Signed-off-by: Joonsoo Kim --- Documentation/blockdev/zram.txt | 2 + drivers/block/zram/Makefile | 2 +-
[PATCH v2 1/4] zram: introduce zram_entry to prepare dedup functionality
From: Joonsoo Kim Following patch will implement deduplication functionality in the zram and it requires an indirection layer to manage the life cycle of zsmalloc handle. To prepare that, this patch introduces zram_entry which can be used to manage the life-cycle of zsmalloc handle. Many lines are changed due to rename but core change is just simple introduction about newly data structure. Signed-off-by: Joonsoo Kim --- drivers/block/zram/zram_drv.c | 82 +-- drivers/block/zram/zram_drv.h | 6 +++- 2 files changed, 60 insertions(+), 28 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 0194441..f3949da 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -419,6 +419,32 @@ static DEVICE_ATTR_RO(io_stat); static DEVICE_ATTR_RO(mm_stat); static DEVICE_ATTR_RO(debug_stat); +static struct zram_entry *zram_entry_alloc(struct zram *zram, + unsigned int len, gfp_t flags) +{ + struct zram_meta *meta = zram->meta; + struct zram_entry *entry; + + entry = kzalloc(sizeof(*entry), flags); + if (!entry) + return NULL; + + entry->handle = zs_malloc(meta->mem_pool, len, flags); + if (!entry->handle) { + kfree(entry); + return NULL; + } + + return entry; +} + +static inline void zram_entry_free(struct zram_meta *meta, + struct zram_entry *entry) +{ + zs_free(meta->mem_pool, entry->handle); + kfree(entry); +} + static void zram_meta_free(struct zram_meta *meta, u64 disksize) { size_t num_pages = disksize >> PAGE_SHIFT; @@ -426,15 +452,15 @@ static void zram_meta_free(struct zram_meta *meta, u64 disksize) /* Free all pages that are still in this zram device */ for (index = 0; index < num_pages; index++) { - unsigned long handle = meta->table[index].handle; + struct zram_entry *entry = meta->table[index].entry; /* * No memory is allocated for same element filled pages. * Simply clear same page flag. */ - if (!handle || zram_test_flag(meta, index, ZRAM_SAME)) + if (!entry || zram_test_flag(meta, index, ZRAM_SAME)) continue; - zs_free(meta->mem_pool, handle); + zram_entry_free(meta, entry); } zs_destroy_pool(meta->mem_pool); @@ -479,7 +505,7 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, u64 disksize) static void zram_free_page(struct zram *zram, size_t index) { struct zram_meta *meta = zram->meta; - unsigned long handle = meta->table[index].handle; + struct zram_entry *entry = meta->table[index].entry; /* * No memory is allocated for same element filled pages. @@ -492,16 +518,16 @@ static void zram_free_page(struct zram *zram, size_t index) return; } - if (!handle) + if (!entry) return; - zs_free(meta->mem_pool, handle); + zram_entry_free(meta, entry); atomic64_sub(zram_get_obj_size(meta, index), >stats.compr_data_size); atomic64_dec(>stats.pages_stored); - meta->table[index].handle = 0; + meta->table[index].entry = NULL; zram_set_obj_size(meta, index, 0); } @@ -510,20 +536,20 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index) int ret = 0; unsigned char *cmem; struct zram_meta *meta = zram->meta; - unsigned long handle; + struct zram_entry *entry; unsigned int size; bit_spin_lock(ZRAM_ACCESS, >table[index].value); - handle = meta->table[index].handle; + entry = meta->table[index].entry; size = zram_get_obj_size(meta, index); - if (!handle || zram_test_flag(meta, index, ZRAM_SAME)) { + if (!entry || zram_test_flag(meta, index, ZRAM_SAME)) { bit_spin_unlock(ZRAM_ACCESS, >table[index].value); zram_fill_page(mem, PAGE_SIZE, meta->table[index].element); return 0; } - cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO); + cmem = zs_map_object(meta->mem_pool, entry->handle, ZS_MM_RO); if (size == PAGE_SIZE) { copy_page(mem, cmem); } else { @@ -532,7 +558,7 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index) ret = zcomp_decompress(zstrm, cmem, size, mem); zcomp_stream_put(zram->comp); } - zs_unmap_object(meta->mem_pool, handle); + zs_unmap_object(meta->mem_pool, entry->handle); bit_spin_unlock(ZRAM_ACCESS, >table[index].value); /* Should NEVER happen. Return bio error if it does. */ @@ -554,7 +580,7 @@ static int zram_bvec_read(struct zram
[PATCH v2 2/4] zram: implement deduplication in zram
From: Joonsoo Kim This patch implements deduplication feature in zram. The purpose of this work is naturally to save amount of memory usage by zram. Android is one of the biggest users to use zram as swap and it's really important to save amount of memory usage. There is a paper that reports that duplication ratio of Android's memory content is rather high [1]. And, there is a similar work on zswap that also reports that experiments has shown that around 10-15% of pages stored in zswp are duplicates and deduplicate them provides some benefits [2]. Also, there is a different kind of workload that uses zram as blockdev and store build outputs into it to reduce wear-out problem of real blockdev. In this workload, deduplication hit is very high due to temporary files and intermediate object files. Detailed analysis is on the bottom of this description. Anyway, if we can detect duplicated content and avoid to store duplicated content at different memory space, we can save memory. This patch tries to do that. Implementation is almost simple and intuitive but I should note one thing about implementation detail. To check duplication, this patch uses checksum of the page and collision of this checksum could be possible. There would be many choices to handle this situation but this patch chooses to allow entry with duplicated checksum to be added to the hash, but, not to compare all entries with duplicated checksum when checking duplication. I guess that checksum collision is quite rare event and we don't need to pay any attention to such a case. Therefore, I decided the most simplest way to implement the feature. If there is a different opinion, I can accept and go that way. Following is the result of this patch. Test result #1 (Swap): Android Marshmallow, emulator, x86_64, Backporting to kernel v3.18 orig_data_size: 145297408 compr_data_size: 32408125 mem_used_total: 32276480 dup_data_size: 3188134 meta_data_size: 1444272 Last two metrics added to mm_stat are related to this work. First one, dup_data_size, is amount of saved memory by avoiding to store duplicated page. Later one, meta_data_size, is the amount of data structure to support deduplication. If dup > meta, we can judge that the patch improves memory usage. In Adnroid, we can save 5% of memory usage by this work. Test result #2 (Blockdev): build the kernel and store output to ext4 FS on zram Elapsed time: 249 s mm_stat: 430845952 191014886 196898816 0 196898816 28320 0 0 0 Elapsed time: 250 s mm_stat: 430505984 190971334 148365312 0 148365312 28404 0 47287038 3945792 There is no performance degradation and save 23% memory. Test result #3 (Blockdev): copy android build output dir(out/host) to ext4 FS on zram Elapsed time: out/host: 88 s mm_stat: 8834420736 3658184579 3834208256 0 3834208256 32889 0 0 0 Elapsed time: out/host: 100 s mm_stat: 8832929792 3657329322 2832015360 0 2832015360 32609 0 952568877 80880336 It shows performance degradation roughly 13% and save 24% memory. Maybe, it is due to overhead of calculating checksum and comparison. Test result #4 (Blockdev): copy android build output dir(out/target/common) to ext4 FS on zram Elapsed time: out/host: 203 s mm_stat: 4041678848 2310355010 2346577920 0 2346582016 500 4 0 0 Elapsed time: out/host: 201 s mm_stat: 4041666560 2310488276 1338150912 0 1338150912 476 0 989088794 24564336 Memory is saved by 42% and performance is the same. Even if there is overhead of calculating checksum and comparison, large hit ratio compensate it since hit leads to less compression attempt. I checked the detailed reason of savings on kernel build workload and there are some cases that deduplication happens. 1) *.cmd Build command is usually similar in one directory so content of these file are very similar. In my system, more than 789 lines in fs/ext4/.namei.o.cmd and fs/ext4/.inode.o.cmd are the same in 944 and 938 lines of the file, respectively. 2) intermediate object files built-in.o and temporary object file have the similar contents. More than 50% of fs/ext4/ext4.o is the same with fs/ext4/built-in.o. 3) vmlinux .tmp_vmlinux1 and .tmp_vmlinux2 and arch/x86/boo/compressed/vmlinux.bin have the similar contents. Android test has similar case that some of object files(.class and .so) are similar with another ones. (./host/linux-x86/lib/libartd.so and ./host/linux-x86-lib/libartd-comiler.so) Anyway, benefit seems to be largely dependent on the workload so following patch will make this feature optional. However, this feature can help some usecases so is deserved to be merged. [1]: MemScope: Analyzing Memory Duplication on Android Systems, dl.acm.org/citation.cfm?id=2797023 [2]: zswap: Optimize compressed pool memory utilization, lkml.kernel.org/r/1341407574.7551.1471584870761.JavaMail.weblogic@epwas3p2 Signed-off-by: Joonsoo Kim --- Documentation/blockdev/zram.txt | 2 + drivers/block/zram/Makefile | 2 +- drivers/block/zram/zram_dedup.c | 222
[PATCH v2 0/4] zram: implement deduplication in zram
From: Joonsoo KimChanges from v1 o reogranize dedup specific functions o support Kconfig on/off o update zram documents o compare all the entries with same checksum (patch #4) This patchset implements deduplication feature in zram. Motivation is to save memory usage by zram. There are detailed description about motivation and experimental results on patch #2 so please refer it. Thanks. Joonsoo Kim (4): zram: introduce zram_entry to prepare dedup functionality zram: implement deduplication in zram zram: make deduplication feature optional zram: compare all the entries with same checksum for deduplication Documentation/ABI/testing/sysfs-block-zram | 10 + Documentation/blockdev/zram.txt| 3 + drivers/block/zram/Kconfig | 14 ++ drivers/block/zram/Makefile| 3 +- drivers/block/zram/zram_dedup.c| 288 + drivers/block/zram/zram_dedup.h| 56 ++ drivers/block/zram/zram_drv.c | 170 + drivers/block/zram/zram_drv.h | 25 ++- 8 files changed, 535 insertions(+), 34 deletions(-) create mode 100644 drivers/block/zram/zram_dedup.c create mode 100644 drivers/block/zram/zram_dedup.h -- 2.7.4
[PATCH v2 0/4] zram: implement deduplication in zram
From: Joonsoo Kim Changes from v1 o reogranize dedup specific functions o support Kconfig on/off o update zram documents o compare all the entries with same checksum (patch #4) This patchset implements deduplication feature in zram. Motivation is to save memory usage by zram. There are detailed description about motivation and experimental results on patch #2 so please refer it. Thanks. Joonsoo Kim (4): zram: introduce zram_entry to prepare dedup functionality zram: implement deduplication in zram zram: make deduplication feature optional zram: compare all the entries with same checksum for deduplication Documentation/ABI/testing/sysfs-block-zram | 10 + Documentation/blockdev/zram.txt| 3 + drivers/block/zram/Kconfig | 14 ++ drivers/block/zram/Makefile| 3 +- drivers/block/zram/zram_dedup.c| 288 + drivers/block/zram/zram_dedup.h| 56 ++ drivers/block/zram/zram_drv.c | 170 + drivers/block/zram/zram_drv.h | 25 ++- 8 files changed, 535 insertions(+), 34 deletions(-) create mode 100644 drivers/block/zram/zram_dedup.c create mode 100644 drivers/block/zram/zram_dedup.h -- 2.7.4
Re: linux-next: build failure after merge of the phy-next tree
Hi, On Thursday 30 March 2017 09:17 AM, Stephen Rothwell wrote: > Hi Kishon, > > After merging the phy-next tree, today's linux-next build (x86_64 > allmodconfig) failed like this: > > drivers/phy/phy-rockchip-inno-usb2.c:1148:3: error: unknown field > 'phy_tuning' specified in initializer >.phy_tuning = rk3328_usb2phy_tuning, >^ > drivers/phy/phy-rockchip-inno-usb2.c:1148:17: error: 'rk3328_usb2phy_tuning' > undeclared here (not in a function) >.phy_tuning = rk3328_usb2phy_tuning, > ^ > > Caused by commit > > ffa0c278e89c ("phy: rockchip-inno-usb2: add support of u2phy for rk3328") > > I have used the phy-next tree from next-20170329 for today. Thanks for reporting this. Will fix it in my tree. -Kishon
Re: linux-next: build failure after merge of the phy-next tree
Hi, On Thursday 30 March 2017 09:17 AM, Stephen Rothwell wrote: > Hi Kishon, > > After merging the phy-next tree, today's linux-next build (x86_64 > allmodconfig) failed like this: > > drivers/phy/phy-rockchip-inno-usb2.c:1148:3: error: unknown field > 'phy_tuning' specified in initializer >.phy_tuning = rk3328_usb2phy_tuning, >^ > drivers/phy/phy-rockchip-inno-usb2.c:1148:17: error: 'rk3328_usb2phy_tuning' > undeclared here (not in a function) >.phy_tuning = rk3328_usb2phy_tuning, > ^ > > Caused by commit > > ffa0c278e89c ("phy: rockchip-inno-usb2: add support of u2phy for rk3328") > > I have used the phy-next tree from next-20170329 for today. Thanks for reporting this. Will fix it in my tree. -Kishon
Re: [PATCH] phy: rockchip-inno-usb2: fix spelling mistake: "connecetd" -> "connected"
On Thursday 23 February 2017 05:00 AM, Colin King wrote: > From: Colin Ian King> > trivial fix to spelling mistake in dev_dbg message, also rejoin > lines to clean up checkpatch warning > > Signed-off-by: Colin Ian King merged, thanks. -Kishon > --- > drivers/phy/phy-rockchip-inno-usb2.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/phy/phy-rockchip-inno-usb2.c > b/drivers/phy/phy-rockchip-inno-usb2.c > index 4ea95c2..761a921 100644 > --- a/drivers/phy/phy-rockchip-inno-usb2.c > +++ b/drivers/phy/phy-rockchip-inno-usb2.c > @@ -554,8 +554,7 @@ static void rockchip_usb2phy_otg_sm_work(struct > work_struct *work) > case USB_CHG_STATE_DETECTED: > switch (rphy->chg_type) { > case POWER_SUPPLY_TYPE_USB: > - dev_dbg(>phy->dev, > - "sdp cable is connecetd\n"); > + dev_dbg(>phy->dev, "sdp cable is > connected\n"); > rockchip_usb2phy_power_on(rport->phy); > rport->state = OTG_STATE_B_PERIPHERAL; > notify_charger = true; > @@ -563,16 +562,14 @@ static void rockchip_usb2phy_otg_sm_work(struct > work_struct *work) > cable = EXTCON_CHG_USB_SDP; > break; > case POWER_SUPPLY_TYPE_USB_DCP: > - dev_dbg(>phy->dev, > - "dcp cable is connecetd\n"); > + dev_dbg(>phy->dev, "dcp cable is > connected\n"); > rockchip_usb2phy_power_off(rport->phy); > notify_charger = true; > sch_work = true; > cable = EXTCON_CHG_USB_DCP; > break; > case POWER_SUPPLY_TYPE_USB_CDP: > - dev_dbg(>phy->dev, > - "cdp cable is connecetd\n"); > + dev_dbg(>phy->dev, "cdp cable is > connected\n"); > rockchip_usb2phy_power_on(rport->phy); > rport->state = OTG_STATE_B_PERIPHERAL; > notify_charger = true; >
Re: [PATCH] phy: rockchip-inno-usb2: fix spelling mistake: "connecetd" -> "connected"
On Thursday 23 February 2017 05:00 AM, Colin King wrote: > From: Colin Ian King > > trivial fix to spelling mistake in dev_dbg message, also rejoin > lines to clean up checkpatch warning > > Signed-off-by: Colin Ian King merged, thanks. -Kishon > --- > drivers/phy/phy-rockchip-inno-usb2.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/phy/phy-rockchip-inno-usb2.c > b/drivers/phy/phy-rockchip-inno-usb2.c > index 4ea95c2..761a921 100644 > --- a/drivers/phy/phy-rockchip-inno-usb2.c > +++ b/drivers/phy/phy-rockchip-inno-usb2.c > @@ -554,8 +554,7 @@ static void rockchip_usb2phy_otg_sm_work(struct > work_struct *work) > case USB_CHG_STATE_DETECTED: > switch (rphy->chg_type) { > case POWER_SUPPLY_TYPE_USB: > - dev_dbg(>phy->dev, > - "sdp cable is connecetd\n"); > + dev_dbg(>phy->dev, "sdp cable is > connected\n"); > rockchip_usb2phy_power_on(rport->phy); > rport->state = OTG_STATE_B_PERIPHERAL; > notify_charger = true; > @@ -563,16 +562,14 @@ static void rockchip_usb2phy_otg_sm_work(struct > work_struct *work) > cable = EXTCON_CHG_USB_SDP; > break; > case POWER_SUPPLY_TYPE_USB_DCP: > - dev_dbg(>phy->dev, > - "dcp cable is connecetd\n"); > + dev_dbg(>phy->dev, "dcp cable is > connected\n"); > rockchip_usb2phy_power_off(rport->phy); > notify_charger = true; > sch_work = true; > cable = EXTCON_CHG_USB_DCP; > break; > case POWER_SUPPLY_TYPE_USB_CDP: > - dev_dbg(>phy->dev, > - "cdp cable is connecetd\n"); > + dev_dbg(>phy->dev, "cdp cable is > connected\n"); > rockchip_usb2phy_power_on(rport->phy); > rport->state = OTG_STATE_B_PERIPHERAL; > notify_charger = true; >
Re: [PATCH v2 2/2] mfd: axp20c-i2c: Select designware i2c-bus driver on x86
Hi Hans, [auto build test WARNING on ljones-mfd/for-mfd-next] [also build test WARNING on v4.11-rc4 next-20170329] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Hans-de-Goede/mfd-intel_soc_pmic-Select-designware-i2c-bus-driver/20170330-102517 base: https://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git for-mfd-next config: i386-randconfig-x074-201713 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): warning: (MFD_AXP20X_I2C) selects I2C_DESIGNWARE_BAYTRAIL which has unmet direct dependencies (I2C && HAS_IOMEM && ACPI && (I2C_DESIGNWARE_PLATFORM=m && IOSF_MBI || I2C_DESIGNWARE_PLATFORM=y && IOSF_MBI=y)) --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v2 2/2] mfd: axp20c-i2c: Select designware i2c-bus driver on x86
Hi Hans, [auto build test WARNING on ljones-mfd/for-mfd-next] [also build test WARNING on v4.11-rc4 next-20170329] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Hans-de-Goede/mfd-intel_soc_pmic-Select-designware-i2c-bus-driver/20170330-102517 base: https://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git for-mfd-next config: i386-randconfig-x074-201713 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): warning: (MFD_AXP20X_I2C) selects I2C_DESIGNWARE_BAYTRAIL which has unmet direct dependencies (I2C && HAS_IOMEM && ACPI && (I2C_DESIGNWARE_PLATFORM=m && IOSF_MBI || I2C_DESIGNWARE_PLATFORM=y && IOSF_MBI=y)) --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v2 2/2] mfd: axp20c-i2c: Select designware i2c-bus driver on x86
Hi Hans, [auto build test ERROR on ljones-mfd/for-mfd-next] [also build test ERROR on v4.11-rc4 next-20170329] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Hans-de-Goede/mfd-intel_soc_pmic-Select-designware-i2c-bus-driver/20170330-102517 base: https://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git for-mfd-next config: i386-randconfig-r0-201713 (attached as .config) compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): drivers/built-in.o: In function `reset_semaphore': >> i2c-designware-baytrail.c:(.text+0x136c3b): undefined reference to >> `iosf_mbi_read' >> i2c-designware-baytrail.c:(.text+0x136c65): undefined reference to >> `iosf_mbi_write' drivers/built-in.o: In function `baytrail_i2c_acquire': i2c-designware-baytrail.c:(.text+0x136ce6): undefined reference to `iosf_mbi_write' i2c-designware-baytrail.c:(.text+0x136d26): undefined reference to `iosf_mbi_read' i2c-designware-baytrail.c:(.text+0x136da1): undefined reference to `iosf_mbi_read' drivers/built-in.o: In function `i2c_dw_eval_lock_support': >> (.text+0x136f01): undefined reference to `iosf_mbi_available' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v2 2/2] mfd: axp20c-i2c: Select designware i2c-bus driver on x86
Hi Hans, [auto build test ERROR on ljones-mfd/for-mfd-next] [also build test ERROR on v4.11-rc4 next-20170329] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Hans-de-Goede/mfd-intel_soc_pmic-Select-designware-i2c-bus-driver/20170330-102517 base: https://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git for-mfd-next config: i386-randconfig-r0-201713 (attached as .config) compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): drivers/built-in.o: In function `reset_semaphore': >> i2c-designware-baytrail.c:(.text+0x136c3b): undefined reference to >> `iosf_mbi_read' >> i2c-designware-baytrail.c:(.text+0x136c65): undefined reference to >> `iosf_mbi_write' drivers/built-in.o: In function `baytrail_i2c_acquire': i2c-designware-baytrail.c:(.text+0x136ce6): undefined reference to `iosf_mbi_write' i2c-designware-baytrail.c:(.text+0x136d26): undefined reference to `iosf_mbi_read' i2c-designware-baytrail.c:(.text+0x136da1): undefined reference to `iosf_mbi_read' drivers/built-in.o: In function `i2c_dw_eval_lock_support': >> (.text+0x136f01): undefined reference to `iosf_mbi_available' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH] block: do not put mq context in blk_mq_alloc_request_hctx
In blk_mq_alloc_request_hctx, blk_mq_sched_get_request doesn't get sw context so we don't need to put the context with blk_mq_put_ctx. Unless, we will see preempt counter underflow. Cc: Sagi GrimbergCc: Omar Sandoval Cc: Jens Axboe Signed-off-by: Minchan Kim --- Maybe, it would be fixed by someone but I have noticed preempt counter undeflow problem a few weeks ago and still see the problem with linux-next. block/blk-mq.c | 1 - 1 file changed, 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index a4546f060e80..a6f3998dc4ee 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -321,7 +321,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw, rq = blk_mq_sched_get_request(q, NULL, rw, _data); - blk_mq_put_ctx(alloc_data.ctx); blk_queue_exit(q); if (!rq) -- 2.7.4
Re: [PATCH v2] virtio_net: fix support for small rings
On 2017年03月30日 01:42, Michael S. Tsirkin wrote: When ring size is small (<32 entries) making buffers smaller means a full ring might not be able to hold enough buffers to fit a single large packet. Make sure a ring full of buffers is large enough to allow at least one packet of max size. Fixes: 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag allocators") Signed-off-by: Michael S. Tsirkin--- changes from v1: typo fix drivers/net/virtio_net.c | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9dc31dc..e5f6e34 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -29,6 +29,7 @@ #include #include #include +#include This seems unnecessary. static int napi_weight = NAPI_POLL_WEIGHT; module_param(napi_weight, int, 0444); @@ -98,6 +99,9 @@ struct receive_queue { /* RX: fragments + linear part + virtio header */ struct scatterlist sg[MAX_SKB_FRAGS + 2]; + /* Min single buffer size for mergeable buffers case. */ + unsigned int min_buf_len; + /* Name of this receive queue: input.$index */ char name[40]; }; @@ -894,13 +898,14 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq, return err; } -static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len) +static unsigned int get_mergeable_buf_len(struct receive_queue *rq, + struct ewma_pkt_len *avg_pkt_len) { const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf); unsigned int len; len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len), - GOOD_PACKET_LEN, PAGE_SIZE - hdr_len); + rq->min_buf_len - hdr_len, PAGE_SIZE - hdr_len); return ALIGN(len, L1_CACHE_BYTES); } @@ -914,7 +919,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, int err; unsigned int len, hole; - len = get_mergeable_buf_len(>mrg_avg_pkt_len); + len = get_mergeable_buf_len(rq, >mrg_avg_pkt_len); if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp))) return -ENOMEM; @@ -2086,6 +2091,21 @@ static void virtnet_del_vqs(struct virtnet_info *vi) virtnet_free_queues(vi); } +/* How large should a single buffer be so a queue full of these can fit at + * least one full packet? + * Logic below assumes the mergeable buffer header is used. + */ +static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct virtqueue *vq) +{ + const unsigned int hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf); + unsigned int rq_size = virtqueue_get_vring_size(vq); + unsigned int packet_len = vi->big_packets ? IP_MAX_MTU : vi->dev->max_mtu; + unsigned int buf_len = hdr_len + ETH_HLEN + VLAN_HLEN + packet_len; + unsigned int min_buf_len = DIV_ROUND_UP(buf_len, rq_size); + + return max(min_buf_len, hdr_len); +} Consider rq_size may be large, I think this should be max(min_buf_len, GOOD_PACKET_LEN)? + static int virtnet_find_vqs(struct virtnet_info *vi) { vq_callback_t **callbacks; @@ -2151,6 +2171,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi) for (i = 0; i < vi->max_queue_pairs; i++) { vi->rq[i].vq = vqs[rxq2vq(i)]; + vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq); vi->sq[i].vq = vqs[txq2vq(i)]; } @@ -2237,7 +2258,8 @@ static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue, BUG_ON(queue_index >= vi->max_queue_pairs); avg = >rq[queue_index].mrg_avg_pkt_len; - return sprintf(buf, "%u\n", get_mergeable_buf_len(avg)); + return sprintf(buf, "%u\n", + get_mergeable_buf_len(>rq[queue_index], avg)); } static struct rx_queue_attribute mergeable_rx_buffer_size_attribute =
[PATCH] block: do not put mq context in blk_mq_alloc_request_hctx
In blk_mq_alloc_request_hctx, blk_mq_sched_get_request doesn't get sw context so we don't need to put the context with blk_mq_put_ctx. Unless, we will see preempt counter underflow. Cc: Sagi Grimberg Cc: Omar Sandoval Cc: Jens Axboe Signed-off-by: Minchan Kim --- Maybe, it would be fixed by someone but I have noticed preempt counter undeflow problem a few weeks ago and still see the problem with linux-next. block/blk-mq.c | 1 - 1 file changed, 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index a4546f060e80..a6f3998dc4ee 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -321,7 +321,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw, rq = blk_mq_sched_get_request(q, NULL, rw, _data); - blk_mq_put_ctx(alloc_data.ctx); blk_queue_exit(q); if (!rq) -- 2.7.4
Re: [PATCH v2] virtio_net: fix support for small rings
On 2017年03月30日 01:42, Michael S. Tsirkin wrote: When ring size is small (<32 entries) making buffers smaller means a full ring might not be able to hold enough buffers to fit a single large packet. Make sure a ring full of buffers is large enough to allow at least one packet of max size. Fixes: 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag allocators") Signed-off-by: Michael S. Tsirkin --- changes from v1: typo fix drivers/net/virtio_net.c | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9dc31dc..e5f6e34 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -29,6 +29,7 @@ #include #include #include +#include This seems unnecessary. static int napi_weight = NAPI_POLL_WEIGHT; module_param(napi_weight, int, 0444); @@ -98,6 +99,9 @@ struct receive_queue { /* RX: fragments + linear part + virtio header */ struct scatterlist sg[MAX_SKB_FRAGS + 2]; + /* Min single buffer size for mergeable buffers case. */ + unsigned int min_buf_len; + /* Name of this receive queue: input.$index */ char name[40]; }; @@ -894,13 +898,14 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq, return err; } -static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len) +static unsigned int get_mergeable_buf_len(struct receive_queue *rq, + struct ewma_pkt_len *avg_pkt_len) { const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf); unsigned int len; len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len), - GOOD_PACKET_LEN, PAGE_SIZE - hdr_len); + rq->min_buf_len - hdr_len, PAGE_SIZE - hdr_len); return ALIGN(len, L1_CACHE_BYTES); } @@ -914,7 +919,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, int err; unsigned int len, hole; - len = get_mergeable_buf_len(>mrg_avg_pkt_len); + len = get_mergeable_buf_len(rq, >mrg_avg_pkt_len); if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp))) return -ENOMEM; @@ -2086,6 +2091,21 @@ static void virtnet_del_vqs(struct virtnet_info *vi) virtnet_free_queues(vi); } +/* How large should a single buffer be so a queue full of these can fit at + * least one full packet? + * Logic below assumes the mergeable buffer header is used. + */ +static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct virtqueue *vq) +{ + const unsigned int hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf); + unsigned int rq_size = virtqueue_get_vring_size(vq); + unsigned int packet_len = vi->big_packets ? IP_MAX_MTU : vi->dev->max_mtu; + unsigned int buf_len = hdr_len + ETH_HLEN + VLAN_HLEN + packet_len; + unsigned int min_buf_len = DIV_ROUND_UP(buf_len, rq_size); + + return max(min_buf_len, hdr_len); +} Consider rq_size may be large, I think this should be max(min_buf_len, GOOD_PACKET_LEN)? + static int virtnet_find_vqs(struct virtnet_info *vi) { vq_callback_t **callbacks; @@ -2151,6 +2171,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi) for (i = 0; i < vi->max_queue_pairs; i++) { vi->rq[i].vq = vqs[rxq2vq(i)]; + vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq); vi->sq[i].vq = vqs[txq2vq(i)]; } @@ -2237,7 +2258,8 @@ static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue, BUG_ON(queue_index >= vi->max_queue_pairs); avg = >rq[queue_index].mrg_avg_pkt_len; - return sprintf(buf, "%u\n", get_mergeable_buf_len(avg)); + return sprintf(buf, "%u\n", + get_mergeable_buf_len(>rq[queue_index], avg)); } static struct rx_queue_attribute mergeable_rx_buffer_size_attribute =
Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
On Wed, 2017-03-29 at 23:55 +0300, Stas Sergeev wrote: > 29.03.2017 07:38, Ricardo Neri пишет: > >> Probably you could also remove > >> the sldt and str emulation for protected mode, because, > >> as I understand from this thread, wine does not > >> need those. > > I see. I would lean on keeping the emulation because I already > > implemented it :), for completeness, and because it is performed in a > > single switch. The bulk of the emulation code deals with operands. > But this is not for free. > As Andy said, you will then need a syscall and > a feature mask to be able to disable this emulation. > And AFAIK you haven't implemented that yet, so > there is something to consider. Right, I see your point. > You know the wine's > requirements now - they are very small. And > dosemu doesn't need anything at all but smsw. > And even smsw is very rare. > >>> But emulation is still needed for SMSW, right? > >> Likely so. > >> If you want, I can enable the logging of this command > >> and see if it is used by some of the DOS programs I have. > > It would be great if you could do that, if you don't mind. > OK, scheduled to the week-end. > I'll let you know. Thanks! > > >> But at least dosemu implements it, so probably it is needed. > > Right. > > > >> Of course if it is used by one of 100 DOS progs, then there > >> is an option to just add its support to dosemu2 and pretend > >> the compatibility problems did not exist. :) > > Do you mean relaying the GP fault to dosemu instead of trapping it and > > emulating it in the kernel? > Yes, that would be optimal if this does not severely break > the current setups. If we can find out that smsw is not in > the real use, we can probably do exactly that. > But other > instructions are not in real use in v86 for sure, so I > wouldn't be adding the explicit test-cases to the kernel > that will make you depend on some particular behaviour > that no one may need. > My objection was that we shouldn't > write tests before we know exactly how we want this to work. OK, if only SMSW is used then I'll keep the emulation for SMSW only.
Re: [v6 PATCH 00/21] x86: Enable User-Mode Instruction Prevention
On Wed, 2017-03-29 at 23:55 +0300, Stas Sergeev wrote: > 29.03.2017 07:38, Ricardo Neri пишет: > >> Probably you could also remove > >> the sldt and str emulation for protected mode, because, > >> as I understand from this thread, wine does not > >> need those. > > I see. I would lean on keeping the emulation because I already > > implemented it :), for completeness, and because it is performed in a > > single switch. The bulk of the emulation code deals with operands. > But this is not for free. > As Andy said, you will then need a syscall and > a feature mask to be able to disable this emulation. > And AFAIK you haven't implemented that yet, so > there is something to consider. Right, I see your point. > You know the wine's > requirements now - they are very small. And > dosemu doesn't need anything at all but smsw. > And even smsw is very rare. > >>> But emulation is still needed for SMSW, right? > >> Likely so. > >> If you want, I can enable the logging of this command > >> and see if it is used by some of the DOS programs I have. > > It would be great if you could do that, if you don't mind. > OK, scheduled to the week-end. > I'll let you know. Thanks! > > >> But at least dosemu implements it, so probably it is needed. > > Right. > > > >> Of course if it is used by one of 100 DOS progs, then there > >> is an option to just add its support to dosemu2 and pretend > >> the compatibility problems did not exist. :) > > Do you mean relaying the GP fault to dosemu instead of trapping it and > > emulating it in the kernel? > Yes, that would be optimal if this does not severely break > the current setups. If we can find out that smsw is not in > the real use, we can probably do exactly that. > But other > instructions are not in real use in v86 for sure, so I > wouldn't be adding the explicit test-cases to the kernel > that will make you depend on some particular behaviour > that no one may need. > My objection was that we shouldn't > write tests before we know exactly how we want this to work. OK, if only SMSW is used then I'll keep the emulation for SMSW only.
Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling
On Thu, Mar 30, 2017 at 02:26:26PM +1030, Jonathan Woithe wrote: > On Wed, Mar 29, 2017 at 08:36:50PM -0700, Darren Hart wrote: > > On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote: > > > On Wed, Mar 29, 2017 at 10:19 AM, Micha?? K??pie??> > > wrote: > > > > > > > Darren, Andy, in light of the above I will be awaiting your review of > > > > this series. I will submit v2 afterwards, with all remarks from both > > > > you and Jonathan taken into account. > > > > > > Darren marked this series under his name to review, so, I let him to > > > speak for us. > > > > The series looks good to me. Nice work Micha??. They are logically divided > > and > > address issues in a procedural way (so I stopped commenting until I read the > > full series through as a couple of times you addressed a concern from a > > move in > > a cleanup to follow). > > > > I've applied the noted change to 7/8 and will run this through my tests, but > > don't anticipate any problems. Jonathan, if you don't have any additional > > concerns, let me know if I can add your Reviewed-by. > > With the noted change to 7/8 applied I'm happy with the resulting series. > As you noted, there is still some scope for making things more consistent, > especially with regard to error handling. However, that is really a > separate task which can be addressed in a later series. This present series > doesn't impact on this issue in any significant way so it makes sense that > be applied. > > Reviewed-by: Jonathan Woithe Merged, thanks! -- Darren Hart VMware Open Source Technology Center
Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling
On Thu, Mar 30, 2017 at 02:26:26PM +1030, Jonathan Woithe wrote: > On Wed, Mar 29, 2017 at 08:36:50PM -0700, Darren Hart wrote: > > On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote: > > > On Wed, Mar 29, 2017 at 10:19 AM, Micha?? K??pie?? > > > wrote: > > > > > > > Darren, Andy, in light of the above I will be awaiting your review of > > > > this series. I will submit v2 afterwards, with all remarks from both > > > > you and Jonathan taken into account. > > > > > > Darren marked this series under his name to review, so, I let him to > > > speak for us. > > > > The series looks good to me. Nice work Micha??. They are logically divided > > and > > address issues in a procedural way (so I stopped commenting until I read the > > full series through as a couple of times you addressed a concern from a > > move in > > a cleanup to follow). > > > > I've applied the noted change to 7/8 and will run this through my tests, but > > don't anticipate any problems. Jonathan, if you don't have any additional > > concerns, let me know if I can add your Reviewed-by. > > With the noted change to 7/8 applied I'm happy with the resulting series. > As you noted, there is still some scope for making things more consistent, > especially with regard to error handling. However, that is really a > separate task which can be addressed in a later series. This present series > doesn't impact on this issue in any significant way so it makes sense that > be applied. > > Reviewed-by: Jonathan Woithe Merged, thanks! -- Darren Hart VMware Open Source Technology Center
Re: [PATCH v6 0/4] Broadcom SBA RAID support
On Wed, Mar 29, 2017 at 11:35:43AM +0530, Anup Patel wrote: > On Tue, Mar 21, 2017 at 2:48 PM, Vinod Koulwrote: > > On Tue, Mar 21, 2017 at 02:17:21PM +0530, Anup Patel wrote: > >> On Tue, Mar 21, 2017 at 2:00 PM, Vinod Koul wrote: > >> > On Mon, Mar 06, 2017 at 03:13:24PM +0530, Anup Patel wrote: > >> >> The Broadcom SBA RAID is a stream-based device which provides > >> >> RAID5/6 offload. > >> >> > >> >> It requires a SoC specific ring manager (such as Broadcom FlexRM > >> >> ring manager) to provide ring-based programming interface. Due to > >> >> this, the Broadcom SBA RAID driver (mailbox client) implements > >> >> DMA device having one DMA channel using a set of mailbox channels > >> >> provided by Broadcom SoC specific ring manager driver (mailbox > >> >> controller). > >> >> > >> >> The Broadcom SBA RAID hardware requires PQ disk position instead > >> >> of PQ disk coefficient. To address this, we have added raid_gflog > >> >> table which will help driver to convert PQ disk coefficient to PQ > >> >> disk position. > >> >> > >> >> This patchset is based on Linux-4.11-rc1 and depends on patchset > >> >> "[PATCH v5 0/2] Broadcom FlexRM ring manager support" > >> > > >> > Okay I applied and was about to push when I noticed this :( > >> > > >> > So what is the status of this..? > >> > >> PATCH2 is Acked but PATCH1 is under-review. Currently, its > >> v6 of that patchset. > >> > >> The only dependency on that patchset is the changes in > >> brcm-message.h which are required by this BCM-SBA-RAID > >> driver. > >> > >> @Jassi, > >> Can you please have a look at PATCH v6? > > > > And I would need an immutable branch/tag once merged. I am going to keep > > this series pending till then. > > The Broadcom FlexRM patchset is pickedup by Jassi and > can be found in mailbox-for-next branch of > git://git.linaro.org/landing-teams/working/fujitsu/integration > > Both patchset (Broadcom FlexRM patchset and this one) are > also available in sba-raid-v7 branch of > https://github.com/Broadcom/arm64-linux.git Jassi, Can you provide an immutable branch/tag please for this, latter is preferred. Btw didn't find your tree in MAINTAINERS.. > > Regards, > Anup -- ~Vinod
Re: [PATCH v6 0/4] Broadcom SBA RAID support
On Wed, Mar 29, 2017 at 11:35:43AM +0530, Anup Patel wrote: > On Tue, Mar 21, 2017 at 2:48 PM, Vinod Koul wrote: > > On Tue, Mar 21, 2017 at 02:17:21PM +0530, Anup Patel wrote: > >> On Tue, Mar 21, 2017 at 2:00 PM, Vinod Koul wrote: > >> > On Mon, Mar 06, 2017 at 03:13:24PM +0530, Anup Patel wrote: > >> >> The Broadcom SBA RAID is a stream-based device which provides > >> >> RAID5/6 offload. > >> >> > >> >> It requires a SoC specific ring manager (such as Broadcom FlexRM > >> >> ring manager) to provide ring-based programming interface. Due to > >> >> this, the Broadcom SBA RAID driver (mailbox client) implements > >> >> DMA device having one DMA channel using a set of mailbox channels > >> >> provided by Broadcom SoC specific ring manager driver (mailbox > >> >> controller). > >> >> > >> >> The Broadcom SBA RAID hardware requires PQ disk position instead > >> >> of PQ disk coefficient. To address this, we have added raid_gflog > >> >> table which will help driver to convert PQ disk coefficient to PQ > >> >> disk position. > >> >> > >> >> This patchset is based on Linux-4.11-rc1 and depends on patchset > >> >> "[PATCH v5 0/2] Broadcom FlexRM ring manager support" > >> > > >> > Okay I applied and was about to push when I noticed this :( > >> > > >> > So what is the status of this..? > >> > >> PATCH2 is Acked but PATCH1 is under-review. Currently, its > >> v6 of that patchset. > >> > >> The only dependency on that patchset is the changes in > >> brcm-message.h which are required by this BCM-SBA-RAID > >> driver. > >> > >> @Jassi, > >> Can you please have a look at PATCH v6? > > > > And I would need an immutable branch/tag once merged. I am going to keep > > this series pending till then. > > The Broadcom FlexRM patchset is pickedup by Jassi and > can be found in mailbox-for-next branch of > git://git.linaro.org/landing-teams/working/fujitsu/integration > > Both patchset (Broadcom FlexRM patchset and this one) are > also available in sba-raid-v7 branch of > https://github.com/Broadcom/arm64-linux.git Jassi, Can you provide an immutable branch/tag please for this, latter is preferred. Btw didn't find your tree in MAINTAINERS.. > > Regards, > Anup -- ~Vinod
Re: [PATCH v2 0/8] thermal: ti-soc-thermal: Migrate slope/offset data to device tree
On Thu, Mar 30, 2017 at 08:59:31AM +0530, Keerthy wrote: > > > On Wednesday 29 March 2017 10:07 AM, Eduardo Valentin wrote: > > Keerthy, > > > > On Fri, Mar 24, 2017 at 07:26:10AM -0700, Tony Lindgren wrote: > >> * Keerthy[170323 20:29]: > >>> > >>> > >>> On Friday 24 March 2017 02:22 AM, Tony Lindgren wrote: > * Keerthy [170321 20:45]: > > > > > > On Thursday 09 March 2017 01:35 PM, Keerthy wrote: > >> Currently the slope and offset values for calculating the > >> hot spot temperature of a particular thermal zone is part > >> of driver data. Pass them here instead and obtain the values > >> while of node parsing. > >> > >> Tested for the slope and constant values on DRA7-EVM, OMAP3-BEAGLE. > > > > Have you tried on boards that need negative coefficients? > > > > https://patchwork.kernel.org/patch/9619577/ > > Yes. I retrieved the negative values nicely in the driver passed via > Device Tree. > > > > > > > Hi Eduardo, > > > > If the series looks okay could you please pull this? > > Also.. Are the dts changes safe for me to pick separately? > >>> > >>> Yes Tony they are safe to pulled separately. > >> > >> OK applying patches 1 - 5 of this series into omap-for-v4.12/dt-v2. > > > > Keerthy, > > > > The only thing I want you to confirm is if you are really getting the > > negative coefficients, because currently of-thermal reads the array > > using an OF helper that understands only unsigned. For this reason, I > > will be queueing your patches only for next merge window, not as a fix, > > not for rc's. > > I am getting negative co-efficients. OK. That is odd. Might be simply we get still converted after simply assigning the unsigned to the int in the thermal device data struct. Anyways, still not for rc, given the amount of changes in the driver. > > > > > > >> > >> Thanks, > >> > >> Tony
Re: [PATCH v2 0/8] thermal: ti-soc-thermal: Migrate slope/offset data to device tree
On Thu, Mar 30, 2017 at 08:59:31AM +0530, Keerthy wrote: > > > On Wednesday 29 March 2017 10:07 AM, Eduardo Valentin wrote: > > Keerthy, > > > > On Fri, Mar 24, 2017 at 07:26:10AM -0700, Tony Lindgren wrote: > >> * Keerthy [170323 20:29]: > >>> > >>> > >>> On Friday 24 March 2017 02:22 AM, Tony Lindgren wrote: > * Keerthy [170321 20:45]: > > > > > > On Thursday 09 March 2017 01:35 PM, Keerthy wrote: > >> Currently the slope and offset values for calculating the > >> hot spot temperature of a particular thermal zone is part > >> of driver data. Pass them here instead and obtain the values > >> while of node parsing. > >> > >> Tested for the slope and constant values on DRA7-EVM, OMAP3-BEAGLE. > > > > Have you tried on boards that need negative coefficients? > > > > https://patchwork.kernel.org/patch/9619577/ > > Yes. I retrieved the negative values nicely in the driver passed via > Device Tree. > > > > > > > Hi Eduardo, > > > > If the series looks okay could you please pull this? > > Also.. Are the dts changes safe for me to pick separately? > >>> > >>> Yes Tony they are safe to pulled separately. > >> > >> OK applying patches 1 - 5 of this series into omap-for-v4.12/dt-v2. > > > > Keerthy, > > > > The only thing I want you to confirm is if you are really getting the > > negative coefficients, because currently of-thermal reads the array > > using an OF helper that understands only unsigned. For this reason, I > > will be queueing your patches only for next merge window, not as a fix, > > not for rc's. > > I am getting negative co-efficients. OK. That is odd. Might be simply we get still converted after simply assigning the unsigned to the int in the thermal device data struct. Anyways, still not for rc, given the amount of changes in the driver. > > > > > > >> > >> Thanks, > >> > >> Tony
Re: [PATCH -mm -v7 4/9] mm, THP, swap: Add get_huge_swap_page()
Johannes Weinerwrites: > On Tue, Mar 28, 2017 at 01:32:04PM +0800, Huang, Ying wrote: >> @@ -527,6 +527,23 @@ static inline swp_entry_t get_swap_page(void) >> >> #endif /* CONFIG_SWAP */ >> >> +#ifdef CONFIG_THP_SWAP_CLUSTER >> +static inline swp_entry_t get_huge_swap_page(void) >> +{ >> +swp_entry_t entry; >> + >> +if (get_swap_pages(1, , true)) >> +return entry; >> +else >> +return (swp_entry_t) {0}; >> +} >> +#else >> +static inline swp_entry_t get_huge_swap_page(void) >> +{ >> +return (swp_entry_t) {0}; >> +} >> +#endif > > Your introducing a function without a user, making it very hard to > judge whether the API is well-designed for the callers or not. > > I pointed this out as a systemic problem with this patch series in v3, > along with other stuff, but with the way this series is structured I'm > having a hard time seeing whether you implemented my other feedback or > whether your counter arguments to them are justified. > > I cannot review and ack these patches this way. Sorry for inconvenience, I will send a new version to combine the function definition and usage into one patch at least for you to review. But I think we can continue our discussion in the comments your raised so far firstly, what do you think about that? Best Regards, Huang, Ying
Re: [PATCH -mm -v7 4/9] mm, THP, swap: Add get_huge_swap_page()
Johannes Weiner writes: > On Tue, Mar 28, 2017 at 01:32:04PM +0800, Huang, Ying wrote: >> @@ -527,6 +527,23 @@ static inline swp_entry_t get_swap_page(void) >> >> #endif /* CONFIG_SWAP */ >> >> +#ifdef CONFIG_THP_SWAP_CLUSTER >> +static inline swp_entry_t get_huge_swap_page(void) >> +{ >> +swp_entry_t entry; >> + >> +if (get_swap_pages(1, , true)) >> +return entry; >> +else >> +return (swp_entry_t) {0}; >> +} >> +#else >> +static inline swp_entry_t get_huge_swap_page(void) >> +{ >> +return (swp_entry_t) {0}; >> +} >> +#endif > > Your introducing a function without a user, making it very hard to > judge whether the API is well-designed for the callers or not. > > I pointed this out as a systemic problem with this patch series in v3, > along with other stuff, but with the way this series is structured I'm > having a hard time seeing whether you implemented my other feedback or > whether your counter arguments to them are justified. > > I cannot review and ack these patches this way. Sorry for inconvenience, I will send a new version to combine the function definition and usage into one patch at least for you to review. But I think we can continue our discussion in the comments your raised so far firstly, what do you think about that? Best Regards, Huang, Ying
Re: [BUG nohz]: wrong user and system time accounting
On Wed, 2017-03-29 at 16:08 -0400, Rik van Riel wrote: > In other words, the tick on cpu0 is aligned > with the tick on the nohz_full cpus, and > jiffies is advanced while the nohz_full cpus > with an active tick happen to be in kernel > mode? You really want skew_tick=1, especially on big boxen. > Frederic, can you think of any reason why > the tick on nohz_full CPUs would end up aligned > with the tick on cpu0, instead of running at some > random offset? (I or low rq->clock bits as crude NOHZ collision avoidance) > A random offset, or better yet a somewhat randomized > tick length to make sure that simultaneous ticks are > fairly rare and the vtime sampling does not end up > "in phase" with the jiffies incrementing, could make > the accounting work right again. That improves jitter, especially on big boxen. I have an 8 socket box that thinks it's an extra large PC, there, collision avoidance matters hugely. I couldn't reproduce bean counting woes, no idea if collision avoidance will help that. -Mike
Re: [BUG nohz]: wrong user and system time accounting
On Wed, 2017-03-29 at 16:08 -0400, Rik van Riel wrote: > In other words, the tick on cpu0 is aligned > with the tick on the nohz_full cpus, and > jiffies is advanced while the nohz_full cpus > with an active tick happen to be in kernel > mode? You really want skew_tick=1, especially on big boxen. > Frederic, can you think of any reason why > the tick on nohz_full CPUs would end up aligned > with the tick on cpu0, instead of running at some > random offset? (I or low rq->clock bits as crude NOHZ collision avoidance) > A random offset, or better yet a somewhat randomized > tick length to make sure that simultaneous ticks are > fairly rare and the vtime sampling does not end up > "in phase" with the jiffies incrementing, could make > the accounting work right again. That improves jitter, especially on big boxen. I have an 8 socket box that thinks it's an extra large PC, there, collision avoidance matters hugely. I couldn't reproduce bean counting woes, no idea if collision avoidance will help that. -Mike
linux-next: build failure after merge of the vhost tree
Hi Michael, After merging the vhost tree, today's linux-next build (arm multi_v7_defconfig) failed like this: drivers/remoteproc/remoteproc_virtio.c: In function 'rproc_virtio_find_vqs': drivers/remoteproc/remoteproc_virtio.c:148:9: error: 'ctx' undeclared (first use in this function) ctx ? ctx[i] : false); ^ Caused by commit a965e977a103 ("virtio: add context flag to find vqs") I have used the vhost tree from next-20170329 for today. -- Cheers, Stephen Rothwell
linux-next: build failure after merge of the vhost tree
Hi Michael, After merging the vhost tree, today's linux-next build (arm multi_v7_defconfig) failed like this: drivers/remoteproc/remoteproc_virtio.c: In function 'rproc_virtio_find_vqs': drivers/remoteproc/remoteproc_virtio.c:148:9: error: 'ctx' undeclared (first use in this function) ctx ? ctx[i] : false); ^ Caused by commit a965e977a103 ("virtio: add context flag to find vqs") I have used the vhost tree from next-20170329 for today. -- Cheers, Stephen Rothwell
Re: [PATCH -mm -v7 9/9] mm, THP, swap: Delay splitting THP during swap out
Johannes Weinerwrites: > On Tue, Mar 28, 2017 at 01:32:09PM +0800, Huang, Ying wrote: >> @@ -183,12 +184,53 @@ void __delete_from_swap_cache(struct page *page) >> ADD_CACHE_INFO(del_total, nr); >> } >> >> +#ifdef CONFIG_THP_SWAP_CLUSTER >> +int add_to_swap_trans_huge(struct page *page, struct list_head *list) >> +{ >> +swp_entry_t entry; >> +int ret = 0; >> + >> +/* cannot split, which may be needed during swap in, skip it */ >> +if (!can_split_huge_page(page, NULL)) >> +return -EBUSY; >> +/* fallback to split huge page firstly if no PMD map */ >> +if (!compound_mapcount(page)) >> +return 0; >> +entry = get_huge_swap_page(); >> +if (!entry.val) >> +return 0; >> +if (mem_cgroup_try_charge_swap(page, entry, HPAGE_PMD_NR)) { >> +__swapcache_free(entry, true); >> +return -EOVERFLOW; >> +} >> +ret = add_to_swap_cache(page, entry, >> +__GFP_HIGH | __GFP_NOMEMALLOC|__GFP_NOWARN); >> +/* -ENOMEM radix-tree allocation failure */ >> +if (ret) { >> +__swapcache_free(entry, true); >> +return 0; >> +} >> +ret = split_huge_page_to_list(page, list); >> +if (ret) { >> +delete_from_swap_cache(page); >> +return -EBUSY; >> +} >> +return 1; >> +} >> +#else >> +static inline int add_to_swap_trans_huge(struct page *page, >> + struct list_head *list) >> +{ >> +return 0; >> +} >> +#endif >> + >> /** >> * add_to_swap - allocate swap space for a page >> * @page: page we want to move to swap >> * >> * Allocate swap space for the page and add the page to the >> - * swap cache. Caller needs to hold the page lock. >> + * swap cache. Caller needs to hold the page lock. >> */ >> int add_to_swap(struct page *page, struct list_head *list) >> { >> @@ -198,6 +240,18 @@ int add_to_swap(struct page *page, struct list_head >> *list) >> VM_BUG_ON_PAGE(!PageLocked(page), page); >> VM_BUG_ON_PAGE(!PageUptodate(page), page); >> >> +if (unlikely(PageTransHuge(page))) { >> +err = add_to_swap_trans_huge(page, list); >> +switch (err) { >> +case 1: >> +return 1; >> +case 0: >> +/* fallback to split firstly if return 0 */ >> +break; >> +default: >> +return 0; >> +} >> +} >> entry = get_swap_page(); >> if (!entry.val) >> return 0; > > add_to_swap_trans_huge() is too close a copy of add_to_swap(), which > makes the code error prone for future modifications to the swap slot > allocation protocol. > > This should read: > > retry: > entry = get_swap_page(page); > if (!entry.val) { > if (PageTransHuge(page)) { > split_huge_page_to_list(page, list); > goto retry; > } > return 0; > } If the swap space is used up, that is, get_swap_page() cannot allocate even 1 swap entry for a normal page. We will split THP unnecessarily with the change, but in the original code, we just skip the THP. There may be a performance regression here. Similar problem exists for mem_cgroup_try_charge_swap() too. If the mem cgroup exceeds the swap limit, the THP will be split unnecessary with the change too. > And get_swap_page(), mem_cgroup_try_charge_swap() etc. should all > check PageTransHuge() instead of having extra parameters or separate > code paths for the huge page case. > > In general, don't try to tack this feature onto the side of the > VM. Because right now, this looks a bit like the hugetlb code, with > one big branch in the beginning that opens up an alternate > reality. Instead, these functions should handle THP all the way down > the stack, and without passing down redundant information. Yes. We should share the code as much as possible. I just have some questions as above. Could you help me on that? Best Regards, Huang, Ying
Re: [PATCH -mm -v7 9/9] mm, THP, swap: Delay splitting THP during swap out
Johannes Weiner writes: > On Tue, Mar 28, 2017 at 01:32:09PM +0800, Huang, Ying wrote: >> @@ -183,12 +184,53 @@ void __delete_from_swap_cache(struct page *page) >> ADD_CACHE_INFO(del_total, nr); >> } >> >> +#ifdef CONFIG_THP_SWAP_CLUSTER >> +int add_to_swap_trans_huge(struct page *page, struct list_head *list) >> +{ >> +swp_entry_t entry; >> +int ret = 0; >> + >> +/* cannot split, which may be needed during swap in, skip it */ >> +if (!can_split_huge_page(page, NULL)) >> +return -EBUSY; >> +/* fallback to split huge page firstly if no PMD map */ >> +if (!compound_mapcount(page)) >> +return 0; >> +entry = get_huge_swap_page(); >> +if (!entry.val) >> +return 0; >> +if (mem_cgroup_try_charge_swap(page, entry, HPAGE_PMD_NR)) { >> +__swapcache_free(entry, true); >> +return -EOVERFLOW; >> +} >> +ret = add_to_swap_cache(page, entry, >> +__GFP_HIGH | __GFP_NOMEMALLOC|__GFP_NOWARN); >> +/* -ENOMEM radix-tree allocation failure */ >> +if (ret) { >> +__swapcache_free(entry, true); >> +return 0; >> +} >> +ret = split_huge_page_to_list(page, list); >> +if (ret) { >> +delete_from_swap_cache(page); >> +return -EBUSY; >> +} >> +return 1; >> +} >> +#else >> +static inline int add_to_swap_trans_huge(struct page *page, >> + struct list_head *list) >> +{ >> +return 0; >> +} >> +#endif >> + >> /** >> * add_to_swap - allocate swap space for a page >> * @page: page we want to move to swap >> * >> * Allocate swap space for the page and add the page to the >> - * swap cache. Caller needs to hold the page lock. >> + * swap cache. Caller needs to hold the page lock. >> */ >> int add_to_swap(struct page *page, struct list_head *list) >> { >> @@ -198,6 +240,18 @@ int add_to_swap(struct page *page, struct list_head >> *list) >> VM_BUG_ON_PAGE(!PageLocked(page), page); >> VM_BUG_ON_PAGE(!PageUptodate(page), page); >> >> +if (unlikely(PageTransHuge(page))) { >> +err = add_to_swap_trans_huge(page, list); >> +switch (err) { >> +case 1: >> +return 1; >> +case 0: >> +/* fallback to split firstly if return 0 */ >> +break; >> +default: >> +return 0; >> +} >> +} >> entry = get_swap_page(); >> if (!entry.val) >> return 0; > > add_to_swap_trans_huge() is too close a copy of add_to_swap(), which > makes the code error prone for future modifications to the swap slot > allocation protocol. > > This should read: > > retry: > entry = get_swap_page(page); > if (!entry.val) { > if (PageTransHuge(page)) { > split_huge_page_to_list(page, list); > goto retry; > } > return 0; > } If the swap space is used up, that is, get_swap_page() cannot allocate even 1 swap entry for a normal page. We will split THP unnecessarily with the change, but in the original code, we just skip the THP. There may be a performance regression here. Similar problem exists for mem_cgroup_try_charge_swap() too. If the mem cgroup exceeds the swap limit, the THP will be split unnecessary with the change too. > And get_swap_page(), mem_cgroup_try_charge_swap() etc. should all > check PageTransHuge() instead of having extra parameters or separate > code paths for the huge page case. > > In general, don't try to tack this feature onto the side of the > VM. Because right now, this looks a bit like the hugetlb code, with > one big branch in the beginning that opens up an alternate > reality. Instead, these functions should handle THP all the way down > the stack, and without passing down redundant information. Yes. We should share the code as much as possible. I just have some questions as above. Could you help me on that? Best Regards, Huang, Ying
Re: [PATCH v2] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
On Wed, 03/29 22:37, Martin K. Petersen wrote: > Fam Zhengwrites: > > Fam, > > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > > index fcfeddc..a5c7e67 100644 > > --- a/drivers/scsi/sd.c > > +++ b/drivers/scsi/sd.c > > @@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk) > > rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks); > > } else > > rw_max = BLK_DEF_MAX_SECTORS; > > + rw_max = min_not_zero(rw_max, logical_to_sectors(sdp, dev_max)); > > > > /* Combine with controller limits */ > > q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); > > Instead of updating rw_max twice, how about: > > } else > rw_max = min_not_zero(logical_to_sectors(sdp, dev_max), > BLK_DEF_MAX_SECTORS); Yes, it is better. Is it okay to make the change when you apply? Fam
Re: [PATCH v2] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
On Wed, 03/29 22:37, Martin K. Petersen wrote: > Fam Zheng writes: > > Fam, > > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > > index fcfeddc..a5c7e67 100644 > > --- a/drivers/scsi/sd.c > > +++ b/drivers/scsi/sd.c > > @@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk) > > rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks); > > } else > > rw_max = BLK_DEF_MAX_SECTORS; > > + rw_max = min_not_zero(rw_max, logical_to_sectors(sdp, dev_max)); > > > > /* Combine with controller limits */ > > q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); > > Instead of updating rw_max twice, how about: > > } else > rw_max = min_not_zero(logical_to_sectors(sdp, dev_max), > BLK_DEF_MAX_SECTORS); Yes, it is better. Is it okay to make the change when you apply? Fam
Re: [PATCH] zram: factor out partial IO routine
On (03/29/17 16:48), Minchan Kim wrote: > For architecture(PAGE_SIZE > 4K), zram have supported partial IO. > However, the mixed code for handling normal/partial IO is too mess, > error-prone to modify IO handler functions with upcoming feature > so this patch aims for cleaning up via factoring out partial IO > routines to zram_bvec_partial_[read|write] which will be disabled > for most 4K page architecures. > > x86(4K architecure) > add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-664 (-664) > function old new delta > zram_bvec_rw23012039-262 > zram_decompress_page.isra402 --402 > > So, we will save 662 bytes. > > However, as side effect, it will increase binary size in > non-4K architecure but it's not major for zram so I believe > benefit(maintainance, binary size for most architecture) is bigger. a bigger side effect is that now we double the amount of lines we need to change in certain patches and, thus, the amount of work - when we add new functionality/fix something in zram_bvec_{write, read} we also would need to touch zram_bvec_partial_{write, read}. still probably worth it. Reviewed-by: Sergey Senozhatsky-ss
Re: [PATCH] zram: factor out partial IO routine
On (03/29/17 16:48), Minchan Kim wrote: > For architecture(PAGE_SIZE > 4K), zram have supported partial IO. > However, the mixed code for handling normal/partial IO is too mess, > error-prone to modify IO handler functions with upcoming feature > so this patch aims for cleaning up via factoring out partial IO > routines to zram_bvec_partial_[read|write] which will be disabled > for most 4K page architecures. > > x86(4K architecure) > add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-664 (-664) > function old new delta > zram_bvec_rw23012039-262 > zram_decompress_page.isra402 --402 > > So, we will save 662 bytes. > > However, as side effect, it will increase binary size in > non-4K architecure but it's not major for zram so I believe > benefit(maintainance, binary size for most architecture) is bigger. a bigger side effect is that now we double the amount of lines we need to change in certain patches and, thus, the amount of work - when we add new functionality/fix something in zram_bvec_{write, read} we also would need to touch zram_bvec_partial_{write, read}. still probably worth it. Reviewed-by: Sergey Senozhatsky -ss
Re: spin_lock behavior with ARM64 big.Little/HMP
Hi Sudeep, Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata 819472 enabled ? Sorry for bringing this up after the loo-ong delay, but I've been assured that the A53 involved is > r0p1. I've also confirmed this problem on multiple internal platforms, and I'm pretty sure that it occurs on any b.L out there today. Also, we found the same problematic lock design used in the workqueue code in the kernel, causing the same livelock. It's very very rare and requires a perfect set of circumstances. If it would help I can provide a unit test if you folks would be generous enough to test it on the latest Juno or something b.L that's also upstream. Thanks, Vikram
Re: spin_lock behavior with ARM64 big.Little/HMP
Hi Sudeep, Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata 819472 enabled ? Sorry for bringing this up after the loo-ong delay, but I've been assured that the A53 involved is > r0p1. I've also confirmed this problem on multiple internal platforms, and I'm pretty sure that it occurs on any b.L out there today. Also, we found the same problematic lock design used in the workqueue code in the kernel, causing the same livelock. It's very very rare and requires a perfect set of circumstances. If it would help I can provide a unit test if you folks would be generous enough to test it on the latest Juno or something b.L that's also upstream. Thanks, Vikram
Re: [RFC PATCH 0/6] Unify the Interrupt Mode and setup it as soon as possible
On 03/30/17 at 11:09am, Dou Liyang wrote: > > > At 03/30/2017 11:03 AM, Dou Liyang wrote: > > Hi Baoquan, > > > > At 03/30/2017 10:08 AM, Baoquan He wrote: > > > Hi Liyang, > > > > > > This is awesome. I planned to do this after kaslr back porting, glad to > > > see your posting. I like below diagram and the idea of patch 2/6 > > > framework. Will review and see what I can do to help since rhel bug from > > > FJ is assigned to me. > > > > > > > Thanks very much for your join! We have investigated the bug almost > > half a year. :) > > > > In my opinion, > > If we plan to refactor the process of APIC initialization for the bug. > > There must be lots of work need to be done. This patchset is just the > > first step. When I test it, I am thinking about: > > > > 1. The check and logic in each enable and setup LAPIC/IOAPIC functions. > > 2. The process of IRQ remapping. > > 3. The check and init of APIC timer. > > 4. The relationship between the various switches, such as If > > the smp_found_config is 1, the acpi_lapic must be 1. > > > > And following work to me are: > > > > 1. Use more test cases to test. > > 2. learn the IOMMU. > > 3. trace the APIC timer code. > > 4. make the check logic more clear. > > > > Hope to be helpful to you. Thanks for telling, I will also check. > > > > > > And add Joerg to this thread since he knows IOMMU very well. > > > > ahh, > > --cc j...@8bytes.org, not j...@8types.org Yes, indeed. Thanks. > > > oops, Yes, I forgot it, Thanks! > > > > Thanks > > Liyang > > > > > > > > Thanks > > > Baoquan > > > > > > On 03/29/17 at 10:55pm, Dou Liyang wrote: > > > > According to Ingo's and Eric's advice[1,2], Try my best to optimize the > > > > init of Interrupt Mode for x86. > > > > > > > > The MP specification defines three different interrupt modes as follows: > > > > > > > > 1. PIC Mode > > > > 2. Virtual Wire Mode > > > > 3. Symmetic I/O Mode > > > > > > > > Currently, In kernel, > > > > > > > > 1. Setup the Virtual Wire Mode during the IRQ initialization( > > > > step 1 in the following figure). > > > > 2. Enable and Setup the Symmetic I/O Mode either during the > > > > SMP-capabe system prepares CPUs(step 2) or during the UP system > > > > initializes itself(step 3). > > > > > > > > start_kernel > > > > +---+ > > > > | > > > > +--> ... > > > > | > > > > |setup_arch > > > > +--> +---+ > > > > | > > > > |init_IRQ > > > > +-> +--+-+ > > > > | |init_ISA_irqs > > > > | +--> +-++ > > > > | | ++ > > > > +---> +--> | 1.init_bsp_APIC| > > > > | ... ++ > > > > +---> > > > > | rest_init > > > > +--->---+-+ > > > > | | kernel_init > > > > | +> +-+ > > > > | | kernel_init_freeable > > > > | +-> +-+ > > > > | | smp_prepare_cpus > > > > | +---> ++-+ > > > > | | | +---+ > > > > | | +-> |2. apic_bsp_setup | > > > > | | +---+ > > > > | | > > > > v | smp_init > > > > +---> +---++ > > > > |+---+ > > > > +--> |3. apic_bsp_setup | > > > >+---+ > > > > > > > > The purpose of this patchset is Unifing these setup steps and > > > > executing as > > > > soon as possible as follows: > > > > > > > >start_kernel > > > > ---+ > > > > | > > > > | > > > > | > > > > | init_IRQ > > > > +>---++ > > > > || > > > > || ++ > > > > |+> | 4. init_bsp_APIC | > > > > | ++ > > > > v > > > > > > > > By the way, Also fix a bug about kexec[3]. > > > > > > > > > > > > Some doubts, need help: > > > > > > > > 1. Patchset has influence on IOMMU in enable_IR_x2apic(). Not sure > > > > it can be in advance? > > > > > > > > 2. Due to > > > > > > > > Commit 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on > > > > kexec") > > > > > > > > ..., patchset also needs TSC and uses the "cpu_khz" in > > > > setup_local_APIC(). > > > > And a warning[4] will be triggered when crashed/on kexec. Not sure > > > > how to > > > > modify? > > > > > > > > [1]. https://lkml.org/lkml/2016/8/2/929 > > > > [2]. https://lkml.org/lkml/2016/8/1/506 > > > > [3]. https://lkml.org/lkml/2016/7/25/1118 > > > > [4]. WARN_ON(max_loops <= 0) in setup_local_APIC() > > > > > > > > Dou Liyang (6): > > > > x86/apic: Replace init_bsp_APIC() with apic_virture_wire_mode_setup() > > > > x86/apic: Construct a framework for setuping APIC mode as soon as > > > >
Re: [RFC PATCH 0/6] Unify the Interrupt Mode and setup it as soon as possible
On 03/30/17 at 11:09am, Dou Liyang wrote: > > > At 03/30/2017 11:03 AM, Dou Liyang wrote: > > Hi Baoquan, > > > > At 03/30/2017 10:08 AM, Baoquan He wrote: > > > Hi Liyang, > > > > > > This is awesome. I planned to do this after kaslr back porting, glad to > > > see your posting. I like below diagram and the idea of patch 2/6 > > > framework. Will review and see what I can do to help since rhel bug from > > > FJ is assigned to me. > > > > > > > Thanks very much for your join! We have investigated the bug almost > > half a year. :) > > > > In my opinion, > > If we plan to refactor the process of APIC initialization for the bug. > > There must be lots of work need to be done. This patchset is just the > > first step. When I test it, I am thinking about: > > > > 1. The check and logic in each enable and setup LAPIC/IOAPIC functions. > > 2. The process of IRQ remapping. > > 3. The check and init of APIC timer. > > 4. The relationship between the various switches, such as If > > the smp_found_config is 1, the acpi_lapic must be 1. > > > > And following work to me are: > > > > 1. Use more test cases to test. > > 2. learn the IOMMU. > > 3. trace the APIC timer code. > > 4. make the check logic more clear. > > > > Hope to be helpful to you. Thanks for telling, I will also check. > > > > > > And add Joerg to this thread since he knows IOMMU very well. > > > > ahh, > > --cc j...@8bytes.org, not j...@8types.org Yes, indeed. Thanks. > > > oops, Yes, I forgot it, Thanks! > > > > Thanks > > Liyang > > > > > > > > Thanks > > > Baoquan > > > > > > On 03/29/17 at 10:55pm, Dou Liyang wrote: > > > > According to Ingo's and Eric's advice[1,2], Try my best to optimize the > > > > init of Interrupt Mode for x86. > > > > > > > > The MP specification defines three different interrupt modes as follows: > > > > > > > > 1. PIC Mode > > > > 2. Virtual Wire Mode > > > > 3. Symmetic I/O Mode > > > > > > > > Currently, In kernel, > > > > > > > > 1. Setup the Virtual Wire Mode during the IRQ initialization( > > > > step 1 in the following figure). > > > > 2. Enable and Setup the Symmetic I/O Mode either during the > > > > SMP-capabe system prepares CPUs(step 2) or during the UP system > > > > initializes itself(step 3). > > > > > > > > start_kernel > > > > +---+ > > > > | > > > > +--> ... > > > > | > > > > |setup_arch > > > > +--> +---+ > > > > | > > > > |init_IRQ > > > > +-> +--+-+ > > > > | |init_ISA_irqs > > > > | +--> +-++ > > > > | | ++ > > > > +---> +--> | 1.init_bsp_APIC| > > > > | ... ++ > > > > +---> > > > > | rest_init > > > > +--->---+-+ > > > > | | kernel_init > > > > | +> +-+ > > > > | | kernel_init_freeable > > > > | +-> +-+ > > > > | | smp_prepare_cpus > > > > | +---> ++-+ > > > > | | | +---+ > > > > | | +-> |2. apic_bsp_setup | > > > > | | +---+ > > > > | | > > > > v | smp_init > > > > +---> +---++ > > > > |+---+ > > > > +--> |3. apic_bsp_setup | > > > >+---+ > > > > > > > > The purpose of this patchset is Unifing these setup steps and > > > > executing as > > > > soon as possible as follows: > > > > > > > >start_kernel > > > > ---+ > > > > | > > > > | > > > > | > > > > | init_IRQ > > > > +>---++ > > > > || > > > > || ++ > > > > |+> | 4. init_bsp_APIC | > > > > | ++ > > > > v > > > > > > > > By the way, Also fix a bug about kexec[3]. > > > > > > > > > > > > Some doubts, need help: > > > > > > > > 1. Patchset has influence on IOMMU in enable_IR_x2apic(). Not sure > > > > it can be in advance? > > > > > > > > 2. Due to > > > > > > > > Commit 8c3ba8d04924 ("x86, apic: ack all pending irqs when crashed/on > > > > kexec") > > > > > > > > ..., patchset also needs TSC and uses the "cpu_khz" in > > > > setup_local_APIC(). > > > > And a warning[4] will be triggered when crashed/on kexec. Not sure > > > > how to > > > > modify? > > > > > > > > [1]. https://lkml.org/lkml/2016/8/2/929 > > > > [2]. https://lkml.org/lkml/2016/8/1/506 > > > > [3]. https://lkml.org/lkml/2016/7/25/1118 > > > > [4]. WARN_ON(max_loops <= 0) in setup_local_APIC() > > > > > > > > Dou Liyang (6): > > > > x86/apic: Replace init_bsp_APIC() with apic_virture_wire_mode_setup() > > > > x86/apic: Construct a framework for setuping APIC mode as soon as > > > >
Re: [PATCH v9 10/15] ACPI: platform-msi: retrieve dev id from IORT
Hi all: 在 2017/3/30 11:07, Hanjun Guo 写道: > On 03/30/2017 01:32 AM, Lorenzo Pieralisi wrote: >> On Wed, Mar 29, 2017 at 05:13:54PM +0100, Lorenzo Pieralisi wrote: >>> On Wed, Mar 29, 2017 at 03:52:47PM +0100, Marc Zyngier wrote: On 29/03/17 14:00, Hanjun Guo wrote: > On 03/29/2017 08:38 PM, Lorenzo Pieralisi wrote: >> On Wed, Mar 29, 2017 at 07:52:48PM +0800, Hanjun Guo wrote: >>> Hi Lorenzo, >>> >>> On 03/29/2017 06:14 PM, Lorenzo Pieralisi wrote: Hi Hanjun, Marc, On Tue, Mar 07, 2017 at 08:40:05PM +0800, Hanjun Guo wrote: > [...] >drivers/acpi/arm64/iort.c | 24 > >drivers/irqchip/irq-gic-v3-its-platform-msi.c | 3 ++- [...] Thoughts? >>> >>> Perfect for me. Hanjun, I can cherry pick Marc's patch above, rework >>> this patch and post the resulting branch for everyone to have a final >>> test. >> >> git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/linux.git >> acpi/arm64-acpi-4.12 >> >> Please have a look and let me know if that's ok, I planned to send >> a PR to Catalin by the end of the week (first 7 patches up to >> 7fc3061df075 ("ACPI: platform: setup MSI domain for ACPI based platform >> device")). > > Perfect for me too, Lorenzo, Marc, Thank you very much. > > I'm currently in paternity leave and can't reach the machine, > I had a detail review with the patches, they looks good to me, > Ma Jun and Wei Xu will test on Hisilicon machines and give the > feedback. The sas/xge/uart are working fine on my hisilicon board with Lorenzo's branch (arm64-acpi-4.12) Thanks Majun > > Thanks > Hanjun > > . >
Re: [PATCH v9 10/15] ACPI: platform-msi: retrieve dev id from IORT
Hi all: 在 2017/3/30 11:07, Hanjun Guo 写道: > On 03/30/2017 01:32 AM, Lorenzo Pieralisi wrote: >> On Wed, Mar 29, 2017 at 05:13:54PM +0100, Lorenzo Pieralisi wrote: >>> On Wed, Mar 29, 2017 at 03:52:47PM +0100, Marc Zyngier wrote: On 29/03/17 14:00, Hanjun Guo wrote: > On 03/29/2017 08:38 PM, Lorenzo Pieralisi wrote: >> On Wed, Mar 29, 2017 at 07:52:48PM +0800, Hanjun Guo wrote: >>> Hi Lorenzo, >>> >>> On 03/29/2017 06:14 PM, Lorenzo Pieralisi wrote: Hi Hanjun, Marc, On Tue, Mar 07, 2017 at 08:40:05PM +0800, Hanjun Guo wrote: > [...] >drivers/acpi/arm64/iort.c | 24 > >drivers/irqchip/irq-gic-v3-its-platform-msi.c | 3 ++- [...] Thoughts? >>> >>> Perfect for me. Hanjun, I can cherry pick Marc's patch above, rework >>> this patch and post the resulting branch for everyone to have a final >>> test. >> >> git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/linux.git >> acpi/arm64-acpi-4.12 >> >> Please have a look and let me know if that's ok, I planned to send >> a PR to Catalin by the end of the week (first 7 patches up to >> 7fc3061df075 ("ACPI: platform: setup MSI domain for ACPI based platform >> device")). > > Perfect for me too, Lorenzo, Marc, Thank you very much. > > I'm currently in paternity leave and can't reach the machine, > I had a detail review with the patches, they looks good to me, > Ma Jun and Wei Xu will test on Hisilicon machines and give the > feedback. The sas/xge/uart are working fine on my hisilicon board with Lorenzo's branch (arm64-acpi-4.12) Thanks Majun > > Thanks > Hanjun > > . >
Re: [PATCH 4.8 50/96] firmware: fix usermode helper fallback loading
On Fri, Mar 24, 2017 at 04:01:58PM -0400, Ben Gamari wrote: > Greg Kroah-Hartmanwrites: > > > On Fri, Jan 06, 2017 at 10:54:38PM +0100, Yves-Alexis Perez wrote: > >> On Fri, 2017-01-06 at 22:43 +0100, Greg Kroah-Hartman wrote: > >> > 4.8-stable review patch. If anyone has any objections, please let me > >> > know. > >> > >> Hi Greg, > >> > >> Ben Gamari think there was a regression in that patch so I'm adding him to > >> recipients so he can voice concerns if needed. > > > > Given the lack of response, I'm going to assume all is fine :) > > > Oh dear, sorry for the late response; this was stuck in the pergatory of > my inbox. > > It's been a while since I've looked at this, but I believe the alleged > regression in this pastch is the reason I have the attached patch in my > tree. I seem to recall that it was the ath10k driver which triggered the > issue. > > Unfortunately I can't recall which driver was affected by this. I'll > have to see what happens when I revert the attached patch. > > Cheers, > > - Ben > > > > Author: Ben Gamari > Date: Mon Jan 2 00:38:05 2017 -0500 > > firmware_class: Ensure buf is non-NULL in __fw_load_abort > > I have observed that this can be NULL. > > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c > index ac350c518e0c..fd0be24911fc 100644 > --- a/drivers/base/firmware_class.c > +++ b/drivers/base/firmware_class.c > @@ -546,7 +546,8 @@ static void __fw_load_abort(struct firmware_buf *buf) > * There is a small window in which user can write to 'loading' > * between loading done and disappearance of 'loading' > */ > - if (fw_state_is_done(>fw_st)) > + > + if (!buf || fw_state_is_done(>fw_st)) > return; > > list_del_init(>pending_list); We discussed this a while back and went with a more elegant fix, see: commit 191e885a2e130e639bb0c8ee350d7047294f2ce6 Luis
Re: [PATCH 4.8 50/96] firmware: fix usermode helper fallback loading
On Fri, Mar 24, 2017 at 04:01:58PM -0400, Ben Gamari wrote: > Greg Kroah-Hartman writes: > > > On Fri, Jan 06, 2017 at 10:54:38PM +0100, Yves-Alexis Perez wrote: > >> On Fri, 2017-01-06 at 22:43 +0100, Greg Kroah-Hartman wrote: > >> > 4.8-stable review patch. If anyone has any objections, please let me > >> > know. > >> > >> Hi Greg, > >> > >> Ben Gamari think there was a regression in that patch so I'm adding him to > >> recipients so he can voice concerns if needed. > > > > Given the lack of response, I'm going to assume all is fine :) > > > Oh dear, sorry for the late response; this was stuck in the pergatory of > my inbox. > > It's been a while since I've looked at this, but I believe the alleged > regression in this pastch is the reason I have the attached patch in my > tree. I seem to recall that it was the ath10k driver which triggered the > issue. > > Unfortunately I can't recall which driver was affected by this. I'll > have to see what happens when I revert the attached patch. > > Cheers, > > - Ben > > > > Author: Ben Gamari > Date: Mon Jan 2 00:38:05 2017 -0500 > > firmware_class: Ensure buf is non-NULL in __fw_load_abort > > I have observed that this can be NULL. > > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c > index ac350c518e0c..fd0be24911fc 100644 > --- a/drivers/base/firmware_class.c > +++ b/drivers/base/firmware_class.c > @@ -546,7 +546,8 @@ static void __fw_load_abort(struct firmware_buf *buf) > * There is a small window in which user can write to 'loading' > * between loading done and disappearance of 'loading' > */ > - if (fw_state_is_done(>fw_st)) > + > + if (!buf || fw_state_is_done(>fw_st)) > return; > > list_del_init(>pending_list); We discussed this a while back and went with a more elegant fix, see: commit 191e885a2e130e639bb0c8ee350d7047294f2ce6 Luis
Re: [PATCH 1/8] platform/x86: fujitsu-laptop: move backlight input device setup to a separate function
On Wed, Mar 29, 2017 at 12:54:15PM -0700, Darren Hart wrote: > On Mon, Mar 20, 2017 at 10:32:17AM +0100, Micha?? K??pie?? wrote: > > + > > + return error; > > This return path could be cleaned up a bit: > > error = input_register_device(input); > if (error) > input_free_device(input); > > return error; > > But, this driver uses this "error/return 0" pattern pretty consistently, > whereas > most of the kernel uses ret instead of error, and will return ret on success > and > failure, relying on it being 0 in the successful case. Over the whole driver, > we'd save several lines with the conversion and be more consistent with the > rest > of the kernel. But, local consistency is important too. Jonathan, do you have > a > preference for this driver? I have no strong preferences, except to say that clarity is important. As I eluded to a few minutes ago, I agree that there's scope to address error handling and there is a case to be made for bringing it into line with the rest of the kernel. I think this can be addressed in a separate patch series though. The present series under consideration doesn't make the situation any worse (it actually improves it in some places) and introduces worthwhile changes. As such I don't see that we gain anything by delaying it in order to address what is, at the end of the day, a separate concern. Regards jonathan
Re: [PATCH 1/8] platform/x86: fujitsu-laptop: move backlight input device setup to a separate function
On Wed, Mar 29, 2017 at 12:54:15PM -0700, Darren Hart wrote: > On Mon, Mar 20, 2017 at 10:32:17AM +0100, Micha?? K??pie?? wrote: > > + > > + return error; > > This return path could be cleaned up a bit: > > error = input_register_device(input); > if (error) > input_free_device(input); > > return error; > > But, this driver uses this "error/return 0" pattern pretty consistently, > whereas > most of the kernel uses ret instead of error, and will return ret on success > and > failure, relying on it being 0 in the successful case. Over the whole driver, > we'd save several lines with the conversion and be more consistent with the > rest > of the kernel. But, local consistency is important too. Jonathan, do you have > a > preference for this driver? I have no strong preferences, except to say that clarity is important. As I eluded to a few minutes ago, I agree that there's scope to address error handling and there is a case to be made for bringing it into line with the rest of the kernel. I think this can be addressed in a separate patch series though. The present series under consideration doesn't make the situation any worse (it actually improves it in some places) and introduces worthwhile changes. As such I don't see that we gain anything by delaying it in order to address what is, at the end of the day, a separate concern. Regards jonathan
[PATCH] powerpc/prom: Increase RMA size to 512MB
>From 3ae8d1ed31b01b92b172fe20e4560cfbfab135ec Mon Sep 17 00:00:00 2001 From: rootDate: Mon, 27 Mar 2017 19:43:14 -0400 Subject: [PATCH] powerpc/prom: Increase RMA size to 512MB When booting very large systems with a large initrd, we run out of space for either the RTAS or the flattened device tree (FDT). Boot fails with messages like: Could not allocate memory for RTAS or No memory for flatten_device_tree (no room) Increasing the minimum RMA size to 512MB fixes the problem. This should not have an impact on smaller LPARs (with 256MB memory), as the firmware will cap the RMA to the memory assigned to the LPAR. Fix is based on input/discussions with Michael Ellerman. Thanks to Praveen K. Pandey for testing on a large system. Reported-by: Praveen K. Pandey Signed-off-by: Sukadev Bhattiprolu --- arch/powerpc/kernel/prom_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 1c1b44e..dd8a04f 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -815,7 +815,7 @@ struct ibm_arch_vec __cacheline_aligned ibm_architecture_vec = { .virt_base = cpu_to_be32(0x), .virt_size = cpu_to_be32(0x), .load_base = cpu_to_be32(0x), - .min_rma = cpu_to_be32(256),/* 256MB min RMA */ + .min_rma = cpu_to_be32(512),/* 512MB min RMA */ .min_load = cpu_to_be32(0x),/* full client load */ .min_rma_percent = 0, /* min RMA percentage of total RAM */ .max_pft_size = 48, /* max log_2(hash table size) */ -- 1.8.3.1
[PATCH] powerpc/prom: Increase RMA size to 512MB
>From 3ae8d1ed31b01b92b172fe20e4560cfbfab135ec Mon Sep 17 00:00:00 2001 From: root Date: Mon, 27 Mar 2017 19:43:14 -0400 Subject: [PATCH] powerpc/prom: Increase RMA size to 512MB When booting very large systems with a large initrd, we run out of space for either the RTAS or the flattened device tree (FDT). Boot fails with messages like: Could not allocate memory for RTAS or No memory for flatten_device_tree (no room) Increasing the minimum RMA size to 512MB fixes the problem. This should not have an impact on smaller LPARs (with 256MB memory), as the firmware will cap the RMA to the memory assigned to the LPAR. Fix is based on input/discussions with Michael Ellerman. Thanks to Praveen K. Pandey for testing on a large system. Reported-by: Praveen K. Pandey Signed-off-by: Sukadev Bhattiprolu --- arch/powerpc/kernel/prom_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 1c1b44e..dd8a04f 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -815,7 +815,7 @@ struct ibm_arch_vec __cacheline_aligned ibm_architecture_vec = { .virt_base = cpu_to_be32(0x), .virt_size = cpu_to_be32(0x), .load_base = cpu_to_be32(0x), - .min_rma = cpu_to_be32(256),/* 256MB min RMA */ + .min_rma = cpu_to_be32(512),/* 512MB min RMA */ .min_load = cpu_to_be32(0x),/* full client load */ .min_rma_percent = 0, /* min RMA percentage of total RAM */ .max_pft_size = 48, /* max log_2(hash table size) */ -- 1.8.3.1
Re: [PATCH] virtio_net: enable big packets for large MTU values
On 2017年03月29日 20:38, Michael S. Tsirkin wrote: If one enables e.g. jumbo frames without mergeable buffers, packets won't fit in 1500 byte buffers we use. Switch to big packet mode instead. TODO: make sizing more exact, possibly extend small packet mode to use larger pages. Signed-off-by: Michael S. Tsirkin--- drivers/net/virtio_net.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index e0fb3707..9dc31dc 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2428,6 +2428,10 @@ static int virtnet_probe(struct virtio_device *vdev) dev->mtu = mtu; dev->max_mtu = mtu; } + + /* TODO: size buffers correctly in this case. */ + if (dev->mtu > ETH_DATA_LEN) + vi->big_packets = true; } if (vi->any_header_sg) Ok, I think maybe we should fail the probe of small buffer in this case and decrease the max_mtu to ETH_DATA_LEN if small buffer is used (since user are allowed to increase the mtu). Thanks
Re: [PATCH] virtio_net: enable big packets for large MTU values
On 2017年03月29日 20:38, Michael S. Tsirkin wrote: If one enables e.g. jumbo frames without mergeable buffers, packets won't fit in 1500 byte buffers we use. Switch to big packet mode instead. TODO: make sizing more exact, possibly extend small packet mode to use larger pages. Signed-off-by: Michael S. Tsirkin --- drivers/net/virtio_net.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index e0fb3707..9dc31dc 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2428,6 +2428,10 @@ static int virtnet_probe(struct virtio_device *vdev) dev->mtu = mtu; dev->max_mtu = mtu; } + + /* TODO: size buffers correctly in this case. */ + if (dev->mtu > ETH_DATA_LEN) + vi->big_packets = true; } if (vi->any_header_sg) Ok, I think maybe we should fail the probe of small buffer in this case and decrease the max_mtu to ETH_DATA_LEN if small buffer is used (since user are allowed to increase the mtu). Thanks
Re: lockdep warning: console vs. mem hotplug
On (03/28/17 16:22), Michal Hocko wrote: [..] > > Sebastian, does this change make lockdep happy? > > > > it removes console drivers from the __offline_isolated_pages(). not the > > best solution I can think of, but the simplest one. > > > > --- > > > > mm/page_alloc.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index f749b7ff7c50..eb61e6ab5f4f 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, > > unsigned long end_pfn) > > BUG_ON(!PageBuddy(page)); > > order = page_order(page); > > #ifdef CONFIG_DEBUG_VM > > - pr_info("remove from free list %lx %d %lx\n", > > + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n", > > pfn, 1 << order, end_pfn); > > #endif > > list_del(>lru); > > I believe this is not a proper fix. oh, absolutely. I hate it. didn't really propose it as a fix. mostly did it just to verify that there are no other lock inversions behind the one that has been reported (lockdep turns off itself when it detects the first lock dependency inversion). > Although this code is ugly and maybe it doesn't really need zone->lock > because that should be the page allocator internal thing the problem is > that printk shouldn't impose such a subtle dependency on locks. Why does > the timer needs to allocate at all? I believe Petr has answered your questions. sorry for the delay. -ss
Re: lockdep warning: console vs. mem hotplug
On (03/28/17 16:22), Michal Hocko wrote: [..] > > Sebastian, does this change make lockdep happy? > > > > it removes console drivers from the __offline_isolated_pages(). not the > > best solution I can think of, but the simplest one. > > > > --- > > > > mm/page_alloc.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index f749b7ff7c50..eb61e6ab5f4f 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, > > unsigned long end_pfn) > > BUG_ON(!PageBuddy(page)); > > order = page_order(page); > > #ifdef CONFIG_DEBUG_VM > > - pr_info("remove from free list %lx %d %lx\n", > > + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n", > > pfn, 1 << order, end_pfn); > > #endif > > list_del(>lru); > > I believe this is not a proper fix. oh, absolutely. I hate it. didn't really propose it as a fix. mostly did it just to verify that there are no other lock inversions behind the one that has been reported (lockdep turns off itself when it detects the first lock dependency inversion). > Although this code is ugly and maybe it doesn't really need zone->lock > because that should be the page allocator internal thing the problem is > that printk shouldn't impose such a subtle dependency on locks. Why does > the timer needs to allocate at all? I believe Petr has answered your questions. sorry for the delay. -ss
Re: [PATCH] virtio_net: fix mergeable bufs error handling
On 2017年03月29日 20:37, Michael S. Tsirkin wrote: On xdp error we try to free head_skb without having initialized it, that's clearly bogus. Fixes: f600b6905015 ("virtio_net: Add XDP support") Cc: John FastabendSigned-off-by: Michael S. Tsirkin --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 11773d6..e0fb3707 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -570,7 +570,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, u16 num_buf; struct page *page; int offset; - struct sk_buff *head_skb, *curr_skb; + struct sk_buff *head_skb = NULL, *curr_skb; struct bpf_prog *xdp_prog; unsigned int truesize; My tree (net.git HEAD is 8f1f7eeb22c16a197159cf7b35d1350695193ead) has: head_skb = NULL; just after the above codes. Thanks
Re: [PATCH] virtio_net: fix mergeable bufs error handling
On 2017年03月29日 20:37, Michael S. Tsirkin wrote: On xdp error we try to free head_skb without having initialized it, that's clearly bogus. Fixes: f600b6905015 ("virtio_net: Add XDP support") Cc: John Fastabend Signed-off-by: Michael S. Tsirkin --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 11773d6..e0fb3707 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -570,7 +570,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, u16 num_buf; struct page *page; int offset; - struct sk_buff *head_skb, *curr_skb; + struct sk_buff *head_skb = NULL, *curr_skb; struct bpf_prog *xdp_prog; unsigned int truesize; My tree (net.git HEAD is 8f1f7eeb22c16a197159cf7b35d1350695193ead) has: head_skb = NULL; just after the above codes. Thanks
Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling
On Wed, Mar 29, 2017 at 08:36:50PM -0700, Darren Hart wrote: > On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote: > > On Wed, Mar 29, 2017 at 10:19 AM, Micha?? K??pie??> > wrote: > > > > > Darren, Andy, in light of the above I will be awaiting your review of > > > this series. I will submit v2 afterwards, with all remarks from both > > > you and Jonathan taken into account. > > > > Darren marked this series under his name to review, so, I let him to > > speak for us. > > The series looks good to me. Nice work Micha??. They are logically divided and > address issues in a procedural way (so I stopped commenting until I read the > full series through as a couple of times you addressed a concern from a move > in > a cleanup to follow). > > I've applied the noted change to 7/8 and will run this through my tests, but > don't anticipate any problems. Jonathan, if you don't have any additional > concerns, let me know if I can add your Reviewed-by. With the noted change to 7/8 applied I'm happy with the resulting series. As you noted, there is still some scope for making things more consistent, especially with regard to error handling. However, that is really a separate task which can be addressed in a later series. This present series doesn't impact on this issue in any significant way so it makes sense that be applied. Reviewed-by: Jonathan Woithe Regards jonathan
Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling
On Wed, Mar 29, 2017 at 08:36:50PM -0700, Darren Hart wrote: > On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote: > > On Wed, Mar 29, 2017 at 10:19 AM, Micha?? K??pie?? > > wrote: > > > > > Darren, Andy, in light of the above I will be awaiting your review of > > > this series. I will submit v2 afterwards, with all remarks from both > > > you and Jonathan taken into account. > > > > Darren marked this series under his name to review, so, I let him to > > speak for us. > > The series looks good to me. Nice work Micha??. They are logically divided and > address issues in a procedural way (so I stopped commenting until I read the > full series through as a couple of times you addressed a concern from a move > in > a cleanup to follow). > > I've applied the noted change to 7/8 and will run this through my tests, but > don't anticipate any problems. Jonathan, if you don't have any additional > concerns, let me know if I can add your Reviewed-by. With the noted change to 7/8 applied I'm happy with the resulting series. As you noted, there is still some scope for making things more consistent, especially with regard to error handling. However, that is really a separate task which can be addressed in a later series. This present series doesn't impact on this issue in any significant way so it makes sense that be applied. Reviewed-by: Jonathan Woithe Regards jonathan
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, 2017-03-29 at 23:19 +0300, Michael S. Tsirkin wrote: > > > > > > > > > > > >max_nr_ports) == 0) { > > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[ > > > > static unsigned int features[] = { > > > >> > VIRTIO_CONSOLE_F_SIZE, > > +#ifndef CONFIG_IRQ_FORCED_THREADING > > > >> > VIRTIO_CONSOLE_F_MULTIPORT, > > +#endif > > }; > > These look kind of questionable. > Is this part needed? I would have sworn it was, but double checking, nope, it's not. Hm, so I could make a prettier bandaid with a runtime check.. but it'd remain a bandaid, so I'll go do some beans 'n' biscuits work instead. -Mike
Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")
On Wed, 2017-03-29 at 23:19 +0300, Michael S. Tsirkin wrote: > > > > > > > > > > > >max_nr_ports) == 0) { > > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[ > > > > static unsigned int features[] = { > > > >> > VIRTIO_CONSOLE_F_SIZE, > > +#ifndef CONFIG_IRQ_FORCED_THREADING > > > >> > VIRTIO_CONSOLE_F_MULTIPORT, > > +#endif > > }; > > These look kind of questionable. > Is this part needed? I would have sworn it was, but double checking, nope, it's not. Hm, so I could make a prettier bandaid with a runtime check.. but it'd remain a bandaid, so I'll go do some beans 'n' biscuits work instead. -Mike
Re: [PATCH net-next 8/8] vhost_net: use lockless peeking for skb array during busy polling
On 2017年03月30日 10:33, Michael S. Tsirkin wrote: On Thu, Mar 30, 2017 at 10:16:15AM +0800, Jason Wang wrote: On 2017年03月29日 20:07, Michael S. Tsirkin wrote: On Tue, Mar 21, 2017 at 12:04:47PM +0800, Jason Wang wrote: For the socket that exports its skb array, we can use lockless polling to avoid touching spinlock during busy polling. Signed-off-by: Jason Wang--- drivers/vhost/net.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 53f09f2..41153a3 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -551,10 +551,13 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk) return len; } -static int sk_has_rx_data(struct sock *sk) +static int sk_has_rx_data(struct vhost_net_virtqueue *rvq, struct sock *sk) { struct socket *sock = sk->sk_socket; + if (rvq->rx_array) + return !__skb_array_empty(rvq->rx_array); + if (sock->ops->peek_len) return sock->ops->peek_len(sock); I don't see which patch adds __skb_array_empty. This is not something new, it was introduced by ad69f35d1dc0a ("skb_array: array based FIFO for skbs"). Thanks Same comment about a compiler barrier applies then. Ok, rethink about this, since skb_array could be resized, using lockless version seems wrong. For the comment of using compiler barrier, caller (vhost_net_rx_peek_head_len) uses cpu_relax(). But I haven't figured out why a compiler barrier is needed here. Could you please explain? Thanks @@ -579,7 +582,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, endtime = busy_clock() + vq->busyloop_timeout; while (vhost_can_busy_poll(>dev, endtime) && - !sk_has_rx_data(sk) && + !sk_has_rx_data(rvq, sk) && vhost_vq_avail_empty(>dev, vq)) cpu_relax(); -- 2.7.4
Re: [PATCH net-next 8/8] vhost_net: use lockless peeking for skb array during busy polling
On 2017年03月30日 10:33, Michael S. Tsirkin wrote: On Thu, Mar 30, 2017 at 10:16:15AM +0800, Jason Wang wrote: On 2017年03月29日 20:07, Michael S. Tsirkin wrote: On Tue, Mar 21, 2017 at 12:04:47PM +0800, Jason Wang wrote: For the socket that exports its skb array, we can use lockless polling to avoid touching spinlock during busy polling. Signed-off-by: Jason Wang --- drivers/vhost/net.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 53f09f2..41153a3 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -551,10 +551,13 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk) return len; } -static int sk_has_rx_data(struct sock *sk) +static int sk_has_rx_data(struct vhost_net_virtqueue *rvq, struct sock *sk) { struct socket *sock = sk->sk_socket; + if (rvq->rx_array) + return !__skb_array_empty(rvq->rx_array); + if (sock->ops->peek_len) return sock->ops->peek_len(sock); I don't see which patch adds __skb_array_empty. This is not something new, it was introduced by ad69f35d1dc0a ("skb_array: array based FIFO for skbs"). Thanks Same comment about a compiler barrier applies then. Ok, rethink about this, since skb_array could be resized, using lockless version seems wrong. For the comment of using compiler barrier, caller (vhost_net_rx_peek_head_len) uses cpu_relax(). But I haven't figured out why a compiler barrier is needed here. Could you please explain? Thanks @@ -579,7 +582,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, endtime = busy_clock() + vq->busyloop_timeout; while (vhost_can_busy_poll(>dev, endtime) && - !sk_has_rx_data(sk) && + !sk_has_rx_data(rvq, sk) && vhost_vq_avail_empty(>dev, vq)) cpu_relax(); -- 2.7.4
linux-next: build failure after merge of the phy-next tree
Hi Kishon, After merging the phy-next tree, today's linux-next build (x86_64 allmodconfig) failed like this: drivers/phy/phy-rockchip-inno-usb2.c:1148:3: error: unknown field 'phy_tuning' specified in initializer .phy_tuning = rk3328_usb2phy_tuning, ^ drivers/phy/phy-rockchip-inno-usb2.c:1148:17: error: 'rk3328_usb2phy_tuning' undeclared here (not in a function) .phy_tuning = rk3328_usb2phy_tuning, ^ Caused by commit ffa0c278e89c ("phy: rockchip-inno-usb2: add support of u2phy for rk3328") I have used the phy-next tree from next-20170329 for today. -- Cheers, Stephen Rothwell
linux-next: build failure after merge of the phy-next tree
Hi Kishon, After merging the phy-next tree, today's linux-next build (x86_64 allmodconfig) failed like this: drivers/phy/phy-rockchip-inno-usb2.c:1148:3: error: unknown field 'phy_tuning' specified in initializer .phy_tuning = rk3328_usb2phy_tuning, ^ drivers/phy/phy-rockchip-inno-usb2.c:1148:17: error: 'rk3328_usb2phy_tuning' undeclared here (not in a function) .phy_tuning = rk3328_usb2phy_tuning, ^ Caused by commit ffa0c278e89c ("phy: rockchip-inno-usb2: add support of u2phy for rk3328") I have used the phy-next tree from next-20170329 for today. -- Cheers, Stephen Rothwell
Re: linux-next: manual merge of the tty tree with the tty.current tree
On Mon, 2017-03-20 at 10:26 +0100, Dmitry Vyukov wrote: > On Mon, Mar 20, 2017 at 10:21 AM, Dmitry Vyukovwrote: > > On Mon, Mar 20, 2017 at 3:28 AM, Stephen Rothwell > > wrote: > > > Hi Greg, > > > > > > Today's linux-next merge of the tty tree got a conflict in: > > > > > > drivers/tty/tty_ldisc.c > > > > > > between commit: > > > > > > 5362544bebe8 ("tty: don't panic on OOM in tty_set_ldisc()") > > > > > > from the tty.current tree and commit: > > > > > > 71472fa9c52b ("tty: Fix ldisc crash on reopened tty") > > > > > > from the tty tree. > > > > > > I fixed it up (see below) and can carry the fix as necessary. This > > > is now fixed as far as linux-next is concerned, but any non trivial > > > conflicts should be mentioned to your upstream maintainer when your tree > > > is submitted for merging. You may also want to consider cooperating > > > with the maintainer of the conflicting tree to minimise any particularly > > > complex conflicts. > > > > > > -- > > > Cheers, > > > Stephen Rothwell > > > > > > diff --cc drivers/tty/tty_ldisc.c > > > index b0500a0a87b8,4ee7742dced3.. > > > --- a/drivers/tty/tty_ldisc.c > > > +++ b/drivers/tty/tty_ldisc.c > > > @@@ -621,14 -669,17 +621,15 @@@ int tty_ldisc_reinit(struct tty_struct > > > tty_ldisc_put(tty->ldisc); > > > } > > > > > > - /* switch the line discipline */ > > > - tty->ldisc = ld; > > > tty_set_termios_ldisc(tty, disc); > > > - retval = tty_ldisc_open(tty, tty->ldisc); > > > + retval = tty_ldisc_open(tty, ld); > > > if (retval) { > > > - tty_ldisc_put(tty->ldisc); > > > - tty->ldisc = NULL; > > > - if (!WARN_ON(disc == N_TTY)) { > > > - tty_ldisc_put(ld); > > > - ld = NULL; > > > - } > > > ++ tty_ldisc_put(ld); > > > ++ ld = NULL; > > > } > > > + > > > + /* switch the line discipline */ > > > + smp_store_release(>ldisc, ld); > > > return retval; > > > } > > > > > > > > > Peter, > > > > Looking at your patch "tty: Fix ldisc crash on reopened tty", I think > > there is a missed barrier in tty_ldisc_ref. A single barrier does not > > have any effect, they always need to be in pairs. So I think we also > > need at least: > > > > @@ -295,7 +295,8 @@ struct tty_ldisc *tty_ldisc_ref(struct tty_struct *tty) > > struct tty_ldisc *ld = NULL; > > > > if (ldsem_down_read_trylock(>ldisc_sem)) { > > - ld = tty->ldisc; > > + ld = READ_ONCE(tty->ldisc); > > + read_barrier_depends(); > > if (!ld) > > ldsem_up_read(>ldisc_sem); > > } > > > > > > Or simply: > > > > @@ -295,7 +295,8 @@ struct tty_ldisc *tty_ldisc_ref(struct tty_struct *tty) > > struct tty_ldisc *ld = NULL; > > > > if (ldsem_down_read_trylock(>ldisc_sem)) { > > - ld = tty->ldisc; > > + /* pairs with smp_store_release in tty_ldisc_reinit */ > > + ld = smp_load_acquire(>ldisc); > > if (!ld) > > ldsem_up_read(>ldisc_sem); > > } > > > > > I am also surprised that callers of tty_ldisc_reinit don't hold > ldisc_sem. I thought that ldisc_sem is what's supposed to protect > changes to ldisc. That would also auto fix the crash without any > tricky barriers as flush_to_ldisc uses tty_ldisc_ref. Dmitry, Thanks for the help. Peter doesn't seem to be responding to email any more. I'm not familiar with the tty layer, but the issue that patch was suppose to fix had a similar signature to the below oops we are seeing on powerpc on boot. (sorry I don't have a repro on mainline or linux-next). Hence why I pushed on it. In the below crash the call to tty_ldisc_reinit is coming from a workqueue, so requiring the callers to hold the ldisc_sem is more tricky. Could we just hold the ldisc_sem inside tty_ldisc_reinit()? Regards, Mikey [ 9.021567] Unable to handle kernel paging request for data at address 0x2260 [ 9.022501] Faulting instruction address: 0xc06c7770 [ 9.023105] Oops: Kernel access of bad area, sig: 11 [#1] [ 9.023674] SMP NR_CPUS=2048 [ 9.023676] NUMA [ 9.023970] PowerNV [ 9.024372] Modules linked in: ofpart cmdlinepart ipmi_powernv powernv_flash ipmi_devintf mtd ipmi_msghandler ibmpowernv opal_prd uio_pdrv_genirq uio vmx_crypto ip_tables x_tables autofs4 ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt crc32c_vpmsum fb_sys_fops drm ahci libahci tg3 [ 9.027146] CPU: 15 PID: 354 Comm: kworker/u64:2 Not tainted 4.10.0-8-generic #10-Ubuntu [ 9.027978] Workqueue: events_unbound flush_to_ldisc [ 9.028468] task: c016a7758c00 task.stack: c000fd084000 [ 9.029055] NIP: c06c7770 LR: c06c7758 CTR: c06c84b0 [ 9.029767] REGS:
Re: linux-next: manual merge of the tty tree with the tty.current tree
On Mon, 2017-03-20 at 10:26 +0100, Dmitry Vyukov wrote: > On Mon, Mar 20, 2017 at 10:21 AM, Dmitry Vyukov wrote: > > On Mon, Mar 20, 2017 at 3:28 AM, Stephen Rothwell > > wrote: > > > Hi Greg, > > > > > > Today's linux-next merge of the tty tree got a conflict in: > > > > > > drivers/tty/tty_ldisc.c > > > > > > between commit: > > > > > > 5362544bebe8 ("tty: don't panic on OOM in tty_set_ldisc()") > > > > > > from the tty.current tree and commit: > > > > > > 71472fa9c52b ("tty: Fix ldisc crash on reopened tty") > > > > > > from the tty tree. > > > > > > I fixed it up (see below) and can carry the fix as necessary. This > > > is now fixed as far as linux-next is concerned, but any non trivial > > > conflicts should be mentioned to your upstream maintainer when your tree > > > is submitted for merging. You may also want to consider cooperating > > > with the maintainer of the conflicting tree to minimise any particularly > > > complex conflicts. > > > > > > -- > > > Cheers, > > > Stephen Rothwell > > > > > > diff --cc drivers/tty/tty_ldisc.c > > > index b0500a0a87b8,4ee7742dced3.. > > > --- a/drivers/tty/tty_ldisc.c > > > +++ b/drivers/tty/tty_ldisc.c > > > @@@ -621,14 -669,17 +621,15 @@@ int tty_ldisc_reinit(struct tty_struct > > > tty_ldisc_put(tty->ldisc); > > > } > > > > > > - /* switch the line discipline */ > > > - tty->ldisc = ld; > > > tty_set_termios_ldisc(tty, disc); > > > - retval = tty_ldisc_open(tty, tty->ldisc); > > > + retval = tty_ldisc_open(tty, ld); > > > if (retval) { > > > - tty_ldisc_put(tty->ldisc); > > > - tty->ldisc = NULL; > > > - if (!WARN_ON(disc == N_TTY)) { > > > - tty_ldisc_put(ld); > > > - ld = NULL; > > > - } > > > ++ tty_ldisc_put(ld); > > > ++ ld = NULL; > > > } > > > + > > > + /* switch the line discipline */ > > > + smp_store_release(>ldisc, ld); > > > return retval; > > > } > > > > > > > > > Peter, > > > > Looking at your patch "tty: Fix ldisc crash on reopened tty", I think > > there is a missed barrier in tty_ldisc_ref. A single barrier does not > > have any effect, they always need to be in pairs. So I think we also > > need at least: > > > > @@ -295,7 +295,8 @@ struct tty_ldisc *tty_ldisc_ref(struct tty_struct *tty) > > struct tty_ldisc *ld = NULL; > > > > if (ldsem_down_read_trylock(>ldisc_sem)) { > > - ld = tty->ldisc; > > + ld = READ_ONCE(tty->ldisc); > > + read_barrier_depends(); > > if (!ld) > > ldsem_up_read(>ldisc_sem); > > } > > > > > > Or simply: > > > > @@ -295,7 +295,8 @@ struct tty_ldisc *tty_ldisc_ref(struct tty_struct *tty) > > struct tty_ldisc *ld = NULL; > > > > if (ldsem_down_read_trylock(>ldisc_sem)) { > > - ld = tty->ldisc; > > + /* pairs with smp_store_release in tty_ldisc_reinit */ > > + ld = smp_load_acquire(>ldisc); > > if (!ld) > > ldsem_up_read(>ldisc_sem); > > } > > > > > I am also surprised that callers of tty_ldisc_reinit don't hold > ldisc_sem. I thought that ldisc_sem is what's supposed to protect > changes to ldisc. That would also auto fix the crash without any > tricky barriers as flush_to_ldisc uses tty_ldisc_ref. Dmitry, Thanks for the help. Peter doesn't seem to be responding to email any more. I'm not familiar with the tty layer, but the issue that patch was suppose to fix had a similar signature to the below oops we are seeing on powerpc on boot. (sorry I don't have a repro on mainline or linux-next). Hence why I pushed on it. In the below crash the call to tty_ldisc_reinit is coming from a workqueue, so requiring the callers to hold the ldisc_sem is more tricky. Could we just hold the ldisc_sem inside tty_ldisc_reinit()? Regards, Mikey [ 9.021567] Unable to handle kernel paging request for data at address 0x2260 [ 9.022501] Faulting instruction address: 0xc06c7770 [ 9.023105] Oops: Kernel access of bad area, sig: 11 [#1] [ 9.023674] SMP NR_CPUS=2048 [ 9.023676] NUMA [ 9.023970] PowerNV [ 9.024372] Modules linked in: ofpart cmdlinepart ipmi_powernv powernv_flash ipmi_devintf mtd ipmi_msghandler ibmpowernv opal_prd uio_pdrv_genirq uio vmx_crypto ip_tables x_tables autofs4 ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt crc32c_vpmsum fb_sys_fops drm ahci libahci tg3 [ 9.027146] CPU: 15 PID: 354 Comm: kworker/u64:2 Not tainted 4.10.0-8-generic #10-Ubuntu [ 9.027978] Workqueue: events_unbound flush_to_ldisc [ 9.028468] task: c016a7758c00 task.stack: c000fd084000 [ 9.029055] NIP: c06c7770 LR: c06c7758 CTR: c06c84b0 [ 9.029767] REGS: c000fd0878c0 TRAP: 0300 Not tainted
Re: [PATCH] kernel.h: add IS_PTR_ALIGNED() macro
On 03/29/17 18:57, Masahiro Yamada wrote: > > Could you care to send a patch? > > Perhaps, ALIGN and PTR_ALIGN can be merged as well? > Can't promise when I'd get around to it... -hpa
Re: [PATCH] kernel.h: add IS_PTR_ALIGNED() macro
On 03/29/17 18:57, Masahiro Yamada wrote: > > Could you care to send a patch? > > Perhaps, ALIGN and PTR_ALIGN can be merged as well? > Can't promise when I'd get around to it... -hpa
Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling
On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote: > On Wed, Mar 29, 2017 at 10:19 AM, Michał Kępieńwrote: > > > Darren, Andy, in light of the above I will be awaiting your review of > > this series. I will submit v2 afterwards, with all remarks from both > > you and Jonathan taken into account. > > Darren marked this series under his name to review, so, I let him to > speak for us. The series looks good to me. Nice work Michał. They are logically divided and address issues in a procedural way (so I stopped commenting until I read the full series through as a couple of times you addressed a concern from a move in a cleanup to follow). I've applied the noted change to 7/8 and will run this through my tests, but don't anticipate any problems. Jonathan, if you don't have any additional concerns, let me know if I can add your Reviewed-by. Thanks, -- Darren Hart VMware Open Source Technology Center
Re: [PATCH 0/8] fujitsu-laptop: use sparse keymaps for input event handling
On Wed, Mar 29, 2017 at 07:35:50PM +0300, Andy Shevchenko wrote: > On Wed, Mar 29, 2017 at 10:19 AM, Michał Kępień wrote: > > > Darren, Andy, in light of the above I will be awaiting your review of > > this series. I will submit v2 afterwards, with all remarks from both > > you and Jonathan taken into account. > > Darren marked this series under his name to review, so, I let him to > speak for us. The series looks good to me. Nice work Michał. They are logically divided and address issues in a procedural way (so I stopped commenting until I read the full series through as a couple of times you addressed a concern from a move in a cleanup to follow). I've applied the noted change to 7/8 and will run this through my tests, but don't anticipate any problems. Jonathan, if you don't have any additional concerns, let me know if I can add your Reviewed-by. Thanks, -- Darren Hart VMware Open Source Technology Center
Re: [PATCH v4 1/2] module: verify address is read-only
+++ Eddie Kovsky [26/03/17 15:08 -0600]: Implement a mechanism to check if a module's address is in the rodata or ro_after_init sections. It mimics the existing functions that test if an address is inside a module's text section. Functions that take a module as an argument will be able to verify that the module address is in a read-only section. The idea is to prevent structures (including modules) that are not read-only from being passed to functions. This implements the first half of a suggestion made by Kees Cook for the Kernel Self Protection Project: - provide mechanism to check for ro_after_init memory areas, and reject structures not marked ro_after_init in vmbus_register() Suggested-by: Kees CookSigned-off-by: Eddie Kovsky Thanks for the respin, this looks much better. Acked-by: Jessica Yu --- Changes in v4: - Rename function __module_ro_address() to __module_rodata_address() - Move module_rodata_address() stub below is_module_address() - Minor comment fixes - Verify that mod is not NULL before setting up layout variables Changes in v3: - Change function name is_module_ro_address() to is_module_rodata_address(). - Improve comments on is_module_rodata_address(). - Add a !CONFIG_MODULES stub for is_module_rodata_address. - Correct and simplify the check for the read-only memory regions in the module address. --- include/linux/module.h | 12 kernel/module.c| 53 ++ 2 files changed, 65 insertions(+) diff --git a/include/linux/module.h b/include/linux/module.h index 9ad68561d8c2..a3d17b081de3 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -492,7 +492,9 @@ static inline int module_is_live(struct module *mod) struct module *__module_text_address(unsigned long addr); struct module *__module_address(unsigned long addr); +struct module *__module_rodata_address(unsigned long addr); bool is_module_address(unsigned long addr); +bool is_module_rodata_address(unsigned long addr); bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr); bool is_module_percpu_address(unsigned long addr); bool is_module_text_address(unsigned long addr); @@ -646,6 +648,11 @@ static inline struct module *__module_address(unsigned long addr) return NULL; } +static inline struct module *__module_rodata_address(unsigned long addr) +{ + return NULL; +} + static inline struct module *__module_text_address(unsigned long addr) { return NULL; @@ -656,6 +663,11 @@ static inline bool is_module_address(unsigned long addr) return false; } +static inline bool is_module_rodata_address(unsigned long addr) +{ + return false; +} + static inline bool is_module_percpu_address(unsigned long addr) { return false; diff --git a/kernel/module.c b/kernel/module.c index 8ffcd29a4245..26bd62c61d87 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -4292,6 +4292,59 @@ struct module *__module_text_address(unsigned long addr) } EXPORT_SYMBOL_GPL(__module_text_address); +/** + * is_module_rodata_address - is this address inside read-only module data? + * @addr: the address to check. + * + */ +bool is_module_rodata_address(unsigned long addr) +{ + bool ret; + + preempt_disable(); + ret = __module_rodata_address(addr) != NULL; + preempt_enable(); + + return ret; +} + +/* + * __module_rodata_address - get the module whose rodata/ro_after_init sections + * contain the given address. + * @addr: the address. + * + * Must be called with preempt disabled or module mutex held so that + * module doesn't get freed during this. + */ +struct module *__module_rodata_address(unsigned long addr) +{ + struct module *mod = __module_address(addr); + + /* +* Make sure module is within the read-only section. +* range(base + text_size, base + ro_after_init_size) +* encompasses both the rodata and ro_after_init regions. +* See comment above frob_text() for the layout diagram. +*/ + if (mod) { + void *init_base = mod->init_layout.base; + unsigned int init_text_size = mod->init_layout.text_size; + unsigned int init_ro_after_init_size = mod->init_layout.ro_after_init_size; + + void *core_base = mod->core_layout.base; + unsigned int core_text_size = mod->core_layout.text_size; + unsigned int core_ro_after_init_size = mod->core_layout.ro_after_init_size; + + if (!within(addr, init_base + init_text_size, + init_ro_after_init_size - init_text_size) + && !within(addr, core_base + core_text_size, + core_ro_after_init_size - core_text_size)) + mod = NULL; + } + return mod; +} +EXPORT_SYMBOL_GPL(__module_rodata_address); + /* Don't grab lock, we're oopsing. */ void
Re: [PATCH v4 1/2] module: verify address is read-only
+++ Eddie Kovsky [26/03/17 15:08 -0600]: Implement a mechanism to check if a module's address is in the rodata or ro_after_init sections. It mimics the existing functions that test if an address is inside a module's text section. Functions that take a module as an argument will be able to verify that the module address is in a read-only section. The idea is to prevent structures (including modules) that are not read-only from being passed to functions. This implements the first half of a suggestion made by Kees Cook for the Kernel Self Protection Project: - provide mechanism to check for ro_after_init memory areas, and reject structures not marked ro_after_init in vmbus_register() Suggested-by: Kees Cook Signed-off-by: Eddie Kovsky Thanks for the respin, this looks much better. Acked-by: Jessica Yu --- Changes in v4: - Rename function __module_ro_address() to __module_rodata_address() - Move module_rodata_address() stub below is_module_address() - Minor comment fixes - Verify that mod is not NULL before setting up layout variables Changes in v3: - Change function name is_module_ro_address() to is_module_rodata_address(). - Improve comments on is_module_rodata_address(). - Add a !CONFIG_MODULES stub for is_module_rodata_address. - Correct and simplify the check for the read-only memory regions in the module address. --- include/linux/module.h | 12 kernel/module.c| 53 ++ 2 files changed, 65 insertions(+) diff --git a/include/linux/module.h b/include/linux/module.h index 9ad68561d8c2..a3d17b081de3 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -492,7 +492,9 @@ static inline int module_is_live(struct module *mod) struct module *__module_text_address(unsigned long addr); struct module *__module_address(unsigned long addr); +struct module *__module_rodata_address(unsigned long addr); bool is_module_address(unsigned long addr); +bool is_module_rodata_address(unsigned long addr); bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr); bool is_module_percpu_address(unsigned long addr); bool is_module_text_address(unsigned long addr); @@ -646,6 +648,11 @@ static inline struct module *__module_address(unsigned long addr) return NULL; } +static inline struct module *__module_rodata_address(unsigned long addr) +{ + return NULL; +} + static inline struct module *__module_text_address(unsigned long addr) { return NULL; @@ -656,6 +663,11 @@ static inline bool is_module_address(unsigned long addr) return false; } +static inline bool is_module_rodata_address(unsigned long addr) +{ + return false; +} + static inline bool is_module_percpu_address(unsigned long addr) { return false; diff --git a/kernel/module.c b/kernel/module.c index 8ffcd29a4245..26bd62c61d87 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -4292,6 +4292,59 @@ struct module *__module_text_address(unsigned long addr) } EXPORT_SYMBOL_GPL(__module_text_address); +/** + * is_module_rodata_address - is this address inside read-only module data? + * @addr: the address to check. + * + */ +bool is_module_rodata_address(unsigned long addr) +{ + bool ret; + + preempt_disable(); + ret = __module_rodata_address(addr) != NULL; + preempt_enable(); + + return ret; +} + +/* + * __module_rodata_address - get the module whose rodata/ro_after_init sections + * contain the given address. + * @addr: the address. + * + * Must be called with preempt disabled or module mutex held so that + * module doesn't get freed during this. + */ +struct module *__module_rodata_address(unsigned long addr) +{ + struct module *mod = __module_address(addr); + + /* +* Make sure module is within the read-only section. +* range(base + text_size, base + ro_after_init_size) +* encompasses both the rodata and ro_after_init regions. +* See comment above frob_text() for the layout diagram. +*/ + if (mod) { + void *init_base = mod->init_layout.base; + unsigned int init_text_size = mod->init_layout.text_size; + unsigned int init_ro_after_init_size = mod->init_layout.ro_after_init_size; + + void *core_base = mod->core_layout.base; + unsigned int core_text_size = mod->core_layout.text_size; + unsigned int core_ro_after_init_size = mod->core_layout.ro_after_init_size; + + if (!within(addr, init_base + init_text_size, + init_ro_after_init_size - init_text_size) + && !within(addr, core_base + core_text_size, + core_ro_after_init_size - core_text_size)) + mod = NULL; + } + return mod; +} +EXPORT_SYMBOL_GPL(__module_rodata_address); + /* Don't grab lock, we're oopsing. */ void print_modules(void) { -- 2.12.1
Re: [PATCH] virtio_console: fix uninitialized variable use
On Wed, 2017-03-29 at 23:27 +0300, Michael S. Tsirkin wrote: > Hi Mike > if you like, pls send me your Signed-off-by and I'll > change the patch to make you an author. Nah, it's perfect as it is. While I was pretty darn sure it was generic, I intentionally posted it as diagnostic info. > More importantly, pls let me know whether this helps > resolve the issues about hybernation for you! Well, it prevents kaboom with VIRTIO_CONSOLE_F_MULTIPORT turned off, but no, it does nada wrt the warnings. -Mike
Re: [PATCH] virtio_console: fix uninitialized variable use
On Wed, 2017-03-29 at 23:27 +0300, Michael S. Tsirkin wrote: > Hi Mike > if you like, pls send me your Signed-off-by and I'll > change the patch to make you an author. Nah, it's perfect as it is. While I was pretty darn sure it was generic, I intentionally posted it as diagnostic info. > More importantly, pls let me know whether this helps > resolve the issues about hybernation for you! Well, it prevents kaboom with VIRTIO_CONSOLE_F_MULTIPORT turned off, but no, it does nada wrt the warnings. -Mike
Re: [PATCH v2 0/8] thermal: ti-soc-thermal: Migrate slope/offset data to device tree
On Wednesday 29 March 2017 10:07 AM, Eduardo Valentin wrote: > Keerthy, > > On Fri, Mar 24, 2017 at 07:26:10AM -0700, Tony Lindgren wrote: >> * Keerthy[170323 20:29]: >>> >>> >>> On Friday 24 March 2017 02:22 AM, Tony Lindgren wrote: * Keerthy [170321 20:45]: > > > On Thursday 09 March 2017 01:35 PM, Keerthy wrote: >> Currently the slope and offset values for calculating the >> hot spot temperature of a particular thermal zone is part >> of driver data. Pass them here instead and obtain the values >> while of node parsing. >> >> Tested for the slope and constant values on DRA7-EVM, OMAP3-BEAGLE. > > Have you tried on boards that need negative coefficients? > > https://patchwork.kernel.org/patch/9619577/ Yes. I retrieved the negative values nicely in the driver passed via Device Tree. > > > Hi Eduardo, > > If the series looks okay could you please pull this? Also.. Are the dts changes safe for me to pick separately? >>> >>> Yes Tony they are safe to pulled separately. >> >> OK applying patches 1 - 5 of this series into omap-for-v4.12/dt-v2. > > Keerthy, > > The only thing I want you to confirm is if you are really getting the > negative coefficients, because currently of-thermal reads the array > using an OF helper that understands only unsigned. For this reason, I > will be queueing your patches only for next merge window, not as a fix, > not for rc's. I am getting negative co-efficients. > > >> >> Thanks, >> >> Tony
Re: [PATCH v2 0/8] thermal: ti-soc-thermal: Migrate slope/offset data to device tree
On Wednesday 29 March 2017 10:07 AM, Eduardo Valentin wrote: > Keerthy, > > On Fri, Mar 24, 2017 at 07:26:10AM -0700, Tony Lindgren wrote: >> * Keerthy [170323 20:29]: >>> >>> >>> On Friday 24 March 2017 02:22 AM, Tony Lindgren wrote: * Keerthy [170321 20:45]: > > > On Thursday 09 March 2017 01:35 PM, Keerthy wrote: >> Currently the slope and offset values for calculating the >> hot spot temperature of a particular thermal zone is part >> of driver data. Pass them here instead and obtain the values >> while of node parsing. >> >> Tested for the slope and constant values on DRA7-EVM, OMAP3-BEAGLE. > > Have you tried on boards that need negative coefficients? > > https://patchwork.kernel.org/patch/9619577/ Yes. I retrieved the negative values nicely in the driver passed via Device Tree. > > > Hi Eduardo, > > If the series looks okay could you please pull this? Also.. Are the dts changes safe for me to pick separately? >>> >>> Yes Tony they are safe to pulled separately. >> >> OK applying patches 1 - 5 of this series into omap-for-v4.12/dt-v2. > > Keerthy, > > The only thing I want you to confirm is if you are really getting the > negative coefficients, because currently of-thermal reads the array > using an OF helper that understands only unsigned. For this reason, I > will be queueing your patches only for next merge window, not as a fix, > not for rc's. I am getting negative co-efficients. > > >> >> Thanks, >> >> Tony
[PATCH 0/5] firmware: move UMH locks onto fallback code
Greg, One of the eyesores on the old API was the use of the UMH lock even when we don't use any of the usermode helpers. It took quite a bit of git archeology to draw up a solution which makes me feel comfortable in moving this code out of the way given it may have added new protections we never knew about. A resolution is to make similar protections for direct FS lookups for now, and later after a release or so we can consider removing them. There is further possible work to do to clean up the UMH pollution on external code, but this can be done separately and is more a generic kernel/kmod.c thing than firmware thing. This changes makes subsequent patches able to fold the new driver data API onto the older firmware API. These all go hammered with 0-day and the firmware test scripts shipped upstream. Luis R. Rodriguez (5): firmware: share fw fallback killing on reboot/suspend firmware: always enable the reboot notifier firmware: add sanity check on shutdown/suspend firmware: move assign_firmware_buf() further up firmware: move umh try locks into the umh code .../driver-api/firmware/request_firmware.rst | 11 + drivers/base/firmware_class.c | 293 ++--- 2 files changed, 207 insertions(+), 97 deletions(-) -- 2.11.0
[PATCH 0/5] firmware: move UMH locks onto fallback code
Greg, One of the eyesores on the old API was the use of the UMH lock even when we don't use any of the usermode helpers. It took quite a bit of git archeology to draw up a solution which makes me feel comfortable in moving this code out of the way given it may have added new protections we never knew about. A resolution is to make similar protections for direct FS lookups for now, and later after a release or so we can consider removing them. There is further possible work to do to clean up the UMH pollution on external code, but this can be done separately and is more a generic kernel/kmod.c thing than firmware thing. This changes makes subsequent patches able to fold the new driver data API onto the older firmware API. These all go hammered with 0-day and the firmware test scripts shipped upstream. Luis R. Rodriguez (5): firmware: share fw fallback killing on reboot/suspend firmware: always enable the reboot notifier firmware: add sanity check on shutdown/suspend firmware: move assign_firmware_buf() further up firmware: move umh try locks into the umh code .../driver-api/firmware/request_firmware.rst | 11 + drivers/base/firmware_class.c | 293 ++--- 2 files changed, 207 insertions(+), 97 deletions(-) -- 2.11.0
Re: [PATCH v2 3/8] ARM: OMAP5: Thermal: Add slope and offset values
On Wednesday 29 March 2017 10:03 AM, Eduardo Valentin wrote: > On Thu, Mar 09, 2017 at 01:35:57PM +0530, Keerthy wrote: >> Currently the slope and offset values for calculating the >> hot spot temperature of a particular thermal zone is part >> of driver data. Pass them here instead and obtain the values >> while of node parsing. > > > The patch is fine.. but > >> >> Signed-off-by: Keerthy>> --- >> arch/arm/boot/dts/omap5.dtsi | 9 + >> 1 file changed, 9 insertions(+) >> >> diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi >> index 222155c..eaff2a5 100644 >> --- a/arch/arm/boot/dts/omap5.dtsi >> +++ b/arch/arm/boot/dts/omap5.dtsi >> @@ -1127,6 +1127,15 @@ >> >> _thermal { >> polling-delay = <500>; /* milliseconds */ >> +coefficients = <65 (-1791)>; > > I suppose you tried this change with this patch: > https://patchwork.kernel.org/patch/9619577/ > ? > > Otherwise, I do not see how your coeff would work, right? IIRC i did not have that patch. But No issues there. I did try deliberately with negative values and i retrieved the negative values in driver nicely. Fox ex: -20 unsigned when retrieved in driver as an int coeff will give me -20. I checked with multiple negative co-efficients and i retrieved the passed negative co-efficients in driver from DT. Regards, Keerthy > >> }; >> >> /include/ "omap54xx-clocks.dtsi" >> + >> +_thermal { >> +coefficients = <117 (-2992)>; >> +}; >> + >> +_thermal { >> +coefficients = <0 2000>; >> +}; >> -- >> 1.9.1 >>
Re: [PATCH v2 3/8] ARM: OMAP5: Thermal: Add slope and offset values
On Wednesday 29 March 2017 10:03 AM, Eduardo Valentin wrote: > On Thu, Mar 09, 2017 at 01:35:57PM +0530, Keerthy wrote: >> Currently the slope and offset values for calculating the >> hot spot temperature of a particular thermal zone is part >> of driver data. Pass them here instead and obtain the values >> while of node parsing. > > > The patch is fine.. but > >> >> Signed-off-by: Keerthy >> --- >> arch/arm/boot/dts/omap5.dtsi | 9 + >> 1 file changed, 9 insertions(+) >> >> diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi >> index 222155c..eaff2a5 100644 >> --- a/arch/arm/boot/dts/omap5.dtsi >> +++ b/arch/arm/boot/dts/omap5.dtsi >> @@ -1127,6 +1127,15 @@ >> >> _thermal { >> polling-delay = <500>; /* milliseconds */ >> +coefficients = <65 (-1791)>; > > I suppose you tried this change with this patch: > https://patchwork.kernel.org/patch/9619577/ > ? > > Otherwise, I do not see how your coeff would work, right? IIRC i did not have that patch. But No issues there. I did try deliberately with negative values and i retrieved the negative values in driver nicely. Fox ex: -20 unsigned when retrieved in driver as an int coeff will give me -20. I checked with multiple negative co-efficients and i retrieved the passed negative co-efficients in driver from DT. Regards, Keerthy > >> }; >> >> /include/ "omap54xx-clocks.dtsi" >> + >> +_thermal { >> +coefficients = <117 (-2992)>; >> +}; >> + >> +_thermal { >> +coefficients = <0 2000>; >> +}; >> -- >> 1.9.1 >>
[PATCH 3/5] firmware: add sanity check on shutdown/suspend
The firmware API should not be used after we go to suspend and after we reboot/halt. The suspend/resume case is a bit complex, so this documents that so things are clearer. We want to know about users of the API in incorrect places so that their callers are corrected, so this also adds a warn for those cases. Signed-off-by: Luis R. Rodriguez--- .../driver-api/firmware/request_firmware.rst | 11 +++ drivers/base/firmware_class.c | 96 ++ 2 files changed, 107 insertions(+) diff --git a/Documentation/driver-api/firmware/request_firmware.rst b/Documentation/driver-api/firmware/request_firmware.rst index cc0aea880824..1c2c4967cd43 100644 --- a/Documentation/driver-api/firmware/request_firmware.rst +++ b/Documentation/driver-api/firmware/request_firmware.rst @@ -44,6 +44,17 @@ request_firmware_nowait .. kernel-doc:: drivers/base/firmware_class.c :functions: request_firmware_nowait +Considerations for suspend and resume += + +During suspend and resume only the built-in firmware and the firmware cache +elements of the firmware API can be used. This is managed by fw_pm_notify(). + +fw_pm_notify + +.. kernel-doc:: drivers/base/firmware_class.c + :functions: fw_pm_notify + request firmware API expected driver use diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index e8edf22536c1..f7ac619635b4 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -260,6 +260,38 @@ static int fw_cache_piggyback_on_request(const char *name); * guarding for corner cases a global lock should be OK */ static DEFINE_MUTEX(fw_lock); +static bool __enable_firmware = false; + +static void enable_firmware(void) +{ + mutex_lock(_lock); + __enable_firmware = true; + mutex_unlock(_lock); +} + +static void disable_firmware(void) +{ + mutex_lock(_lock); + __enable_firmware = false; + mutex_unlock(_lock); +} + +/* + * When disabled only the built-in firmware and the firmware cache will be + * used to look for firmware. + */ +static bool firmware_enabled(void) +{ + bool enabled = false; + + mutex_lock(_lock); + if (__enable_firmware) + enabled = true; + mutex_unlock(_lock); + + return enabled; +} + static struct firmware_cache fw_cache; static struct firmware_buf *__allocate_fw_buf(const char *fw_name, @@ -1165,6 +1197,12 @@ _request_firmware(const struct firmware **firmware_p, const char *name, if (ret <= 0) /* error or already assigned */ goto out; + if (!firmware_enabled()) { + WARN(1, "firmware request while host is not available\n"); + ret = -EHOSTDOWN; + goto out; + } + ret = 0; timeout = firmware_loading_timeout(); if (opt_flags & FW_OPT_NOWAIT) { @@ -1697,6 +1735,59 @@ static void device_uncache_fw_images_delay(unsigned long delay) msecs_to_jiffies(delay)); } +/** + * fw_pm_notify - notifier for suspend/resume + * + * Used to modify the firmware_class state as we move in between states. + * The firmware_class implements a firmware cache to enable device driver + * to fetch firmware upon resume before the root filesystem is ready. We + * disable API calls which do not use the built-in firmware or the firmware + * cache when we know these calls will not work. + * + * The inner logic behind all this is a bit complex so it is worth summarizing + * the kernel's own suspend/resume process with context and focus on how this + * can impact the firmware API. + * + * First a review on how we go to suspend: + * + * pm_suspend() --> enter_state() --> + * sys_sync() + * suspend_prepare() --> + * __pm_notifier_call_chain(PM_SUSPEND_PREPARE, ...); + * suspend_freeze_processes() --> + * freeze_processes() --> + * __usermodehelper_set_disable_depth(UMH_DISABLED); + * freeze all tasks ... + * freeze_kernel_threads() + * suspend_devices_and_enter() --> + * dpm_suspend_start() --> + * dpm_prepare() + * dpm_suspend() + * suspend_enter() --> + * platform_suspend_prepare() + * dpm_suspend_late() + * freeze_enter() + * syscore_suspend() + * + * When we resume we bail out of a loop from suspend_devices_and_enter() and + * unwind back out to the caller enter_state() where we were before as follows: + * + * enter_state() --> + * suspend_devices_and_enter() --> (bail from loop) + * dpm_resume_end() --> + * dpm_resume() + * dpm_complete() + * suspend_finish() --> + *
[PATCH 3/5] firmware: add sanity check on shutdown/suspend
The firmware API should not be used after we go to suspend and after we reboot/halt. The suspend/resume case is a bit complex, so this documents that so things are clearer. We want to know about users of the API in incorrect places so that their callers are corrected, so this also adds a warn for those cases. Signed-off-by: Luis R. Rodriguez --- .../driver-api/firmware/request_firmware.rst | 11 +++ drivers/base/firmware_class.c | 96 ++ 2 files changed, 107 insertions(+) diff --git a/Documentation/driver-api/firmware/request_firmware.rst b/Documentation/driver-api/firmware/request_firmware.rst index cc0aea880824..1c2c4967cd43 100644 --- a/Documentation/driver-api/firmware/request_firmware.rst +++ b/Documentation/driver-api/firmware/request_firmware.rst @@ -44,6 +44,17 @@ request_firmware_nowait .. kernel-doc:: drivers/base/firmware_class.c :functions: request_firmware_nowait +Considerations for suspend and resume += + +During suspend and resume only the built-in firmware and the firmware cache +elements of the firmware API can be used. This is managed by fw_pm_notify(). + +fw_pm_notify + +.. kernel-doc:: drivers/base/firmware_class.c + :functions: fw_pm_notify + request firmware API expected driver use diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index e8edf22536c1..f7ac619635b4 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -260,6 +260,38 @@ static int fw_cache_piggyback_on_request(const char *name); * guarding for corner cases a global lock should be OK */ static DEFINE_MUTEX(fw_lock); +static bool __enable_firmware = false; + +static void enable_firmware(void) +{ + mutex_lock(_lock); + __enable_firmware = true; + mutex_unlock(_lock); +} + +static void disable_firmware(void) +{ + mutex_lock(_lock); + __enable_firmware = false; + mutex_unlock(_lock); +} + +/* + * When disabled only the built-in firmware and the firmware cache will be + * used to look for firmware. + */ +static bool firmware_enabled(void) +{ + bool enabled = false; + + mutex_lock(_lock); + if (__enable_firmware) + enabled = true; + mutex_unlock(_lock); + + return enabled; +} + static struct firmware_cache fw_cache; static struct firmware_buf *__allocate_fw_buf(const char *fw_name, @@ -1165,6 +1197,12 @@ _request_firmware(const struct firmware **firmware_p, const char *name, if (ret <= 0) /* error or already assigned */ goto out; + if (!firmware_enabled()) { + WARN(1, "firmware request while host is not available\n"); + ret = -EHOSTDOWN; + goto out; + } + ret = 0; timeout = firmware_loading_timeout(); if (opt_flags & FW_OPT_NOWAIT) { @@ -1697,6 +1735,59 @@ static void device_uncache_fw_images_delay(unsigned long delay) msecs_to_jiffies(delay)); } +/** + * fw_pm_notify - notifier for suspend/resume + * + * Used to modify the firmware_class state as we move in between states. + * The firmware_class implements a firmware cache to enable device driver + * to fetch firmware upon resume before the root filesystem is ready. We + * disable API calls which do not use the built-in firmware or the firmware + * cache when we know these calls will not work. + * + * The inner logic behind all this is a bit complex so it is worth summarizing + * the kernel's own suspend/resume process with context and focus on how this + * can impact the firmware API. + * + * First a review on how we go to suspend: + * + * pm_suspend() --> enter_state() --> + * sys_sync() + * suspend_prepare() --> + * __pm_notifier_call_chain(PM_SUSPEND_PREPARE, ...); + * suspend_freeze_processes() --> + * freeze_processes() --> + * __usermodehelper_set_disable_depth(UMH_DISABLED); + * freeze all tasks ... + * freeze_kernel_threads() + * suspend_devices_and_enter() --> + * dpm_suspend_start() --> + * dpm_prepare() + * dpm_suspend() + * suspend_enter() --> + * platform_suspend_prepare() + * dpm_suspend_late() + * freeze_enter() + * syscore_suspend() + * + * When we resume we bail out of a loop from suspend_devices_and_enter() and + * unwind back out to the caller enter_state() where we were before as follows: + * + * enter_state() --> + * suspend_devices_and_enter() --> (bail from loop) + * dpm_resume_end() --> + * dpm_resume() + * dpm_complete() + * suspend_finish() --> + *
Re: [PATCH v4 2/2] extable: verify address is read-only
+++ Eddie Kovsky [28/03/17 21:28 -0600]: On 03/27/17, Kees Cook wrote: On Mon, Mar 27, 2017 at 1:43 AM, kbuild test robotwrote: > Hi Eddie, > > [auto build test ERROR on next-20170323] > [cannot apply to linus/master linux/master jeyu/modules-next v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.11-rc4] > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] > > url: https://github.com/0day-ci/linux/commits/Eddie-Kovsky/module-verify-address-is-read-only/20170327-142922 > config: blackfin-BF561-EZKIT-SMP_defconfig (attached as .config) > compiler: bfin-uclinux-gcc (GCC) 6.2.0 > reproduce: > wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=blackfin > > All errors (new ones prefixed by >>): > >kernel/built-in.o: In function `core_kernel_rodata': >>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init' Hm, I'm confused about this. blackfin includes include/asm-generic-vmlinux.lds.h and uses the RO_DATA macro (which resolves to RO_DATA_SECTION to RO_AFTER_INIT_DATA which defines __[start|end]_data_ro_after_init. Also, it seems that commit d7c19b066dcf4bd19c4385e8065558d4e74f9e73 ("mm: kmemleak: scan .data.ro_after_init") added a potentially redundant section name (s390 already calls this __[start|end]_ro_after_init). I'd like to get this cleaned up, since having multiple names for the same thing is confusing: diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S index 000e6e91f6a0..3667d20e997f 100644 --- a/arch/s390/kernel/vmlinux.lds.S +++ b/arch/s390/kernel/vmlinux.lds.S @@ -62,9 +62,11 @@ SECTIONS . = ALIGN(PAGE_SIZE); __start_ro_after_init = .; + __start_data_ro_after_init = .; .data..ro_after_init : { *(.data..ro_after_init) } + __end_data_ro_after_init = .; EXCEPTION_TABLE(16) . = ALIGN(PAGE_SIZE); __end_ro_after_init = .; And it seems that this hunk is wrong (__end_ro_after_init includes s390's exception table, etc). I think we should remove the ..._data_... name and use s390's name. I'll send an adjustment patch, but we'll still need to deal with blackfin. -Kees Kees I applied your patch (mm: fix section name for .data..ro_after_init) and changed the new function in extable.c to use __[start|end]_ro_after_init instead. The new version still builds without errors on x86, which isn't surprising. I've cross compiled this for blackfin and I'm able to reproduce the build error. I'm still not sure why. As you pointed out, blackfin does appear to use 'include/asm-generic/vmlinux.lds.h'. This appears to be because blackfin is one of the 2 arches that prepends an underscore '_' to all symbols defined in C. I noticed that __{start,end}_data_ro_after_init in vmlinux.lds.h are not wrapped with VMLINUX_SYMBOL() which adds the necessary prefix for arches that have HAVE_UNDERSCORE_SYMBOL_PREFIX, hence the undefined reference. The below patch fixed the build error for me, if it works for you then I can send a formal patch. diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 4e09b28..7b262f7 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -260,9 +260,9 @@ */ #ifndef RO_AFTER_INIT_DATA #define RO_AFTER_INIT_DATA \ - __start_data_ro_after_init = .; \ + VMLINUX_SYMBOL(__start_data_ro_after_init) = .; \ *(.data..ro_after_init) \ - __end_data_ro_after_init = .; + VMLINUX_SYMBOL(__end_data_ro_after_init) = .; #endif /* Jessica
Re: [PATCH v4 2/2] extable: verify address is read-only
+++ Eddie Kovsky [28/03/17 21:28 -0600]: On 03/27/17, Kees Cook wrote: On Mon, Mar 27, 2017 at 1:43 AM, kbuild test robot wrote: > Hi Eddie, > > [auto build test ERROR on next-20170323] > [cannot apply to linus/master linux/master jeyu/modules-next v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.11-rc4] > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] > > url: https://github.com/0day-ci/linux/commits/Eddie-Kovsky/module-verify-address-is-read-only/20170327-142922 > config: blackfin-BF561-EZKIT-SMP_defconfig (attached as .config) > compiler: bfin-uclinux-gcc (GCC) 6.2.0 > reproduce: > wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=blackfin > > All errors (new ones prefixed by >>): > >kernel/built-in.o: In function `core_kernel_rodata': >>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__start_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init' >>> kernel/extable.c:169: undefined reference to `__end_data_ro_after_init' Hm, I'm confused about this. blackfin includes include/asm-generic-vmlinux.lds.h and uses the RO_DATA macro (which resolves to RO_DATA_SECTION to RO_AFTER_INIT_DATA which defines __[start|end]_data_ro_after_init. Also, it seems that commit d7c19b066dcf4bd19c4385e8065558d4e74f9e73 ("mm: kmemleak: scan .data.ro_after_init") added a potentially redundant section name (s390 already calls this __[start|end]_ro_after_init). I'd like to get this cleaned up, since having multiple names for the same thing is confusing: diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S index 000e6e91f6a0..3667d20e997f 100644 --- a/arch/s390/kernel/vmlinux.lds.S +++ b/arch/s390/kernel/vmlinux.lds.S @@ -62,9 +62,11 @@ SECTIONS . = ALIGN(PAGE_SIZE); __start_ro_after_init = .; + __start_data_ro_after_init = .; .data..ro_after_init : { *(.data..ro_after_init) } + __end_data_ro_after_init = .; EXCEPTION_TABLE(16) . = ALIGN(PAGE_SIZE); __end_ro_after_init = .; And it seems that this hunk is wrong (__end_ro_after_init includes s390's exception table, etc). I think we should remove the ..._data_... name and use s390's name. I'll send an adjustment patch, but we'll still need to deal with blackfin. -Kees Kees I applied your patch (mm: fix section name for .data..ro_after_init) and changed the new function in extable.c to use __[start|end]_ro_after_init instead. The new version still builds without errors on x86, which isn't surprising. I've cross compiled this for blackfin and I'm able to reproduce the build error. I'm still not sure why. As you pointed out, blackfin does appear to use 'include/asm-generic/vmlinux.lds.h'. This appears to be because blackfin is one of the 2 arches that prepends an underscore '_' to all symbols defined in C. I noticed that __{start,end}_data_ro_after_init in vmlinux.lds.h are not wrapped with VMLINUX_SYMBOL() which adds the necessary prefix for arches that have HAVE_UNDERSCORE_SYMBOL_PREFIX, hence the undefined reference. The below patch fixed the build error for me, if it works for you then I can send a formal patch. diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 4e09b28..7b262f7 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -260,9 +260,9 @@ */ #ifndef RO_AFTER_INIT_DATA #define RO_AFTER_INIT_DATA \ - __start_data_ro_after_init = .; \ + VMLINUX_SYMBOL(__start_data_ro_after_init) = .; \ *(.data..ro_after_init) \ - __end_data_ro_after_init = .; + VMLINUX_SYMBOL(__end_data_ro_after_init) = .; #endif /* Jessica
[PATCH] usb: host: plat: Enable xHCI plat runtime PM
Enable the xHCI plat runtime PM for parent device to suspend/resume xHCI. Also call pm_runtime_forbid() in probe() function to force users to explicitly enable runtime pm using power/control in sysfs, in case some parent devices didn't implement runtime PM callbacks. Signed-off-by: Baolin Wang--- drivers/usb/host/xhci-plat.c | 54 -- 1 file changed, 47 insertions(+), 7 deletions(-) diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index bd02a6c..2036c24 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -179,9 +179,15 @@ static int xhci_plat_probe(struct platform_device *pdev) return ret; } + pm_runtime_set_active(>dev); + pm_runtime_enable(>dev); + pm_runtime_get_noresume(>dev); + hcd = usb_create_hcd(driver, >dev, dev_name(>dev)); - if (!hcd) - return -ENOMEM; + if (!hcd) { + ret = -ENOMEM; + goto disable_runtime; + } res = platform_get_resource(pdev, IORESOURCE_MEM, 0); hcd->regs = devm_ioremap_resource(>dev, res); @@ -258,6 +264,14 @@ static int xhci_plat_probe(struct platform_device *pdev) if (ret) goto dealloc_usb2_hcd; + pm_runtime_put_noidle(>dev); + + /* +* Prevent runtime pm from being on as default, users should enable +* runtime pm using power/control in sysfs. +*/ + pm_runtime_forbid(>dev); + return 0; @@ -277,6 +291,10 @@ static int xhci_plat_probe(struct platform_device *pdev) put_hcd: usb_put_hcd(hcd); +disable_runtime: + pm_runtime_put_noidle(>dev); + pm_runtime_disable(>dev); + return ret; } @@ -298,6 +316,9 @@ static int xhci_plat_remove(struct platform_device *dev) clk_disable_unprepare(clk); usb_put_hcd(hcd); + pm_runtime_set_suspended(>dev); + pm_runtime_disable(>dev); + return 0; } @@ -325,14 +346,33 @@ static int xhci_plat_resume(struct device *dev) return xhci_resume(xhci, 0); } +#endif /* CONFIG_PM_SLEEP */ + +#ifdef CONFIG_PM +static int xhci_plat_runtime_suspend(struct device *dev) +{ + struct usb_hcd *hcd = dev_get_drvdata(dev); + struct xhci_hcd *xhci = hcd_to_xhci(hcd); + + return xhci_suspend(xhci, device_may_wakeup(dev)); +} + +static int xhci_plat_runtime_resume(struct device *dev) +{ + struct usb_hcd *hcd = dev_get_drvdata(dev); + struct xhci_hcd *xhci = hcd_to_xhci(hcd); + + return xhci_resume(xhci, 0); +} +#endif /* CONFIG_PM */ static const struct dev_pm_ops xhci_plat_pm_ops = { SET_SYSTEM_SLEEP_PM_OPS(xhci_plat_suspend, xhci_plat_resume) + + SET_RUNTIME_PM_OPS(xhci_plat_runtime_suspend, + xhci_plat_runtime_resume, + NULL) }; -#define DEV_PM_OPS (_plat_pm_ops) -#else -#define DEV_PM_OPS NULL -#endif /* CONFIG_PM */ static const struct acpi_device_id usb_xhci_acpi_match[] = { /* XHCI-compliant USB Controller */ @@ -346,7 +386,7 @@ static int xhci_plat_resume(struct device *dev) .remove = xhci_plat_remove, .driver = { .name = "xhci-hcd", - .pm = DEV_PM_OPS, + .pm = _plat_pm_ops, .of_match_table = of_match_ptr(usb_xhci_of_match), .acpi_match_table = ACPI_PTR(usb_xhci_acpi_match), }, -- 1.7.9.5
Re: [RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask
On Wed, Mar 29, 2017 at 10:13 AM, Oza Ozawrote: > On Tue, Mar 28, 2017 at 7:59 PM, Robin Murphy wrote: >> On 28/03/17 06:27, Oza Oza wrote: >>> On Mon, Mar 27, 2017 at 8:16 PM, Rob Herring wrote: On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep wrote: > it is possible that PCI device supports 64-bit DMA addressing, > and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), > however PCI host bridge may have limitations on the inbound > transaction addressing. As an example, consider NVME SSD device > connected to iproc-PCIe controller. > > Currently, the IOMMU DMA ops only considers PCI device dma_mask > when allocating an IOVA. This is particularly problematic on > ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to > PA for in-bound transactions only after PCI Host has forwarded > these transactions on SOC IO bus. This means on such ARM/ARM64 > SOCs the IOVA of in-bound transactions has to honor the addressing > restrictions of the PCI Host. > > current pcie frmework and of framework integration assumes dma-ranges > in a way where memory-mapped devices define their dma-ranges. > dma-ranges: (child-bus-address, parent-bus-address, length). > > but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. > dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; If you implement a common function, then I expect to see other users converted to use it. There's also PCI hosts in arch/powerpc that parse dma-ranges. >>> >>> the common function should be similar to what >>> of_pci_get_host_bridge_resources is doing right now. >>> it parses ranges property right now. >>> >>> the new function would look look following. >>> >>> of_pci_get_dma_ranges(struct device_node *dev, struct list_head *resources) >>> where resources would return the dma-ranges. >>> >>> but right now if you see the patch, of_dma_configure calls the new >>> function, which actually returns the largest possible size. >>> so this new function has to be generic in a way where other PCI hosts >>> can use it. but certainly iproc(Broadcom SOC) , rcar based SOCs can >>> use it for sure. >>> >>> although having powerpc using it; is a separate exercise, since I do >>> not have any access to other PCI hosts such as powerpc. but we can >>> workout with them on thsi forum if required. >>> >>> so overall, of_pci_get_dma_ranges has to serve following 2 purposes. >>> >>> 1) it has to return largest possible size to of_dma_configure to >>> generate largest possible dma_mask. >>> >>> 2) it also has to return resources (dma-ranges) parsed, to the users. >>> >>> so to address above needs >>> >>> of_pci_get_dma_ranges(struct device_node *dev, struct list_head >>> *resources, u64 *size) >>> >>> dev -> device node. >>> resources -> dma-ranges in allocated list. >>> size -> highest possible size to generate possible dma_mask for >>> of_dma_configure. >>> >>> let em know how this sounds. >> >> Note that the point of passing PCI host bridges into of_dma_configure() >> in the first place was to avoid having some separate PCI-specific path >> for DMA configuration. I worry that introducing bus-specific dma-ranges >> parsing largely defeats that, since we end up with the worst of both >> worlds; effectively-duplicated code, and/or a load of extra complexity >> to then attempt to reconverge the divergent paths (there really >> shouldn't be any need to allocate a list of anything). Given that >> of_translate_dma_address() is already bus-agnostic, it hardly seems >> justifiable for its caller not to be so as well, especially when it >> mostly just comes down to getting the right #address-cells value. >> >> The patch below is actually enough to make typical cases work, but is >> vile, so I'm not seriously considering it (hence I've not bothered >> making IOMMU configuration handle all circumstances). What it has served >> to do, though, is give me a clear idea of how to properly sort out the >> not-quite-right device/parent assumptions between of_dma_configure() and >> of_dma_get_range() rather than bodging around them any further - stay tuned. >> >> Robin. >> >> ->8- >> From: Robin Murphy >> Subject: [PATCH] of/pci: Use child node for DMA configuration >> >> of_dma_configure() expects to be passed an OF node representing the >> device being configured - for PCI devices we currently pass the node of >> the appropriate host controller, which sort of works for inherited >> properties which may appear at any level, like "dma-coherent", but falls >> apart for properties which actually care about specific device-parent >> relationships, like "dma-ranges". >> >> Solve this by attempting to find a suitable child node if the PCI >> hierarchy is actually represented in DT, and if not then faking one up >> as a last resort, to