Re: linux-next: Tree for Feb 4 (gpu/drm/i915/)
On 2/4/21 1:13 AM, Stephen Rothwell wrote: > Hi all, > > Changes since 20210203: > on x86_64: Still seeing 2 unrelated issues: WARNING: unmet direct dependencies detected for DRM_I915_WERROR Depends on [n]: HAS_IOMEM [=y] && DRM_I915 [=m] && EXPERT [=y] && !COMPILE_TEST [=y] Selected by [m]: - DRM_I915_DEBUG [=y] && HAS_IOMEM [=y] && EXPERT [=y] && DRM_I915 [=m] ../drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_freeze_late’: ../drivers/gpu/drm/i915/i915_gem.c:1182:2: error: implicit declaration of function ‘wbinvd_on_all_cpus’; did you mean ‘wrmsr_on_cpus’? [-Werror=implicit-function-declaration] wbinvd_on_all_cpus(); Full randconfig file is attached. -- ~Randy Reported-by: Randy Dunlap config-r7644.gz Description: application/gzip
linux-next: Tree for Feb 4
Hi all, Changes since 20210203: The net-next tree gained a build failure, so I used the version from next-20210203. The tip tree still had its boot failure so I reverted a commit. The drivers-x86 tree gained conflicts against the drm-misc tree. Non-merge commits (relative to Linus' tree): 7615 7960 files changed, 298401 insertions(+), 226420 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig and htmldocs. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 333 trees (counting Linus' and 87 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (61556703b610 Merge tag 'for-linus-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml) Merging fixes/fixes (e71ba9452f0b Linux 5.11-rc2) Merging kbuild-current/fixes (074075aea2ff scripts/clang-tools: switch explicitly to Python 3) Merging arc-current/for-curr (7c53f6b671f4 Linux 5.11-rc3) Merging arm-current/fixes (199a427c3a3d ARM: ensure the signal page contains defined contents) Merging arm64-fixes/for-next/fixes (22cd5edb2d9c arm64: Use simpler arithmetics for the linear map macros) Merging arm-soc-fixes/arm/fixes (459630a3ebb4 Merge tag 'sunxi-fixes-for-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes) Merging drivers-memory-fixes/fixes (5c8fe583cce5 Linux 5.11-rc1) Merging m68k-current/for-linus (2ae92e8b9b7e MAINTAINERS: Update m68k Mac entry) Merging powerpc-fixes/fixes (24321ac668e4 powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64() semantics) Merging s390-fixes/fixes (e82080e1f456 s390: uv: Fix sysfs max number of VCPUs reporting) Merging sparc/master (0a95a6d1a4cd sparc: use for_each_child_of_node() macro) Merging fscrypt-current/for-stable (d19d8d345eec fscrypt: fix inline encryption not used on new files) Merging net/master (3aaf0a27ffc2 Merge tag 'clang-format-for-linux-v5.11-rc7' of git://github.com/ojeda/linux) Merging bpf/master (6183f4d3a0a2 bpf: Check for integer overflow when using roundup_pow_of_two()) Merging ipsec/master (da64ae2d35d3 xfrm: Fix wraparound in xfrm_policy_addr_delta()) Merging netfilter/master (44a674d6f798 Merge tag 'mlx5-fixes-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux) Merging ipvs/master (44a674d6f798 Merge tag 'mlx5-fixes-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux) Merging wireless-drivers/master (93a1d4791c10 mt76: dma: fix a possible memory leak in mt76_add_fragment()) Merging mac80211/master (3aaf0a27ffc2 Merge tag 'clang-format-for-linux-v5.11-rc7' of git://github.com/ojeda/linux) Merging rdma-fixes/for-rc (1048ba83fb1c Linux 5.11-rc6) Merging sound-current/for-linus (4841b8e6318a ALSA: hda/realtek: modify EAPD in the ALC886) Merging sound-asoc-fixes/for-linus (c5ea12b798b0 Merge remote-tracking branch 'asoc/for-5.11' into asoc-linus) Merging regmap-fixes/for-linus (19c329f68089 Linux 5.11-rc4) Merging regulator-fixes/for-linus (f874736f1250 Merge remote-tracking branch 'regulator/for-5.11' into regulator-linus) Merging spi-fixes/for-linus (04aa85475c4c Merge remote-tracking branch 'spi/for-5.11' into spi-linus) Merging pci-current/for-linus (7e69d07d7c3c Revert "PCI/ASPM: Save/restore L1SS Capability for suspend/resume") Merging driver-core.current/driver-core-linus (6ee1d745b7c9 Linux 5.11-rc5) Merging tty.current/tty-linus
linux-next: Tree for Feb 4
Hi all, Changes since 20190201: The vfs tree still had its build failure for which I applied a patch. The net-next tree gained a build failure for which I applied a fix patch. The drm-tegra tree gained a build failure so I used the version from next-20190201. The driver-core tree lost its build failure. Non-merge commits (relative to Linus' tree): 5280 6047 files changed, 219992 insertions(+), 146740 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 296 trees (counting Linus' and 69 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (24b888d8d598 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip) Merging fixes/master (d8d0c3a7f601 x86/syscalls: Mark expected switch fall-throughs) Merging kbuild-current/fixes (6db2983cd806 kallsyms: Handle too long symbols in kallsyms.c) Merging arc-current/for-curr (c4a8fb41246c ARCv2: lib: memcpy: fix doing prefetchw outside of buffer) Merging arm-current/fixes (1b5ba3507842 ARM: 8824/1: fix a migrating irq bug when hotplug cpu) Merging arm64-fixes/for-next/fixes (f7daa9c8fd19 arm64: hibernate: Clean the __hyp_text to PoC after resume) Merging m68k-current/for-linus (bed1369f5190 m68k: Fix memblock-related crashes) Merging powerpc-fixes/fixes (7bea7ac0ca01 powerpc/syscalls: Fix syscall tracing) Merging sparc/master (b71acb0e3721 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (8dfb8d2cceb7 net: systemport: Fix WoL with password after deep sleep) Merging bpf/master (e7b816415e03 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf) Merging ipsec/master (09db51241118 esp: Skip TX bytes accounting when sending from a request socket) Merging netfilter/master (ba59fb027307 sctp: walk the list of asoc safely) Merging ipvs/master (b2e3d68d1251 netfilter: nft_compat: destroy function must not have side effects) Merging wireless-drivers/master (8c22d81d5535 MAINTAINERS: add entry for redpine wireless driver) Merging mac80211/master (e005bd7ddea0 cfg80211: call disconnect_wk when AP stops) Merging rdma-fixes/for-rc (7b21b69ab203 IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociate) Merging sound-current/for-linus (305a0ade1809 ALSA: hda - Serialize codec registrations) Merging sound-asoc-fixes/for-linus (923ed80588cf Merge branch 'asoc-5.0' into asoc-linus) Merging regmap-fixes/for-linus (f17b5f06cb92 Linux 5.0-rc4) Merging regulator-fixes/for-linus (c05e202d60de Merge branch 'regulator-5.0' into regulator-linus) Merging spi-fixes/for-linus (2186097e00f9 Merge branch 'spi-5.0' into spi-linus) Merging pci-current/for-linus (f14bcc0add3a Revert "PCI: armada8k: Add support for gpio controlled reset signal") Merging driver-core.current/driver-core-linus (36991ca68db9 blk-mq: protect debugfs_create_files() from failures) Merging tty.current/tty-linus (fedb5760648a serial: fix race between flush_to_ldisc and tty_open) Merging usb.current/usb-linus (a07ddce4df80 usb: typec: tcpm: Correct the PPS out_volt calculation) Merging usb-gadget-fixes/fixes (a53469a68eb8 usb: phy: am335x: fix race condition in _probe) Merging usb-serial-fixes/usb-linus (f17b5f06cb92 Linux 5.0-rc4) Merging
linux-next: Tree for Feb 4
Hi all, Changes since 20160203: The gpio tree still had its build failure so I used the version from next-20160128. The aio tree still had a build failure so I used the version from next-20160111. The akpm-current tree lost its build failures. Non-merge commits (relative to Linus' tree): 2245 2187 files changed, 81671 insertions(+), 36219 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig (this fails its final link) and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 239 trees (counting Linus' and 36 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (b37a05c083c8 Merge branch 'akpm' (patches from Andrew)) Merging fixes/master (92e963f50fc7 Linux 4.5-rc1) Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on module install) Merging arc-current/for-curr (74bf8efb5fa6 Linux 4.4-rc7) Merging arm-current/fixes (03590cb56d5d ARM: wire up copy_file_range() syscall) Merging m68k-current/for-linus (daf670bc9d36 m68k/defconfig: Update defconfigs for v4.5-rc1) Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached build errors) Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5) Merging powerpc-fixes/fixes (19f97c983071 powerpc/book3s_32: Fix build error with checkpoint restart) Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2) Merging sparc/master (ca0bb0798022 Add sun4v_wdt watchdog driver) Merging net/master (34229b277480 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging ipsec/master (a8a572a6b5f2 xfrm: dst_entries_init() per-net dst_ops) Merging ipvs/master (b16c29191dc8 netfilter: nf_conntrack: use safer way to lock all buckets) Merging wireless-drivers/master (f9ead9beef3f Merge tag 'iwlwifi-for-kalle-2016-01-26_2' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes) Merging mac80211/master (212c5a5e6ba6 mac80211: minstrel: Change expected throughput unit back to Kbps) Merging sound-current/for-linus (2e5dc73fe1c4 Merge branch 'topic/core-fixes' into for-linus) Merging pci-current/for-linus (46560388c476 PCI: iproc: Allow multiple devices except on PAXC) Merging driver-core.current/driver-core-linus (36f90b0a2ddd Linux 4.5-rc2) Merging tty.current/tty-linus (36f90b0a2ddd Linux 4.5-rc2) Merging usb.current/usb-linus (5c82171167ad xhci: Fix list corruption in urb dequeue at host removal) Merging usb-gadget-fixes/fixes (6a4290cc28be usb: dwc3: gadget: set the OTG flag in dwc3 gadget driver.) Merging usb-serial-fixes/usb-linus (4152b387da81 USB: option: fix Cinterion AHxx enumeration) Merging usb-chipidea-fixes/ci-for-usb-stable (6f51bc340d2a usb: chipidea: imx: fix a possible NULL dereference) Merging staging.current/staging-linus (5982557ac6ee Merge tag 'iio-fixes-for-4.5b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus) Merging char-misc.current/char-misc-linus (92e963f50fc7 Linux 4.5-rc1) Merging input-current/for-linus (d4f1b06d685d Input: vmmouse - fix absolute device registration) Merging crypto-current/master (49a20454e0eb crypto: atmel-aes - remove calls of clk_prepare() from atomic contexts) Merging ide/master (e04a2bd6d8c9 drivers/ide: make ide-scan-pci.c driver explicitly non-modular) Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test for PPC_PSERIES) Merging rr-fixes/fixes (275d7d44d802 module: Fix locking in symbol_put_addr()) Merging vfio-fixes/for-linus (16ab8a5cbea4
linux-next: Tree for Feb 4
Hi all, Changes since 20160203: The gpio tree still had its build failure so I used the version from next-20160128. The aio tree still had a build failure so I used the version from next-20160111. The akpm-current tree lost its build failures. Non-merge commits (relative to Linus' tree): 2245 2187 files changed, 81671 insertions(+), 36219 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig (this fails its final link) and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 239 trees (counting Linus' and 36 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (b37a05c083c8 Merge branch 'akpm' (patches from Andrew)) Merging fixes/master (92e963f50fc7 Linux 4.5-rc1) Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on module install) Merging arc-current/for-curr (74bf8efb5fa6 Linux 4.4-rc7) Merging arm-current/fixes (03590cb56d5d ARM: wire up copy_file_range() syscall) Merging m68k-current/for-linus (daf670bc9d36 m68k/defconfig: Update defconfigs for v4.5-rc1) Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached build errors) Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5) Merging powerpc-fixes/fixes (19f97c983071 powerpc/book3s_32: Fix build error with checkpoint restart) Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2) Merging sparc/master (ca0bb0798022 Add sun4v_wdt watchdog driver) Merging net/master (34229b277480 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging ipsec/master (a8a572a6b5f2 xfrm: dst_entries_init() per-net dst_ops) Merging ipvs/master (b16c29191dc8 netfilter: nf_conntrack: use safer way to lock all buckets) Merging wireless-drivers/master (f9ead9beef3f Merge tag 'iwlwifi-for-kalle-2016-01-26_2' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes) Merging mac80211/master (212c5a5e6ba6 mac80211: minstrel: Change expected throughput unit back to Kbps) Merging sound-current/for-linus (2e5dc73fe1c4 Merge branch 'topic/core-fixes' into for-linus) Merging pci-current/for-linus (46560388c476 PCI: iproc: Allow multiple devices except on PAXC) Merging driver-core.current/driver-core-linus (36f90b0a2ddd Linux 4.5-rc2) Merging tty.current/tty-linus (36f90b0a2ddd Linux 4.5-rc2) Merging usb.current/usb-linus (5c82171167ad xhci: Fix list corruption in urb dequeue at host removal) Merging usb-gadget-fixes/fixes (6a4290cc28be usb: dwc3: gadget: set the OTG flag in dwc3 gadget driver.) Merging usb-serial-fixes/usb-linus (4152b387da81 USB: option: fix Cinterion AHxx enumeration) Merging usb-chipidea-fixes/ci-for-usb-stable (6f51bc340d2a usb: chipidea: imx: fix a possible NULL dereference) Merging staging.current/staging-linus (5982557ac6ee Merge tag 'iio-fixes-for-4.5b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus) Merging char-misc.current/char-misc-linus (92e963f50fc7 Linux 4.5-rc1) Merging input-current/for-linus (d4f1b06d685d Input: vmmouse - fix absolute device registration) Merging crypto-current/master (49a20454e0eb crypto: atmel-aes - remove calls of clk_prepare() from atomic contexts) Merging ide/master (e04a2bd6d8c9 drivers/ide: make ide-scan-pci.c driver explicitly non-modular) Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test for PPC_PSERIES) Merging rr-fixes/fixes (275d7d44d802 module: Fix locking in symbol_put_addr()) Merging vfio-fixes/for-linus (16ab8a5cbea4
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 8:46 PM, Sedat Dilek wrote: > On Thu, Feb 5, 2015 at 4:17 AM, Martin K. Petersen > wrote: >>> "Sedat" == Sedat Dilek writes: >> >> Sedat> No, but I am here on a so-called WUBI installation which >> Sedat> triggered some bugs being an exotic installation. My >> Sedat> Ubuntu/precise is a 18GiB image laying on my Win7 partition >> Sedat> (/dev/sda2). >> >> I've been mulling over this for a while and can't come up with a good >> approach. So let's just nuke these warnings. >> >> -- >> Martin K. Petersen Oracle Linux Engineering >> >> >> block: Quiesce zeroout wrapper >> >> blkdev_issue_zeroout() printed a warning if a device failed a discard or >> write same request despite advertising support for these. That's fine >> for SCSI since we'll disable these commands if we get an error back from >> the disk saying that they are not supported. And consequently the >> warning only gets printed once. >> >> There are other types of block devices that support discard, however, >> and these may return -EOPNOTSUPP for each command but leave discard >> enabled in the queue limits. This will cause a warning message for every >> blkdev_issue_zeroout() invocation. >> >> Remove the offending warning messages. >> >> Reported-by: Sedat Dilek >> Signed-off-by: Martin K. Petersen >> --- >> block/blk-lib.c | 26 +++--- >> 1 file changed, 7 insertions(+), 19 deletions(-) >> >> diff --git a/block/blk-lib.c b/block/blk-lib.c >> index 715e948f58a4..7688ee3f5d72 100644 >> --- a/block/blk-lib.c >> +++ b/block/blk-lib.c >> @@ -286,7 +286,6 @@ static int __blkdev_issue_zeroout(struct block_device >> *bdev, sector_t sector, >> * @discard: whether to discard the block range >> * >> * Description: >> - >> * Zero-fill a block range. If the discard flag is set and the block >> * device guarantees that subsequent READ operations to the block range >> * in question will return zeroes, the blocks will be discarded. Should >> @@ -303,26 +302,15 @@ int blkdev_issue_zeroout(struct block_device *bdev, >> sector_t sector, >> sector_t nr_sects, gfp_t gfp_mask, bool discard) >> { >> struct request_queue *q = bdev_get_queue(bdev); >> - unsigned char bdn[BDEVNAME_SIZE]; >> - >> - if (discard && blk_queue_discard(q) && >> q->limits.discard_zeroes_data) { >> >> - if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, >> 0)) >> - return 0; >> - >> - bdevname(bdev, bdn); >> - pr_warn("%s: DISCARD failed. Manually zeroing.\n", bdn); >> - } >> + if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data >> && >> + blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0) >> + return 0; >> >> - if (bdev_write_same(bdev)) { >> - >> - if (!blkdev_issue_write_same(bdev, sector, nr_sects, >> gfp_mask, >> -ZERO_PAGE(0))) >> - return 0; >> - >> - bdevname(bdev, bdn); >> - pr_warn("%s: WRITE SAME failed. Manually zeroing.\n", bdn); >> - } >> + if (bdev_write_same(bdev) && >> + blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, >> + ZERO_PAGE(0)) == 0) >> + return 0; >> >> return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); >> } > > Martin, will you send a separate patch for that? > > Thanks. > Just for the sake of completeness, the patch is now in block-next: commit 9f9ee1f2b2f94f19437ae2def7c0d6636d7fe02e "block: Quiesce zeroout wrapper" - Sedat - [1] http://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/commit/?h=for-next=9f9ee1f2b2f94f19437ae2def7c0d6636d7fe02e -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 8:46 PM, Sedat Dilek sedat.di...@gmail.com wrote: On Thu, Feb 5, 2015 at 4:17 AM, Martin K. Petersen martin.peter...@oracle.com wrote: Sedat == Sedat Dilek sedat.di...@gmail.com writes: Sedat No, but I am here on a so-called WUBI installation which Sedat triggered some bugs being an exotic installation. My Sedat Ubuntu/precise is a 18GiB image laying on my Win7 partition Sedat (/dev/sda2). I've been mulling over this for a while and can't come up with a good approach. So let's just nuke these warnings. -- Martin K. Petersen Oracle Linux Engineering block: Quiesce zeroout wrapper blkdev_issue_zeroout() printed a warning if a device failed a discard or write same request despite advertising support for these. That's fine for SCSI since we'll disable these commands if we get an error back from the disk saying that they are not supported. And consequently the warning only gets printed once. There are other types of block devices that support discard, however, and these may return -EOPNOTSUPP for each command but leave discard enabled in the queue limits. This will cause a warning message for every blkdev_issue_zeroout() invocation. Remove the offending warning messages. Reported-by: Sedat Dilek sedat.di...@gmail.com Signed-off-by: Martin K. Petersen martin.peter...@oracle.com --- block/blk-lib.c | 26 +++--- 1 file changed, 7 insertions(+), 19 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 715e948f58a4..7688ee3f5d72 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -286,7 +286,6 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, * @discard: whether to discard the block range * * Description: - * Zero-fill a block range. If the discard flag is set and the block * device guarantees that subsequent READ operations to the block range * in question will return zeroes, the blocks will be discarded. Should @@ -303,26 +302,15 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, bool discard) { struct request_queue *q = bdev_get_queue(bdev); - unsigned char bdn[BDEVNAME_SIZE]; - - if (discard blk_queue_discard(q) q-limits.discard_zeroes_data) { - if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0)) - return 0; - - bdevname(bdev, bdn); - pr_warn(%s: DISCARD failed. Manually zeroing.\n, bdn); - } + if (discard blk_queue_discard(q) q-limits.discard_zeroes_data + blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0) + return 0; - if (bdev_write_same(bdev)) { - - if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, -ZERO_PAGE(0))) - return 0; - - bdevname(bdev, bdn); - pr_warn(%s: WRITE SAME failed. Manually zeroing.\n, bdn); - } + if (bdev_write_same(bdev) + blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, + ZERO_PAGE(0)) == 0) + return 0; return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); } Martin, will you send a separate patch for that? Thanks. Just for the sake of completeness, the patch is now in block-next: commit 9f9ee1f2b2f94f19437ae2def7c0d6636d7fe02e block: Quiesce zeroout wrapper - Sedat - [1] http://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/commit/?h=for-nextid=9f9ee1f2b2f94f19437ae2def7c0d6636d7fe02e -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Fri, Feb 6, 2015 at 1:12 AM, Steven Rostedt wrote: > On Fri, 6 Feb 2015 00:53:41 +0100 > Sedat Dilek wrote: > >> > See that if (IS_ENABLED(CONFIG_LOCKDEP))? >> > >> >> I have here... >> >> CONFIG_LOCKDEP=y > > Yep, I knew that (you wouldn't get splats without it). > > >> Which old patch? >> "tlb: Don't do trace_tlb_flush() on offline CPUs" ? > > Yeah, that one. In other words, just add this patch on the kernel you > just tested. > > Thanks, > Do you have a name with label for your patch? - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Fri, 6 Feb 2015 00:53:41 +0100 Sedat Dilek wrote: > > See that if (IS_ENABLED(CONFIG_LOCKDEP))? > > > > I have here... > > CONFIG_LOCKDEP=y Yep, I knew that (you wouldn't get splats without it). > Which old patch? > "tlb: Don't do trace_tlb_flush() on offline CPUs" ? Yeah, that one. In other words, just add this patch on the kernel you just tested. Thanks, -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
[...] >> That said, let's add this (on top of the old patch): >> > > Which old patch? > "tlb: Don't do trace_tlb_flush() on offline CPUs" ? > Or did you mean "x86/mm: Omit switch_mm() tracing for offline CPUs" - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Fri, Feb 6, 2015 at 12:11 AM, Steven Rostedt wrote: > On Thu, 5 Feb 2015 23:16:21 +0100 > Sedat Dilek wrote: > >> On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt wrote: >> > On Thu, 5 Feb 2015 22:45:59 +0100 >> > Sedat Dilek wrote: >> > >> >> Steve, this was a typo it's called tlb_flush not tlb_flush*ed*: >> > >> > Heh, yeah, I typed that entire line in by hand. Just be lucky that was >> > the only typo ;-) >> > >> >> >> >> # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable >> >> 1 >> >> >> >> [ 391.090381] intel_pstate CPU 1 exiting >> >> [ 391.104491] smpboot: CPU 1 is now offline >> >> >> > >> > Now, if you disable that (echo 0 to that file), do you still get the >> > rcu lockdep splat if you suspend and resume? >> > >> >> YES, I get the call-trace again! >> > > Bah! I see where the warning comes from. In include/linux/tracepoint.h > we have: > > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ > extern struct tracepoint __tracepoint_##name; \ > static inline void trace_##name(proto) \ > { \ > if (static_key_false(&__tracepoint_##name.key)) \ > __DO_TRACE(&__tracepoint_##name,\ > TP_PROTO(data_proto), \ > TP_ARGS(data_args), \ > TP_CONDITION(cond),,); \ > if (IS_ENABLED(CONFIG_LOCKDEP)) { \ > rcu_read_lock_sched_notrace(); \ > rcu_dereference_sched(__tracepoint_##name.funcs);\ > rcu_read_unlock_sched_notrace();\ > } \ > } \ > > See that if (IS_ENABLED(CONFIG_LOCKDEP))? > I have here... CONFIG_LOCKDEP=y - Sedat - > I'm recalling this. Because tracepoints require RCU, and RCU lockdep > doesn't trigger if a tracepoint isn't enabled (because the rcu calls > are hidden in the __DO_TRACE() behind that static_key_false), we would > be missing lots of rcu problem tracepoints because tests were run > without them enabled. > > The answer was to add this rcu check when LOCKDEP was enabled. So no, > adding that conditional isn't going to help, because lockdep will > trigger here, even if it were safe because of the conditional :-/. > > That said, let's add this (on top of the old patch): > Which old patch? "tlb: Don't do trace_tlb_flush() on offline CPUs" ? - Sedat - > (again, not tested) > > Signed-off-by: Steven Rostedt > --- > diff --git a/arch/x86/include/asm/mmu_context.h > b/arch/x86/include/asm/mmu_context.h > index 4b75d591eb5e..401b5bfbcdbd 100644 > --- a/arch/x86/include/asm/mmu_context.h > +++ b/arch/x86/include/asm/mmu_context.h > @@ -47,7 +47,12 @@ static inline void switch_mm(struct mm_struct *prev, > struct mm_struct *next, > > /* Re-load page tables */ > load_cr3(next->pgd); > - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); > + /* > +* Do not check rcu when tracing is not enabled. The > +* tracepoint has a condition to not trace if the CPU is > +* offline, and rcu check will complain if it is. > +*/ > + trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, > TLB_FLUSH_ALL); > > /* Stop flush ipis for the previous mm */ > cpumask_clear_cpu(cpu, mm_cpumask(prev)); > @@ -84,7 +89,13 @@ static inline void switch_mm(struct mm_struct *prev, > struct mm_struct *next, > * to make sure to use no freed page tables. > */ > load_cr3(next->pgd); > - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, > TLB_FLUSH_ALL); > + /* > +* Do not check rcu when tracing is not enabled. The > +* tracepoint has a condition to not trace if the CPU > is > +* offline, and rcu check will complain if it is. > +*/ > + trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, > + TLB_FLUSH_ALL); > load_LDT_nolock(>context); > } > } > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index e08e21e5f601..747a05aceb60 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -179,6 +179,14 @@ extern void syscall_unregfunc(void); > rcu_read_unlock_sched_notrace();\ > }
Re: linux-next: Tree for Feb 4
On Thu, 5 Feb 2015 23:16:21 +0100 Sedat Dilek wrote: > On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt wrote: > > On Thu, 5 Feb 2015 22:45:59 +0100 > > Sedat Dilek wrote: > > > >> Steve, this was a typo it's called tlb_flush not tlb_flush*ed*: > > > > Heh, yeah, I typed that entire line in by hand. Just be lucky that was > > the only typo ;-) > > > >> > >> # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable > >> 1 > >> > >> [ 391.090381] intel_pstate CPU 1 exiting > >> [ 391.104491] smpboot: CPU 1 is now offline > >> > > > > Now, if you disable that (echo 0 to that file), do you still get the > > rcu lockdep splat if you suspend and resume? > > > > YES, I get the call-trace again! > Bah! I see where the warning comes from. In include/linux/tracepoint.h we have: #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name,\ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond),,); \ if (IS_ENABLED(CONFIG_LOCKDEP)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace();\ } \ } \ See that if (IS_ENABLED(CONFIG_LOCKDEP))? I'm recalling this. Because tracepoints require RCU, and RCU lockdep doesn't trigger if a tracepoint isn't enabled (because the rcu calls are hidden in the __DO_TRACE() behind that static_key_false), we would be missing lots of rcu problem tracepoints because tests were run without them enabled. The answer was to add this rcu check when LOCKDEP was enabled. So no, adding that conditional isn't going to help, because lockdep will trigger here, even if it were safe because of the conditional :-/. That said, let's add this (on top of the old patch): (again, not tested) Signed-off-by: Steven Rostedt --- diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 4b75d591eb5e..401b5bfbcdbd 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -47,7 +47,12 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, /* Re-load page tables */ load_cr3(next->pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + /* +* Do not check rcu when tracing is not enabled. The +* tracepoint has a condition to not trace if the CPU is +* offline, and rcu check will complain if it is. +*/ + trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); /* Stop flush ipis for the previous mm */ cpumask_clear_cpu(cpu, mm_cpumask(prev)); @@ -84,7 +89,13 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * to make sure to use no freed page tables. */ load_cr3(next->pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + /* +* Do not check rcu when tracing is not enabled. The +* tracepoint has a condition to not trace if the CPU is +* offline, and rcu check will complain if it is. +*/ + trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, + TLB_FLUSH_ALL); load_LDT_nolock(>context); } } diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index e08e21e5f601..747a05aceb60 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -179,6 +179,14 @@ extern void syscall_unregfunc(void); rcu_read_unlock_sched_notrace();\ } \ } \ + static inline void trace_##name##_rcu_nocheck(proto)\ + { \ + if (static_key_false(&__tracepoint_##name.key)) \ +
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt wrote: > On Thu, 5 Feb 2015 22:45:59 +0100 > Sedat Dilek wrote: > >> Steve, this was a typo it's called tlb_flush not tlb_flush*ed*: > > Heh, yeah, I typed that entire line in by hand. Just be lucky that was > the only typo ;-) > >> >> # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable >> 1 >> >> [ 391.090381] intel_pstate CPU 1 exiting >> [ 391.104491] smpboot: CPU 1 is now offline >> > > Now, if you disable that (echo 0 to that file), do you still get the > rcu lockdep splat if you suspend and resume? > YES, I get the call-trace again! # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable 0 # cat /sys/devices/system/cpu/cpu1/online 0 [ 2470.606222] intel_pstate CPU 1 exiting [ 2470.628153] [ 2470.628155] === [ 2470.628156] [ INFO: suspicious RCU usage. ] [ 2470.628159] 3.19.0-rc7-next-20150204.9-iniza-small #1 Not tainted [ 2470.628160] --- [ 2470.628162] include/trace/events/tlb.h:37 suspicious rcu_dereference_check() usage! [ 2470.628163] [ 2470.628163] other info that might help us debug this: [ 2470.628163] [ 2470.628164] [ 2470.628164] RCU used illegally from offline CPU! [ 2470.628164] rcu_scheduler_active = 1, debug_locks = 0 [ 2470.628165] no locks held by swapper/1/0. [ 2470.628166] [ 2470.628166] stack backtrace: [ 2470.628169] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.9-iniza-small #1 [ 2470.628171] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 2470.628176] 0001 88011a44fe18 817ecf4d 0011 [ 2470.628179] 88011a448290 88011a44fe48 810d6b57 8800cc2660c0 [ 2470.628182] 0001 81d35160 0002 88011a44fe78 [ 2470.628183] Call Trace: [ 2470.628192] [] dump_stack+0x4c/0x65 [ 2470.628198] [] lockdep_rcu_suspicious+0xe7/0x120 [ 2470.628203] [] idle_task_exit+0x1c9/0x260 [ 2470.628208] [] play_dead_common+0xe/0x50 [ 2470.628211] [] native_play_dead+0x15/0x140 [ 2470.628216] [] arch_cpu_idle_dead+0xf/0x20 [ 2470.628219] [] cpu_startup_entry+0x37e/0x580 [ 2470.628222] smpboot: CPU 1 didn't die... [ 2470.628224] [] start_secondary+0x140/0x150 - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, 5 Feb 2015 22:45:59 +0100 Sedat Dilek wrote: > Steve, this was a typo it's called tlb_flush not tlb_flush*ed*: Heh, yeah, I typed that entire line in by hand. Just be lucky that was the only typo ;-) > > # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable > 1 > > [ 391.090381] intel_pstate CPU 1 exiting > [ 391.104491] smpboot: CPU 1 is now offline > Now, if you disable that (echo 0 to that file), do you still get the rcu lockdep splat if you suspend and resume? -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
[...] >>> >> Unfortunately, the call-trace remains when doing an offlining of cpu1. >>> >> ( It's good to see it's reproducible. ) >>> > >>> > Was the tracepoint enabled? Or was there some other rcu call that >>> > triggered this. Or would cpu_online(smp_processor_id()) return true at >>> > this point? >>> > >>> >>> Thanks Steve for jumping into this one! >>> >>> Good point. >>> I looked at my kernel-config (which I already sent :-)). >>> >>> Do I need to enable...? >>> >>> # CONFIG_RCU_TRACE is not set >>> >>> ...or even more? >>> >> >> What I meant by the tracepoint being enabled, was not that it was >> configured in (I'm assuming it was), but that you started tracing? >> >> echo 1 > /sys/kernel/debug/tracing/events/enable >> >> or >> >> echo 1 > /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable >> > > NO, I did not start any tracing before doing my testing. > > # cat /sys/kernel/debug/tracing/events/enable > 0 > > # echo 1 > /sys/kernel/debug/tracing/events/enable > > # cat /sys/kernel/debug/tracing/events/enable > X > > # LC_ALL=C cat /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable > cat: /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable: No such > file or directory > > Looks like I need to enable...? > > # CONFIG_DEBUG_TLBFLUSH is not set > Here my new kernel-config (not sure if I really need them to be enabled): $ ./scripts/diffconfig /boot/config-3.19.0-rc7-next-20150204.7-iniza-small /boot/config-3.19.0-rc7-next-20150204.9-iniza-small DEBUG_TLBFLUSH n -> y RCU_TRACE n -> y TREE_RCU_TRACE n -> y Steve, this was a typo it's called tlb_flush not tlb_flush*ed*: # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable 1 [ 391.090381] intel_pstate CPU 1 exiting [ 391.104491] smpboot: CPU 1 is now offline - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 9:22 PM, Steven Rostedt wrote: > On Thu, 5 Feb 2015 21:07:27 +0100 > Sedat Dilek wrote: > >> > Is this Paul's version of the patch or mine? If it is just mine, do you >> > know if Paul's version triggers this too? >> > >> >> This one which entered Pauls rcu-next tree. >> >> [1] >> http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/next=2b27cf7317d8a99a50bead9faccd54b46b6f0c41 > > That's mine. > > It looks like the condition will be tested before it calls and rcu > code. Which is why I was confused that it still gave a splat. Paul > posted a patch before this that did the check outside the trace point. > > This one: > > http://marc.info/?l=linux-kernel=142310961217650=2 > >> >> >> ( I did not build from scratch but re-invoking make "updated" the >> >> files touched by Steven's patch, see attached build-log. ) >> >> >> >> Unfortunately, the call-trace remains when doing an offlining of cpu1. >> >> ( It's good to see it's reproducible. ) >> > >> > Was the tracepoint enabled? Or was there some other rcu call that >> > triggered this. Or would cpu_online(smp_processor_id()) return true at >> > this point? >> > >> >> Thanks Steve for jumping into this one! >> >> Good point. >> I looked at my kernel-config (which I already sent :-)). >> >> Do I need to enable...? >> >> # CONFIG_RCU_TRACE is not set >> >> ...or even more? >> > > What I meant by the tracepoint being enabled, was not that it was > configured in (I'm assuming it was), but that you started tracing? > > echo 1 > /sys/kernel/debug/tracing/events/enable > > or > > echo 1 > /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable > NO, I did not start any tracing before doing my testing. # cat /sys/kernel/debug/tracing/events/enable 0 # echo 1 > /sys/kernel/debug/tracing/events/enable # cat /sys/kernel/debug/tracing/events/enable X # LC_ALL=C cat /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable cat: /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable: No such file or directory Looks like I need to enable...? # CONFIG_DEBUG_TLBFLUSH is not set To answer your question... # cat /sys/devices/system/cpu/cpu1/online 1 # echo 0 > /sys/devices/system/cpu/cpu1/online # cat /sys/devices/system/cpu/cpu1/online 0 [ 375.337050] intel_pstate CPU 1 exiting [ 375.351069] smpboot: CPU 1 is now offline So, this did not happen this time. - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, 5 Feb 2015 21:07:27 +0100 Sedat Dilek wrote: > > Is this Paul's version of the patch or mine? If it is just mine, do you > > know if Paul's version triggers this too? > > > > This one which entered Pauls rcu-next tree. > > [1] > http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/next=2b27cf7317d8a99a50bead9faccd54b46b6f0c41 That's mine. It looks like the condition will be tested before it calls and rcu code. Which is why I was confused that it still gave a splat. Paul posted a patch before this that did the check outside the trace point. This one: http://marc.info/?l=linux-kernel=142310961217650=2 > > >> ( I did not build from scratch but re-invoking make "updated" the > >> files touched by Steven's patch, see attached build-log. ) > >> > >> Unfortunately, the call-trace remains when doing an offlining of cpu1. > >> ( It's good to see it's reproducible. ) > > > > Was the tracepoint enabled? Or was there some other rcu call that > > triggered this. Or would cpu_online(smp_processor_id()) return true at > > this point? > > > > Thanks Steve for jumping into this one! > > Good point. > I looked at my kernel-config (which I already sent :-)). > > Do I need to enable...? > > # CONFIG_RCU_TRACE is not set > > ...or even more? > What I meant by the tracepoint being enabled, was not that it was configured in (I'm assuming it was), but that you started tracing? echo 1 > /sys/kernel/debug/tracing/events/enable or echo 1 > /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 8:58 PM, Steven Rostedt wrote: > On Thu, 5 Feb 2015 20:25:21 +0100 > Sedat Dilek wrote: > >> On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney >> wrote: >> > On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: >> >> On 02/05/2015 10:34 AM, Paul E. McKenney wrote: >> >> >> > Did I actually need to be >> >> >> > onlining/offlining CPUs to hit the splat that Sedat was reporting? >> >> > Yep, you do need to offline at least one CPU to hit that splat. >> >> >> >> Heh, do we need a debugging mode that will randomly offline/online CPUs? >> >> :) >> > >> > For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c >> > are your friends. ;-) >> > >> > The problem is that I only run RCU-relevant combinations of Kconfigs, >> > which means that I missed the ones that Sedat used to find this problem. >> > So I guess it is a good thing that others run -next testing. >> > >> >> [ Revived by a voltaren resinat pill... ] >> >> I reverted "x86/mm: Omit switch_mm() tracing for offline CPUs" >> ...and... >> applied "tlb: Don't do trace_tlb_flush() on offline CPUs" >> ...in my build-dir. > > Is this Paul's version of the patch or mine? If it is just mine, do you > know if Paul's version triggers this too? > This one which entered Pauls rcu-next tree. [1] http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/next=2b27cf7317d8a99a50bead9faccd54b46b6f0c41 >> ( I did not build from scratch but re-invoking make "updated" the >> files touched by Steven's patch, see attached build-log. ) >> >> Unfortunately, the call-trace remains when doing an offlining of cpu1. >> ( It's good to see it's reproducible. ) > > Was the tracepoint enabled? Or was there some other rcu call that > triggered this. Or would cpu_online(smp_processor_id()) return true at > this point? > Thanks Steve for jumping into this one! Good point. I looked at my kernel-config (which I already sent :-)). Do I need to enable...? # CONFIG_RCU_TRACE is not set ...or even more? - Sedat - > -- Steve > >> >> root# echo 0 > /sys/devices/system/cpu/cpu1/online >> >> [ 121.652796] intel_pstate CPU 1 exiting >> [ 121.666272] >> [ 121.666274] === >> [ 121.666274] [ INFO: suspicious RCU usage. ] >> [ 121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted >> [ 121.666278] --- >> [ 121.666280] include/trace/events/tlb.h:37 suspicious >> rcu_dereference_check() usage! >> [ 121.666281] >> [ 121.666281] other info that might help us debug this: >> [ 121.666281] >> [ 121.666282] >> [ 121.666282] RCU used illegally from offline CPU! >> [ 121.666282] rcu_scheduler_active = 1, debug_locks = 0 >> [ 121.666283] no locks held by swapper/1/0. >> [ 121.666284] >> [ 121.666284] stack backtrace: >> [ 121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted >> 3.19.0-rc7-next-20150204.7-iniza-small #4 >> [ 121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 >> [ 121.666293] 0001 88011a44fe18 817e39cd >> 0011 >> [ 121.666296] 88011a448290 88011a44fe48 810d6af7 >> 8800d3dfaac0 >> [ 121.666299] 0001 81d32ce0 0005 >> 88011a44fe78 >> [ 121.666300] Call Trace: >> [ 121.666308] [] dump_stack+0x4c/0x65 >> [ 121.666313] [] lockdep_rcu_suspicious+0xe7/0x120 >> [ 121.666318] [] idle_task_exit+0x1c9/0x260 >> [ 121.666322] [] play_dead_common+0xe/0x50 >> [ 121.666325] [] native_play_dead+0x15/0x140 >> [ 121.666330] [] arch_cpu_idle_dead+0xf/0x20 >> [ 121.666333] [] cpu_startup_entry+0x37e/0x580 >> [ 121.666336] [] start_secondary+0x140/0x150 >> [ 121.666744] smpboot: CPU 1 is now offline >> >> >From rcu point this is now safe? >> But another area (linux-pm?) is still affected? >> I will try to test "vanilla" pm-next if the problem exists with >> intel_pstate as suggested by Rafael. >> Hmmm, not sure how I can get the pm-next code which went into >> next-20150204 as linux-pm.git#linux-next was feeded with new stuff. >> >> >> - Sedat - > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, 5 Feb 2015 20:25:21 +0100 Sedat Dilek wrote: > On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney > wrote: > > On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: > >> On 02/05/2015 10:34 AM, Paul E. McKenney wrote: > >> >> > Did I actually need to be > >> >> > onlining/offlining CPUs to hit the splat that Sedat was reporting? > >> > Yep, you do need to offline at least one CPU to hit that splat. > >> > >> Heh, do we need a debugging mode that will randomly offline/online CPUs? :) > > > > For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c > > are your friends. ;-) > > > > The problem is that I only run RCU-relevant combinations of Kconfigs, > > which means that I missed the ones that Sedat used to find this problem. > > So I guess it is a good thing that others run -next testing. > > > > [ Revived by a voltaren resinat pill... ] > > I reverted "x86/mm: Omit switch_mm() tracing for offline CPUs" > ...and... > applied "tlb: Don't do trace_tlb_flush() on offline CPUs" > ...in my build-dir. Is this Paul's version of the patch or mine? If it is just mine, do you know if Paul's version triggers this too? > ( I did not build from scratch but re-invoking make "updated" the > files touched by Steven's patch, see attached build-log. ) > > Unfortunately, the call-trace remains when doing an offlining of cpu1. > ( It's good to see it's reproducible. ) Was the tracepoint enabled? Or was there some other rcu call that triggered this. Or would cpu_online(smp_processor_id()) return true at this point? -- Steve > > root# echo 0 > /sys/devices/system/cpu/cpu1/online > > [ 121.652796] intel_pstate CPU 1 exiting > [ 121.666272] > [ 121.666274] === > [ 121.666274] [ INFO: suspicious RCU usage. ] > [ 121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted > [ 121.666278] --- > [ 121.666280] include/trace/events/tlb.h:37 suspicious > rcu_dereference_check() usage! > [ 121.666281] > [ 121.666281] other info that might help us debug this: > [ 121.666281] > [ 121.666282] > [ 121.666282] RCU used illegally from offline CPU! > [ 121.666282] rcu_scheduler_active = 1, debug_locks = 0 > [ 121.666283] no locks held by swapper/1/0. > [ 121.666284] > [ 121.666284] stack backtrace: > [ 121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > 3.19.0-rc7-next-20150204.7-iniza-small #4 > [ 121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > [ 121.666293] 0001 88011a44fe18 817e39cd > 0011 > [ 121.666296] 88011a448290 88011a44fe48 810d6af7 > 8800d3dfaac0 > [ 121.666299] 0001 81d32ce0 0005 > 88011a44fe78 > [ 121.666300] Call Trace: > [ 121.666308] [] dump_stack+0x4c/0x65 > [ 121.666313] [] lockdep_rcu_suspicious+0xe7/0x120 > [ 121.666318] [] idle_task_exit+0x1c9/0x260 > [ 121.666322] [] play_dead_common+0xe/0x50 > [ 121.666325] [] native_play_dead+0x15/0x140 > [ 121.666330] [] arch_cpu_idle_dead+0xf/0x20 > [ 121.666333] [] cpu_startup_entry+0x37e/0x580 > [ 121.666336] [] start_secondary+0x140/0x150 > [ 121.666744] smpboot: CPU 1 is now offline > > >From rcu point this is now safe? > But another area (linux-pm?) is still affected? > I will try to test "vanilla" pm-next if the problem exists with > intel_pstate as suggested by Rafael. > Hmmm, not sure how I can get the pm-next code which went into > next-20150204 as linux-pm.git#linux-next was feeded with new stuff. > > > - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 4:17 AM, Martin K. Petersen wrote: >> "Sedat" == Sedat Dilek writes: > > Sedat> No, but I am here on a so-called WUBI installation which > Sedat> triggered some bugs being an exotic installation. My > Sedat> Ubuntu/precise is a 18GiB image laying on my Win7 partition > Sedat> (/dev/sda2). > > I've been mulling over this for a while and can't come up with a good > approach. So let's just nuke these warnings. > > -- > Martin K. Petersen Oracle Linux Engineering > > > block: Quiesce zeroout wrapper > > blkdev_issue_zeroout() printed a warning if a device failed a discard or > write same request despite advertising support for these. That's fine > for SCSI since we'll disable these commands if we get an error back from > the disk saying that they are not supported. And consequently the > warning only gets printed once. > > There are other types of block devices that support discard, however, > and these may return -EOPNOTSUPP for each command but leave discard > enabled in the queue limits. This will cause a warning message for every > blkdev_issue_zeroout() invocation. > > Remove the offending warning messages. > > Reported-by: Sedat Dilek > Signed-off-by: Martin K. Petersen > --- > block/blk-lib.c | 26 +++--- > 1 file changed, 7 insertions(+), 19 deletions(-) > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index 715e948f58a4..7688ee3f5d72 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -286,7 +286,6 @@ static int __blkdev_issue_zeroout(struct block_device > *bdev, sector_t sector, > * @discard: whether to discard the block range > * > * Description: > - > * Zero-fill a block range. If the discard flag is set and the block > * device guarantees that subsequent READ operations to the block range > * in question will return zeroes, the blocks will be discarded. Should > @@ -303,26 +302,15 @@ int blkdev_issue_zeroout(struct block_device *bdev, > sector_t sector, > sector_t nr_sects, gfp_t gfp_mask, bool discard) > { > struct request_queue *q = bdev_get_queue(bdev); > - unsigned char bdn[BDEVNAME_SIZE]; > - > - if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data) > { > > - if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, > 0)) > - return 0; > - > - bdevname(bdev, bdn); > - pr_warn("%s: DISCARD failed. Manually zeroing.\n", bdn); > - } > + if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data > && > + blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0) > + return 0; > > - if (bdev_write_same(bdev)) { > - > - if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, > -ZERO_PAGE(0))) > - return 0; > - > - bdevname(bdev, bdn); > - pr_warn("%s: WRITE SAME failed. Manually zeroing.\n", bdn); > - } > + if (bdev_write_same(bdev) && > + blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, > + ZERO_PAGE(0)) == 0) > + return 0; > > return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); > } Martin, will you send a separate patch for that? Thanks. - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 8:33 PM, Paul E. McKenney wrote: > On Thu, Feb 05, 2015 at 08:25:21PM +0100, Sedat Dilek wrote: >> On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney >> wrote: >> > On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: >> >> On 02/05/2015 10:34 AM, Paul E. McKenney wrote: >> >> >> > Did I actually need to be >> >> >> > onlining/offlining CPUs to hit the splat that Sedat was reporting? >> >> > Yep, you do need to offline at least one CPU to hit that splat. >> >> >> >> Heh, do we need a debugging mode that will randomly offline/online CPUs? >> >> :) >> > >> > For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c >> > are your friends. ;-) >> > >> > The problem is that I only run RCU-relevant combinations of Kconfigs, >> > which means that I missed the ones that Sedat used to find this problem. >> > So I guess it is a good thing that others run -next testing. >> > >> >> [ Revived by a voltaren resinat pill... ] >> >> I reverted "x86/mm: Omit switch_mm() tracing for offline CPUs" >> ...and... >> applied "tlb: Don't do trace_tlb_flush() on offline CPUs" >> ...in my build-dir. >> ( I did not build from scratch but re-invoking make "updated" the >> files touched by Steven's patch, see attached build-log. ) >> >> Unfortunately, the call-trace remains when doing an offlining of cpu1. >> ( It's good to see it's reproducible. ) >> >> root# echo 0 > /sys/devices/system/cpu/cpu1/online >> >> [ 121.652796] intel_pstate CPU 1 exiting >> [ 121.666272] >> [ 121.666274] === >> [ 121.666274] [ INFO: suspicious RCU usage. ] >> [ 121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted >> [ 121.666278] --- >> [ 121.666280] include/trace/events/tlb.h:37 suspicious >> rcu_dereference_check() usage! >> [ 121.666281] >> [ 121.666281] other info that might help us debug this: >> [ 121.666281] >> [ 121.666282] >> [ 121.666282] RCU used illegally from offline CPU! >> [ 121.666282] rcu_scheduler_active = 1, debug_locks = 0 >> [ 121.666283] no locks held by swapper/1/0. >> [ 121.666284] >> [ 121.666284] stack backtrace: >> [ 121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted >> 3.19.0-rc7-next-20150204.7-iniza-small #4 >> [ 121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 >> [ 121.666293] 0001 88011a44fe18 817e39cd >> 0011 >> [ 121.666296] 88011a448290 88011a44fe48 810d6af7 >> 8800d3dfaac0 >> [ 121.666299] 0001 81d32ce0 0005 >> 88011a44fe78 >> [ 121.666300] Call Trace: >> [ 121.666308] [] dump_stack+0x4c/0x65 >> [ 121.666313] [] lockdep_rcu_suspicious+0xe7/0x120 >> [ 121.666318] [] idle_task_exit+0x1c9/0x260 >> [ 121.666322] [] play_dead_common+0xe/0x50 >> [ 121.666325] [] native_play_dead+0x15/0x140 >> [ 121.666330] [] arch_cpu_idle_dead+0xf/0x20 >> [ 121.666333] [] cpu_startup_entry+0x37e/0x580 >> [ 121.666336] [] start_secondary+0x140/0x150 >> [ 121.666744] smpboot: CPU 1 is now offline >> >> >From rcu point this is now safe? >> But another area (linux-pm?) is still affected? >> I will try to test "vanilla" pm-next if the problem exists with >> intel_pstate as suggested by Rafael. >> Hmmm, not sure how I can get the pm-next code which went into >> next-20150204 as linux-pm.git#linux-next was feeded with new stuff. > > At this point, I am starting to think in terms of moving the new > CPU_DYING_IDLE notification later in the offline sequence. This will > take me a bit to get set up correctly, but I hope to have a patch > some time tomorrow (Friday), Pacific time. > Is "CPU_DYING_IDLE (notification)" rcu area? Shall I do a pm-next testing? By looking at [1] I got the commit-id/sha1 which went into next-20150204. n102: pm 12f24f2d78ce801c9330c5f682b7beb215bdbab1 If this helps you I will do. "For Paul" :-) - Sedat - [1] http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/Next/SHA1s?id=next-20150204#n102 [2] http://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next=12f24f2d78ce801c9330c5f682b7beb215bdbab1 > Thanx, Paul > >> - Sedat - > >> CHK include/config/kernel.release >> make KBUILD_SRC= >> CHK include/config/kernel.release >> CHK include/generated/uapi/linux/version.h >> CHK include/generated/utsrelease.h >> CALLscripts/checksyscalls.sh >> CHK include/generated/compile.h >> CC arch/x86/mm/init.o >> CC arch/x86/mm/init_64.o >> CC mm/mmu_context.o >> CC kernel/fork.o >> CC arch/x86/kernel/process_64.o >> CC mm/mmap.o >> CC arch/x86/kernel/ldt.o >> CC arch/x86/mm/tlb.o >> CC arch/x86/kernel/setup.o >> LD arch/x86/mm/built-in.o >> CC kernel/exit.o >> LD
Re: linux-next: Tree for Feb 4
On Thu, Feb 05, 2015 at 08:25:21PM +0100, Sedat Dilek wrote: > On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney > wrote: > > On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: > >> On 02/05/2015 10:34 AM, Paul E. McKenney wrote: > >> >> > Did I actually need to be > >> >> > onlining/offlining CPUs to hit the splat that Sedat was reporting? > >> > Yep, you do need to offline at least one CPU to hit that splat. > >> > >> Heh, do we need a debugging mode that will randomly offline/online CPUs? :) > > > > For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c > > are your friends. ;-) > > > > The problem is that I only run RCU-relevant combinations of Kconfigs, > > which means that I missed the ones that Sedat used to find this problem. > > So I guess it is a good thing that others run -next testing. > > > > [ Revived by a voltaren resinat pill... ] > > I reverted "x86/mm: Omit switch_mm() tracing for offline CPUs" > ...and... > applied "tlb: Don't do trace_tlb_flush() on offline CPUs" > ...in my build-dir. > ( I did not build from scratch but re-invoking make "updated" the > files touched by Steven's patch, see attached build-log. ) > > Unfortunately, the call-trace remains when doing an offlining of cpu1. > ( It's good to see it's reproducible. ) > > root# echo 0 > /sys/devices/system/cpu/cpu1/online > > [ 121.652796] intel_pstate CPU 1 exiting > [ 121.666272] > [ 121.666274] === > [ 121.666274] [ INFO: suspicious RCU usage. ] > [ 121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted > [ 121.666278] --- > [ 121.666280] include/trace/events/tlb.h:37 suspicious > rcu_dereference_check() usage! > [ 121.666281] > [ 121.666281] other info that might help us debug this: > [ 121.666281] > [ 121.666282] > [ 121.666282] RCU used illegally from offline CPU! > [ 121.666282] rcu_scheduler_active = 1, debug_locks = 0 > [ 121.666283] no locks held by swapper/1/0. > [ 121.666284] > [ 121.666284] stack backtrace: > [ 121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > 3.19.0-rc7-next-20150204.7-iniza-small #4 > [ 121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > [ 121.666293] 0001 88011a44fe18 817e39cd > 0011 > [ 121.666296] 88011a448290 88011a44fe48 810d6af7 > 8800d3dfaac0 > [ 121.666299] 0001 81d32ce0 0005 > 88011a44fe78 > [ 121.666300] Call Trace: > [ 121.666308] [] dump_stack+0x4c/0x65 > [ 121.666313] [] lockdep_rcu_suspicious+0xe7/0x120 > [ 121.666318] [] idle_task_exit+0x1c9/0x260 > [ 121.666322] [] play_dead_common+0xe/0x50 > [ 121.666325] [] native_play_dead+0x15/0x140 > [ 121.666330] [] arch_cpu_idle_dead+0xf/0x20 > [ 121.666333] [] cpu_startup_entry+0x37e/0x580 > [ 121.666336] [] start_secondary+0x140/0x150 > [ 121.666744] smpboot: CPU 1 is now offline > > >From rcu point this is now safe? > But another area (linux-pm?) is still affected? > I will try to test "vanilla" pm-next if the problem exists with > intel_pstate as suggested by Rafael. > Hmmm, not sure how I can get the pm-next code which went into > next-20150204 as linux-pm.git#linux-next was feeded with new stuff. At this point, I am starting to think in terms of moving the new CPU_DYING_IDLE notification later in the offline sequence. This will take me a bit to get set up correctly, but I hope to have a patch some time tomorrow (Friday), Pacific time. Thanx, Paul > - Sedat - > CHK include/config/kernel.release > make KBUILD_SRC= > CHK include/config/kernel.release > CHK include/generated/uapi/linux/version.h > CHK include/generated/utsrelease.h > CALLscripts/checksyscalls.sh > CHK include/generated/compile.h > CC arch/x86/mm/init.o > CC arch/x86/mm/init_64.o > CC mm/mmu_context.o > CC kernel/fork.o > CC arch/x86/kernel/process_64.o > CC mm/mmap.o > CC arch/x86/kernel/ldt.o > CC arch/x86/mm/tlb.o > CC arch/x86/kernel/setup.o > LD arch/x86/mm/built-in.o > CC kernel/exit.o > LD mm/built-in.o > CC arch/x86/xen/mmu.o > CC arch/x86/kernel/apic/ipi.o > CC fs/exec.o > LD arch/x86/kernel/apic/built-in.o > CC kernel/power/snapshot.o > CC arch/x86/kernel/cpu/common.o > LD kernel/power/built-in.o > LD arch/x86/xen/built-in.o > CC kernel/sched/core.o > LD arch/x86/kernel/cpu/built-in.o > CC arch/x86/kernel/smp.o > CC arch/x86/kernel/machine_kexec_64.o > LD arch/x86/kernel/built-in.o > LD arch/x86/built-in.o > LD kernel/sched/built-in.o > CC kernel/module.o > CC fs/compat.o > CHK kernel/config_data.h > LD
Re: linux-next: Tree for Feb 4
On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: > On 02/05/2015 10:34 AM, Paul E. McKenney wrote: > >> > Did I actually need to be > >> > onlining/offlining CPUs to hit the splat that Sedat was reporting? > > Yep, you do need to offline at least one CPU to hit that splat. > > Heh, do we need a debugging mode that will randomly offline/online CPUs? :) For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c are your friends. ;-) The problem is that I only run RCU-relevant combinations of Kconfigs, which means that I missed the ones that Sedat used to find this problem. So I guess it is a good thing that others run -next testing. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On 02/05/2015 10:34 AM, Paul E. McKenney wrote: >> > Did I actually need to be >> > onlining/offlining CPUs to hit the splat that Sedat was reporting? > Yep, you do need to offline at least one CPU to hit that splat. Heh, do we need a debugging mode that will randomly offline/online CPUs? :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 05, 2015 at 10:11:31AM -0800, Dave Hansen wrote: > On 02/05/2015 10:08 AM, Steven Rostedt wrote: > > --- a/include/trace/events/tlb.h > > +++ b/include/trace/events/tlb.h > > @@ -13,11 +13,13 @@ > > { TLB_LOCAL_SHOOTDOWN, "local shootdown" },\ > > { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" } > > > > -TRACE_EVENT(tlb_flush, > > +TRACE_EVENT_CONDITION(tlb_flush, > > > > TP_PROTO(int reason, unsigned long pages), > > TP_ARGS(reason, pages), > > > > + TP_CONDITION(cpu_online(smp_processor_id())), > > That's a pretty reasonable fix, although it would be nice if the > debugging was easier to hit. Looks very good to me! Unless someone else speaks up, I will carry this patch. > Did I actually need to be > onlining/offlining CPUs to hit the splat that Sedat was reporting? Yep, you do need to offline at least one CPU to hit that splat. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On 02/05/2015 10:08 AM, Steven Rostedt wrote: > --- a/include/trace/events/tlb.h > +++ b/include/trace/events/tlb.h > @@ -13,11 +13,13 @@ > { TLB_LOCAL_SHOOTDOWN, "local shootdown" },\ > { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" } > > -TRACE_EVENT(tlb_flush, > +TRACE_EVENT_CONDITION(tlb_flush, > > TP_PROTO(int reason, unsigned long pages), > TP_ARGS(reason, pages), > > + TP_CONDITION(cpu_online(smp_processor_id())), That's a pretty reasonable fix, although it would be nice if the debugging was easier to hit. Did I actually need to be onlining/offlining CPUs to hit the splat that Sedat was reporting? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, 5 Feb 2015 13:03:43 -0500 Steven Rostedt wrote: > (not tested) > > Signed-off-by: Steven Rostedt > --- > diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h > index 13391d288107..040c1cdfe6d1 100644 > --- a/include/trace/events/tlb.h > +++ b/include/trace/events/tlb.h > @@ -13,11 +13,13 @@ > { TLB_LOCAL_SHOOTDOWN, "local shootdown" },\ > { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" } > > -TRACE_EVENT(tlb_flush, > +TRACE_EVENT_CONDITION(tlb_flush, > > TP_PROTO(int reason, unsigned long pages), > TP_ARGS(reason, pages), > > + TP_CONDITION(cpu_online(smp_processor_id()), > + I said it wasn't tested. I also forgot to hit save after I realized I was missing a ')'. -- Steve Take two: diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h index 13391d288107..0e7635765153 100644 --- a/include/trace/events/tlb.h +++ b/include/trace/events/tlb.h @@ -13,11 +13,13 @@ { TLB_LOCAL_SHOOTDOWN, "local shootdown" },\ { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" } -TRACE_EVENT(tlb_flush, +TRACE_EVENT_CONDITION(tlb_flush, TP_PROTO(int reason, unsigned long pages), TP_ARGS(reason, pages), + TP_CONDITION(cpu_online(smp_processor_id())), + TP_STRUCT__entry( __field( int, reason) __field(unsigned long, pages) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, 04 Feb 2015 23:14:55 -0800 Dave Hansen wrote: > On 02/04/2015 05:53 PM, Sedat Dilek wrote: > > The architecture-specific switch_mm() function can be called by offline > > CPUs, but includes event tracing, which cannot be legally carried out > > on offline CPUs. This results in a lockdep-RCU splat. This commit fixes > > this splat by omitting the tracing when the CPU is offline. > ... > >>> >> > load_cr3(next->pgd); > >>> >> > - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, > >>> >> > TLB_FLUSH_ALL); > >>> >> > + if (cpu_online(smp_processor_id())) > >>> >> > + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, > >>> >> > TLB_FLUSH_ALL); > > Is this, perhaps, something that we should be doing in the generic trace > code so that all of the trace users don't have to worry about it? Also, > this patch will add overhead to the code when tracing is off. It would > be best if we could manage to make the cpu_online() check only in the > cases where the tracepoint is on. Note, we can move the check into the code that enables or disables trace points. I believe, the rcu part of a tracepoint is only the call to the callbacks. The jump_label part should be safe outside of rcu. In that case, instead, have this, which does exactly the same thing without having any overhead of the branch when tracing is disabled: (not tested) Signed-off-by: Steven Rostedt --- diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h index 13391d288107..040c1cdfe6d1 100644 --- a/include/trace/events/tlb.h +++ b/include/trace/events/tlb.h @@ -13,11 +13,13 @@ { TLB_LOCAL_SHOOTDOWN, "local shootdown" },\ { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" } -TRACE_EVENT(tlb_flush, +TRACE_EVENT_CONDITION(tlb_flush, TP_PROTO(int reason, unsigned long pages), TP_ARGS(reason, pages), + TP_CONDITION(cpu_online(smp_processor_id()), + TP_STRUCT__entry( __field( int, reason) __field(unsigned long, pages) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 05, 2015 at 03:57:12PM +0100, Sedat Dilek wrote: > On Thu, Feb 5, 2015 at 8:14 AM, Dave Hansen wrote: > > On 02/04/2015 05:53 PM, Sedat Dilek wrote: > >> The architecture-specific switch_mm() function can be called by offline > >> CPUs, but includes event tracing, which cannot be legally carried out > >> on offline CPUs. This results in a lockdep-RCU splat. This commit fixes > >> this splat by omitting the tracing when the CPU is offline. > > ... > >> > load_cr3(next->pgd); > >> > - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, > >> > TLB_FLUSH_ALL); > >> > + if (cpu_online(smp_processor_id())) > >> > + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, > >> > TLB_FLUSH_ALL); > > > > Is this, perhaps, something that we should be doing in the generic trace > > code so that all of the trace users don't have to worry about it? Also, > > this patch will add overhead to the code when tracing is off. It would > > be best if we could manage to make the cpu_online() check only in the > > cases where the tracepoint is on. > > Hi Dave, > > thanks for your feedback. > > I have just seen that I again see the call-trace. When you get well, could you please send that call trace? > Maybe you can discuss with Paul and others or offer a proposal patch. The other possibility is to have a CONFIG_ARCH_DYING_IDLE or some such that allows this particular flavor of x86 to invoke the CPU_DYING_IDLE from after the call to switch_mm(). Dave, does that make sense? My guess would be that there should be a cpu_dying_idle_generic() invoked from cpu_idle_loop(), and a cpu_dying_idle_native() invoked at the end of idle_task_exit(). Or can I get away with just moving the current rcu_notify_cpu() call from cpu_idle_loop() to the end of idle_task_exit()? A quick look at the calls to idle_task_exit() makes this look plausible. There are a number of calls to printk() and to complete() that need help, but that is a pre-existing issue in any case, as both these code paths have RCU readers that are having no effect on offline CPUs. Dave, thoughts? > I should really do something for my recovery (influenza). > Instead of laying lazy in my bed I thought to update my Linux kernels > and graphics driver stack which made me happy. Get well, being sick is bad for your health! ;-) Thanx, Paul > Regards, > - Sedat - > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, Feb 04, 2015 at 11:14:55PM -0800, Dave Hansen wrote: > On 02/04/2015 05:53 PM, Sedat Dilek wrote: > > The architecture-specific switch_mm() function can be called by offline > > CPUs, but includes event tracing, which cannot be legally carried out > > on offline CPUs. This results in a lockdep-RCU splat. This commit fixes > > this splat by omitting the tracing when the CPU is offline. > ... > >>> >> > load_cr3(next->pgd); > >>> >> > - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, > >>> >> > TLB_FLUSH_ALL); > >>> >> > + if (cpu_online(smp_processor_id())) > >>> >> > + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, > >>> >> > TLB_FLUSH_ALL); > > Is this, perhaps, something that we should be doing in the generic trace > code so that all of the trace users don't have to worry about it? Also, > this patch will add overhead to the code when tracing is off. It would > be best if we could manage to make the cpu_online() check only in the > cases where the tracepoint is on. I considered doing this in the _rcuidle piece of the trace code, but unlike the RCU idle exit/entry in the _rcuidle stuff, the work required to get through the RCU online/offline code is pretty heavyweight. You end up having 16 CPUs contending for an rcu_node lock, for example. But maybe you are instead suggesting pushing only the cpu_online() check into the trace infrastructure. If so, fair point, and I will take a look at this. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 8:14 AM, Dave Hansen wrote: > On 02/04/2015 05:53 PM, Sedat Dilek wrote: >> The architecture-specific switch_mm() function can be called by offline >> CPUs, but includes event tracing, which cannot be legally carried out >> on offline CPUs. This results in a lockdep-RCU splat. This commit fixes >> this splat by omitting the tracing when the CPU is offline. > ... >> > load_cr3(next->pgd); >> > - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, >> > TLB_FLUSH_ALL); >> > + if (cpu_online(smp_processor_id())) >> > + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, >> > TLB_FLUSH_ALL); > > Is this, perhaps, something that we should be doing in the generic trace > code so that all of the trace users don't have to worry about it? Also, > this patch will add overhead to the code when tracing is off. It would > be best if we could manage to make the cpu_online() check only in the > cases where the tracepoint is on. Hi Dave, thanks for your feedback. I have just seen that I again see the call-trace. Maybe you can discuss with Paul and others or offer a proposal patch. I should really do something for my recovery (influenza). Instead of laying lazy in my bed I thought to update my Linux kernels and graphics driver stack which made me happy. Regards, - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 8:14 AM, Dave Hansen d...@sr71.net wrote: On 02/04/2015 05:53 PM, Sedat Dilek wrote: The architecture-specific switch_mm() function can be called by offline CPUs, but includes event tracing, which cannot be legally carried out on offline CPUs. This results in a lockdep-RCU splat. This commit fixes this splat by omitting the tracing when the CPU is offline. ... load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + if (cpu_online(smp_processor_id())) + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); Is this, perhaps, something that we should be doing in the generic trace code so that all of the trace users don't have to worry about it? Also, this patch will add overhead to the code when tracing is off. It would be best if we could manage to make the cpu_online() check only in the cases where the tracepoint is on. Hi Dave, thanks for your feedback. I have just seen that I again see the call-trace. Maybe you can discuss with Paul and others or offer a proposal patch. I should really do something for my recovery (influenza). Instead of laying lazy in my bed I thought to update my Linux kernels and graphics driver stack which made me happy. Regards, - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, Feb 04, 2015 at 11:14:55PM -0800, Dave Hansen wrote: On 02/04/2015 05:53 PM, Sedat Dilek wrote: The architecture-specific switch_mm() function can be called by offline CPUs, but includes event tracing, which cannot be legally carried out on offline CPUs. This results in a lockdep-RCU splat. This commit fixes this splat by omitting the tracing when the CPU is offline. ... load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + if (cpu_online(smp_processor_id())) + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); Is this, perhaps, something that we should be doing in the generic trace code so that all of the trace users don't have to worry about it? Also, this patch will add overhead to the code when tracing is off. It would be best if we could manage to make the cpu_online() check only in the cases where the tracepoint is on. I considered doing this in the _rcuidle piece of the trace code, but unlike the RCU idle exit/entry in the _rcuidle stuff, the work required to get through the RCU online/offline code is pretty heavyweight. You end up having 16 CPUs contending for an rcu_node lock, for example. But maybe you are instead suggesting pushing only the cpu_online() check into the trace infrastructure. If so, fair point, and I will take a look at this. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 9:22 PM, Steven Rostedt rost...@goodmis.org wrote: On Thu, 5 Feb 2015 21:07:27 +0100 Sedat Dilek sedat.di...@gmail.com wrote: Is this Paul's version of the patch or mine? If it is just mine, do you know if Paul's version triggers this too? This one which entered Pauls rcu-next tree. [1] http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/nextid=2b27cf7317d8a99a50bead9faccd54b46b6f0c41 That's mine. It looks like the condition will be tested before it calls and rcu code. Which is why I was confused that it still gave a splat. Paul posted a patch before this that did the check outside the trace point. This one: http://marc.info/?l=linux-kernelm=142310961217650w=2 ( I did not build from scratch but re-invoking make updated the files touched by Steven's patch, see attached build-log. ) Unfortunately, the call-trace remains when doing an offlining of cpu1. ( It's good to see it's reproducible. ) Was the tracepoint enabled? Or was there some other rcu call that triggered this. Or would cpu_online(smp_processor_id()) return true at this point? Thanks Steve for jumping into this one! Good point. I looked at my kernel-config (which I already sent :-)). Do I need to enable...? # CONFIG_RCU_TRACE is not set ...or even more? What I meant by the tracepoint being enabled, was not that it was configured in (I'm assuming it was), but that you started tracing? echo 1 /sys/kernel/debug/tracing/events/enable or echo 1 /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable NO, I did not start any tracing before doing my testing. # cat /sys/kernel/debug/tracing/events/enable 0 # echo 1 /sys/kernel/debug/tracing/events/enable # cat /sys/kernel/debug/tracing/events/enable X # LC_ALL=C cat /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable cat: /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable: No such file or directory Looks like I need to enable...? # CONFIG_DEBUG_TLBFLUSH is not set To answer your question... # cat /sys/devices/system/cpu/cpu1/online 1 # echo 0 /sys/devices/system/cpu/cpu1/online # cat /sys/devices/system/cpu/cpu1/online 0 [ 375.337050] intel_pstate CPU 1 exiting [ 375.351069] smpboot: CPU 1 is now offline So, this did not happen this time. - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
[...] Unfortunately, the call-trace remains when doing an offlining of cpu1. ( It's good to see it's reproducible. ) Was the tracepoint enabled? Or was there some other rcu call that triggered this. Or would cpu_online(smp_processor_id()) return true at this point? Thanks Steve for jumping into this one! Good point. I looked at my kernel-config (which I already sent :-)). Do I need to enable...? # CONFIG_RCU_TRACE is not set ...or even more? What I meant by the tracepoint being enabled, was not that it was configured in (I'm assuming it was), but that you started tracing? echo 1 /sys/kernel/debug/tracing/events/enable or echo 1 /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable NO, I did not start any tracing before doing my testing. # cat /sys/kernel/debug/tracing/events/enable 0 # echo 1 /sys/kernel/debug/tracing/events/enable # cat /sys/kernel/debug/tracing/events/enable X # LC_ALL=C cat /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable cat: /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable: No such file or directory Looks like I need to enable...? # CONFIG_DEBUG_TLBFLUSH is not set Here my new kernel-config (not sure if I really need them to be enabled): $ ./scripts/diffconfig /boot/config-3.19.0-rc7-next-20150204.7-iniza-small /boot/config-3.19.0-rc7-next-20150204.9-iniza-small DEBUG_TLBFLUSH n - y RCU_TRACE n - y TREE_RCU_TRACE n - y Steve, this was a typo it's called tlb_flush not tlb_flush*ed*: # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable 1 [ 391.090381] intel_pstate CPU 1 exiting [ 391.104491] smpboot: CPU 1 is now offline - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt rost...@goodmis.org wrote: On Thu, 5 Feb 2015 22:45:59 +0100 Sedat Dilek sedat.di...@gmail.com wrote: Steve, this was a typo it's called tlb_flush not tlb_flush*ed*: Heh, yeah, I typed that entire line in by hand. Just be lucky that was the only typo ;-) # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable 1 [ 391.090381] intel_pstate CPU 1 exiting [ 391.104491] smpboot: CPU 1 is now offline Now, if you disable that (echo 0 to that file), do you still get the rcu lockdep splat if you suspend and resume? YES, I get the call-trace again! # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable 0 # cat /sys/devices/system/cpu/cpu1/online 0 [ 2470.606222] intel_pstate CPU 1 exiting [ 2470.628153] [ 2470.628155] === [ 2470.628156] [ INFO: suspicious RCU usage. ] [ 2470.628159] 3.19.0-rc7-next-20150204.9-iniza-small #1 Not tainted [ 2470.628160] --- [ 2470.628162] include/trace/events/tlb.h:37 suspicious rcu_dereference_check() usage! [ 2470.628163] [ 2470.628163] other info that might help us debug this: [ 2470.628163] [ 2470.628164] [ 2470.628164] RCU used illegally from offline CPU! [ 2470.628164] rcu_scheduler_active = 1, debug_locks = 0 [ 2470.628165] no locks held by swapper/1/0. [ 2470.628166] [ 2470.628166] stack backtrace: [ 2470.628169] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.9-iniza-small #1 [ 2470.628171] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 2470.628176] 0001 88011a44fe18 817ecf4d 0011 [ 2470.628179] 88011a448290 88011a44fe48 810d6b57 8800cc2660c0 [ 2470.628182] 0001 81d35160 0002 88011a44fe78 [ 2470.628183] Call Trace: [ 2470.628192] [817ecf4d] dump_stack+0x4c/0x65 [ 2470.628198] [810d6b57] lockdep_rcu_suspicious+0xe7/0x120 [ 2470.628203] [810b7459] idle_task_exit+0x1c9/0x260 [ 2470.628208] [81054c4e] play_dead_common+0xe/0x50 [ 2470.628211] [81054ca5] native_play_dead+0x15/0x140 [ 2470.628216] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 2470.628219] [810cdbae] cpu_startup_entry+0x37e/0x580 [ 2470.628222] smpboot: CPU 1 didn't die... [ 2470.628224] [81053e20] start_secondary+0x140/0x150 - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, 5 Feb 2015 22:45:59 +0100 Sedat Dilek sedat.di...@gmail.com wrote: Steve, this was a typo it's called tlb_flush not tlb_flush*ed*: Heh, yeah, I typed that entire line in by hand. Just be lucky that was the only typo ;-) # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable 1 [ 391.090381] intel_pstate CPU 1 exiting [ 391.104491] smpboot: CPU 1 is now offline Now, if you disable that (echo 0 to that file), do you still get the rcu lockdep splat if you suspend and resume? -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, 5 Feb 2015 23:16:21 +0100 Sedat Dilek sedat.di...@gmail.com wrote: On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt rost...@goodmis.org wrote: On Thu, 5 Feb 2015 22:45:59 +0100 Sedat Dilek sedat.di...@gmail.com wrote: Steve, this was a typo it's called tlb_flush not tlb_flush*ed*: Heh, yeah, I typed that entire line in by hand. Just be lucky that was the only typo ;-) # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable 1 [ 391.090381] intel_pstate CPU 1 exiting [ 391.104491] smpboot: CPU 1 is now offline Now, if you disable that (echo 0 to that file), do you still get the rcu lockdep splat if you suspend and resume? YES, I get the call-trace again! Bah! I see where the warning comes from. In include/linux/tracepoint.h we have: #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(__tracepoint_##name.key)) \ __DO_TRACE(__tracepoint_##name,\ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond),,); \ if (IS_ENABLED(CONFIG_LOCKDEP)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace();\ } \ } \ See that if (IS_ENABLED(CONFIG_LOCKDEP))? I'm recalling this. Because tracepoints require RCU, and RCU lockdep doesn't trigger if a tracepoint isn't enabled (because the rcu calls are hidden in the __DO_TRACE() behind that static_key_false), we would be missing lots of rcu problem tracepoints because tests were run without them enabled. The answer was to add this rcu check when LOCKDEP was enabled. So no, adding that conditional isn't going to help, because lockdep will trigger here, even if it were safe because of the conditional :-/. That said, let's add this (on top of the old patch): (again, not tested) Signed-off-by: Steven Rostedt rost...@goodmis.org --- diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 4b75d591eb5e..401b5bfbcdbd 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -47,7 +47,12 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, /* Re-load page tables */ load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + /* +* Do not check rcu when tracing is not enabled. The +* tracepoint has a condition to not trace if the CPU is +* offline, and rcu check will complain if it is. +*/ + trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); /* Stop flush ipis for the previous mm */ cpumask_clear_cpu(cpu, mm_cpumask(prev)); @@ -84,7 +89,13 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * to make sure to use no freed page tables. */ load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + /* +* Do not check rcu when tracing is not enabled. The +* tracepoint has a condition to not trace if the CPU is +* offline, and rcu check will complain if it is. +*/ + trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, + TLB_FLUSH_ALL); load_LDT_nolock(next-context); } } diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index e08e21e5f601..747a05aceb60 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -179,6 +179,14 @@ extern void syscall_unregfunc(void); rcu_read_unlock_sched_notrace();\ } \ } \ + static inline void trace_##name##_rcu_nocheck(proto)\ + { \ + if
Re: linux-next: Tree for Feb 4
[...] That said, let's add this (on top of the old patch): Which old patch? tlb: Don't do trace_tlb_flush() on offline CPUs ? Or did you mean x86/mm: Omit switch_mm() tracing for offline CPUs - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Fri, Feb 6, 2015 at 12:11 AM, Steven Rostedt rost...@goodmis.org wrote: On Thu, 5 Feb 2015 23:16:21 +0100 Sedat Dilek sedat.di...@gmail.com wrote: On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt rost...@goodmis.org wrote: On Thu, 5 Feb 2015 22:45:59 +0100 Sedat Dilek sedat.di...@gmail.com wrote: Steve, this was a typo it's called tlb_flush not tlb_flush*ed*: Heh, yeah, I typed that entire line in by hand. Just be lucky that was the only typo ;-) # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable 1 [ 391.090381] intel_pstate CPU 1 exiting [ 391.104491] smpboot: CPU 1 is now offline Now, if you disable that (echo 0 to that file), do you still get the rcu lockdep splat if you suspend and resume? YES, I get the call-trace again! Bah! I see where the warning comes from. In include/linux/tracepoint.h we have: #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(__tracepoint_##name.key)) \ __DO_TRACE(__tracepoint_##name,\ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond),,); \ if (IS_ENABLED(CONFIG_LOCKDEP)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace();\ } \ } \ See that if (IS_ENABLED(CONFIG_LOCKDEP))? I have here... CONFIG_LOCKDEP=y - Sedat - I'm recalling this. Because tracepoints require RCU, and RCU lockdep doesn't trigger if a tracepoint isn't enabled (because the rcu calls are hidden in the __DO_TRACE() behind that static_key_false), we would be missing lots of rcu problem tracepoints because tests were run without them enabled. The answer was to add this rcu check when LOCKDEP was enabled. So no, adding that conditional isn't going to help, because lockdep will trigger here, even if it were safe because of the conditional :-/. That said, let's add this (on top of the old patch): Which old patch? tlb: Don't do trace_tlb_flush() on offline CPUs ? - Sedat - (again, not tested) Signed-off-by: Steven Rostedt rost...@goodmis.org --- diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 4b75d591eb5e..401b5bfbcdbd 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -47,7 +47,12 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, /* Re-load page tables */ load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + /* +* Do not check rcu when tracing is not enabled. The +* tracepoint has a condition to not trace if the CPU is +* offline, and rcu check will complain if it is. +*/ + trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); /* Stop flush ipis for the previous mm */ cpumask_clear_cpu(cpu, mm_cpumask(prev)); @@ -84,7 +89,13 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * to make sure to use no freed page tables. */ load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + /* +* Do not check rcu when tracing is not enabled. The +* tracepoint has a condition to not trace if the CPU is +* offline, and rcu check will complain if it is. +*/ + trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, + TLB_FLUSH_ALL); load_LDT_nolock(next-context); } } diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index e08e21e5f601..747a05aceb60 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -179,6 +179,14 @@ extern void syscall_unregfunc(void); rcu_read_unlock_sched_notrace();\ } \
Re: linux-next: Tree for Feb 4
On Fri, Feb 6, 2015 at 1:12 AM, Steven Rostedt rost...@goodmis.org wrote: On Fri, 6 Feb 2015 00:53:41 +0100 Sedat Dilek sedat.di...@gmail.com wrote: See that if (IS_ENABLED(CONFIG_LOCKDEP))? I have here... CONFIG_LOCKDEP=y Yep, I knew that (you wouldn't get splats without it). Which old patch? tlb: Don't do trace_tlb_flush() on offline CPUs ? Yeah, that one. In other words, just add this patch on the kernel you just tested. Thanks, Do you have a name with label for your patch? - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Fri, 6 Feb 2015 00:53:41 +0100 Sedat Dilek sedat.di...@gmail.com wrote: See that if (IS_ENABLED(CONFIG_LOCKDEP))? I have here... CONFIG_LOCKDEP=y Yep, I knew that (you wouldn't get splats without it). Which old patch? tlb: Don't do trace_tlb_flush() on offline CPUs ? Yeah, that one. In other words, just add this patch on the kernel you just tested. Thanks, -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 05, 2015 at 03:57:12PM +0100, Sedat Dilek wrote: On Thu, Feb 5, 2015 at 8:14 AM, Dave Hansen d...@sr71.net wrote: On 02/04/2015 05:53 PM, Sedat Dilek wrote: The architecture-specific switch_mm() function can be called by offline CPUs, but includes event tracing, which cannot be legally carried out on offline CPUs. This results in a lockdep-RCU splat. This commit fixes this splat by omitting the tracing when the CPU is offline. ... load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + if (cpu_online(smp_processor_id())) + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); Is this, perhaps, something that we should be doing in the generic trace code so that all of the trace users don't have to worry about it? Also, this patch will add overhead to the code when tracing is off. It would be best if we could manage to make the cpu_online() check only in the cases where the tracepoint is on. Hi Dave, thanks for your feedback. I have just seen that I again see the call-trace. When you get well, could you please send that call trace? Maybe you can discuss with Paul and others or offer a proposal patch. The other possibility is to have a CONFIG_ARCH_DYING_IDLE or some such that allows this particular flavor of x86 to invoke the CPU_DYING_IDLE from after the call to switch_mm(). Dave, does that make sense? My guess would be that there should be a cpu_dying_idle_generic() invoked from cpu_idle_loop(), and a cpu_dying_idle_native() invoked at the end of idle_task_exit(). Or can I get away with just moving the current rcu_notify_cpu() call from cpu_idle_loop() to the end of idle_task_exit()? A quick look at the calls to idle_task_exit() makes this look plausible. There are a number of calls to printk() and to complete() that need help, but that is a pre-existing issue in any case, as both these code paths have RCU readers that are having no effect on offline CPUs. Dave, thoughts? I should really do something for my recovery (influenza). Instead of laying lazy in my bed I thought to update my Linux kernels and graphics driver stack which made me happy. Get well, being sick is bad for your health! ;-) Thanx, Paul Regards, - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, 04 Feb 2015 23:14:55 -0800 Dave Hansen d...@sr71.net wrote: On 02/04/2015 05:53 PM, Sedat Dilek wrote: The architecture-specific switch_mm() function can be called by offline CPUs, but includes event tracing, which cannot be legally carried out on offline CPUs. This results in a lockdep-RCU splat. This commit fixes this splat by omitting the tracing when the CPU is offline. ... load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + if (cpu_online(smp_processor_id())) + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); Is this, perhaps, something that we should be doing in the generic trace code so that all of the trace users don't have to worry about it? Also, this patch will add overhead to the code when tracing is off. It would be best if we could manage to make the cpu_online() check only in the cases where the tracepoint is on. Note, we can move the check into the code that enables or disables trace points. I believe, the rcu part of a tracepoint is only the call to the callbacks. The jump_label part should be safe outside of rcu. In that case, instead, have this, which does exactly the same thing without having any overhead of the branch when tracing is disabled: (not tested) Signed-off-by: Steven Rostedt rost...@goodmis.org --- diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h index 13391d288107..040c1cdfe6d1 100644 --- a/include/trace/events/tlb.h +++ b/include/trace/events/tlb.h @@ -13,11 +13,13 @@ { TLB_LOCAL_SHOOTDOWN, local shootdown },\ { TLB_LOCAL_MM_SHOOTDOWN, local mm shootdown } -TRACE_EVENT(tlb_flush, +TRACE_EVENT_CONDITION(tlb_flush, TP_PROTO(int reason, unsigned long pages), TP_ARGS(reason, pages), + TP_CONDITION(cpu_online(smp_processor_id()), + TP_STRUCT__entry( __field( int, reason) __field(unsigned long, pages) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, 5 Feb 2015 13:03:43 -0500 Steven Rostedt rost...@goodmis.org wrote: (not tested) Signed-off-by: Steven Rostedt rost...@goodmis.org --- diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h index 13391d288107..040c1cdfe6d1 100644 --- a/include/trace/events/tlb.h +++ b/include/trace/events/tlb.h @@ -13,11 +13,13 @@ { TLB_LOCAL_SHOOTDOWN, local shootdown },\ { TLB_LOCAL_MM_SHOOTDOWN, local mm shootdown } -TRACE_EVENT(tlb_flush, +TRACE_EVENT_CONDITION(tlb_flush, TP_PROTO(int reason, unsigned long pages), TP_ARGS(reason, pages), + TP_CONDITION(cpu_online(smp_processor_id()), + I said it wasn't tested. I also forgot to hit save after I realized I was missing a ')'. -- Steve Take two: diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h index 13391d288107..0e7635765153 100644 --- a/include/trace/events/tlb.h +++ b/include/trace/events/tlb.h @@ -13,11 +13,13 @@ { TLB_LOCAL_SHOOTDOWN, local shootdown },\ { TLB_LOCAL_MM_SHOOTDOWN, local mm shootdown } -TRACE_EVENT(tlb_flush, +TRACE_EVENT_CONDITION(tlb_flush, TP_PROTO(int reason, unsigned long pages), TP_ARGS(reason, pages), + TP_CONDITION(cpu_online(smp_processor_id())), + TP_STRUCT__entry( __field( int, reason) __field(unsigned long, pages) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On 02/05/2015 10:08 AM, Steven Rostedt wrote: --- a/include/trace/events/tlb.h +++ b/include/trace/events/tlb.h @@ -13,11 +13,13 @@ { TLB_LOCAL_SHOOTDOWN, local shootdown },\ { TLB_LOCAL_MM_SHOOTDOWN, local mm shootdown } -TRACE_EVENT(tlb_flush, +TRACE_EVENT_CONDITION(tlb_flush, TP_PROTO(int reason, unsigned long pages), TP_ARGS(reason, pages), + TP_CONDITION(cpu_online(smp_processor_id())), That's a pretty reasonable fix, although it would be nice if the debugging was easier to hit. Did I actually need to be onlining/offlining CPUs to hit the splat that Sedat was reporting? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On 02/05/2015 10:34 AM, Paul E. McKenney wrote: Did I actually need to be onlining/offlining CPUs to hit the splat that Sedat was reporting? Yep, you do need to offline at least one CPU to hit that splat. Heh, do we need a debugging mode that will randomly offline/online CPUs? :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: On 02/05/2015 10:34 AM, Paul E. McKenney wrote: Did I actually need to be onlining/offlining CPUs to hit the splat that Sedat was reporting? Yep, you do need to offline at least one CPU to hit that splat. Heh, do we need a debugging mode that will randomly offline/online CPUs? :) For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c are your friends. ;-) The problem is that I only run RCU-relevant combinations of Kconfigs, which means that I missed the ones that Sedat used to find this problem. So I guess it is a good thing that others run -next testing. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 05, 2015 at 10:11:31AM -0800, Dave Hansen wrote: On 02/05/2015 10:08 AM, Steven Rostedt wrote: --- a/include/trace/events/tlb.h +++ b/include/trace/events/tlb.h @@ -13,11 +13,13 @@ { TLB_LOCAL_SHOOTDOWN, local shootdown },\ { TLB_LOCAL_MM_SHOOTDOWN, local mm shootdown } -TRACE_EVENT(tlb_flush, +TRACE_EVENT_CONDITION(tlb_flush, TP_PROTO(int reason, unsigned long pages), TP_ARGS(reason, pages), + TP_CONDITION(cpu_online(smp_processor_id())), That's a pretty reasonable fix, although it would be nice if the debugging was easier to hit. Looks very good to me! Unless someone else speaks up, I will carry this patch. Did I actually need to be onlining/offlining CPUs to hit the splat that Sedat was reporting? Yep, you do need to offline at least one CPU to hit that splat. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 05, 2015 at 08:25:21PM +0100, Sedat Dilek wrote: On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: On 02/05/2015 10:34 AM, Paul E. McKenney wrote: Did I actually need to be onlining/offlining CPUs to hit the splat that Sedat was reporting? Yep, you do need to offline at least one CPU to hit that splat. Heh, do we need a debugging mode that will randomly offline/online CPUs? :) For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c are your friends. ;-) The problem is that I only run RCU-relevant combinations of Kconfigs, which means that I missed the ones that Sedat used to find this problem. So I guess it is a good thing that others run -next testing. [ Revived by a voltaren resinat pill... ] I reverted x86/mm: Omit switch_mm() tracing for offline CPUs ...and... applied tlb: Don't do trace_tlb_flush() on offline CPUs ...in my build-dir. ( I did not build from scratch but re-invoking make updated the files touched by Steven's patch, see attached build-log. ) Unfortunately, the call-trace remains when doing an offlining of cpu1. ( It's good to see it's reproducible. ) root# echo 0 /sys/devices/system/cpu/cpu1/online [ 121.652796] intel_pstate CPU 1 exiting [ 121.666272] [ 121.666274] === [ 121.666274] [ INFO: suspicious RCU usage. ] [ 121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted [ 121.666278] --- [ 121.666280] include/trace/events/tlb.h:37 suspicious rcu_dereference_check() usage! [ 121.666281] [ 121.666281] other info that might help us debug this: [ 121.666281] [ 121.666282] [ 121.666282] RCU used illegally from offline CPU! [ 121.666282] rcu_scheduler_active = 1, debug_locks = 0 [ 121.666283] no locks held by swapper/1/0. [ 121.666284] [ 121.666284] stack backtrace: [ 121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.7-iniza-small #4 [ 121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 121.666293] 0001 88011a44fe18 817e39cd 0011 [ 121.666296] 88011a448290 88011a44fe48 810d6af7 8800d3dfaac0 [ 121.666299] 0001 81d32ce0 0005 88011a44fe78 [ 121.666300] Call Trace: [ 121.666308] [817e39cd] dump_stack+0x4c/0x65 [ 121.666313] [810d6af7] lockdep_rcu_suspicious+0xe7/0x120 [ 121.666318] [810b73f9] idle_task_exit+0x1c9/0x260 [ 121.666322] [81054c4e] play_dead_common+0xe/0x50 [ 121.666325] [81054ca5] native_play_dead+0x15/0x140 [ 121.666330] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 121.666333] [810cdb4e] cpu_startup_entry+0x37e/0x580 [ 121.666336] [81053e20] start_secondary+0x140/0x150 [ 121.666744] smpboot: CPU 1 is now offline From rcu point this is now safe? But another area (linux-pm?) is still affected? I will try to test vanilla pm-next if the problem exists with intel_pstate as suggested by Rafael. Hmmm, not sure how I can get the pm-next code which went into next-20150204 as linux-pm.git#linux-next was feeded with new stuff. At this point, I am starting to think in terms of moving the new CPU_DYING_IDLE notification later in the offline sequence. This will take me a bit to get set up correctly, but I hope to have a patch some time tomorrow (Friday), Pacific time. Thanx, Paul - Sedat - CHK include/config/kernel.release make KBUILD_SRC= CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CALLscripts/checksyscalls.sh CHK include/generated/compile.h CC arch/x86/mm/init.o CC arch/x86/mm/init_64.o CC mm/mmu_context.o CC kernel/fork.o CC arch/x86/kernel/process_64.o CC mm/mmap.o CC arch/x86/kernel/ldt.o CC arch/x86/mm/tlb.o CC arch/x86/kernel/setup.o LD arch/x86/mm/built-in.o CC kernel/exit.o LD mm/built-in.o CC arch/x86/xen/mmu.o CC arch/x86/kernel/apic/ipi.o CC fs/exec.o LD arch/x86/kernel/apic/built-in.o CC kernel/power/snapshot.o CC arch/x86/kernel/cpu/common.o LD kernel/power/built-in.o LD arch/x86/xen/built-in.o CC kernel/sched/core.o LD arch/x86/kernel/cpu/built-in.o CC arch/x86/kernel/smp.o CC arch/x86/kernel/machine_kexec_64.o LD arch/x86/kernel/built-in.o LD arch/x86/built-in.o LD kernel/sched/built-in.o CC kernel/module.o CC fs/compat.o CHK kernel/config_data.h LD
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 4:17 AM, Martin K. Petersen martin.peter...@oracle.com wrote: Sedat == Sedat Dilek sedat.di...@gmail.com writes: Sedat No, but I am here on a so-called WUBI installation which Sedat triggered some bugs being an exotic installation. My Sedat Ubuntu/precise is a 18GiB image laying on my Win7 partition Sedat (/dev/sda2). I've been mulling over this for a while and can't come up with a good approach. So let's just nuke these warnings. -- Martin K. Petersen Oracle Linux Engineering block: Quiesce zeroout wrapper blkdev_issue_zeroout() printed a warning if a device failed a discard or write same request despite advertising support for these. That's fine for SCSI since we'll disable these commands if we get an error back from the disk saying that they are not supported. And consequently the warning only gets printed once. There are other types of block devices that support discard, however, and these may return -EOPNOTSUPP for each command but leave discard enabled in the queue limits. This will cause a warning message for every blkdev_issue_zeroout() invocation. Remove the offending warning messages. Reported-by: Sedat Dilek sedat.di...@gmail.com Signed-off-by: Martin K. Petersen martin.peter...@oracle.com --- block/blk-lib.c | 26 +++--- 1 file changed, 7 insertions(+), 19 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 715e948f58a4..7688ee3f5d72 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -286,7 +286,6 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, * @discard: whether to discard the block range * * Description: - * Zero-fill a block range. If the discard flag is set and the block * device guarantees that subsequent READ operations to the block range * in question will return zeroes, the blocks will be discarded. Should @@ -303,26 +302,15 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, bool discard) { struct request_queue *q = bdev_get_queue(bdev); - unsigned char bdn[BDEVNAME_SIZE]; - - if (discard blk_queue_discard(q) q-limits.discard_zeroes_data) { - if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0)) - return 0; - - bdevname(bdev, bdn); - pr_warn(%s: DISCARD failed. Manually zeroing.\n, bdn); - } + if (discard blk_queue_discard(q) q-limits.discard_zeroes_data + blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0) + return 0; - if (bdev_write_same(bdev)) { - - if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, -ZERO_PAGE(0))) - return 0; - - bdevname(bdev, bdn); - pr_warn(%s: WRITE SAME failed. Manually zeroing.\n, bdn); - } + if (bdev_write_same(bdev) + blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, + ZERO_PAGE(0)) == 0) + return 0; return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); } Martin, will you send a separate patch for that? Thanks. - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, 5 Feb 2015 20:25:21 +0100 Sedat Dilek sedat.di...@gmail.com wrote: On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: On 02/05/2015 10:34 AM, Paul E. McKenney wrote: Did I actually need to be onlining/offlining CPUs to hit the splat that Sedat was reporting? Yep, you do need to offline at least one CPU to hit that splat. Heh, do we need a debugging mode that will randomly offline/online CPUs? :) For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c are your friends. ;-) The problem is that I only run RCU-relevant combinations of Kconfigs, which means that I missed the ones that Sedat used to find this problem. So I guess it is a good thing that others run -next testing. [ Revived by a voltaren resinat pill... ] I reverted x86/mm: Omit switch_mm() tracing for offline CPUs ...and... applied tlb: Don't do trace_tlb_flush() on offline CPUs ...in my build-dir. Is this Paul's version of the patch or mine? If it is just mine, do you know if Paul's version triggers this too? ( I did not build from scratch but re-invoking make updated the files touched by Steven's patch, see attached build-log. ) Unfortunately, the call-trace remains when doing an offlining of cpu1. ( It's good to see it's reproducible. ) Was the tracepoint enabled? Or was there some other rcu call that triggered this. Or would cpu_online(smp_processor_id()) return true at this point? -- Steve root# echo 0 /sys/devices/system/cpu/cpu1/online [ 121.652796] intel_pstate CPU 1 exiting [ 121.666272] [ 121.666274] === [ 121.666274] [ INFO: suspicious RCU usage. ] [ 121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted [ 121.666278] --- [ 121.666280] include/trace/events/tlb.h:37 suspicious rcu_dereference_check() usage! [ 121.666281] [ 121.666281] other info that might help us debug this: [ 121.666281] [ 121.666282] [ 121.666282] RCU used illegally from offline CPU! [ 121.666282] rcu_scheduler_active = 1, debug_locks = 0 [ 121.666283] no locks held by swapper/1/0. [ 121.666284] [ 121.666284] stack backtrace: [ 121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.7-iniza-small #4 [ 121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 121.666293] 0001 88011a44fe18 817e39cd 0011 [ 121.666296] 88011a448290 88011a44fe48 810d6af7 8800d3dfaac0 [ 121.666299] 0001 81d32ce0 0005 88011a44fe78 [ 121.666300] Call Trace: [ 121.666308] [817e39cd] dump_stack+0x4c/0x65 [ 121.666313] [810d6af7] lockdep_rcu_suspicious+0xe7/0x120 [ 121.666318] [810b73f9] idle_task_exit+0x1c9/0x260 [ 121.666322] [81054c4e] play_dead_common+0xe/0x50 [ 121.666325] [81054ca5] native_play_dead+0x15/0x140 [ 121.666330] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 121.666333] [810cdb4e] cpu_startup_entry+0x37e/0x580 [ 121.666336] [81053e20] start_secondary+0x140/0x150 [ 121.666744] smpboot: CPU 1 is now offline From rcu point this is now safe? But another area (linux-pm?) is still affected? I will try to test vanilla pm-next if the problem exists with intel_pstate as suggested by Rafael. Hmmm, not sure how I can get the pm-next code which went into next-20150204 as linux-pm.git#linux-next was feeded with new stuff. - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 8:58 PM, Steven Rostedt rost...@goodmis.org wrote: On Thu, 5 Feb 2015 20:25:21 +0100 Sedat Dilek sedat.di...@gmail.com wrote: On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: On 02/05/2015 10:34 AM, Paul E. McKenney wrote: Did I actually need to be onlining/offlining CPUs to hit the splat that Sedat was reporting? Yep, you do need to offline at least one CPU to hit that splat. Heh, do we need a debugging mode that will randomly offline/online CPUs? :) For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c are your friends. ;-) The problem is that I only run RCU-relevant combinations of Kconfigs, which means that I missed the ones that Sedat used to find this problem. So I guess it is a good thing that others run -next testing. [ Revived by a voltaren resinat pill... ] I reverted x86/mm: Omit switch_mm() tracing for offline CPUs ...and... applied tlb: Don't do trace_tlb_flush() on offline CPUs ...in my build-dir. Is this Paul's version of the patch or mine? If it is just mine, do you know if Paul's version triggers this too? This one which entered Pauls rcu-next tree. [1] http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/nextid=2b27cf7317d8a99a50bead9faccd54b46b6f0c41 ( I did not build from scratch but re-invoking make updated the files touched by Steven's patch, see attached build-log. ) Unfortunately, the call-trace remains when doing an offlining of cpu1. ( It's good to see it's reproducible. ) Was the tracepoint enabled? Or was there some other rcu call that triggered this. Or would cpu_online(smp_processor_id()) return true at this point? Thanks Steve for jumping into this one! Good point. I looked at my kernel-config (which I already sent :-)). Do I need to enable...? # CONFIG_RCU_TRACE is not set ...or even more? - Sedat - -- Steve root# echo 0 /sys/devices/system/cpu/cpu1/online [ 121.652796] intel_pstate CPU 1 exiting [ 121.666272] [ 121.666274] === [ 121.666274] [ INFO: suspicious RCU usage. ] [ 121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted [ 121.666278] --- [ 121.666280] include/trace/events/tlb.h:37 suspicious rcu_dereference_check() usage! [ 121.666281] [ 121.666281] other info that might help us debug this: [ 121.666281] [ 121.666282] [ 121.666282] RCU used illegally from offline CPU! [ 121.666282] rcu_scheduler_active = 1, debug_locks = 0 [ 121.666283] no locks held by swapper/1/0. [ 121.666284] [ 121.666284] stack backtrace: [ 121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.7-iniza-small #4 [ 121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 121.666293] 0001 88011a44fe18 817e39cd 0011 [ 121.666296] 88011a448290 88011a44fe48 810d6af7 8800d3dfaac0 [ 121.666299] 0001 81d32ce0 0005 88011a44fe78 [ 121.666300] Call Trace: [ 121.666308] [817e39cd] dump_stack+0x4c/0x65 [ 121.666313] [810d6af7] lockdep_rcu_suspicious+0xe7/0x120 [ 121.666318] [810b73f9] idle_task_exit+0x1c9/0x260 [ 121.666322] [81054c4e] play_dead_common+0xe/0x50 [ 121.666325] [81054ca5] native_play_dead+0x15/0x140 [ 121.666330] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 121.666333] [810cdb4e] cpu_startup_entry+0x37e/0x580 [ 121.666336] [81053e20] start_secondary+0x140/0x150 [ 121.666744] smpboot: CPU 1 is now offline From rcu point this is now safe? But another area (linux-pm?) is still affected? I will try to test vanilla pm-next if the problem exists with intel_pstate as suggested by Rafael. Hmmm, not sure how I can get the pm-next code which went into next-20150204 as linux-pm.git#linux-next was feeded with new stuff. - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 8:33 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Thu, Feb 05, 2015 at 08:25:21PM +0100, Sedat Dilek wrote: On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: On 02/05/2015 10:34 AM, Paul E. McKenney wrote: Did I actually need to be onlining/offlining CPUs to hit the splat that Sedat was reporting? Yep, you do need to offline at least one CPU to hit that splat. Heh, do we need a debugging mode that will randomly offline/online CPUs? :) For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c are your friends. ;-) The problem is that I only run RCU-relevant combinations of Kconfigs, which means that I missed the ones that Sedat used to find this problem. So I guess it is a good thing that others run -next testing. [ Revived by a voltaren resinat pill... ] I reverted x86/mm: Omit switch_mm() tracing for offline CPUs ...and... applied tlb: Don't do trace_tlb_flush() on offline CPUs ...in my build-dir. ( I did not build from scratch but re-invoking make updated the files touched by Steven's patch, see attached build-log. ) Unfortunately, the call-trace remains when doing an offlining of cpu1. ( It's good to see it's reproducible. ) root# echo 0 /sys/devices/system/cpu/cpu1/online [ 121.652796] intel_pstate CPU 1 exiting [ 121.666272] [ 121.666274] === [ 121.666274] [ INFO: suspicious RCU usage. ] [ 121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted [ 121.666278] --- [ 121.666280] include/trace/events/tlb.h:37 suspicious rcu_dereference_check() usage! [ 121.666281] [ 121.666281] other info that might help us debug this: [ 121.666281] [ 121.666282] [ 121.666282] RCU used illegally from offline CPU! [ 121.666282] rcu_scheduler_active = 1, debug_locks = 0 [ 121.666283] no locks held by swapper/1/0. [ 121.666284] [ 121.666284] stack backtrace: [ 121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.7-iniza-small #4 [ 121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 121.666293] 0001 88011a44fe18 817e39cd 0011 [ 121.666296] 88011a448290 88011a44fe48 810d6af7 8800d3dfaac0 [ 121.666299] 0001 81d32ce0 0005 88011a44fe78 [ 121.666300] Call Trace: [ 121.666308] [817e39cd] dump_stack+0x4c/0x65 [ 121.666313] [810d6af7] lockdep_rcu_suspicious+0xe7/0x120 [ 121.666318] [810b73f9] idle_task_exit+0x1c9/0x260 [ 121.666322] [81054c4e] play_dead_common+0xe/0x50 [ 121.666325] [81054ca5] native_play_dead+0x15/0x140 [ 121.666330] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 121.666333] [810cdb4e] cpu_startup_entry+0x37e/0x580 [ 121.666336] [81053e20] start_secondary+0x140/0x150 [ 121.666744] smpboot: CPU 1 is now offline From rcu point this is now safe? But another area (linux-pm?) is still affected? I will try to test vanilla pm-next if the problem exists with intel_pstate as suggested by Rafael. Hmmm, not sure how I can get the pm-next code which went into next-20150204 as linux-pm.git#linux-next was feeded with new stuff. At this point, I am starting to think in terms of moving the new CPU_DYING_IDLE notification later in the offline sequence. This will take me a bit to get set up correctly, but I hope to have a patch some time tomorrow (Friday), Pacific time. Is CPU_DYING_IDLE (notification) rcu area? Shall I do a pm-next testing? By looking at [1] I got the commit-id/sha1 which went into next-20150204. n102: pm 12f24f2d78ce801c9330c5f682b7beb215bdbab1 If this helps you I will do. For Paul :-) - Sedat - [1] http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/Next/SHA1s?id=next-20150204#n102 [2] http://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-nextid=12f24f2d78ce801c9330c5f682b7beb215bdbab1 Thanx, Paul - Sedat - CHK include/config/kernel.release make KBUILD_SRC= CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CALLscripts/checksyscalls.sh CHK include/generated/compile.h CC arch/x86/mm/init.o CC arch/x86/mm/init_64.o CC mm/mmu_context.o CC kernel/fork.o CC arch/x86/kernel/process_64.o CC mm/mmap.o CC arch/x86/kernel/ldt.o CC arch/x86/mm/tlb.o CC arch/x86/kernel/setup.o LD arch/x86/mm/built-in.o CC kernel/exit.o LD mm/built-in.o CC arch/x86/xen/mmu.o CC
Re: linux-next: Tree for Feb 4
On Thu, 5 Feb 2015 21:07:27 +0100 Sedat Dilek sedat.di...@gmail.com wrote: Is this Paul's version of the patch or mine? If it is just mine, do you know if Paul's version triggers this too? This one which entered Pauls rcu-next tree. [1] http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/nextid=2b27cf7317d8a99a50bead9faccd54b46b6f0c41 That's mine. It looks like the condition will be tested before it calls and rcu code. Which is why I was confused that it still gave a splat. Paul posted a patch before this that did the check outside the trace point. This one: http://marc.info/?l=linux-kernelm=142310961217650w=2 ( I did not build from scratch but re-invoking make updated the files touched by Steven's patch, see attached build-log. ) Unfortunately, the call-trace remains when doing an offlining of cpu1. ( It's good to see it's reproducible. ) Was the tracepoint enabled? Or was there some other rcu call that triggered this. Or would cpu_online(smp_processor_id()) return true at this point? Thanks Steve for jumping into this one! Good point. I looked at my kernel-config (which I already sent :-)). Do I need to enable...? # CONFIG_RCU_TRACE is not set ...or even more? What I meant by the tracepoint being enabled, was not that it was configured in (I'm assuming it was), but that you started tracing? echo 1 /sys/kernel/debug/tracing/events/enable or echo 1 /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On 02/04/2015 05:53 PM, Sedat Dilek wrote: > The architecture-specific switch_mm() function can be called by offline > CPUs, but includes event tracing, which cannot be legally carried out > on offline CPUs. This results in a lockdep-RCU splat. This commit fixes > this splat by omitting the tracing when the CPU is offline. ... >>> >> > load_cr3(next->pgd); >>> >> > - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, >>> >> > TLB_FLUSH_ALL); >>> >> > + if (cpu_online(smp_processor_id())) >>> >> > + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, >>> >> > TLB_FLUSH_ALL); Is this, perhaps, something that we should be doing in the generic trace code so that all of the trace users don't have to worry about it? Also, this patch will add overhead to the code when tracing is off. It would be best if we could manage to make the cpu_online() check only in the cases where the tracepoint is on. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 05, 2015 at 03:12:20AM +0100, Sedat Dilek wrote: > On Thu, Feb 5, 2015 at 2:53 AM, Sedat Dilek wrote: > > On Thu, Feb 5, 2015 at 2:51 AM, Paul E. McKenney > > wrote: > >> On Thu, Feb 05, 2015 at 02:18:01AM +0100, Sedat Dilek wrote: > >>> On Thu, Feb 5, 2015 at 1:57 AM, Paul E. McKenney > >>> wrote: > >>> > On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote: > >>> >> On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney > >>> >> wrote: > >>> >> > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote: > >>> >> >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: > >>> >> >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney > >>> >> >> > wrote: > >>> >> >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki > >>> >> >> > > wrote: > >>> >> >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: > >>> >> > > >>> >> > [ . . . ] > >>> >> > > >>> >> >> > > > > [ 1144.482666] Disabling non-boot CPUs ... > >>> >> >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting > >>> >> >> > > > > [ 1144.486064] > >>> >> >> > > > > [ 1144.486065] === > >>> >> >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die... > >>> >> >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] > >>> >> >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 > >>> >> >> > > > > Not tainted > >>> >> >> > > > > [ 1144.486070] --- > >>> >> >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious > >>> >> >> > > > > rcu_dereference_check() usage! > >>> >> >> > > > > [ 1144.486073] > >>> >> >> > > > > [ 1144.486073] other info that might help us debug this: > >>> >> >> > > > > [ 1144.486073] > >>> >> >> > > > > [ 1144.486074] > >>> >> >> > > > > [ 1144.486074] RCU used illegally from offline CPU! > >>> >> >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 > >>> >> >> > > > > [ 1144.486076] no locks held by swapper/1/0. > >>> >> >> > > > > [ 1144.486076] > >>> >> >> > > > > [ 1144.486076] stack backtrace: > >>> >> >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > >>> >> >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 > >>> >> >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > >>> >> >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK > >>> >> >> > > > > 03/28/2013 > >>> >> >> > > > > [ 1144.486085] 0001 88011a44fe18 > >>> >> >> > > > > 817e370d > >>> >> >> > > > > 0011 > >>> >> >> > > > > [ 1144.486088] 88011a448290 88011a44fe48 > >>> >> >> > > > > 810d6847 > >>> >> >> > > > > 8800c66b9600 > >>> >> >> > > > > [ 1144.486091] 0001 88011a44c000 > >>> >> >> > > > > 81cb3900 > >>> >> >> > > > > 88011a44fe78 > >>> >> >> > > > > [ 1144.486092] Call Trace: > >>> >> >> > > > > [ 1144.486099] [] dump_stack+0x4c/0x65 > >>> >> >> > > > > [ 1144.486104] [] > >>> >> >> > > > > lockdep_rcu_suspicious+0xe7/0x120 > >>> >> >> > > > >>> >> >> > > As near as I can tell, idle_task_exit() is running on an > >>> >> >> > > offline CPU, > >>> >> >> > > then calling switch_mm() which contains trace_tlb_flush(), > >>> >> >> > > which uses RCU. > >>> >> >> > > And RCU is objecting to being used from a CPU that it is > >>> >> >> > > ignoring. > >>> >> >> > > > >>> >> >> > > One approach would be to push RCU's idea of when the CPU goes > >>> >> >> > > offline > >>> >> >> > > down into arch code in this case, using some Kconfig symbol and > >>> >> >> > > the usual conditional compilation. Another approach would be to > >>> >> >> > > invoke the trace calls under cpu_online(), for example, for the > >>> >> >> > > first such call in switch_mm(): > >>> >> >> > > > >>> >> >> > > if (cpu_online(smp_processor_id())) > >>> >> >> > > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, > >>> >> >> > > TLB_FLUSH_ALL); > >>> >> >> > > > >>> >> >> > > The compiler would discard this if tracing was disabled. > >>> >> >> > > >>> >> >> > That looks like less intrusive to me. > >>> >> >> > >>> >> >> One possible concern is increased context-switch path length, but > >>> >> >> that > >>> >> >> would only be the case where tracing is enabled by default. > >>> >> > > >>> >> > Nevertheless, here is an untested patch. Does it help? > >>> >> > >>> >> No bedtime :-) > >>> > > >>> > Sorry! Actually, getting results tomorrow would be plenty OK by me. > >>> > > >>> >> I tried with a revert of... > >>> >> > >>> >> commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b > >>> >> rcu: Handle outgoing CPUs on exit from idle loop > >>> >> > >>> >> ...and offlining cpu1 seems not to produce the trace... > >>> > > >>> > As expected. The trace can still appear, but the outgoing CPU needs to > >>> > be delayed by at least one jiffy on its final pass through the idle > >>> > loop. > >>> > Which can really happen in virtualized environments. > >>> > > >>> >> [ 115.280244]
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 4:17 AM, Martin K. Petersen wrote: >> "Sedat" == Sedat Dilek writes: > > Sedat> No, but I am here on a so-called WUBI installation which > Sedat> triggered some bugs being an exotic installation. My > Sedat> Ubuntu/precise is a 18GiB image laying on my Win7 partition > Sedat> (/dev/sda2). > > I've been mulling over this for a while and can't come up with a good > approach. So let's just nuke these warnings. > > -- > Martin K. Petersen Oracle Linux Engineering > > > block: Quiesce zeroout wrapper > > blkdev_issue_zeroout() printed a warning if a device failed a discard or > write same request despite advertising support for these. That's fine > for SCSI since we'll disable these commands if we get an error back from > the disk saying that they are not supported. And consequently the > warning only gets printed once. > > There are other types of block devices that support discard, however, > and these may return -EOPNOTSUPP for each command but leave discard > enabled in the queue limits. This will cause a warning message for every > blkdev_issue_zeroout() invocation. > > Remove the offending warning messages. > > Reported-by: Sedat Dilek Thanks for the fix! Tested-by: Sedat Dilek - Sedat - > Signed-off-by: Martin K. Petersen > --- > block/blk-lib.c | 26 +++--- > 1 file changed, 7 insertions(+), 19 deletions(-) > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index 715e948f58a4..7688ee3f5d72 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -286,7 +286,6 @@ static int __blkdev_issue_zeroout(struct block_device > *bdev, sector_t sector, > * @discard: whether to discard the block range > * > * Description: > - > * Zero-fill a block range. If the discard flag is set and the block > * device guarantees that subsequent READ operations to the block range > * in question will return zeroes, the blocks will be discarded. Should > @@ -303,26 +302,15 @@ int blkdev_issue_zeroout(struct block_device *bdev, > sector_t sector, > sector_t nr_sects, gfp_t gfp_mask, bool discard) > { > struct request_queue *q = bdev_get_queue(bdev); > - unsigned char bdn[BDEVNAME_SIZE]; > - > - if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data) > { > > - if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, > 0)) > - return 0; > - > - bdevname(bdev, bdn); > - pr_warn("%s: DISCARD failed. Manually zeroing.\n", bdn); > - } > + if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data > && > + blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0) > + return 0; > > - if (bdev_write_same(bdev)) { > - > - if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, > -ZERO_PAGE(0))) > - return 0; > - > - bdevname(bdev, bdn); > - pr_warn("%s: WRITE SAME failed. Manually zeroing.\n", bdn); > - } > + if (bdev_write_same(bdev) && > + blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, > + ZERO_PAGE(0)) == 0) > + return 0; > > return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); > } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
> "Sedat" == Sedat Dilek writes: Sedat> No, but I am here on a so-called WUBI installation which Sedat> triggered some bugs being an exotic installation. My Sedat> Ubuntu/precise is a 18GiB image laying on my Win7 partition Sedat> (/dev/sda2). I've been mulling over this for a while and can't come up with a good approach. So let's just nuke these warnings. -- Martin K. Petersen Oracle Linux Engineering block: Quiesce zeroout wrapper blkdev_issue_zeroout() printed a warning if a device failed a discard or write same request despite advertising support for these. That's fine for SCSI since we'll disable these commands if we get an error back from the disk saying that they are not supported. And consequently the warning only gets printed once. There are other types of block devices that support discard, however, and these may return -EOPNOTSUPP for each command but leave discard enabled in the queue limits. This will cause a warning message for every blkdev_issue_zeroout() invocation. Remove the offending warning messages. Reported-by: Sedat Dilek Signed-off-by: Martin K. Petersen --- block/blk-lib.c | 26 +++--- 1 file changed, 7 insertions(+), 19 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 715e948f58a4..7688ee3f5d72 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -286,7 +286,6 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, * @discard: whether to discard the block range * * Description: - * Zero-fill a block range. If the discard flag is set and the block * device guarantees that subsequent READ operations to the block range * in question will return zeroes, the blocks will be discarded. Should @@ -303,26 +302,15 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, bool discard) { struct request_queue *q = bdev_get_queue(bdev); - unsigned char bdn[BDEVNAME_SIZE]; - - if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data) { - if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0)) - return 0; - - bdevname(bdev, bdn); - pr_warn("%s: DISCARD failed. Manually zeroing.\n", bdn); - } + if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data && + blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0) + return 0; - if (bdev_write_same(bdev)) { - - if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, -ZERO_PAGE(0))) - return 0; - - bdevname(bdev, bdn); - pr_warn("%s: WRITE SAME failed. Manually zeroing.\n", bdn); - } + if (bdev_write_same(bdev) && + blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, + ZERO_PAGE(0)) == 0) + return 0; return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 2:53 AM, Sedat Dilek wrote: > On Thu, Feb 5, 2015 at 2:51 AM, Paul E. McKenney > wrote: >> On Thu, Feb 05, 2015 at 02:18:01AM +0100, Sedat Dilek wrote: >>> On Thu, Feb 5, 2015 at 1:57 AM, Paul E. McKenney >>> wrote: >>> > On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote: >>> >> On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney >>> >> wrote: >>> >> > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote: >>> >> >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: >>> >> >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: >>> >> >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: >>> >> >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: >>> >> > >>> >> > [ . . . ] >>> >> > >>> >> >> > > > > [ 1144.482666] Disabling non-boot CPUs ... >>> >> >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting >>> >> >> > > > > [ 1144.486064] >>> >> >> > > > > [ 1144.486065] === >>> >> >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die... >>> >> >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] >>> >> >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not >>> >> >> > > > > tainted >>> >> >> > > > > [ 1144.486070] --- >>> >> >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious >>> >> >> > > > > rcu_dereference_check() usage! >>> >> >> > > > > [ 1144.486073] >>> >> >> > > > > [ 1144.486073] other info that might help us debug this: >>> >> >> > > > > [ 1144.486073] >>> >> >> > > > > [ 1144.486074] >>> >> >> > > > > [ 1144.486074] RCU used illegally from offline CPU! >>> >> >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 >>> >> >> > > > > [ 1144.486076] no locks held by swapper/1/0. >>> >> >> > > > > [ 1144.486076] >>> >> >> > > > > [ 1144.486076] stack backtrace: >>> >> >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted >>> >> >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 >>> >> >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >>> >> >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK >>> >> >> > > > > 03/28/2013 >>> >> >> > > > > [ 1144.486085] 0001 88011a44fe18 >>> >> >> > > > > 817e370d >>> >> >> > > > > 0011 >>> >> >> > > > > [ 1144.486088] 88011a448290 88011a44fe48 >>> >> >> > > > > 810d6847 >>> >> >> > > > > 8800c66b9600 >>> >> >> > > > > [ 1144.486091] 0001 88011a44c000 >>> >> >> > > > > 81cb3900 >>> >> >> > > > > 88011a44fe78 >>> >> >> > > > > [ 1144.486092] Call Trace: >>> >> >> > > > > [ 1144.486099] [] dump_stack+0x4c/0x65 >>> >> >> > > > > [ 1144.486104] [] >>> >> >> > > > > lockdep_rcu_suspicious+0xe7/0x120 >>> >> >> > > >>> >> >> > > As near as I can tell, idle_task_exit() is running on an offline >>> >> >> > > CPU, >>> >> >> > > then calling switch_mm() which contains trace_tlb_flush(), which >>> >> >> > > uses RCU. >>> >> >> > > And RCU is objecting to being used from a CPU that it is ignoring. >>> >> >> > > >>> >> >> > > One approach would be to push RCU's idea of when the CPU goes >>> >> >> > > offline >>> >> >> > > down into arch code in this case, using some Kconfig symbol and >>> >> >> > > the usual conditional compilation. Another approach would be to >>> >> >> > > invoke the trace calls under cpu_online(), for example, for the >>> >> >> > > first such call in switch_mm(): >>> >> >> > > >>> >> >> > > if (cpu_online(smp_processor_id())) >>> >> >> > > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, >>> >> >> > > TLB_FLUSH_ALL); >>> >> >> > > >>> >> >> > > The compiler would discard this if tracing was disabled. >>> >> >> > >>> >> >> > That looks like less intrusive to me. >>> >> >> >>> >> >> One possible concern is increased context-switch path length, but that >>> >> >> would only be the case where tracing is enabled by default. >>> >> > >>> >> > Nevertheless, here is an untested patch. Does it help? >>> >> >>> >> No bedtime :-) >>> > >>> > Sorry! Actually, getting results tomorrow would be plenty OK by me. >>> > >>> >> I tried with a revert of... >>> >> >>> >> commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b >>> >> rcu: Handle outgoing CPUs on exit from idle loop >>> >> >>> >> ...and offlining cpu1 seems not to produce the trace... >>> > >>> > As expected. The trace can still appear, but the outgoing CPU needs to >>> > be delayed by at least one jiffy on its final pass through the idle loop. >>> > Which can really happen in virtualized environments. >>> > >>> >> [ 115.280244] PPP BSD Compression module registered >>> >> [ 115.288761] PPP Deflate Compression module registered >>> >> [ 162.935524] intel_pstate CPU 1 exiting >>> >> [ 162.949729] smpboot: CPU 1 is now offline >>> >> >>> >> Will try the patch. >>> > >>> > Looking forward to seeing the results! >>> > >>> >
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 2:51 AM, Paul E. McKenney wrote: > On Thu, Feb 05, 2015 at 02:18:01AM +0100, Sedat Dilek wrote: >> On Thu, Feb 5, 2015 at 1:57 AM, Paul E. McKenney >> wrote: >> > On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote: >> >> On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney >> >> wrote: >> >> > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote: >> >> >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: >> >> >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: >> >> >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: >> >> >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: >> >> > >> >> > [ . . . ] >> >> > >> >> >> > > > > [ 1144.482666] Disabling non-boot CPUs ... >> >> >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting >> >> >> > > > > [ 1144.486064] >> >> >> > > > > [ 1144.486065] === >> >> >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die... >> >> >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] >> >> >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not >> >> >> > > > > tainted >> >> >> > > > > [ 1144.486070] --- >> >> >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious >> >> >> > > > > rcu_dereference_check() usage! >> >> >> > > > > [ 1144.486073] >> >> >> > > > > [ 1144.486073] other info that might help us debug this: >> >> >> > > > > [ 1144.486073] >> >> >> > > > > [ 1144.486074] >> >> >> > > > > [ 1144.486074] RCU used illegally from offline CPU! >> >> >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 >> >> >> > > > > [ 1144.486076] no locks held by swapper/1/0. >> >> >> > > > > [ 1144.486076] >> >> >> > > > > [ 1144.486076] stack backtrace: >> >> >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted >> >> >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 >> >> >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >> >> >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK >> >> >> > > > > 03/28/2013 >> >> >> > > > > [ 1144.486085] 0001 88011a44fe18 >> >> >> > > > > 817e370d >> >> >> > > > > 0011 >> >> >> > > > > [ 1144.486088] 88011a448290 88011a44fe48 >> >> >> > > > > 810d6847 >> >> >> > > > > 8800c66b9600 >> >> >> > > > > [ 1144.486091] 0001 88011a44c000 >> >> >> > > > > 81cb3900 >> >> >> > > > > 88011a44fe78 >> >> >> > > > > [ 1144.486092] Call Trace: >> >> >> > > > > [ 1144.486099] [] dump_stack+0x4c/0x65 >> >> >> > > > > [ 1144.486104] [] >> >> >> > > > > lockdep_rcu_suspicious+0xe7/0x120 >> >> >> > > >> >> >> > > As near as I can tell, idle_task_exit() is running on an offline >> >> >> > > CPU, >> >> >> > > then calling switch_mm() which contains trace_tlb_flush(), which >> >> >> > > uses RCU. >> >> >> > > And RCU is objecting to being used from a CPU that it is ignoring. >> >> >> > > >> >> >> > > One approach would be to push RCU's idea of when the CPU goes >> >> >> > > offline >> >> >> > > down into arch code in this case, using some Kconfig symbol and >> >> >> > > the usual conditional compilation. Another approach would be to >> >> >> > > invoke the trace calls under cpu_online(), for example, for the >> >> >> > > first such call in switch_mm(): >> >> >> > > >> >> >> > > if (cpu_online(smp_processor_id())) >> >> >> > > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); >> >> >> > > >> >> >> > > The compiler would discard this if tracing was disabled. >> >> >> > >> >> >> > That looks like less intrusive to me. >> >> >> >> >> >> One possible concern is increased context-switch path length, but that >> >> >> would only be the case where tracing is enabled by default. >> >> > >> >> > Nevertheless, here is an untested patch. Does it help? >> >> >> >> No bedtime :-) >> > >> > Sorry! Actually, getting results tomorrow would be plenty OK by me. >> > >> >> I tried with a revert of... >> >> >> >> commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b >> >> rcu: Handle outgoing CPUs on exit from idle loop >> >> >> >> ...and offlining cpu1 seems not to produce the trace... >> > >> > As expected. The trace can still appear, but the outgoing CPU needs to >> > be delayed by at least one jiffy on its final pass through the idle loop. >> > Which can really happen in virtualized environments. >> > >> >> [ 115.280244] PPP BSD Compression module registered >> >> [ 115.288761] PPP Deflate Compression module registered >> >> [ 162.935524] intel_pstate CPU 1 exiting >> >> [ 162.949729] smpboot: CPU 1 is now offline >> >> >> >> Will try the patch. >> > >> > Looking forward to seeing the results! >> > >> > Thanx, Paul >> > >> >> - Sedat - >> >> >> >> > >> >> > Thanx, Paul >> >> > >> >> >
Re: linux-next: Tree for Feb 4
On Thu, Feb 05, 2015 at 02:18:01AM +0100, Sedat Dilek wrote: > On Thu, Feb 5, 2015 at 1:57 AM, Paul E. McKenney > wrote: > > On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote: > >> On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney > >> wrote: > >> > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote: > >> >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: > >> >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: > >> >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: > >> >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: > >> > > >> > [ . . . ] > >> > > >> >> > > > > [ 1144.482666] Disabling non-boot CPUs ... > >> >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting > >> >> > > > > [ 1144.486064] > >> >> > > > > [ 1144.486065] === > >> >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die... > >> >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] > >> >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not > >> >> > > > > tainted > >> >> > > > > [ 1144.486070] --- > >> >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious > >> >> > > > > rcu_dereference_check() usage! > >> >> > > > > [ 1144.486073] > >> >> > > > > [ 1144.486073] other info that might help us debug this: > >> >> > > > > [ 1144.486073] > >> >> > > > > [ 1144.486074] > >> >> > > > > [ 1144.486074] RCU used illegally from offline CPU! > >> >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 > >> >> > > > > [ 1144.486076] no locks held by swapper/1/0. > >> >> > > > > [ 1144.486076] > >> >> > > > > [ 1144.486076] stack backtrace: > >> >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > >> >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 > >> >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > >> >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK > >> >> > > > > 03/28/2013 > >> >> > > > > [ 1144.486085] 0001 88011a44fe18 > >> >> > > > > 817e370d > >> >> > > > > 0011 > >> >> > > > > [ 1144.486088] 88011a448290 88011a44fe48 > >> >> > > > > 810d6847 > >> >> > > > > 8800c66b9600 > >> >> > > > > [ 1144.486091] 0001 88011a44c000 > >> >> > > > > 81cb3900 > >> >> > > > > 88011a44fe78 > >> >> > > > > [ 1144.486092] Call Trace: > >> >> > > > > [ 1144.486099] [] dump_stack+0x4c/0x65 > >> >> > > > > [ 1144.486104] [] > >> >> > > > > lockdep_rcu_suspicious+0xe7/0x120 > >> >> > > > >> >> > > As near as I can tell, idle_task_exit() is running on an offline > >> >> > > CPU, > >> >> > > then calling switch_mm() which contains trace_tlb_flush(), which > >> >> > > uses RCU. > >> >> > > And RCU is objecting to being used from a CPU that it is ignoring. > >> >> > > > >> >> > > One approach would be to push RCU's idea of when the CPU goes > >> >> > > offline > >> >> > > down into arch code in this case, using some Kconfig symbol and > >> >> > > the usual conditional compilation. Another approach would be to > >> >> > > invoke the trace calls under cpu_online(), for example, for the > >> >> > > first such call in switch_mm(): > >> >> > > > >> >> > > if (cpu_online(smp_processor_id())) > >> >> > > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); > >> >> > > > >> >> > > The compiler would discard this if tracing was disabled. > >> >> > > >> >> > That looks like less intrusive to me. > >> >> > >> >> One possible concern is increased context-switch path length, but that > >> >> would only be the case where tracing is enabled by default. > >> > > >> > Nevertheless, here is an untested patch. Does it help? > >> > >> No bedtime :-) > > > > Sorry! Actually, getting results tomorrow would be plenty OK by me. > > > >> I tried with a revert of... > >> > >> commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b > >> rcu: Handle outgoing CPUs on exit from idle loop > >> > >> ...and offlining cpu1 seems not to produce the trace... > > > > As expected. The trace can still appear, but the outgoing CPU needs to > > be delayed by at least one jiffy on its final pass through the idle loop. > > Which can really happen in virtualized environments. > > > >> [ 115.280244] PPP BSD Compression module registered > >> [ 115.288761] PPP Deflate Compression module registered > >> [ 162.935524] intel_pstate CPU 1 exiting > >> [ 162.949729] smpboot: CPU 1 is now offline > >> > >> Will try the patch. > > > > Looking forward to seeing the results! > > > > Thanx, Paul > > > >> - Sedat - > >> > >> > > >> > Thanx, Paul > >> > > >> > > >> > > >> > x86: Omit switch_mm() tracing for offline CPUs > >> > > >> > The architecture-specific
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 1:57 AM, Paul E. McKenney wrote: > On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote: >> On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney >> wrote: >> > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote: >> >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: >> >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: >> >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: >> >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: >> > >> > [ . . . ] >> > >> >> > > > > [ 1144.482666] Disabling non-boot CPUs ... >> >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting >> >> > > > > [ 1144.486064] >> >> > > > > [ 1144.486065] === >> >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die... >> >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] >> >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not >> >> > > > > tainted >> >> > > > > [ 1144.486070] --- >> >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious >> >> > > > > rcu_dereference_check() usage! >> >> > > > > [ 1144.486073] >> >> > > > > [ 1144.486073] other info that might help us debug this: >> >> > > > > [ 1144.486073] >> >> > > > > [ 1144.486074] >> >> > > > > [ 1144.486074] RCU used illegally from offline CPU! >> >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 >> >> > > > > [ 1144.486076] no locks held by swapper/1/0. >> >> > > > > [ 1144.486076] >> >> > > > > [ 1144.486076] stack backtrace: >> >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted >> >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 >> >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >> >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK >> >> > > > > 03/28/2013 >> >> > > > > [ 1144.486085] 0001 88011a44fe18 817e370d >> >> > > > > 0011 >> >> > > > > [ 1144.486088] 88011a448290 88011a44fe48 810d6847 >> >> > > > > 8800c66b9600 >> >> > > > > [ 1144.486091] 0001 88011a44c000 81cb3900 >> >> > > > > 88011a44fe78 >> >> > > > > [ 1144.486092] Call Trace: >> >> > > > > [ 1144.486099] [] dump_stack+0x4c/0x65 >> >> > > > > [ 1144.486104] [] >> >> > > > > lockdep_rcu_suspicious+0xe7/0x120 >> >> > > >> >> > > As near as I can tell, idle_task_exit() is running on an offline CPU, >> >> > > then calling switch_mm() which contains trace_tlb_flush(), which uses >> >> > > RCU. >> >> > > And RCU is objecting to being used from a CPU that it is ignoring. >> >> > > >> >> > > One approach would be to push RCU's idea of when the CPU goes offline >> >> > > down into arch code in this case, using some Kconfig symbol and >> >> > > the usual conditional compilation. Another approach would be to >> >> > > invoke the trace calls under cpu_online(), for example, for the >> >> > > first such call in switch_mm(): >> >> > > >> >> > > if (cpu_online(smp_processor_id())) >> >> > > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); >> >> > > >> >> > > The compiler would discard this if tracing was disabled. >> >> > >> >> > That looks like less intrusive to me. >> >> >> >> One possible concern is increased context-switch path length, but that >> >> would only be the case where tracing is enabled by default. >> > >> > Nevertheless, here is an untested patch. Does it help? >> >> No bedtime :-) > > Sorry! Actually, getting results tomorrow would be plenty OK by me. > >> I tried with a revert of... >> >> commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b >> rcu: Handle outgoing CPUs on exit from idle loop >> >> ...and offlining cpu1 seems not to produce the trace... > > As expected. The trace can still appear, but the outgoing CPU needs to > be delayed by at least one jiffy on its final pass through the idle loop. > Which can really happen in virtualized environments. > >> [ 115.280244] PPP BSD Compression module registered >> [ 115.288761] PPP Deflate Compression module registered >> [ 162.935524] intel_pstate CPU 1 exiting >> [ 162.949729] smpboot: CPU 1 is now offline >> >> Will try the patch. > > Looking forward to seeing the results! > > Thanx, Paul > >> - Sedat - >> >> > >> > Thanx, Paul >> > >> > >> > >> > x86: Omit switch_mm() tracing for offline CPUs >> > >> > The architecture-specific switch_mm() function can be called by offline >> > CPUs, but includes event tracing, which cannot be legally carried out >> > on offline CPUs. This results in a lockdep-RCU splat. This commit fixes >> > this splat by omitting the tracing when the CPU is offline. >> > >> > Reported-by: Sedat Dilek >> > Signed-off-by: Paul E. McKenney >> > >> > diff
Re: linux-next: Tree for Feb 4
On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote: > On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney > wrote: > > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote: > >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: > >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: > >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: > >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: > > > > [ . . . ] > > > >> > > > > [ 1144.482666] Disabling non-boot CPUs ... > >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting > >> > > > > [ 1144.486064] > >> > > > > [ 1144.486065] === > >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die... > >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] > >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not > >> > > > > tainted > >> > > > > [ 1144.486070] --- > >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious > >> > > > > rcu_dereference_check() usage! > >> > > > > [ 1144.486073] > >> > > > > [ 1144.486073] other info that might help us debug this: > >> > > > > [ 1144.486073] > >> > > > > [ 1144.486074] > >> > > > > [ 1144.486074] RCU used illegally from offline CPU! > >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 > >> > > > > [ 1144.486076] no locks held by swapper/1/0. > >> > > > > [ 1144.486076] > >> > > > > [ 1144.486076] stack backtrace: > >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 > >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK > >> > > > > 03/28/2013 > >> > > > > [ 1144.486085] 0001 88011a44fe18 817e370d > >> > > > > 0011 > >> > > > > [ 1144.486088] 88011a448290 88011a44fe48 810d6847 > >> > > > > 8800c66b9600 > >> > > > > [ 1144.486091] 0001 88011a44c000 81cb3900 > >> > > > > 88011a44fe78 > >> > > > > [ 1144.486092] Call Trace: > >> > > > > [ 1144.486099] [] dump_stack+0x4c/0x65 > >> > > > > [ 1144.486104] [] > >> > > > > lockdep_rcu_suspicious+0xe7/0x120 > >> > > > >> > > As near as I can tell, idle_task_exit() is running on an offline CPU, > >> > > then calling switch_mm() which contains trace_tlb_flush(), which uses > >> > > RCU. > >> > > And RCU is objecting to being used from a CPU that it is ignoring. > >> > > > >> > > One approach would be to push RCU's idea of when the CPU goes offline > >> > > down into arch code in this case, using some Kconfig symbol and > >> > > the usual conditional compilation. Another approach would be to > >> > > invoke the trace calls under cpu_online(), for example, for the > >> > > first such call in switch_mm(): > >> > > > >> > > if (cpu_online(smp_processor_id())) > >> > > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); > >> > > > >> > > The compiler would discard this if tracing was disabled. > >> > > >> > That looks like less intrusive to me. > >> > >> One possible concern is increased context-switch path length, but that > >> would only be the case where tracing is enabled by default. > > > > Nevertheless, here is an untested patch. Does it help? > > No bedtime :-) Sorry! Actually, getting results tomorrow would be plenty OK by me. > I tried with a revert of... > > commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b > rcu: Handle outgoing CPUs on exit from idle loop > > ...and offlining cpu1 seems not to produce the trace... As expected. The trace can still appear, but the outgoing CPU needs to be delayed by at least one jiffy on its final pass through the idle loop. Which can really happen in virtualized environments. > [ 115.280244] PPP BSD Compression module registered > [ 115.288761] PPP Deflate Compression module registered > [ 162.935524] intel_pstate CPU 1 exiting > [ 162.949729] smpboot: CPU 1 is now offline > > Will try the patch. Looking forward to seeing the results! Thanx, Paul > - Sedat - > > > > > Thanx, Paul > > > > > > > > x86: Omit switch_mm() tracing for offline CPUs > > > > The architecture-specific switch_mm() function can be called by offline > > CPUs, but includes event tracing, which cannot be legally carried out > > on offline CPUs. This results in a lockdep-RCU splat. This commit fixes > > this splat by omitting the tracing when the CPU is offline. > > > > Reported-by: Sedat Dilek > > Signed-off-by: Paul E. McKenney > > > > diff --git a/arch/x86/include/asm/mmu_context.h > > b/arch/x86/include/asm/mmu_context.h > > index 40269a2bf6f9..7e7f2445fbc9 100644 > > --- a/arch/x86/include/asm/mmu_context.h
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney wrote: > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote: >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: > > [ . . . ] > >> > > > > [ 1144.482666] Disabling non-boot CPUs ... >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting >> > > > > [ 1144.486064] >> > > > > [ 1144.486065] === >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die... >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted >> > > > > [ 1144.486070] --- >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious >> > > > > rcu_dereference_check() usage! >> > > > > [ 1144.486073] >> > > > > [ 1144.486073] other info that might help us debug this: >> > > > > [ 1144.486073] >> > > > > [ 1144.486074] >> > > > > [ 1144.486074] RCU used illegally from offline CPU! >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 >> > > > > [ 1144.486076] no locks held by swapper/1/0. >> > > > > [ 1144.486076] >> > > > > [ 1144.486076] stack backtrace: >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 >> > > > > [ 1144.486085] 0001 88011a44fe18 817e370d >> > > > > 0011 >> > > > > [ 1144.486088] 88011a448290 88011a44fe48 810d6847 >> > > > > 8800c66b9600 >> > > > > [ 1144.486091] 0001 88011a44c000 81cb3900 >> > > > > 88011a44fe78 >> > > > > [ 1144.486092] Call Trace: >> > > > > [ 1144.486099] [] dump_stack+0x4c/0x65 >> > > > > [ 1144.486104] [] >> > > > > lockdep_rcu_suspicious+0xe7/0x120 >> > > >> > > As near as I can tell, idle_task_exit() is running on an offline CPU, >> > > then calling switch_mm() which contains trace_tlb_flush(), which uses >> > > RCU. >> > > And RCU is objecting to being used from a CPU that it is ignoring. >> > > >> > > One approach would be to push RCU's idea of when the CPU goes offline >> > > down into arch code in this case, using some Kconfig symbol and >> > > the usual conditional compilation. Another approach would be to >> > > invoke the trace calls under cpu_online(), for example, for the >> > > first such call in switch_mm(): >> > > >> > > if (cpu_online(smp_processor_id())) >> > > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); >> > > >> > > The compiler would discard this if tracing was disabled. >> > >> > That looks like less intrusive to me. >> >> One possible concern is increased context-switch path length, but that >> would only be the case where tracing is enabled by default. > > Nevertheless, here is an untested patch. Does it help? No bedtime :-) I tried with a revert of... commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b rcu: Handle outgoing CPUs on exit from idle loop ...and offlining cpu1 seems not to produce the trace... [ 115.280244] PPP BSD Compression module registered [ 115.288761] PPP Deflate Compression module registered [ 162.935524] intel_pstate CPU 1 exiting [ 162.949729] smpboot: CPU 1 is now offline Will try the patch. - Sedat - > > Thanx, Paul > > > > x86: Omit switch_mm() tracing for offline CPUs > > The architecture-specific switch_mm() function can be called by offline > CPUs, but includes event tracing, which cannot be legally carried out > on offline CPUs. This results in a lockdep-RCU splat. This commit fixes > this splat by omitting the tracing when the CPU is offline. > > Reported-by: Sedat Dilek > Signed-off-by: Paul E. McKenney > > diff --git a/arch/x86/include/asm/mmu_context.h > b/arch/x86/include/asm/mmu_context.h > index 40269a2bf6f9..7e7f2445fbc9 100644 > --- a/arch/x86/include/asm/mmu_context.h > +++ b/arch/x86/include/asm/mmu_context.h > @@ -47,7 +47,8 @@ static inline void switch_mm(struct mm_struct *prev, struct > mm_struct *next, > > /* Re-load page tables */ > load_cr3(next->pgd); > - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); > + if (cpu_online(smp_processor_id())) > + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, > TLB_FLUSH_ALL); > > /* Stop flush ipis for the previous mm */ > cpumask_clear_cpu(cpu, mm_cpumask(prev)); > @@ -84,7 +85,8 @@ static inline void switch_mm(struct mm_struct *prev, struct > mm_struct *next, >
Re: linux-next: Tree for Feb 4
On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote: > On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: > > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: > > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: > > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: [ . . . ] > > > > > [ 1144.482666] Disabling non-boot CPUs ... > > > > > [ 1144.483000] intel_pstate CPU 1 exiting > > > > > [ 1144.486064] > > > > > [ 1144.486065] === > > > > > [ 1144.486067] smpboot: CPU 1 didn't die... > > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] > > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted > > > > > [ 1144.486070] --- > > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious > > > > > rcu_dereference_check() usage! > > > > > [ 1144.486073] > > > > > [ 1144.486073] other info that might help us debug this: > > > > > [ 1144.486073] > > > > > [ 1144.486074] > > > > > [ 1144.486074] RCU used illegally from offline CPU! > > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 > > > > > [ 1144.486076] no locks held by swapper/1/0. > > > > > [ 1144.486076] > > > > > [ 1144.486076] stack backtrace: > > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 > > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > > > > > [ 1144.486085] 0001 88011a44fe18 817e370d > > > > > 0011 > > > > > [ 1144.486088] 88011a448290 88011a44fe48 810d6847 > > > > > 8800c66b9600 > > > > > [ 1144.486091] 0001 88011a44c000 81cb3900 > > > > > 88011a44fe78 > > > > > [ 1144.486092] Call Trace: > > > > > [ 1144.486099] [] dump_stack+0x4c/0x65 > > > > > [ 1144.486104] [] lockdep_rcu_suspicious+0xe7/0x120 > > > > > > As near as I can tell, idle_task_exit() is running on an offline CPU, > > > then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. > > > And RCU is objecting to being used from a CPU that it is ignoring. > > > > > > One approach would be to push RCU's idea of when the CPU goes offline > > > down into arch code in this case, using some Kconfig symbol and > > > the usual conditional compilation. Another approach would be to > > > invoke the trace calls under cpu_online(), for example, for the > > > first such call in switch_mm(): > > > > > > if (cpu_online(smp_processor_id())) > > > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); > > > > > > The compiler would discard this if tracing was disabled. > > > > That looks like less intrusive to me. > > One possible concern is increased context-switch path length, but that > would only be the case where tracing is enabled by default. Nevertheless, here is an untested patch. Does it help? Thanx, Paul x86: Omit switch_mm() tracing for offline CPUs The architecture-specific switch_mm() function can be called by offline CPUs, but includes event tracing, which cannot be legally carried out on offline CPUs. This results in a lockdep-RCU splat. This commit fixes this splat by omitting the tracing when the CPU is offline. Reported-by: Sedat Dilek Signed-off-by: Paul E. McKenney diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 40269a2bf6f9..7e7f2445fbc9 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -47,7 +47,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, /* Re-load page tables */ load_cr3(next->pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + if (cpu_online(smp_processor_id())) + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); /* Stop flush ipis for the previous mm */ cpumask_clear_cpu(cpu, mm_cpumask(prev)); @@ -84,7 +85,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * to make sure to use no freed page tables. */ load_cr3(next->pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + if (cpu_online(smp_processor_id())) + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); load_LDT_nolock(>context); } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 12:51 AM, Paul E. McKenney wrote: > On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: >> On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: >> > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: >> > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: >> > > > On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell >> > > > wrote: >> > > > > Hi all, >> > > > > >> > > > > The next release I will be making will be next-20150209 - which will >> > > > > probably be after the v3.19 release. >> > > > > >> > > > > Changes since 20150203: >> > > > > >> > > > > The sound-asoc tree gained a conflict against the sound tree. >> > > > > >> > > > > The scsi tree gained a build failure caused by an interaction with >> > > > > the >> > > > > driver-core tree. I applied a merge fix patch. >> > > > > >> > > > > The akpm-current tree gained a build failure for which I disabled >> > > > > CONFIG_KASAN. >> > > > > >> > > > > Non-merge commits (relative to Linus' tree): 7461 >> > > > > 7314 files changed, 309736 insertions(+), 172363 deletions(-) >> > > > > >> > > > > >> > > > > >> > > > >> > > > [ CC linux-rcu | linux-pm | intel_pstate maintainers ] >> > > >> > > Dirk is not the maintainer of intel_pstate any more, CC: Kristen. >> > > >> > > > Hi, >> > > > >> > > > after suspend-and-resume I see the following call-trace: >> > > >> > > Do you see that after CPU1 offline too? >> > > >> > > > ... >> > > > [ 1144.482666] Disabling non-boot CPUs ... >> > > > [ 1144.483000] intel_pstate CPU 1 exiting >> > > > [ 1144.486064] >> > > > [ 1144.486065] === >> > > > [ 1144.486067] smpboot: CPU 1 didn't die... >> > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] >> > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted >> > > > [ 1144.486070] --- >> > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious >> > > > rcu_dereference_check() usage! >> > > > [ 1144.486073] >> > > > [ 1144.486073] other info that might help us debug this: >> > > > [ 1144.486073] >> > > > [ 1144.486074] >> > > > [ 1144.486074] RCU used illegally from offline CPU! >> > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 >> > > > [ 1144.486076] no locks held by swapper/1/0. >> > > > [ 1144.486076] >> > > > [ 1144.486076] stack backtrace: >> > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted >> > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 >> > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >> > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 >> > > > [ 1144.486085] 0001 88011a44fe18 817e370d >> > > > 0011 >> > > > [ 1144.486088] 88011a448290 88011a44fe48 810d6847 >> > > > 8800c66b9600 >> > > > [ 1144.486091] 0001 88011a44c000 81cb3900 >> > > > 88011a44fe78 >> > > > [ 1144.486092] Call Trace: >> > > > [ 1144.486099] [] dump_stack+0x4c/0x65 >> > > > [ 1144.486104] [] lockdep_rcu_suspicious+0xe7/0x120 >> > >> > As near as I can tell, idle_task_exit() is running on an offline CPU, >> > then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. >> > And RCU is objecting to being used from a CPU that it is ignoring. >> > >> > One approach would be to push RCU's idea of when the CPU goes offline >> > down into arch code in this case, using some Kconfig symbol and >> > the usual conditional compilation. Another approach would be to >> > invoke the trace calls under cpu_online(), for example, for the >> > first such call in switch_mm(): >> > >> > if (cpu_online(smp_processor_id())) >> > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); >> > >> > The compiler would discard this if tracing was disabled. >> >> That looks like less intrusive to me. > > One possible concern is increased context-switch path length, but that > would only be the case where tracing is enabled by default. > Hmmm, which kernel-config "trace" options do you mean in particular? >> > Other thoughts? >> >> Well, the whole issue here seems to be that common code using RCU is also >> useful in places where RCU doesn't want to be used. Arguably, we can deal >> with all of those cases in a whack-a-mole manner, but that doesn't seem to >> scale too well. > > Well, I did put a change into -next that makes these particular moles > stick their heads up farther, so this is not a random event. And in > this particular case, we do have the option of extending RCU's reach to > cover this operation, at the expense of a bit more intrusion by RCU into > arch-specific code. If tracing is enabled by default by major distros, > that might be the right thing to do, unappealing though it might be. > Can you point me to that change in rcu-next? > But yes, it would have been far
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 12:25 AM, Rafael J. Wysocki wrote: > On Wednesday, February 04, 2015 11:38:40 PM Sedat Dilek wrote: >> On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki >> wrote: >> > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: >> >> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell >> >> wrote: >> >> > Hi all, >> >> > >> >> > The next release I will be making will be next-20150209 - which will >> >> > probably be after the v3.19 release. >> >> > >> >> > Changes since 20150203: >> >> > >> >> > The sound-asoc tree gained a conflict against the sound tree. >> >> > >> >> > The scsi tree gained a build failure caused by an interaction with the >> >> > driver-core tree. I applied a merge fix patch. >> >> > >> >> > The akpm-current tree gained a build failure for which I disabled >> >> > CONFIG_KASAN. >> >> > >> >> > Non-merge commits (relative to Linus' tree): 7461 >> >> > 7314 files changed, 309736 insertions(+), 172363 deletions(-) >> >> > >> >> > >> >> > >> >> >> >> [ CC linux-rcu | linux-pm | intel_pstate maintainers ] >> > >> > Dirk is not the maintainer of intel_pstate any more, CC: Kristen. >> > >> >> Yupp, I forwarded my original posting before you answered me. >> >> >> Hi, >> >> >> >> after suspend-and-resume I see the following call-trace: >> > >> > Do you see that after CPU1 offline too? >> > >> >> Did not check yet. >> >> >> ... >> >> [ 1144.482666] Disabling non-boot CPUs ... >> >> [ 1144.483000] intel_pstate CPU 1 exiting >> >> [ 1144.486064] >> >> [ 1144.486065] === >> >> [ 1144.486067] smpboot: CPU 1 didn't die... >> >> [ 1144.486067] [ INFO: suspicious RCU usage. ] >> >> [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted >> >> [ 1144.486070] --- >> >> [ 1144.486072] include/trace/events/tlb.h:35 suspicious >> >> rcu_dereference_check() usage! >> >> [ 1144.486073] >> >> [ 1144.486073] other info that might help us debug this: >> >> [ 1144.486073] >> >> [ 1144.486074] >> >> [ 1144.486074] RCU used illegally from offline CPU! >> >> [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 >> >> [ 1144.486076] no locks held by swapper/1/0. >> >> [ 1144.486076] >> >> [ 1144.486076] stack backtrace: >> >> [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted >> >> 3.19.0-rc7-next-20150204.1-iniza-small #1 >> >> [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >> >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 >> >> [ 1144.486085] 0001 88011a44fe18 817e370d >> >> 0011 >> >> [ 1144.486088] 88011a448290 88011a44fe48 810d6847 >> >> 8800c66b9600 >> >> [ 1144.486091] 0001 88011a44c000 81cb3900 >> >> 88011a44fe78 >> >> [ 1144.486092] Call Trace: >> >> [ 1144.486099] [] dump_stack+0x4c/0x65 >> >> [ 1144.486104] [] lockdep_rcu_suspicious+0xe7/0x120 >> >> [ 1144.486109] [] idle_task_exit+0x205/0x2c0 >> >> [ 1144.486113] [] play_dead_common+0xe/0x50 >> >> [ 1144.486116] [] native_play_dead+0x15/0x140 >> >> [ 1144.486121] [] arch_cpu_idle_dead+0xf/0x20 >> >> [ 1144.486123] [] cpu_startup_entry+0x37e/0x580 >> >> [ 1144.486126] [] start_secondary+0x140/0x150 >> >> [ 1144.502920] intel_pstate CPU 2 exiting >> >> ... >> >> >> >> Not sure if this comes from the rcu or pm/intel_pstate area. >> > >> > New intel_pstate commits in linux-next are between 7ab0256e57ae and >> > a04759924e25 inclusive. Please check that range first. >> > >> >> Not sure if I am willing to test with reverted patches. >> ( /me was updating Linux graphic driver stack today built with >> upcomming llvm-toolchain v3.6.0. ) >> >> > If that doesn't point you to the offender, you can pull the linux-next >> > branch of the linux-pm.git tree at: >> > >> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git >> > linux-next >> > >> > and see if that alone triggers the issue for you. If not, the offender is >> > not there. Otherwise, and if you use the ACPI cpuidle driver, you can >> > check the acpi-processor merge point too. >> > >> >> I pulled in pm-next-20150204 on top of next-20150204, but that did not help. > > What I was asking about was to test linux-pm.git/linux-next *instead* *of* > full > linux-next and not on top of it. That would tell you whether or not the new > trace > was introduced by one of the PM commits or elsewhere. > No, I did not test this. > But this most likely is what Paul said anyway. > Not sure what you mean by this statement. I tried -3 kernel with... f64b348810c2 Revert "intel_pstate: Add support for SkyLake" a0d825a39848 Revert "intel_pstate: expose turbo range to sysfs" 847153608ecf Revert "intel_pstate: Add num_pstates to sysfs" 412a6770cde4 Revert "intel_pstate: respect cpufreq policy request" e2a6685023ed Revert "intel_pstate: honor user space min_perf_pct override on resume"
Re: linux-next: Tree for Feb 4
On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: > > > > On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell > > > > wrote: > > > > > Hi all, > > > > > > > > > > The next release I will be making will be next-20150209 - which will > > > > > probably be after the v3.19 release. > > > > > > > > > > Changes since 20150203: > > > > > > > > > > The sound-asoc tree gained a conflict against the sound tree. > > > > > > > > > > The scsi tree gained a build failure caused by an interaction with the > > > > > driver-core tree. I applied a merge fix patch. > > > > > > > > > > The akpm-current tree gained a build failure for which I disabled > > > > > CONFIG_KASAN. > > > > > > > > > > Non-merge commits (relative to Linus' tree): 7461 > > > > > 7314 files changed, 309736 insertions(+), 172363 deletions(-) > > > > > > > > > > > > > > > > > > > > > > > [ CC linux-rcu | linux-pm | intel_pstate maintainers ] > > > > > > Dirk is not the maintainer of intel_pstate any more, CC: Kristen. > > > > > > > Hi, > > > > > > > > after suspend-and-resume I see the following call-trace: > > > > > > Do you see that after CPU1 offline too? > > > > > > > ... > > > > [ 1144.482666] Disabling non-boot CPUs ... > > > > [ 1144.483000] intel_pstate CPU 1 exiting > > > > [ 1144.486064] > > > > [ 1144.486065] === > > > > [ 1144.486067] smpboot: CPU 1 didn't die... > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted > > > > [ 1144.486070] --- > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious > > > > rcu_dereference_check() usage! > > > > [ 1144.486073] > > > > [ 1144.486073] other info that might help us debug this: > > > > [ 1144.486073] > > > > [ 1144.486074] > > > > [ 1144.486074] RCU used illegally from offline CPU! > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 > > > > [ 1144.486076] no locks held by swapper/1/0. > > > > [ 1144.486076] > > > > [ 1144.486076] stack backtrace: > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > > > > [ 1144.486085] 0001 88011a44fe18 817e370d > > > > 0011 > > > > [ 1144.486088] 88011a448290 88011a44fe48 810d6847 > > > > 8800c66b9600 > > > > [ 1144.486091] 0001 88011a44c000 81cb3900 > > > > 88011a44fe78 > > > > [ 1144.486092] Call Trace: > > > > [ 1144.486099] [] dump_stack+0x4c/0x65 > > > > [ 1144.486104] [] lockdep_rcu_suspicious+0xe7/0x120 > > > > As near as I can tell, idle_task_exit() is running on an offline CPU, > > then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. > > And RCU is objecting to being used from a CPU that it is ignoring. > > > > One approach would be to push RCU's idea of when the CPU goes offline > > down into arch code in this case, using some Kconfig symbol and > > the usual conditional compilation. Another approach would be to > > invoke the trace calls under cpu_online(), for example, for the > > first such call in switch_mm(): > > > > if (cpu_online(smp_processor_id())) > > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); > > > > The compiler would discard this if tracing was disabled. > > That looks like less intrusive to me. One possible concern is increased context-switch path length, but that would only be the case where tracing is enabled by default. > > Other thoughts? > > Well, the whole issue here seems to be that common code using RCU is also > useful in places where RCU doesn't want to be used. Arguably, we can deal > with all of those cases in a whack-a-mole manner, but that doesn't seem to > scale too well. Well, I did put a change into -next that makes these particular moles stick their heads up farther, so this is not a random event. And in this particular case, we do have the option of extending RCU's reach to cover this operation, at the expense of a bit more intrusion by RCU into arch-specific code. If tracing is enabled by default by major distros, that might be the right thing to do, unappealing though it might be. But yes, it would have been far better for RCU to have been picky to begin with, so that these issues could have been addressed as the were added to the kernel. I guess one possible source of comfort is that once this is in place, future issues will make themselves immediately apparent.
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 12:30 AM, Rafael J. Wysocki wrote: > On Wednesday, February 04, 2015 11:46:32 PM Sedat Dilek wrote: >> On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki >> wrote: >> > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: >> >> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell >> >> wrote: >> >> > Hi all, >> >> > >> >> > The next release I will be making will be next-20150209 - which will >> >> > probably be after the v3.19 release. >> >> > >> >> > Changes since 20150203: >> >> > >> >> > The sound-asoc tree gained a conflict against the sound tree. >> >> > >> >> > The scsi tree gained a build failure caused by an interaction with the >> >> > driver-core tree. I applied a merge fix patch. >> >> > >> >> > The akpm-current tree gained a build failure for which I disabled >> >> > CONFIG_KASAN. >> >> > >> >> > Non-merge commits (relative to Linus' tree): 7461 >> >> > 7314 files changed, 309736 insertions(+), 172363 deletions(-) >> >> > >> >> > >> >> > >> >> >> >> [ CC linux-rcu | linux-pm | intel_pstate maintainers ] >> > >> > Dirk is not the maintainer of intel_pstate any more, CC: Kristen. >> > >> >> Hi, >> >> >> >> after suspend-and-resume I see the following call-trace: >> > >> > Do you see that after CPU1 offline too? >> > >> >> NO. >> >> After... >> >> root# echo 0 > /sys/devices/system/cpu/cpu1/online >> >> ...I see this: >> >> +[ 707.936668] PM: Saving platform NVS memory >> +[ 707.936674] Disabling non-boot CPUs ... >> +[ 707.936712] intel_pstate CPU 2 exiting >> +[ 707.938024] smpboot: CPU 2 didn't die... >> +[ 707.949128] intel_pstate CPU 3 exiting >> +[ 707.950369] smpboot: CPU 3 didn't die... >> +[ 707.966248] ACPI: Low-level resume complete >> +[ 707.966302] PM: Restoring platform NVS memory >> >> Full dmesg attached. > > The dmesg doesn't match what you said above. > > Anyway, that's not what I meant. Does the CPU1 offlining alone: > > # echo 0 > /sys/devices/system/cpu/cpu1/online > > trigger the trace? It should. > YES, I see this... ... [ 84.668616] PPP BSD Compression module registered [ 84.678072] PPP Deflate Compression module registered [ 101.143582] intel_pstate CPU 1 exiting [ 101.157134] [ 101.157135] === [ 101.157136] [ INFO: suspicious RCU usage. ] [ 101.157139] 3.19.0-rc7-next-20150204.3-iniza-small #1 Not tainted [ 101.157140] --- [ 101.157142] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 101.157142] [ 101.157142] other info that might help us debug this: [ 101.157142] [ 101.157143] [ 101.157143] RCU used illegally from offline CPU! [ 101.157143] rcu_scheduler_active = 1, debug_locks = 0 [ 101.157144] no locks held by swapper/1/0. [ 101.157144] [ 101.157144] stack backtrace: [ 101.157146] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.3-iniza-small #1 [ 101.157147] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 101.157151] 0001 88011a44fe18 817e35fd 0011 [ 101.157153] 88011a448290 88011a44fe48 810d6847 8800d3b96100 [ 101.157155] 0001 88011a44c000 0005 88011a44fe78 [ 101.157156] Call Trace: [ 101.157162] [] dump_stack+0x4c/0x65 [ 101.157166] [] lockdep_rcu_suspicious+0xe7/0x120 [ 101.157170] [] idle_task_exit+0x205/0x2c0 [ 101.157173] [] play_dead_common+0xe/0x50 [ 101.157175] [] native_play_dead+0x15/0x140 [ 101.157179] [] arch_cpu_idle_dead+0xf/0x20 [ 101.157181] [] cpu_startup_entry+0x37e/0x580 [ 101.157183] [] start_secondary+0x140/0x150 [ 101.157228] smpboot: CPU 1 is now offline - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wednesday, February 04, 2015 11:46:32 PM Sedat Dilek wrote: > On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki wrote: > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: > >> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell > >> wrote: > >> > Hi all, > >> > > >> > The next release I will be making will be next-20150209 - which will > >> > probably be after the v3.19 release. > >> > > >> > Changes since 20150203: > >> > > >> > The sound-asoc tree gained a conflict against the sound tree. > >> > > >> > The scsi tree gained a build failure caused by an interaction with the > >> > driver-core tree. I applied a merge fix patch. > >> > > >> > The akpm-current tree gained a build failure for which I disabled > >> > CONFIG_KASAN. > >> > > >> > Non-merge commits (relative to Linus' tree): 7461 > >> > 7314 files changed, 309736 insertions(+), 172363 deletions(-) > >> > > >> > > >> > > >> > >> [ CC linux-rcu | linux-pm | intel_pstate maintainers ] > > > > Dirk is not the maintainer of intel_pstate any more, CC: Kristen. > > > >> Hi, > >> > >> after suspend-and-resume I see the following call-trace: > > > > Do you see that after CPU1 offline too? > > > > NO. > > After... > > root# echo 0 > /sys/devices/system/cpu/cpu1/online > > ...I see this: > > +[ 707.936668] PM: Saving platform NVS memory > +[ 707.936674] Disabling non-boot CPUs ... > +[ 707.936712] intel_pstate CPU 2 exiting > +[ 707.938024] smpboot: CPU 2 didn't die... > +[ 707.949128] intel_pstate CPU 3 exiting > +[ 707.950369] smpboot: CPU 3 didn't die... > +[ 707.966248] ACPI: Low-level resume complete > +[ 707.966302] PM: Restoring platform NVS memory > > Full dmesg attached. The dmesg doesn't match what you said above. Anyway, that's not what I meant. Does the CPU1 offlining alone: # echo 0 > /sys/devices/system/cpu/cpu1/online trigger the trace? It should. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wednesday, February 04, 2015 11:38:40 PM Sedat Dilek wrote: > On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki wrote: > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: > >> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell > >> wrote: > >> > Hi all, > >> > > >> > The next release I will be making will be next-20150209 - which will > >> > probably be after the v3.19 release. > >> > > >> > Changes since 20150203: > >> > > >> > The sound-asoc tree gained a conflict against the sound tree. > >> > > >> > The scsi tree gained a build failure caused by an interaction with the > >> > driver-core tree. I applied a merge fix patch. > >> > > >> > The akpm-current tree gained a build failure for which I disabled > >> > CONFIG_KASAN. > >> > > >> > Non-merge commits (relative to Linus' tree): 7461 > >> > 7314 files changed, 309736 insertions(+), 172363 deletions(-) > >> > > >> > > >> > > >> > >> [ CC linux-rcu | linux-pm | intel_pstate maintainers ] > > > > Dirk is not the maintainer of intel_pstate any more, CC: Kristen. > > > > Yupp, I forwarded my original posting before you answered me. > > >> Hi, > >> > >> after suspend-and-resume I see the following call-trace: > > > > Do you see that after CPU1 offline too? > > > > Did not check yet. > > >> ... > >> [ 1144.482666] Disabling non-boot CPUs ... > >> [ 1144.483000] intel_pstate CPU 1 exiting > >> [ 1144.486064] > >> [ 1144.486065] === > >> [ 1144.486067] smpboot: CPU 1 didn't die... > >> [ 1144.486067] [ INFO: suspicious RCU usage. ] > >> [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted > >> [ 1144.486070] --- > >> [ 1144.486072] include/trace/events/tlb.h:35 suspicious > >> rcu_dereference_check() usage! > >> [ 1144.486073] > >> [ 1144.486073] other info that might help us debug this: > >> [ 1144.486073] > >> [ 1144.486074] > >> [ 1144.486074] RCU used illegally from offline CPU! > >> [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 > >> [ 1144.486076] no locks held by swapper/1/0. > >> [ 1144.486076] > >> [ 1144.486076] stack backtrace: > >> [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > >> 3.19.0-rc7-next-20150204.1-iniza-small #1 > >> [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > >> [ 1144.486085] 0001 88011a44fe18 817e370d > >> 0011 > >> [ 1144.486088] 88011a448290 88011a44fe48 810d6847 > >> 8800c66b9600 > >> [ 1144.486091] 0001 88011a44c000 81cb3900 > >> 88011a44fe78 > >> [ 1144.486092] Call Trace: > >> [ 1144.486099] [] dump_stack+0x4c/0x65 > >> [ 1144.486104] [] lockdep_rcu_suspicious+0xe7/0x120 > >> [ 1144.486109] [] idle_task_exit+0x205/0x2c0 > >> [ 1144.486113] [] play_dead_common+0xe/0x50 > >> [ 1144.486116] [] native_play_dead+0x15/0x140 > >> [ 1144.486121] [] arch_cpu_idle_dead+0xf/0x20 > >> [ 1144.486123] [] cpu_startup_entry+0x37e/0x580 > >> [ 1144.486126] [] start_secondary+0x140/0x150 > >> [ 1144.502920] intel_pstate CPU 2 exiting > >> ... > >> > >> Not sure if this comes from the rcu or pm/intel_pstate area. > > > > New intel_pstate commits in linux-next are between 7ab0256e57ae and > > a04759924e25 inclusive. Please check that range first. > > > > Not sure if I am willing to test with reverted patches. > ( /me was updating Linux graphic driver stack today built with > upcomming llvm-toolchain v3.6.0. ) > > > If that doesn't point you to the offender, you can pull the linux-next > > branch of the linux-pm.git tree at: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next > > > > and see if that alone triggers the issue for you. If not, the offender is > > not there. Otherwise, and if you use the ACPI cpuidle driver, you can > > check the acpi-processor merge point too. > > > > I pulled in pm-next-20150204 on top of next-20150204, but that did not help. What I was asking about was to test linux-pm.git/linux-next *instead* *of* full linux-next and not on top of it. That would tell you whether or not the new trace was introduced by one of the PM commits or elsewhere. But this most likely is what Paul said anyway. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki wrote: > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: >> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell >> wrote: >> > Hi all, >> > >> > The next release I will be making will be next-20150209 - which will >> > probably be after the v3.19 release. >> > >> > Changes since 20150203: >> > >> > The sound-asoc tree gained a conflict against the sound tree. >> > >> > The scsi tree gained a build failure caused by an interaction with the >> > driver-core tree. I applied a merge fix patch. >> > >> > The akpm-current tree gained a build failure for which I disabled >> > CONFIG_KASAN. >> > >> > Non-merge commits (relative to Linus' tree): 7461 >> > 7314 files changed, 309736 insertions(+), 172363 deletions(-) >> > >> > >> > >> >> [ CC linux-rcu | linux-pm | intel_pstate maintainers ] > > Dirk is not the maintainer of intel_pstate any more, CC: Kristen. > >> Hi, >> >> after suspend-and-resume I see the following call-trace: > > Do you see that after CPU1 offline too? > NO. After... root# echo 0 > /sys/devices/system/cpu/cpu1/online ...I see this: +[ 707.936668] PM: Saving platform NVS memory +[ 707.936674] Disabling non-boot CPUs ... +[ 707.936712] intel_pstate CPU 2 exiting +[ 707.938024] smpboot: CPU 2 didn't die... +[ 707.949128] intel_pstate CPU 3 exiting +[ 707.950369] smpboot: CPU 3 didn't die... +[ 707.966248] ACPI: Low-level resume complete +[ 707.966302] PM: Restoring platform NVS memory Full dmesg attached. I hope this helps. - Sedat - >> ... >> [ 1144.482666] Disabling non-boot CPUs ... >> [ 1144.483000] intel_pstate CPU 1 exiting >> [ 1144.486064] >> [ 1144.486065] === >> [ 1144.486067] smpboot: CPU 1 didn't die... >> [ 1144.486067] [ INFO: suspicious RCU usage. ] >> [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted >> [ 1144.486070] --- >> [ 1144.486072] include/trace/events/tlb.h:35 suspicious >> rcu_dereference_check() usage! >> [ 1144.486073] >> [ 1144.486073] other info that might help us debug this: >> [ 1144.486073] >> [ 1144.486074] >> [ 1144.486074] RCU used illegally from offline CPU! >> [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 >> [ 1144.486076] no locks held by swapper/1/0. >> [ 1144.486076] >> [ 1144.486076] stack backtrace: >> [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted >> 3.19.0-rc7-next-20150204.1-iniza-small #1 >> [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 >> [ 1144.486085] 0001 88011a44fe18 817e370d >> 0011 >> [ 1144.486088] 88011a448290 88011a44fe48 810d6847 >> 8800c66b9600 >> [ 1144.486091] 0001 88011a44c000 81cb3900 >> 88011a44fe78 >> [ 1144.486092] Call Trace: >> [ 1144.486099] [] dump_stack+0x4c/0x65 >> [ 1144.486104] [] lockdep_rcu_suspicious+0xe7/0x120 >> [ 1144.486109] [] idle_task_exit+0x205/0x2c0 >> [ 1144.486113] [] play_dead_common+0xe/0x50 >> [ 1144.486116] [] native_play_dead+0x15/0x140 >> [ 1144.486121] [] arch_cpu_idle_dead+0xf/0x20 >> [ 1144.486123] [] cpu_startup_entry+0x37e/0x580 >> [ 1144.486126] [] start_secondary+0x140/0x150 >> [ 1144.502920] intel_pstate CPU 2 exiting >> ... >> >> Not sure if this comes from the rcu or pm/intel_pstate area. > > New intel_pstate commits in linux-next are between 7ab0256e57ae and > a04759924e25 inclusive. Please check that range first. > > If that doesn't point you to the offender, you can pull the linux-next > branch of the linux-pm.git tree at: > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next > > and see if that alone triggers the issue for you. If not, the offender is > not there. Otherwise, and if you use the ACPI cpuidle driver, you can > check the acpi-processor merge point too. > > > -- > I speak only for myself. > Rafael J. Wysocki, Intel Open Source Technology Center. [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.19.0-rc7-next-20150204.2-iniza-small (sedat.di...@gmail.com@fambox) (gcc version 4.9.2 (Ubuntu 4.9.2-0ubuntu1~12.04) ) #1 SMP Wed Feb 4 23:25:30 CET 2015 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.19.0-rc7-next-20150204.2-iniza-small root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] Disabled fast string operations [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable [0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved [
Re: linux-next: Tree for Feb 4
On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki wrote: > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: >> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell >> wrote: >> > Hi all, >> > >> > The next release I will be making will be next-20150209 - which will >> > probably be after the v3.19 release. >> > >> > Changes since 20150203: >> > >> > The sound-asoc tree gained a conflict against the sound tree. >> > >> > The scsi tree gained a build failure caused by an interaction with the >> > driver-core tree. I applied a merge fix patch. >> > >> > The akpm-current tree gained a build failure for which I disabled >> > CONFIG_KASAN. >> > >> > Non-merge commits (relative to Linus' tree): 7461 >> > 7314 files changed, 309736 insertions(+), 172363 deletions(-) >> > >> > >> > >> >> [ CC linux-rcu | linux-pm | intel_pstate maintainers ] > > Dirk is not the maintainer of intel_pstate any more, CC: Kristen. > Yupp, I forwarded my original posting before you answered me. >> Hi, >> >> after suspend-and-resume I see the following call-trace: > > Do you see that after CPU1 offline too? > Did not check yet. >> ... >> [ 1144.482666] Disabling non-boot CPUs ... >> [ 1144.483000] intel_pstate CPU 1 exiting >> [ 1144.486064] >> [ 1144.486065] === >> [ 1144.486067] smpboot: CPU 1 didn't die... >> [ 1144.486067] [ INFO: suspicious RCU usage. ] >> [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted >> [ 1144.486070] --- >> [ 1144.486072] include/trace/events/tlb.h:35 suspicious >> rcu_dereference_check() usage! >> [ 1144.486073] >> [ 1144.486073] other info that might help us debug this: >> [ 1144.486073] >> [ 1144.486074] >> [ 1144.486074] RCU used illegally from offline CPU! >> [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 >> [ 1144.486076] no locks held by swapper/1/0. >> [ 1144.486076] >> [ 1144.486076] stack backtrace: >> [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted >> 3.19.0-rc7-next-20150204.1-iniza-small #1 >> [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 >> [ 1144.486085] 0001 88011a44fe18 817e370d >> 0011 >> [ 1144.486088] 88011a448290 88011a44fe48 810d6847 >> 8800c66b9600 >> [ 1144.486091] 0001 88011a44c000 81cb3900 >> 88011a44fe78 >> [ 1144.486092] Call Trace: >> [ 1144.486099] [] dump_stack+0x4c/0x65 >> [ 1144.486104] [] lockdep_rcu_suspicious+0xe7/0x120 >> [ 1144.486109] [] idle_task_exit+0x205/0x2c0 >> [ 1144.486113] [] play_dead_common+0xe/0x50 >> [ 1144.486116] [] native_play_dead+0x15/0x140 >> [ 1144.486121] [] arch_cpu_idle_dead+0xf/0x20 >> [ 1144.486123] [] cpu_startup_entry+0x37e/0x580 >> [ 1144.486126] [] start_secondary+0x140/0x150 >> [ 1144.502920] intel_pstate CPU 2 exiting >> ... >> >> Not sure if this comes from the rcu or pm/intel_pstate area. > > New intel_pstate commits in linux-next are between 7ab0256e57ae and > a04759924e25 inclusive. Please check that range first. > Not sure if I am willing to test with reverted patches. ( /me was updating Linux graphic driver stack today built with upcomming llvm-toolchain v3.6.0. ) > If that doesn't point you to the offender, you can pull the linux-next > branch of the linux-pm.git tree at: > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next > > and see if that alone triggers the issue for you. If not, the offender is > not there. Otherwise, and if you use the ACPI cpuidle driver, you can > check the acpi-processor merge point too. > I pulled in pm-next-20150204 on top of next-20150204, but that did not help. - Sedat - Jiang Liu (8): ACPI: Fix a bug in parsing ACPI Memory24 resource ACPI: Normalize return value of resource parser functions ACPI: Set flag IORESOURCE_UNSET for unassigned resources ACPI: Enforce stricter checks for address space descriptors ACPI: Return translation offset when parsing ACPI address space resources ACPI: Translate resource into master side address for bridge window resources ACPI: Add field offset to struct resource_list_entry ACPI: Introduce helper function acpi_dev_filter_resource_type() Markus Elfring (1): cpufreq-dt: Drop unnecessary check before cpufreq_cooling_unregister() invocation Rafael J. Wysocki (14): ACPI / cpuidle: Drop unnecessary calls from acpi_idle_do_entry() ACPI / cpuidle: Drop unnecessary calls from ->enter callback routines ACPI / cpuidle: Clean up fallback to C1 checks ACPI / cpuidle: Drop irrelevant comment from acpi_idle_enter_simple() ACPI / cpuidle: Clean up white space in a switch statement ACPI / cpuidle: Drop flags.bm_check tests from acpi_idle_enter_bm() ACPI / cpuidle: Merge acpi_idle_enter_c1()
Re: linux-next: Tree for Feb 4
On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: > > > On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell > > > wrote: > > > > Hi all, > > > > > > > > The next release I will be making will be next-20150209 - which will > > > > probably be after the v3.19 release. > > > > > > > > Changes since 20150203: > > > > > > > > The sound-asoc tree gained a conflict against the sound tree. > > > > > > > > The scsi tree gained a build failure caused by an interaction with the > > > > driver-core tree. I applied a merge fix patch. > > > > > > > > The akpm-current tree gained a build failure for which I disabled > > > > CONFIG_KASAN. > > > > > > > > Non-merge commits (relative to Linus' tree): 7461 > > > > 7314 files changed, 309736 insertions(+), 172363 deletions(-) > > > > > > > > > > > > > > > > > > [ CC linux-rcu | linux-pm | intel_pstate maintainers ] > > > > Dirk is not the maintainer of intel_pstate any more, CC: Kristen. > > > > > Hi, > > > > > > after suspend-and-resume I see the following call-trace: > > > > Do you see that after CPU1 offline too? > > > > > ... > > > [ 1144.482666] Disabling non-boot CPUs ... > > > [ 1144.483000] intel_pstate CPU 1 exiting > > > [ 1144.486064] > > > [ 1144.486065] === > > > [ 1144.486067] smpboot: CPU 1 didn't die... > > > [ 1144.486067] [ INFO: suspicious RCU usage. ] > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted > > > [ 1144.486070] --- > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious > > > rcu_dereference_check() usage! > > > [ 1144.486073] > > > [ 1144.486073] other info that might help us debug this: > > > [ 1144.486073] > > > [ 1144.486074] > > > [ 1144.486074] RCU used illegally from offline CPU! > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 > > > [ 1144.486076] no locks held by swapper/1/0. > > > [ 1144.486076] > > > [ 1144.486076] stack backtrace: > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > > > 3.19.0-rc7-next-20150204.1-iniza-small #1 > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > > > [ 1144.486085] 0001 88011a44fe18 817e370d > > > 0011 > > > [ 1144.486088] 88011a448290 88011a44fe48 810d6847 > > > 8800c66b9600 > > > [ 1144.486091] 0001 88011a44c000 81cb3900 > > > 88011a44fe78 > > > [ 1144.486092] Call Trace: > > > [ 1144.486099] [] dump_stack+0x4c/0x65 > > > [ 1144.486104] [] lockdep_rcu_suspicious+0xe7/0x120 > > As near as I can tell, idle_task_exit() is running on an offline CPU, > then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. > And RCU is objecting to being used from a CPU that it is ignoring. > > One approach would be to push RCU's idea of when the CPU goes offline > down into arch code in this case, using some Kconfig symbol and > the usual conditional compilation. Another approach would be to > invoke the trace calls under cpu_online(), for example, for the > first such call in switch_mm(): > > if (cpu_online(smp_processor_id())) > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); > > The compiler would discard this if tracing was disabled. That looks like less intrusive to me. > Other thoughts? Well, the whole issue here seems to be that common code using RCU is also useful in places where RCU doesn't want to be used. Arguably, we can deal with all of those cases in a whack-a-mole manner, but that doesn't seem to scale too well. Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: > > On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell > > wrote: > > > Hi all, > > > > > > The next release I will be making will be next-20150209 - which will > > > probably be after the v3.19 release. > > > > > > Changes since 20150203: > > > > > > The sound-asoc tree gained a conflict against the sound tree. > > > > > > The scsi tree gained a build failure caused by an interaction with the > > > driver-core tree. I applied a merge fix patch. > > > > > > The akpm-current tree gained a build failure for which I disabled > > > CONFIG_KASAN. > > > > > > Non-merge commits (relative to Linus' tree): 7461 > > > 7314 files changed, 309736 insertions(+), 172363 deletions(-) > > > > > > > > > > > > > [ CC linux-rcu | linux-pm | intel_pstate maintainers ] > > Dirk is not the maintainer of intel_pstate any more, CC: Kristen. > > > Hi, > > > > after suspend-and-resume I see the following call-trace: > > Do you see that after CPU1 offline too? > > > ... > > [ 1144.482666] Disabling non-boot CPUs ... > > [ 1144.483000] intel_pstate CPU 1 exiting > > [ 1144.486064] > > [ 1144.486065] === > > [ 1144.486067] smpboot: CPU 1 didn't die... > > [ 1144.486067] [ INFO: suspicious RCU usage. ] > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted > > [ 1144.486070] --- > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious > > rcu_dereference_check() usage! > > [ 1144.486073] > > [ 1144.486073] other info that might help us debug this: > > [ 1144.486073] > > [ 1144.486074] > > [ 1144.486074] RCU used illegally from offline CPU! > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 > > [ 1144.486076] no locks held by swapper/1/0. > > [ 1144.486076] > > [ 1144.486076] stack backtrace: > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > > 3.19.0-rc7-next-20150204.1-iniza-small #1 > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > > [ 1144.486085] 0001 88011a44fe18 817e370d > > 0011 > > [ 1144.486088] 88011a448290 88011a44fe48 810d6847 > > 8800c66b9600 > > [ 1144.486091] 0001 88011a44c000 81cb3900 > > 88011a44fe78 > > [ 1144.486092] Call Trace: > > [ 1144.486099] [] dump_stack+0x4c/0x65 > > [ 1144.486104] [] lockdep_rcu_suspicious+0xe7/0x120 As near as I can tell, idle_task_exit() is running on an offline CPU, then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. And RCU is objecting to being used from a CPU that it is ignoring. One approach would be to push RCU's idea of when the CPU goes offline down into arch code in this case, using some Kconfig symbol and the usual conditional compilation. Another approach would be to invoke the trace calls under cpu_online(), for example, for the first such call in switch_mm(): if (cpu_online(smp_processor_id())) trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); The compiler would discard this if tracing was disabled. Other thoughts? Note that this use of RCU from an offline CPU is currently tolerated, but is vulnerable to delays, for example, due to virtualization. If a CPU takes more than one jiffy to get from _stop_machine() state to fully offlined, life can be very hard. > > [ 1144.486109] [] idle_task_exit+0x205/0x2c0 > > [ 1144.486113] [] play_dead_common+0xe/0x50 > > [ 1144.486116] [] native_play_dead+0x15/0x140 > > [ 1144.486121] [] arch_cpu_idle_dead+0xf/0x20 > > [ 1144.486123] [] cpu_startup_entry+0x37e/0x580 > > [ 1144.486126] [] start_secondary+0x140/0x150 > > [ 1144.502920] intel_pstate CPU 2 exiting > > ... > > > > Not sure if this comes from the rcu or pm/intel_pstate area. > > New intel_pstate commits in linux-next are between 7ab0256e57ae and > a04759924e25 inclusive. Please check that range first. > > If that doesn't point you to the offender, you can pull the linux-next > branch of the linux-pm.git tree at: > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next > > and see if that alone triggers the issue for you. If not, the offender is > not there. Otherwise, and if you use the ACPI cpuidle driver, you can > check the acpi-processor merge point too. This is almost certainly RCU getting more strict about CPUs using RCU while offline. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: > On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell > wrote: > > Hi all, > > > > The next release I will be making will be next-20150209 - which will > > probably be after the v3.19 release. > > > > Changes since 20150203: > > > > The sound-asoc tree gained a conflict against the sound tree. > > > > The scsi tree gained a build failure caused by an interaction with the > > driver-core tree. I applied a merge fix patch. > > > > The akpm-current tree gained a build failure for which I disabled > > CONFIG_KASAN. > > > > Non-merge commits (relative to Linus' tree): 7461 > > 7314 files changed, 309736 insertions(+), 172363 deletions(-) > > > > > > > > [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. > Hi, > > after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? > ... > [ 1144.482666] Disabling non-boot CPUs ... > [ 1144.483000] intel_pstate CPU 1 exiting > [ 1144.486064] > [ 1144.486065] === > [ 1144.486067] smpboot: CPU 1 didn't die... > [ 1144.486067] [ INFO: suspicious RCU usage. ] > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted > [ 1144.486070] --- > [ 1144.486072] include/trace/events/tlb.h:35 suspicious > rcu_dereference_check() usage! > [ 1144.486073] > [ 1144.486073] other info that might help us debug this: > [ 1144.486073] > [ 1144.486074] > [ 1144.486074] RCU used illegally from offline CPU! > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 > [ 1144.486076] no locks held by swapper/1/0. > [ 1144.486076] > [ 1144.486076] stack backtrace: > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > 3.19.0-rc7-next-20150204.1-iniza-small #1 > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > [ 1144.486085] 0001 88011a44fe18 817e370d > 0011 > [ 1144.486088] 88011a448290 88011a44fe48 810d6847 > 8800c66b9600 > [ 1144.486091] 0001 88011a44c000 81cb3900 > 88011a44fe78 > [ 1144.486092] Call Trace: > [ 1144.486099] [] dump_stack+0x4c/0x65 > [ 1144.486104] [] lockdep_rcu_suspicious+0xe7/0x120 > [ 1144.486109] [] idle_task_exit+0x205/0x2c0 > [ 1144.486113] [] play_dead_common+0xe/0x50 > [ 1144.486116] [] native_play_dead+0x15/0x140 > [ 1144.486121] [] arch_cpu_idle_dead+0xf/0x20 > [ 1144.486123] [] cpu_startup_entry+0x37e/0x580 > [ 1144.486126] [] start_secondary+0x140/0x150 > [ 1144.502920] intel_pstate CPU 2 exiting > ... > > Not sure if this comes from the rcu or pm/intel_pstate area. New intel_pstate commits in linux-next are between 7ab0256e57ae and a04759924e25 inclusive. Please check that range first. If that doesn't point you to the offender, you can pull the linux-next branch of the linux-pm.git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next and see if that alone triggers the issue for you. If not, the offender is not there. Otherwise, and if you use the ACPI cpuidle driver, you can check the acpi-processor merge point too. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, Feb 4, 2015 at 4:58 PM, Martin K. Petersen wrote: >> "Sedat" == Sedat Dilek writes: > >> I am seeing the following in my logs several times... >> >> Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request: >> I/O error, dev loop0, sector 21261344 Feb 4 02:53:13 fambox >> kernel: [15507.397531] loop0: DISCARD failed. Manually zeroing. > > Sedat> What's the plan... s/pr_warn/pr_debug ? > > The rationale here is that we'd like to log (once) if discard or write > same fail on a given device. > > In SCSI we disable these commands if they get failed by the storage. But > it looks like loop keeps advertising discard support after a failure. > > Is your loop device encrypted? Do you know why the discard is failing? > No, but I am here on a so-called WUBI installation which triggered some bugs being an exotic installation. My Ubuntu/precise is a 18GiB image laying on my Win7 partition (/dev/sda2). How can I check or debug the discard failing? - Sedat - P.S.: Some diagnostics $ LC_ALL=C df -T Filesystem Type 1K-blocks Used Available Use% Mounted on rootfs rootfs17753424 15663216 1165332 94% / udev devtmpfs 1959324 4 1959320 1% /dev tmpfs tmpfs 393888 904392984 1% /run /dev/sda2 fuseblk 465546236 161295260 304250976 35% /host /dev/loop0 ext4 17753424 15663216 1165332 94% / none tmpfs 5120 4 5116 1% /run/lock none tmpfs 1969428 176 1969252 1% /run/shm $ cat /etc/fstab # /etc/fstab: static file system information. # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # proc/proc proc nodev,noexec,nosuid 0 0 /host/ubuntu/disks/root.disk/ ext4 loop,errors=remount-ro 0 1 /host/ubuntu/disks/swap.disknoneswaploop,sw 0 0 $ LC_ALL=C sudo fdisk -l /dev/sda [sudo] password for wearefam: Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0xcb9885ab Device Boot Start End Blocks Id System /dev/sda1 *2048 206847 1024007 HPFS/NTFS/exFAT /dev/sda2 206848 931299327 4655462407 HPFS/NTFS/exFAT /dev/sda3 931299328 97677311922736896 27 Hidden NTFS WinRE $ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.13.0-45-generic root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
> "Sedat" == Sedat Dilek writes: > I am seeing the following in my logs several times... > > Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request: > I/O error, dev loop0, sector 21261344 Feb 4 02:53:13 fambox > kernel: [15507.397531] loop0: DISCARD failed. Manually zeroing. Sedat> What's the plan... s/pr_warn/pr_debug ? The rationale here is that we'd like to log (once) if discard or write same fail on a given device. In SCSI we disable these commands if they get failed by the storage. But it looks like loop keeps advertising discard support after a failure. Is your loop device encrypted? Do you know why the discard is failing? -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, Feb 4, 2015 at 4:31 PM, Jens Axboe wrote: > On 02/04/2015 08:21 AM, Sedat Dilek wrote: >> >> On Wed, Feb 4, 2015 at 4:16 PM, Jens Axboe wrote: >>> >>> On 02/04/2015 05:26 AM, Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell wrote: > > > Hi all, > > The next release I will be making will be next-20150209 - which will > probably be after the v3.19 release. > > Changes since 20150203: > > The sound-asoc tree gained a conflict against the sound tree. > > The scsi tree gained a build failure caused by an interaction with the > driver-core tree. I applied a merge fix patch. > > The akpm-current tree gained a build failure for which I disabled > CONFIG_KASAN. > > Non-merge commits (relative to Linus' tree): 7461 >7314 files changed, 309736 insertions(+), 172363 deletions(-) > > > > > [ CC Jens ] Hi, I am seeing the following in my logs several times... Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request: I/O error, dev loop0, sector 21261344 Feb 4 02:53:13 fambox kernel: [15507.397531] loop0: DISCARD failed. Manually zeroing. >>> >>> >>> >>> This is from Martin's commit (CC'ed). Martin, there are various ways we >>> can >>> end up "failing" from blkdev_issue_discard(), I'm going to kill those >>> debug >>> warnings. >>> >> >> [ Really CC Martin :-) ] > > > Ooops, thanks :-) > >> Caused by this one...? >> >> commit d93ba7a5a97c9f315bacdcdb8de4e5f368e7b396 >> "block: Add discard flag to blkdev_issue_zeroout() function" > > > That's the one. > What's the plan... s/pr_warn/pr_debug ? - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On 02/04/2015 08:21 AM, Sedat Dilek wrote: On Wed, Feb 4, 2015 at 4:16 PM, Jens Axboe wrote: On 02/04/2015 05:26 AM, Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC Jens ] Hi, I am seeing the following in my logs several times... Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request: I/O error, dev loop0, sector 21261344 Feb 4 02:53:13 fambox kernel: [15507.397531] loop0: DISCARD failed. Manually zeroing. This is from Martin's commit (CC'ed). Martin, there are various ways we can end up "failing" from blkdev_issue_discard(), I'm going to kill those debug warnings. [ Really CC Martin :-) ] Ooops, thanks :-) Caused by this one...? commit d93ba7a5a97c9f315bacdcdb8de4e5f368e7b396 "block: Add discard flag to blkdev_issue_zeroout() function" That's the one. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, Feb 4, 2015 at 4:16 PM, Jens Axboe wrote: > On 02/04/2015 05:26 AM, Sedat Dilek wrote: >> >> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell >> wrote: >>> >>> Hi all, >>> >>> The next release I will be making will be next-20150209 - which will >>> probably be after the v3.19 release. >>> >>> Changes since 20150203: >>> >>> The sound-asoc tree gained a conflict against the sound tree. >>> >>> The scsi tree gained a build failure caused by an interaction with the >>> driver-core tree. I applied a merge fix patch. >>> >>> The akpm-current tree gained a build failure for which I disabled >>> CONFIG_KASAN. >>> >>> Non-merge commits (relative to Linus' tree): 7461 >>> 7314 files changed, 309736 insertions(+), 172363 deletions(-) >>> >>> >>> >>> >> >> [ CC Jens ] >> >> Hi, >> >> I am seeing the following in my logs several times... >> >> Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request: I/O >> error, dev loop0, sector 21261344 >> Feb 4 02:53:13 fambox kernel: [15507.397531] loop0: DISCARD failed. >> Manually zeroing. > > > This is from Martin's commit (CC'ed). Martin, there are various ways we can > end up "failing" from blkdev_issue_discard(), I'm going to kill those debug > warnings. > [ Really CC Martin :-) ] Caused by this one...? commit d93ba7a5a97c9f315bacdcdb8de4e5f368e7b396 "block: Add discard flag to blkdev_issue_zeroout() function" - Sedat - [1] http://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/commit/?h=for-next=d93ba7a5a97c9f315bacdcdb8de4e5f368e7b396 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On 02/04/2015 05:26 AM, Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC Jens ] Hi, I am seeing the following in my logs several times... Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request: I/O error, dev loop0, sector 21261344 Feb 4 02:53:13 fambox kernel: [15507.397531] loop0: DISCARD failed. Manually zeroing. This is from Martin's commit (CC'ed). Martin, there are various ways we can end up "failing" from blkdev_issue_discard(), I'm going to kill those debug warnings. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: Tree for Feb 4
Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a multi_v7_defconfig for arm. After the final fixups (if any), it is also built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm defconfig. Below is a summary of the state of the merge. I am currently merging 206 trees (counting Linus' and 30 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwells...@canb.auug.org.au $ git checkout master $ git reset --hard stable Merging origin/master (0f98c38d725f Merge branch 'for-linus' of git://git.kernel.dk/linux-block) Merging fixes/master (b94d525e58dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging kbuild-current/rc-fixes (a16c5f99a28c kbuild: Fix removal of the debian/ directory) Merging arc-current/for-curr (2ce7598c9a45 Linux 3.17-rc4) Merging arm-current/fixes (8e6480667246 ARM: 8299/1: mm: ensure local active ASID is marked as allocated on rollover) CONFLICT (content): Merge conflict in arch/arm/mm/dma-mapping.c Merging m68k-current/for-linus (f27bd5bfeda5 m68k: Wire up execveat) Merging metag-fixes/fixes (ffe6902b66aa asm-generic: remove _STK_LIM_MAX) Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5) Merging powerpc-merge/merge (31345e1a071e powerpc/pci: Remove unused force_32bit_msi quirk) Merging powerpc-merge-mpe/fixes (c59c961ca511 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux) Merging sparc/master (66d0f7ec9f10 sparc32: destroy_context() and switch_mm() needs to disable interrupts.) Merging net/master (42b5212fee4f xen-netback: stop the guest rx thread after a fatal error) Merging ipsec/master (59343cd7c480 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging sound-current/for-linus (4161b4505f16 ALSA: ak411x: Fix stall in work callback) Merging pci-current/for-linus (51ac3d2f0c50 PCI: Add NEC variants to Stratus ftServer PCIe DMI check) Merging wireless-drivers/master (e3f31175a3ee ath9k: fix race condition in irq processing during hardware reset) Merging driver-core.current/driver-core-linus (26bc420b59a3 Linux 3.19-rc6) Merging tty.current/tty-linus (ec6f34e5b552 Linux 3.19-rc5) Merging usb.current/usb-linus (e36f014edff7 Linux 3.19-rc7) Merging usb-gadget-fixes/fixes (0df8fc37f6e4 usb: phy: never defer probe in non-OF case) Merging usb-serial-fixes/usb-linus (a6f0331236fa USB: cp210x: add ID for RUGGEDCOM USB Serial Console) Merging staging.current/staging-linus (e36f014edff7 Linux 3.19-rc7) Merging char-misc.current/char-misc-linus (e36f014edff7 Linux 3.19-rc7) Merging input-current/for-linus (47c1ffb2b6b6 Input: elantech - add more Fujtisu notebooks to force crc_enabled) Merging crypto-current/master (3e14dcf7cb80 crypto: add missing crypto module aliases) Merging ide/master (f96fe225677b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging devicetree-current/devicetree/merge (6b1271de3723 of/unittest: Overlays with sub-devices tests) Merging rr-fixes/fixes (d5db139ab376 module: make module_refcount() a signed integer.) Merging vfio-fixes/for-linus (7c2e211f3c95 vfio-pci: Fix the check on pci device type in vfio_pci_probe()) Merging kselftest-fixes/fixes (f5db310d77ef selftests/vm: fix link error for transhuge-stress test) Merging
linux-next: Tree for Feb 4
Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use git pull to do so as that will try to merge the new linux-next release with the old one. You should use git fetch and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a multi_v7_defconfig for arm. After the final fixups (if any), it is also built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm defconfig. Below is a summary of the state of the merge. I am currently merging 206 trees (counting Linus' and 30 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwells...@canb.auug.org.au $ git checkout master $ git reset --hard stable Merging origin/master (0f98c38d725f Merge branch 'for-linus' of git://git.kernel.dk/linux-block) Merging fixes/master (b94d525e58dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging kbuild-current/rc-fixes (a16c5f99a28c kbuild: Fix removal of the debian/ directory) Merging arc-current/for-curr (2ce7598c9a45 Linux 3.17-rc4) Merging arm-current/fixes (8e6480667246 ARM: 8299/1: mm: ensure local active ASID is marked as allocated on rollover) CONFLICT (content): Merge conflict in arch/arm/mm/dma-mapping.c Merging m68k-current/for-linus (f27bd5bfeda5 m68k: Wire up execveat) Merging metag-fixes/fixes (ffe6902b66aa asm-generic: remove _STK_LIM_MAX) Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5) Merging powerpc-merge/merge (31345e1a071e powerpc/pci: Remove unused force_32bit_msi quirk) Merging powerpc-merge-mpe/fixes (c59c961ca511 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux) Merging sparc/master (66d0f7ec9f10 sparc32: destroy_context() and switch_mm() needs to disable interrupts.) Merging net/master (42b5212fee4f xen-netback: stop the guest rx thread after a fatal error) Merging ipsec/master (59343cd7c480 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging sound-current/for-linus (4161b4505f16 ALSA: ak411x: Fix stall in work callback) Merging pci-current/for-linus (51ac3d2f0c50 PCI: Add NEC variants to Stratus ftServer PCIe DMI check) Merging wireless-drivers/master (e3f31175a3ee ath9k: fix race condition in irq processing during hardware reset) Merging driver-core.current/driver-core-linus (26bc420b59a3 Linux 3.19-rc6) Merging tty.current/tty-linus (ec6f34e5b552 Linux 3.19-rc5) Merging usb.current/usb-linus (e36f014edff7 Linux 3.19-rc7) Merging usb-gadget-fixes/fixes (0df8fc37f6e4 usb: phy: never defer probe in non-OF case) Merging usb-serial-fixes/usb-linus (a6f0331236fa USB: cp210x: add ID for RUGGEDCOM USB Serial Console) Merging staging.current/staging-linus (e36f014edff7 Linux 3.19-rc7) Merging char-misc.current/char-misc-linus (e36f014edff7 Linux 3.19-rc7) Merging input-current/for-linus (47c1ffb2b6b6 Input: elantech - add more Fujtisu notebooks to force crc_enabled) Merging crypto-current/master (3e14dcf7cb80 crypto: add missing crypto module aliases) Merging ide/master (f96fe225677b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging devicetree-current/devicetree/merge (6b1271de3723 of/unittest: Overlays with sub-devices tests) Merging rr-fixes/fixes (d5db139ab376 module: make module_refcount() a signed integer.) Merging vfio-fixes/for-linus (7c2e211f3c95 vfio-pci: Fix the check on pci device type in vfio_pci_probe()) Merging kselftest-fixes/fixes (f5db310d77ef selftests/vm: fix link error for transhuge-stress test) Merging
Re: linux-next: Tree for Feb 4
On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. Hi, after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? ... [ 1144.482666] Disabling non-boot CPUs ... [ 1144.483000] intel_pstate CPU 1 exiting [ 1144.486064] [ 1144.486065] === [ 1144.486067] smpboot: CPU 1 didn't die... [ 1144.486067] [ INFO: suspicious RCU usage. ] [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted [ 1144.486070] --- [ 1144.486072] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 1144.486073] [ 1144.486073] other info that might help us debug this: [ 1144.486073] [ 1144.486074] [ 1144.486074] RCU used illegally from offline CPU! [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 [ 1144.486076] no locks held by swapper/1/0. [ 1144.486076] [ 1144.486076] stack backtrace: [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 1144.486085] 0001 88011a44fe18 817e370d 0011 [ 1144.486088] 88011a448290 88011a44fe48 810d6847 8800c66b9600 [ 1144.486091] 0001 88011a44c000 81cb3900 88011a44fe78 [ 1144.486092] Call Trace: [ 1144.486099] [817e370d] dump_stack+0x4c/0x65 [ 1144.486104] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 As near as I can tell, idle_task_exit() is running on an offline CPU, then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. And RCU is objecting to being used from a CPU that it is ignoring. One approach would be to push RCU's idea of when the CPU goes offline down into arch code in this case, using some Kconfig symbol and the usual conditional compilation. Another approach would be to invoke the trace calls under cpu_online(), for example, for the first such call in switch_mm(): if (cpu_online(smp_processor_id())) trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); The compiler would discard this if tracing was disabled. Other thoughts? Note that this use of RCU from an offline CPU is currently tolerated, but is vulnerable to delays, for example, due to virtualization. If a CPU takes more than one jiffy to get from _stop_machine() state to fully offlined, life can be very hard. [ 1144.486109] [810b71a5] idle_task_exit+0x205/0x2c0 [ 1144.486113] [81054c4e] play_dead_common+0xe/0x50 [ 1144.486116] [81054ca5] native_play_dead+0x15/0x140 [ 1144.486121] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 1144.486123] [810cd89e] cpu_startup_entry+0x37e/0x580 [ 1144.486126] [81053e20] start_secondary+0x140/0x150 [ 1144.502920] intel_pstate CPU 2 exiting ... Not sure if this comes from the rcu or pm/intel_pstate area. New intel_pstate commits in linux-next are between 7ab0256e57ae and a04759924e25 inclusive. Please check that range first. If that doesn't point you to the offender, you can pull the linux-next branch of the linux-pm.git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next and see if that alone triggers the issue for you. If not, the offender is not there. Otherwise, and if you use the ACPI cpuidle driver, you can check the acpi-processor merge point too. This is almost certainly RCU getting more strict about CPUs using RCU while offline. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. Hi, after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? ... [ 1144.482666] Disabling non-boot CPUs ... [ 1144.483000] intel_pstate CPU 1 exiting [ 1144.486064] [ 1144.486065] === [ 1144.486067] smpboot: CPU 1 didn't die... [ 1144.486067] [ INFO: suspicious RCU usage. ] [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted [ 1144.486070] --- [ 1144.486072] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 1144.486073] [ 1144.486073] other info that might help us debug this: [ 1144.486073] [ 1144.486074] [ 1144.486074] RCU used illegally from offline CPU! [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 [ 1144.486076] no locks held by swapper/1/0. [ 1144.486076] [ 1144.486076] stack backtrace: [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 1144.486085] 0001 88011a44fe18 817e370d 0011 [ 1144.486088] 88011a448290 88011a44fe48 810d6847 8800c66b9600 [ 1144.486091] 0001 88011a44c000 81cb3900 88011a44fe78 [ 1144.486092] Call Trace: [ 1144.486099] [817e370d] dump_stack+0x4c/0x65 [ 1144.486104] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 [ 1144.486109] [810b71a5] idle_task_exit+0x205/0x2c0 [ 1144.486113] [81054c4e] play_dead_common+0xe/0x50 [ 1144.486116] [81054ca5] native_play_dead+0x15/0x140 [ 1144.486121] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 1144.486123] [810cd89e] cpu_startup_entry+0x37e/0x580 [ 1144.486126] [81053e20] start_secondary+0x140/0x150 [ 1144.502920] intel_pstate CPU 2 exiting ... Not sure if this comes from the rcu or pm/intel_pstate area. New intel_pstate commits in linux-next are between 7ab0256e57ae and a04759924e25 inclusive. Please check that range first. If that doesn't point you to the offender, you can pull the linux-next branch of the linux-pm.git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next and see if that alone triggers the issue for you. If not, the offender is not there. Otherwise, and if you use the ACPI cpuidle driver, you can check the acpi-processor merge point too. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
Sedat == Sedat Dilek sedat.di...@gmail.com writes: I am seeing the following in my logs several times... Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request: I/O error, dev loop0, sector 21261344 Feb 4 02:53:13 fambox kernel: [15507.397531] loop0: DISCARD failed. Manually zeroing. Sedat What's the plan... s/pr_warn/pr_debug ? The rationale here is that we'd like to log (once) if discard or write same fail on a given device. In SCSI we disable these commands if they get failed by the storage. But it looks like loop keeps advertising discard support after a failure. Is your loop device encrypted? Do you know why the discard is failing? -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, Feb 4, 2015 at 4:58 PM, Martin K. Petersen martin.peter...@oracle.com wrote: Sedat == Sedat Dilek sedat.di...@gmail.com writes: I am seeing the following in my logs several times... Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request: I/O error, dev loop0, sector 21261344 Feb 4 02:53:13 fambox kernel: [15507.397531] loop0: DISCARD failed. Manually zeroing. Sedat What's the plan... s/pr_warn/pr_debug ? The rationale here is that we'd like to log (once) if discard or write same fail on a given device. In SCSI we disable these commands if they get failed by the storage. But it looks like loop keeps advertising discard support after a failure. Is your loop device encrypted? Do you know why the discard is failing? No, but I am here on a so-called WUBI installation which triggered some bugs being an exotic installation. My Ubuntu/precise is a 18GiB image laying on my Win7 partition (/dev/sda2). How can I check or debug the discard failing? - Sedat - P.S.: Some diagnostics $ LC_ALL=C df -T Filesystem Type 1K-blocks Used Available Use% Mounted on rootfs rootfs17753424 15663216 1165332 94% / udev devtmpfs 1959324 4 1959320 1% /dev tmpfs tmpfs 393888 904392984 1% /run /dev/sda2 fuseblk 465546236 161295260 304250976 35% /host /dev/loop0 ext4 17753424 15663216 1165332 94% / none tmpfs 5120 4 5116 1% /run/lock none tmpfs 1969428 176 1969252 1% /run/shm $ cat /etc/fstab # /etc/fstab: static file system information. # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # file system mount point type options dump pass proc/proc proc nodev,noexec,nosuid 0 0 /host/ubuntu/disks/root.disk/ ext4 loop,errors=remount-ro 0 1 /host/ubuntu/disks/swap.disknoneswaploop,sw 0 0 $ LC_ALL=C sudo fdisk -l /dev/sda [sudo] password for wearefam: Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0xcb9885ab Device Boot Start End Blocks Id System /dev/sda1 *2048 206847 1024007 HPFS/NTFS/exFAT /dev/sda2 206848 931299327 4655462407 HPFS/NTFS/exFAT /dev/sda3 931299328 97677311922736896 27 Hidden NTFS WinRE $ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.13.0-45-generic root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wednesday, February 04, 2015 11:46:32 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. Hi, after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? NO. After... root# echo 0 /sys/devices/system/cpu/cpu1/online ...I see this: +[ 707.936668] PM: Saving platform NVS memory +[ 707.936674] Disabling non-boot CPUs ... +[ 707.936712] intel_pstate CPU 2 exiting +[ 707.938024] smpboot: CPU 2 didn't die... +[ 707.949128] intel_pstate CPU 3 exiting +[ 707.950369] smpboot: CPU 3 didn't die... +[ 707.966248] ACPI: Low-level resume complete +[ 707.966302] PM: Restoring platform NVS memory Full dmesg attached. The dmesg doesn't match what you said above. Anyway, that's not what I meant. Does the CPU1 offlining alone: # echo 0 /sys/devices/system/cpu/cpu1/online trigger the trace? It should. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 12:30 AM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, February 04, 2015 11:46:32 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. Hi, after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? NO. After... root# echo 0 /sys/devices/system/cpu/cpu1/online ...I see this: +[ 707.936668] PM: Saving platform NVS memory +[ 707.936674] Disabling non-boot CPUs ... +[ 707.936712] intel_pstate CPU 2 exiting +[ 707.938024] smpboot: CPU 2 didn't die... +[ 707.949128] intel_pstate CPU 3 exiting +[ 707.950369] smpboot: CPU 3 didn't die... +[ 707.966248] ACPI: Low-level resume complete +[ 707.966302] PM: Restoring platform NVS memory Full dmesg attached. The dmesg doesn't match what you said above. Anyway, that's not what I meant. Does the CPU1 offlining alone: # echo 0 /sys/devices/system/cpu/cpu1/online trigger the trace? It should. YES, I see this... ... [ 84.668616] PPP BSD Compression module registered [ 84.678072] PPP Deflate Compression module registered [ 101.143582] intel_pstate CPU 1 exiting [ 101.157134] [ 101.157135] === [ 101.157136] [ INFO: suspicious RCU usage. ] [ 101.157139] 3.19.0-rc7-next-20150204.3-iniza-small #1 Not tainted [ 101.157140] --- [ 101.157142] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 101.157142] [ 101.157142] other info that might help us debug this: [ 101.157142] [ 101.157143] [ 101.157143] RCU used illegally from offline CPU! [ 101.157143] rcu_scheduler_active = 1, debug_locks = 0 [ 101.157144] no locks held by swapper/1/0. [ 101.157144] [ 101.157144] stack backtrace: [ 101.157146] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.3-iniza-small #1 [ 101.157147] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 101.157151] 0001 88011a44fe18 817e35fd 0011 [ 101.157153] 88011a448290 88011a44fe48 810d6847 8800d3b96100 [ 101.157155] 0001 88011a44c000 0005 88011a44fe78 [ 101.157156] Call Trace: [ 101.157162] [817e35fd] dump_stack+0x4c/0x65 [ 101.157166] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 [ 101.157170] [810b71a5] idle_task_exit+0x205/0x2c0 [ 101.157173] [81054c4e] play_dead_common+0xe/0x50 [ 101.157175] [81054ca5] native_play_dead+0x15/0x140 [ 101.157179] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 101.157181] [810cd89e] cpu_startup_entry+0x37e/0x580 [ 101.157183] [81053e20] start_secondary+0x140/0x150 [ 101.157228] smpboot: CPU 1 is now offline - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote: On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: [ . . . ] [ 1144.482666] Disabling non-boot CPUs ... [ 1144.483000] intel_pstate CPU 1 exiting [ 1144.486064] [ 1144.486065] === [ 1144.486067] smpboot: CPU 1 didn't die... [ 1144.486067] [ INFO: suspicious RCU usage. ] [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted [ 1144.486070] --- [ 1144.486072] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 1144.486073] [ 1144.486073] other info that might help us debug this: [ 1144.486073] [ 1144.486074] [ 1144.486074] RCU used illegally from offline CPU! [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 [ 1144.486076] no locks held by swapper/1/0. [ 1144.486076] [ 1144.486076] stack backtrace: [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 1144.486085] 0001 88011a44fe18 817e370d 0011 [ 1144.486088] 88011a448290 88011a44fe48 810d6847 8800c66b9600 [ 1144.486091] 0001 88011a44c000 81cb3900 88011a44fe78 [ 1144.486092] Call Trace: [ 1144.486099] [817e370d] dump_stack+0x4c/0x65 [ 1144.486104] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 As near as I can tell, idle_task_exit() is running on an offline CPU, then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. And RCU is objecting to being used from a CPU that it is ignoring. One approach would be to push RCU's idea of when the CPU goes offline down into arch code in this case, using some Kconfig symbol and the usual conditional compilation. Another approach would be to invoke the trace calls under cpu_online(), for example, for the first such call in switch_mm(): if (cpu_online(smp_processor_id())) trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); The compiler would discard this if tracing was disabled. That looks like less intrusive to me. One possible concern is increased context-switch path length, but that would only be the case where tracing is enabled by default. Nevertheless, here is an untested patch. Does it help? Thanx, Paul x86: Omit switch_mm() tracing for offline CPUs The architecture-specific switch_mm() function can be called by offline CPUs, but includes event tracing, which cannot be legally carried out on offline CPUs. This results in a lockdep-RCU splat. This commit fixes this splat by omitting the tracing when the CPU is offline. Reported-by: Sedat Dilek sedat.di...@gmail.com Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 40269a2bf6f9..7e7f2445fbc9 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -47,7 +47,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, /* Re-load page tables */ load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + if (cpu_online(smp_processor_id())) + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); /* Stop flush ipis for the previous mm */ cpumask_clear_cpu(cpu, mm_cpumask(prev)); @@ -84,7 +85,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * to make sure to use no freed page tables. */ load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + if (cpu_online(smp_processor_id())) + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); load_LDT_nolock(next-context); } } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote: On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: [ . . . ] [ 1144.482666] Disabling non-boot CPUs ... [ 1144.483000] intel_pstate CPU 1 exiting [ 1144.486064] [ 1144.486065] === [ 1144.486067] smpboot: CPU 1 didn't die... [ 1144.486067] [ INFO: suspicious RCU usage. ] [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted [ 1144.486070] --- [ 1144.486072] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 1144.486073] [ 1144.486073] other info that might help us debug this: [ 1144.486073] [ 1144.486074] [ 1144.486074] RCU used illegally from offline CPU! [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 [ 1144.486076] no locks held by swapper/1/0. [ 1144.486076] [ 1144.486076] stack backtrace: [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 1144.486085] 0001 88011a44fe18 817e370d 0011 [ 1144.486088] 88011a448290 88011a44fe48 810d6847 8800c66b9600 [ 1144.486091] 0001 88011a44c000 81cb3900 88011a44fe78 [ 1144.486092] Call Trace: [ 1144.486099] [817e370d] dump_stack+0x4c/0x65 [ 1144.486104] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 As near as I can tell, idle_task_exit() is running on an offline CPU, then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. And RCU is objecting to being used from a CPU that it is ignoring. One approach would be to push RCU's idea of when the CPU goes offline down into arch code in this case, using some Kconfig symbol and the usual conditional compilation. Another approach would be to invoke the trace calls under cpu_online(), for example, for the first such call in switch_mm(): if (cpu_online(smp_processor_id())) trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); The compiler would discard this if tracing was disabled. That looks like less intrusive to me. One possible concern is increased context-switch path length, but that would only be the case where tracing is enabled by default. Nevertheless, here is an untested patch. Does it help? No bedtime :-) I tried with a revert of... commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b rcu: Handle outgoing CPUs on exit from idle loop ...and offlining cpu1 seems not to produce the trace... [ 115.280244] PPP BSD Compression module registered [ 115.288761] PPP Deflate Compression module registered [ 162.935524] intel_pstate CPU 1 exiting [ 162.949729] smpboot: CPU 1 is now offline Will try the patch. - Sedat - Thanx, Paul x86: Omit switch_mm() tracing for offline CPUs The architecture-specific switch_mm() function can be called by offline CPUs, but includes event tracing, which cannot be legally carried out on offline CPUs. This results in a lockdep-RCU splat. This commit fixes this splat by omitting the tracing when the CPU is offline. Reported-by: Sedat Dilek sedat.di...@gmail.com Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 40269a2bf6f9..7e7f2445fbc9 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -47,7 +47,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, /* Re-load page tables */ load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + if (cpu_online(smp_processor_id())) + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); /* Stop flush ipis for the previous mm */ cpumask_clear_cpu(cpu, mm_cpumask(prev)); @@ -84,7 +85,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * to make sure to use no freed page tables. */ load_cr3(next-pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
Re: linux-next: Tree for Feb 4
On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. Hi, after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? ... [ 1144.482666] Disabling non-boot CPUs ... [ 1144.483000] intel_pstate CPU 1 exiting [ 1144.486064] [ 1144.486065] === [ 1144.486067] smpboot: CPU 1 didn't die... [ 1144.486067] [ INFO: suspicious RCU usage. ] [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted [ 1144.486070] --- [ 1144.486072] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 1144.486073] [ 1144.486073] other info that might help us debug this: [ 1144.486073] [ 1144.486074] [ 1144.486074] RCU used illegally from offline CPU! [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 [ 1144.486076] no locks held by swapper/1/0. [ 1144.486076] [ 1144.486076] stack backtrace: [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 1144.486085] 0001 88011a44fe18 817e370d 0011 [ 1144.486088] 88011a448290 88011a44fe48 810d6847 8800c66b9600 [ 1144.486091] 0001 88011a44c000 81cb3900 88011a44fe78 [ 1144.486092] Call Trace: [ 1144.486099] [817e370d] dump_stack+0x4c/0x65 [ 1144.486104] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 As near as I can tell, idle_task_exit() is running on an offline CPU, then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. And RCU is objecting to being used from a CPU that it is ignoring. One approach would be to push RCU's idea of when the CPU goes offline down into arch code in this case, using some Kconfig symbol and the usual conditional compilation. Another approach would be to invoke the trace calls under cpu_online(), for example, for the first such call in switch_mm(): if (cpu_online(smp_processor_id())) trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); The compiler would discard this if tracing was disabled. That looks like less intrusive to me. Other thoughts? Well, the whole issue here seems to be that common code using RCU is also useful in places where RCU doesn't want to be used. Arguably, we can deal with all of those cases in a whack-a-mole manner, but that doesn't seem to scale too well. Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 12:25 AM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, February 04, 2015 11:38:40 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. Yupp, I forwarded my original posting before you answered me. Hi, after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? Did not check yet. ... [ 1144.482666] Disabling non-boot CPUs ... [ 1144.483000] intel_pstate CPU 1 exiting [ 1144.486064] [ 1144.486065] === [ 1144.486067] smpboot: CPU 1 didn't die... [ 1144.486067] [ INFO: suspicious RCU usage. ] [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted [ 1144.486070] --- [ 1144.486072] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 1144.486073] [ 1144.486073] other info that might help us debug this: [ 1144.486073] [ 1144.486074] [ 1144.486074] RCU used illegally from offline CPU! [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 [ 1144.486076] no locks held by swapper/1/0. [ 1144.486076] [ 1144.486076] stack backtrace: [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 1144.486085] 0001 88011a44fe18 817e370d 0011 [ 1144.486088] 88011a448290 88011a44fe48 810d6847 8800c66b9600 [ 1144.486091] 0001 88011a44c000 81cb3900 88011a44fe78 [ 1144.486092] Call Trace: [ 1144.486099] [817e370d] dump_stack+0x4c/0x65 [ 1144.486104] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 [ 1144.486109] [810b71a5] idle_task_exit+0x205/0x2c0 [ 1144.486113] [81054c4e] play_dead_common+0xe/0x50 [ 1144.486116] [81054ca5] native_play_dead+0x15/0x140 [ 1144.486121] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 1144.486123] [810cd89e] cpu_startup_entry+0x37e/0x580 [ 1144.486126] [81053e20] start_secondary+0x140/0x150 [ 1144.502920] intel_pstate CPU 2 exiting ... Not sure if this comes from the rcu or pm/intel_pstate area. New intel_pstate commits in linux-next are between 7ab0256e57ae and a04759924e25 inclusive. Please check that range first. Not sure if I am willing to test with reverted patches. ( /me was updating Linux graphic driver stack today built with upcomming llvm-toolchain v3.6.0. ) If that doesn't point you to the offender, you can pull the linux-next branch of the linux-pm.git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next and see if that alone triggers the issue for you. If not, the offender is not there. Otherwise, and if you use the ACPI cpuidle driver, you can check the acpi-processor merge point too. I pulled in pm-next-20150204 on top of next-20150204, but that did not help. What I was asking about was to test linux-pm.git/linux-next *instead* *of* full linux-next and not on top of it. That would tell you whether or not the new trace was introduced by one of the PM commits or elsewhere. No, I did not test this. But this most likely is what Paul said anyway. Not sure what you mean by this statement. I tried -3 kernel with... f64b348810c2 Revert intel_pstate: Add support for SkyLake a0d825a39848 Revert intel_pstate: expose turbo range to sysfs 847153608ecf Revert intel_pstate: Add num_pstates to sysfs 412a6770cde4 Revert intel_pstate: respect cpufreq policy request e2a6685023ed Revert intel_pstate: honor user space min_perf_pct override on resume ...shows the trace when offlining cpu1 (w/o doing a s/r). - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: linux-next: Tree for Feb 4
On Thu, Feb 5, 2015 at 12:51 AM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. Hi, after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? ... [ 1144.482666] Disabling non-boot CPUs ... [ 1144.483000] intel_pstate CPU 1 exiting [ 1144.486064] [ 1144.486065] === [ 1144.486067] smpboot: CPU 1 didn't die... [ 1144.486067] [ INFO: suspicious RCU usage. ] [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted [ 1144.486070] --- [ 1144.486072] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 1144.486073] [ 1144.486073] other info that might help us debug this: [ 1144.486073] [ 1144.486074] [ 1144.486074] RCU used illegally from offline CPU! [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 [ 1144.486076] no locks held by swapper/1/0. [ 1144.486076] [ 1144.486076] stack backtrace: [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 1144.486085] 0001 88011a44fe18 817e370d 0011 [ 1144.486088] 88011a448290 88011a44fe48 810d6847 8800c66b9600 [ 1144.486091] 0001 88011a44c000 81cb3900 88011a44fe78 [ 1144.486092] Call Trace: [ 1144.486099] [817e370d] dump_stack+0x4c/0x65 [ 1144.486104] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 As near as I can tell, idle_task_exit() is running on an offline CPU, then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. And RCU is objecting to being used from a CPU that it is ignoring. One approach would be to push RCU's idea of when the CPU goes offline down into arch code in this case, using some Kconfig symbol and the usual conditional compilation. Another approach would be to invoke the trace calls under cpu_online(), for example, for the first such call in switch_mm(): if (cpu_online(smp_processor_id())) trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); The compiler would discard this if tracing was disabled. That looks like less intrusive to me. One possible concern is increased context-switch path length, but that would only be the case where tracing is enabled by default. Hmmm, which kernel-config trace options do you mean in particular? Other thoughts? Well, the whole issue here seems to be that common code using RCU is also useful in places where RCU doesn't want to be used. Arguably, we can deal with all of those cases in a whack-a-mole manner, but that doesn't seem to scale too well. Well, I did put a change into -next that makes these particular moles stick their heads up farther, so this is not a random event. And in this particular case, we do have the option of extending RCU's reach to cover this operation, at the expense of a bit more intrusion by RCU into arch-specific code. If tracing is enabled by default by major distros, that might be the right thing to do, unappealing though it might be. Can you point me to that change in rcu-next? But yes, it would have been far better for RCU to have been picky to begin with, so that these issues could have been addressed as the were added to the kernel. I guess one possible source of comfort is that once this is in place, future issues will make themselves immediately apparent. Not sure what I now can do to help to trigger this down. Here is 01:00 a.m. - bedtime :-). - Sedat
Re: linux-next: Tree for Feb 4
On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. Yupp, I forwarded my original posting before you answered me. Hi, after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? Did not check yet. ... [ 1144.482666] Disabling non-boot CPUs ... [ 1144.483000] intel_pstate CPU 1 exiting [ 1144.486064] [ 1144.486065] === [ 1144.486067] smpboot: CPU 1 didn't die... [ 1144.486067] [ INFO: suspicious RCU usage. ] [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted [ 1144.486070] --- [ 1144.486072] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 1144.486073] [ 1144.486073] other info that might help us debug this: [ 1144.486073] [ 1144.486074] [ 1144.486074] RCU used illegally from offline CPU! [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 [ 1144.486076] no locks held by swapper/1/0. [ 1144.486076] [ 1144.486076] stack backtrace: [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 1144.486085] 0001 88011a44fe18 817e370d 0011 [ 1144.486088] 88011a448290 88011a44fe48 810d6847 8800c66b9600 [ 1144.486091] 0001 88011a44c000 81cb3900 88011a44fe78 [ 1144.486092] Call Trace: [ 1144.486099] [817e370d] dump_stack+0x4c/0x65 [ 1144.486104] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 [ 1144.486109] [810b71a5] idle_task_exit+0x205/0x2c0 [ 1144.486113] [81054c4e] play_dead_common+0xe/0x50 [ 1144.486116] [81054ca5] native_play_dead+0x15/0x140 [ 1144.486121] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 1144.486123] [810cd89e] cpu_startup_entry+0x37e/0x580 [ 1144.486126] [81053e20] start_secondary+0x140/0x150 [ 1144.502920] intel_pstate CPU 2 exiting ... Not sure if this comes from the rcu or pm/intel_pstate area. New intel_pstate commits in linux-next are between 7ab0256e57ae and a04759924e25 inclusive. Please check that range first. Not sure if I am willing to test with reverted patches. ( /me was updating Linux graphic driver stack today built with upcomming llvm-toolchain v3.6.0. ) If that doesn't point you to the offender, you can pull the linux-next branch of the linux-pm.git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next and see if that alone triggers the issue for you. If not, the offender is not there. Otherwise, and if you use the ACPI cpuidle driver, you can check the acpi-processor merge point too. I pulled in pm-next-20150204 on top of next-20150204, but that did not help. - Sedat - Jiang Liu (8): ACPI: Fix a bug in parsing ACPI Memory24 resource ACPI: Normalize return value of resource parser functions ACPI: Set flag IORESOURCE_UNSET for unassigned resources ACPI: Enforce stricter checks for address space descriptors ACPI: Return translation offset when parsing ACPI address space resources ACPI: Translate resource into master side address for bridge window resources ACPI: Add field offset to struct resource_list_entry ACPI: Introduce helper function acpi_dev_filter_resource_type() Markus Elfring (1): cpufreq-dt: Drop unnecessary check before cpufreq_cooling_unregister() invocation Rafael J. Wysocki (14): ACPI / cpuidle: Drop unnecessary calls from acpi_idle_do_entry() ACPI / cpuidle: Drop unnecessary calls from -enter callback routines ACPI / cpuidle: Clean up fallback to C1 checks ACPI / cpuidle: Drop irrelevant comment from acpi_idle_enter_simple() ACPI / cpuidle: Clean up white space in a switch statement ACPI / cpuidle: Drop flags.bm_check tests from acpi_idle_enter_bm() ACPI / cpuidle: Merge acpi_idle_enter_c1() and
Re: linux-next: Tree for Feb 4
On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. Hi, after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? NO. After... root# echo 0 /sys/devices/system/cpu/cpu1/online ...I see this: +[ 707.936668] PM: Saving platform NVS memory +[ 707.936674] Disabling non-boot CPUs ... +[ 707.936712] intel_pstate CPU 2 exiting +[ 707.938024] smpboot: CPU 2 didn't die... +[ 707.949128] intel_pstate CPU 3 exiting +[ 707.950369] smpboot: CPU 3 didn't die... +[ 707.966248] ACPI: Low-level resume complete +[ 707.966302] PM: Restoring platform NVS memory Full dmesg attached. I hope this helps. - Sedat - ... [ 1144.482666] Disabling non-boot CPUs ... [ 1144.483000] intel_pstate CPU 1 exiting [ 1144.486064] [ 1144.486065] === [ 1144.486067] smpboot: CPU 1 didn't die... [ 1144.486067] [ INFO: suspicious RCU usage. ] [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted [ 1144.486070] --- [ 1144.486072] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 1144.486073] [ 1144.486073] other info that might help us debug this: [ 1144.486073] [ 1144.486074] [ 1144.486074] RCU used illegally from offline CPU! [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 [ 1144.486076] no locks held by swapper/1/0. [ 1144.486076] [ 1144.486076] stack backtrace: [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 1144.486085] 0001 88011a44fe18 817e370d 0011 [ 1144.486088] 88011a448290 88011a44fe48 810d6847 8800c66b9600 [ 1144.486091] 0001 88011a44c000 81cb3900 88011a44fe78 [ 1144.486092] Call Trace: [ 1144.486099] [817e370d] dump_stack+0x4c/0x65 [ 1144.486104] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 [ 1144.486109] [810b71a5] idle_task_exit+0x205/0x2c0 [ 1144.486113] [81054c4e] play_dead_common+0xe/0x50 [ 1144.486116] [81054ca5] native_play_dead+0x15/0x140 [ 1144.486121] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 1144.486123] [810cd89e] cpu_startup_entry+0x37e/0x580 [ 1144.486126] [81053e20] start_secondary+0x140/0x150 [ 1144.502920] intel_pstate CPU 2 exiting ... Not sure if this comes from the rcu or pm/intel_pstate area. New intel_pstate commits in linux-next are between 7ab0256e57ae and a04759924e25 inclusive. Please check that range first. If that doesn't point you to the offender, you can pull the linux-next branch of the linux-pm.git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next and see if that alone triggers the issue for you. If not, the offender is not there. Otherwise, and if you use the ACPI cpuidle driver, you can check the acpi-processor merge point too. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.19.0-rc7-next-20150204.2-iniza-small (sedat.di...@gmail.com@fambox) (gcc version 4.9.2 (Ubuntu 4.9.2-0ubuntu1~12.04) ) #1 SMP Wed Feb 4 23:25:30 CET 2015 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.19.0-rc7-next-20150204.2-iniza-small root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] Disabled fast string operations [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable [0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved [0.00]
Re: linux-next: Tree for Feb 4
On Wednesday, February 04, 2015 11:38:40 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. Yupp, I forwarded my original posting before you answered me. Hi, after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? Did not check yet. ... [ 1144.482666] Disabling non-boot CPUs ... [ 1144.483000] intel_pstate CPU 1 exiting [ 1144.486064] [ 1144.486065] === [ 1144.486067] smpboot: CPU 1 didn't die... [ 1144.486067] [ INFO: suspicious RCU usage. ] [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted [ 1144.486070] --- [ 1144.486072] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 1144.486073] [ 1144.486073] other info that might help us debug this: [ 1144.486073] [ 1144.486074] [ 1144.486074] RCU used illegally from offline CPU! [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 [ 1144.486076] no locks held by swapper/1/0. [ 1144.486076] [ 1144.486076] stack backtrace: [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 1144.486085] 0001 88011a44fe18 817e370d 0011 [ 1144.486088] 88011a448290 88011a44fe48 810d6847 8800c66b9600 [ 1144.486091] 0001 88011a44c000 81cb3900 88011a44fe78 [ 1144.486092] Call Trace: [ 1144.486099] [817e370d] dump_stack+0x4c/0x65 [ 1144.486104] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 [ 1144.486109] [810b71a5] idle_task_exit+0x205/0x2c0 [ 1144.486113] [81054c4e] play_dead_common+0xe/0x50 [ 1144.486116] [81054ca5] native_play_dead+0x15/0x140 [ 1144.486121] [8102963f] arch_cpu_idle_dead+0xf/0x20 [ 1144.486123] [810cd89e] cpu_startup_entry+0x37e/0x580 [ 1144.486126] [81053e20] start_secondary+0x140/0x150 [ 1144.502920] intel_pstate CPU 2 exiting ... Not sure if this comes from the rcu or pm/intel_pstate area. New intel_pstate commits in linux-next are between 7ab0256e57ae and a04759924e25 inclusive. Please check that range first. Not sure if I am willing to test with reverted patches. ( /me was updating Linux graphic driver stack today built with upcomming llvm-toolchain v3.6.0. ) If that doesn't point you to the offender, you can pull the linux-next branch of the linux-pm.git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next and see if that alone triggers the issue for you. If not, the offender is not there. Otherwise, and if you use the ACPI cpuidle driver, you can check the acpi-processor merge point too. I pulled in pm-next-20150204 on top of next-20150204, but that did not help. What I was asking about was to test linux-pm.git/linux-next *instead* *of* full linux-next and not on top of it. That would tell you whether or not the new trace was introduced by one of the PM commits or elsewhere. But this most likely is what Paul said anyway. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Feb 4
On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote: On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote: On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote: On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote: On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, The next release I will be making will be next-20150209 - which will probably be after the v3.19 release. Changes since 20150203: The sound-asoc tree gained a conflict against the sound tree. The scsi tree gained a build failure caused by an interaction with the driver-core tree. I applied a merge fix patch. The akpm-current tree gained a build failure for which I disabled CONFIG_KASAN. Non-merge commits (relative to Linus' tree): 7461 7314 files changed, 309736 insertions(+), 172363 deletions(-) [ CC linux-rcu | linux-pm | intel_pstate maintainers ] Dirk is not the maintainer of intel_pstate any more, CC: Kristen. Hi, after suspend-and-resume I see the following call-trace: Do you see that after CPU1 offline too? ... [ 1144.482666] Disabling non-boot CPUs ... [ 1144.483000] intel_pstate CPU 1 exiting [ 1144.486064] [ 1144.486065] === [ 1144.486067] smpboot: CPU 1 didn't die... [ 1144.486067] [ INFO: suspicious RCU usage. ] [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted [ 1144.486070] --- [ 1144.486072] include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! [ 1144.486073] [ 1144.486073] other info that might help us debug this: [ 1144.486073] [ 1144.486074] [ 1144.486074] RCU used illegally from offline CPU! [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0 [ 1144.486076] no locks held by swapper/1/0. [ 1144.486076] [ 1144.486076] stack backtrace: [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 1144.486085] 0001 88011a44fe18 817e370d 0011 [ 1144.486088] 88011a448290 88011a44fe48 810d6847 8800c66b9600 [ 1144.486091] 0001 88011a44c000 81cb3900 88011a44fe78 [ 1144.486092] Call Trace: [ 1144.486099] [817e370d] dump_stack+0x4c/0x65 [ 1144.486104] [810d6847] lockdep_rcu_suspicious+0xe7/0x120 As near as I can tell, idle_task_exit() is running on an offline CPU, then calling switch_mm() which contains trace_tlb_flush(), which uses RCU. And RCU is objecting to being used from a CPU that it is ignoring. One approach would be to push RCU's idea of when the CPU goes offline down into arch code in this case, using some Kconfig symbol and the usual conditional compilation. Another approach would be to invoke the trace calls under cpu_online(), for example, for the first such call in switch_mm(): if (cpu_online(smp_processor_id())) trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); The compiler would discard this if tracing was disabled. That looks like less intrusive to me. One possible concern is increased context-switch path length, but that would only be the case where tracing is enabled by default. Other thoughts? Well, the whole issue here seems to be that common code using RCU is also useful in places where RCU doesn't want to be used. Arguably, we can deal with all of those cases in a whack-a-mole manner, but that doesn't seem to scale too well. Well, I did put a change into -next that makes these particular moles stick their heads up farther, so this is not a random event. And in this particular case, we do have the option of extending RCU's reach to cover this operation, at the expense of a bit more intrusion by RCU into arch-specific code. If tracing is enabled by default by major distros, that might be the right thing to do, unappealing though it might be. But yes, it would have been far better for RCU to have been picky to begin with, so that these issues could have been addressed as the were added to the kernel. I guess one possible source of comfort is that once this is in place, future issues will make themselves immediately apparent. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at