[tip: x86/cleanups] x86/Kconfig: Remove HPET_EMULATE_RTC depends on RTC
The following commit has been merged into the x86/cleanups branch of tip: Commit-ID: 3228e1dc80983ee1f5d2e533d010b3bd8b50f0e2 Gitweb: https://git.kernel.org/tip/3228e1dc80983ee1f5d2e533d010b3bd8b50f0e2 Author:Anand K Mistry AuthorDate:Thu, 04 Feb 2021 18:32:32 +11:00 Committer: Thomas Gleixner CommitterDate: Fri, 05 Feb 2021 19:56:35 +01:00 x86/Kconfig: Remove HPET_EMULATE_RTC depends on RTC The RTC config option was removed in commit f52ef24be21a ("rtc/alpha: remove legacy rtc driver") Signed-off-by: Anand K Mistry Signed-off-by: Thomas Gleixner Reviewed-by: Randy Dunlap Link: https://lore.kernel.org/r/20210204183205.1.If5c6ded53a00ecad6a02a1e974316291cc0239d1@changeid --- arch/x86/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7b6dd10..865c1e7 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -889,7 +889,7 @@ config HPET_TIMER config HPET_EMULATE_RTC def_bool y - depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || RTC_DRV_CMOS=y) + depends on HPET_TIMER && (RTC_DRV_CMOS=m || RTC_DRV_CMOS=y) config APB_TIMER def_bool y if X86_INTEL_MID
Re: [PATCH] x86: Add a prompt for HPET_EMULATE_RTC
> > > > Hi, > > > > On plain vanilla 5.11-rc6, when I take this (partial) .config file > > which contains: > > > > CONFIG_HPET=y > > # CONFIG_HPET_EMULATE_RTC is not set > > # CONFIG_HPET_MMAP is not set > > > > and run > > $ make olddefconfig > > > > my new .config file contains > > > > CONFIG_HPET_TIMER=y > > CONFIG_HPET_EMULATE_RTC=y > > CONFIG_HPET=y > > # CONFIG_HPET_MMAP is not set > > > > > > Isn't that what you are expecting? > > or are you trying to keep the opposite? > > > > # CONFIG_HPET_EMULATE_RTC is not set > > > > Hm, I bet that I misunderstood you and you want to keep > > it disabled. Is that right? Yes, I want to keep it disabled. My understanding (from the behaviour I've seen with other settings, as well as reading the sources) is that "is not set" is treated the same as "n". In fact, if you change the old config to CONFIG_HPET_EMULATE_RTC=n, you get the same before/after behaviour of this patch. Before, olddefconfig will ignore the old setting and change it to "y". With this patch, olddefconfig keeps the old setting, but re-writes it to "is not set". This is even more surprising because the user is explicit in what the old config setting is, but olddefconfig still ignores it. > > If that's the case, then I agree that your original patch to > make HPET_EMULATE_RTC user-visible is needed. > > Sorry to be so slow about understanding your goal (if I do > understand it now). > > -- > ~Randy > -- Anand K. Mistry Software Engineer Google Australia
[PATCH] x86: Remove HPET_EMULATE_RTC depends on RTC
The RTC config option was removed in commit f52ef24be21a ("rtc/alpha: remove legacy rtc driver") Signed-off-by: Anand K Mistry --- arch/x86/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 21f851179ff0..37775478d86f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -890,7 +890,7 @@ config HPET_TIMER config HPET_EMULATE_RTC def_bool y - depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || RTC_DRV_CMOS=y) + depends on HPET_TIMER && (RTC_DRV_CMOS=m || RTC_DRV_CMOS=y) config APB_TIMER def_bool y if X86_INTEL_MID -- 2.30.0.365.g02bc693789-goog
Re: [PATCH] x86: Add a prompt for HPET_EMULATE_RTC
On Thu, 4 Feb 2021 at 17:30, Randy Dunlap wrote: > > On 2/3/21 10:13 PM, Anand K. Mistry wrote: > >> Hi, > >> > >> When you run "make olddefconfig", should this "depends on" > >> line evaluate to true or false? > > > > True. RTC_DRV_CMOS=y by default on x86 systems and HPET_TIMER also > > appears to default yes (on x86-64 if I'm reading this right). > > > > Oddly, the RTC config option doesn't appear to exist. Probably a separate > > issue. > > Yes, just a separate simple patch. > > >> I.e., what are the settings of these symbols in the old .config file? > >> > >> > >> depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || > >> RTC_DRV_CMOS=y) > > > > Actually, none of these options are set in the old config. > > RTC_DRV_CMOS and HPET_TIMER appear to default to yes. > > > > % grep HPET .config > > CONFIG_HPET=y > > # CONFIG_HPET_EMULATE_RTC is not set > > # CONFIG_HPET_MMAP is not set > > % grep RTC .config > > CONFIG_RTC_CLASS=y > > # CONFIG_HPET_EMULATE_RTC is not set > > CONFIG_PM_TRACE_RTC=y > > # CONFIG_RTC_HCTOSYS is not set > > > >> > >> If not, is there some out-of-tree driver involved? > > > > No out of tree drivers involved. I tried this on a vanilla 5.11-rc6. > > > >> I'm having a little trouble seeing why this is needed. > > > > So am I. But this is the magic that lets me keep > > CONFIG_HPET_EMULATE_RTC from the old config. I did manage to trace > > where the option is being overridden in the conf tool, but the > > reasoning why is beyond my knowledge. > > Can you post the .config that you are feeding to 'make olddefconfig', please. > I'll take a look (or the x86 guys can do so). Sure. Here it is: # # Config options generated by splitconfig # CONFIG_ANDROID=y CONFIG_ANDROID_BINDER_IPC=y CONFIG_ASHMEM=y CONFIG_AUDIT=y CONFIG_BFQ_GROUP_IOSCHED=y CONFIG_BINFMT_MISC=y CONFIG_BLK_CGROUP=y # CONFIG_BLK_DEBUG_FS is not set CONFIG_BLK_DEV_DM=y CONFIG_BLK_DEV_INITRD=y CONFIG_BLK_DEV_IO_TRACE=y CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_SD=y CONFIG_BLK_DEV_SR=m CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y CONFIG_BPF_JIT=y CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_BPF_SYSCALL=y CONFIG_BRIDGE=m CONFIG_BT=m CONFIG_BT_FEATURE_DEBUG=y CONFIG_BT_FEATURE_DEBUG_FUNC_NAMES=y CONFIG_BT_HCIBFUSB=m CONFIG_BT_HCIBTUSB=m CONFIG_BT_HCIBTUSB_AUTOSUSPEND=y CONFIG_BT_HCIBTUSB_INTERVAL=y CONFIG_BT_HCIVHCI=m CONFIG_BT_HIDP=m CONFIG_BT_RFCOMM=m CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_CFG80211=m CONFIG_CFG80211_CERTIFICATION_ONUS=y CONFIG_CFG80211_DEBUGFS=y # CONFIG_CFG80211_REQUIRE_SIGNED_REGDB is not set CONFIG_CFG80211_WEXT=y CONFIG_CFS_BANDWIDTH=y CONFIG_CGROUPS=y CONFIG_CGROUP_BPF=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_FREEZER=y CONFIG_CGROUP_SCHED=y CONFIG_CHROME_PLATFORMS=y CONFIG_CLS_U32_MARK=y # CONFIG_COMPAT_BRK is not set CONFIG_CONFIGFS_FS=y CONFIG_CONNECTOR=y CONFIG_CPUSETS=y CONFIG_CPU_FREQ_GOV_USERSPACE=y CONFIG_CPU_FREQ_STAT=y CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_TEO=y CONFIG_CRC7=m CONFIG_CROS_EC=y CONFIG_CROS_EC_PD_UPDATE=y CONFIG_CROS_EC_SENSORHUB=m CONFIG_CROS_EC_SPI=y CONFIG_CRYPTO_ARC4=y CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_USER_API_HASH=m CONFIG_CRYPTO_USER_API_SKCIPHER=m CONFIG_DAX=y CONFIG_DEBUG_ATOMIC_SLEEP=y CONFIG_DEBUG_CREDENTIALS=y CONFIG_DEBUG_DEVRES=y CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO_DWARF4=y # CONFIG_DEBUG_PREEMPT is not set CONFIG_DEBUG_SPINLOCK=y CONFIG_DEFAULT_HOSTNAME="localhost" CONFIG_DEFAULT_MMAP_MIN_ADDR=32768 CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_DM_CRYPT=y CONFIG_DM_FLAKEY=m CONFIG_DM_INIT=y CONFIG_DM_INTEGRITY=m CONFIG_DM_THIN_PROVISIONING=y CONFIG_DM_VERITY=y CONFIG_DM_VERITY_CHROMEOS=y # CONFIG_DNOTIFY is not set CONFIG_DRM=y CONFIG_DRM_DP_AUX_CHARDEV=y CONFIG_DRM_EVDI=m # CONFIG_DRM_FBDEV_EMULATION is not set CONFIG_DRM_UDL=y CONFIG_DRM_VGEM=y CONFIG_ECRYPT_FS=y CONFIG_EMBEDDED=y CONFIG_ENCRYPTED_KEYS=y CONFIG_ERROR_ON_WARNING=y CONFIG_ESD_FS=y CONFIG_EXT4_FS=y CONFIG_EXT4_FS_POSIX_ACL=y CONFIG_EXT4_FS_SECURITY=y CONFIG_FANOTIFY=y CONFIG_FB=y CONFIG_FB_MODE_HELPERS=y CONFIG_FORTIFY_SOURCE=y CONFIG_FRAMEBUFFER_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y CONFIG_FS_ENCRYPTION=y CONFIG_FS_VERITY=y CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y CONFIG_FUNCTION_TRACER=y CONFIG_FUSE_FS=m CONFIG_GOOGLE_COREBOOT_TABLE=y CONFIG_GOOGLE_FIRMWARE=y CONFIG_GOOGLE_MEMCONSOLE_COREBOOT=y CONFIG_GOOGLE_VPD=y CONFIG_HARDENED_USERCOPY=y CONFIG_HFSPLUS_FS=m CONFIG_HIDRAW=y CONFIG_HID_APPLE=m CONFIG_HID_BATTERY_STRENGTH=y CONFIG_HID_CHERRY=m CONFIG_HID_CHICONY=y CONFIG_HID_HOLTEK=y CONFIG_HID_KENSINGTON=y CONFIG_HID_LOGITECH=m CONFIG_HID_LOGITECH
Re: [PATCH] x86: Add a prompt for HPET_EMULATE_RTC
> Hi, > > When you run "make olddefconfig", should this "depends on" > line evaluate to true or false? True. RTC_DRV_CMOS=y by default on x86 systems and HPET_TIMER also appears to default yes (on x86-64 if I'm reading this right). Oddly, the RTC config option doesn't appear to exist. Probably a separate issue. > I.e., what are the settings of these symbols in the old .config file? > > > depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || > RTC_DRV_CMOS=y) Actually, none of these options are set in the old config. RTC_DRV_CMOS and HPET_TIMER appear to default to yes. % grep HPET .config CONFIG_HPET=y # CONFIG_HPET_EMULATE_RTC is not set # CONFIG_HPET_MMAP is not set % grep RTC .config CONFIG_RTC_CLASS=y # CONFIG_HPET_EMULATE_RTC is not set CONFIG_PM_TRACE_RTC=y # CONFIG_RTC_HCTOSYS is not set > > If not, is there some out-of-tree driver involved? No out of tree drivers involved. I tried this on a vanilla 5.11-rc6. > I'm having a little trouble seeing why this is needed. So am I. But this is the magic that lets me keep CONFIG_HPET_EMULATE_RTC from the old config. I did manage to trace where the option is being overridden in the conf tool, but the reasoning why is beyond my knowledge. -- Anand K. Mistry Software Engineer Google Australia
[PATCH] x86: Add a prompt for HPET_EMULATE_RTC
This does two things: 1. Makes the option visible in menuconfig, allowing the user to easily disable this option 2. Allows olddefconfig to respoct the option if it is set in the old .config file It's not clear exactly why the second consequence is true, but it appears to be because when the conf tool reads the config file, it only respects the existing setting if the option is "visible" (see scripts/kconfig/symbol.c:381). Signed-off-by: Anand K Mistry --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 21f851179ff0..28f814493c7b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -890,6 +890,7 @@ config HPET_TIMER config HPET_EMULATE_RTC def_bool y + prompt "HPET RTC emulation" depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || RTC_DRV_CMOS=y) config APB_TIMER -- 2.30.0.365.g02bc693789-goog
[PATCH] ath10k: Fix lockdep assertion warning in ath10k_sta_statistics
ath10k_debug_fw_stats_request just be called with conf_mutex held, otherwise the following warning is seen when lock debugging is enabled: WARNING: CPU: 0 PID: 793 at drivers/net/wireless/ath/ath10k/debug.c:357 ath10k_debug_fw_stats_request+0x12c/0x133 [ath10k_core] Modules linked in: snd_hda_codec_hdmi designware_i2s snd_hda_intel snd_intel_dspcfg snd_hda_codec i2c_piix4 snd_hwdep snd_hda_core acpi_als kfifo_buf industrialio snd_soc_max98357a snd_soc_adau7002 snd_soc_acp_da7219mx98357_mach snd_soc_da7219 acp_audio_dma ccm xt_MASQUERADE fuse ath10k_pci ath10k_core lzo_rle ath lzo_compress mac80211 zram cfg80211 r8152 mii joydev CPU: 0 PID: 793 Comm: wpa_supplicant Tainted: GW 5.10.9 #5 Hardware name: HP Grunt/Grunt, BIOS Google_Grunt.11031.104.0 09/05/2019 RIP: 0010:ath10k_debug_fw_stats_request+0x12c/0x133 [ath10k_core] Code: 1e bb a1 ff ff ff 4c 89 ef 48 c7 c6 d3 31 2e c0 89 da 31 c0 e8 bd f8 ff ff 89 d8 eb 02 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b e9 04 ff ff ff 0f 1f 44 00 00 55 48 89 e5 41 56 53 48 89 fb RSP: 0018:b2478099f7d0 EFLAGS: 00010246 RAX: RBX: 9e432700cce0 RCX: 11c85cfd6b8e3b00 RDX: 9e432700cce0 RSI: 9e43127c5668 RDI: 9e4318deddf0 RBP: b2478099f7f8 R08: 0002 R09: 0003fd7068cc R10: c01b2749 R11: c029efaf R12: 9e432700c000 R13: 9e43127c33e0 R14: b2478099f918 R15: 9e43127c33e0 FS: 7f7ea48e2740() GS:9e432aa0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 59aa799ddf38 CR3: 000118de2000 CR4: 001506f0 Call Trace: ath10k_sta_statistics+0x4d/0x270 [ath10k_core] sta_set_sinfo+0x1be/0xaec [mac80211] ieee80211_get_station+0x58/0x76 [mac80211] rdev_get_station+0xf1/0x11e [cfg80211] nl80211_get_station+0x7f/0x146 [cfg80211] genl_rcv_msg+0x32e/0x35e ? nl80211_stop_ap+0x19/0x19 [cfg80211] ? nl80211_get_station+0x146/0x146 [cfg80211] ? genl_rcv+0x19/0x36 ? genl_rcv+0x36/0x36 netlink_rcv_skb+0x89/0xfb genl_rcv+0x28/0x36 netlink_unicast+0x169/0x23b netlink_sendmsg+0x38a/0x402 sock_sendmsg+0x72/0x76 sys_sendmsg+0x153/0x1cc ? copy_msghdr_from_user+0x5d/0x85 ___sys_sendmsg+0x7c/0xb5 ? lock_acquire+0x181/0x23d ? syscall_trace_enter+0x15e/0x160 ? find_held_lock+0x3d/0xb2 ? syscall_trace_enter+0x15e/0x160 ? sched_clock_cpu+0x15/0xc6 __sys_sendmsg+0x62/0x9a do_syscall_64+0x43/0x55 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 4913e675630e ("ath10k: enable rx duration report default for wmi tlv") Signed-off-by: Anand K Mistry --- drivers/net/wireless/ath/ath10k/mac.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 7d98250380ec..e815aab412d7 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -9117,7 +9117,9 @@ static void ath10k_sta_statistics(struct ieee80211_hw *hw, if (!ath10k_peer_stats_enabled(ar)) return; + mutex_lock(>conf_mutex); ath10k_debug_fw_stats_request(ar); + mutex_unlock(>conf_mutex); sinfo->rx_duration = arsta->rx_duration; sinfo->filled |= BIT_ULL(NL80211_STA_INFO_RX_DURATION); -- 2.30.0.365.g02bc693789-goog
[PATCH] ath10k: Fix suspicious RCU usage warning in ath10k_wmi_tlv_parse_peer_stats_info()
The ieee80211_find_sta_by_ifaddr call in ath10k_wmi_tlv_parse_peer_stats_info must be called while holding the RCU read lock. Otherwise, the following warning will be seen when RCU usage checking is enabled: = WARNING: suspicious RCU usage 5.10.3 #8 Tainted: GW - include/linux/rhashtable.h:594 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by ksoftirqd/1/16. stack backtrace: CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: GW 5.10.3 #8 Hardware name: HP Grunt/Grunt, BIOS Google_Grunt.11031.104.0 09/05/2019 Call Trace: dump_stack+0xab/0x115 sta_info_hash_lookup+0x71/0x1e9 [mac80211] ? lock_is_held_type+0xe6/0x12f ? __kasan_kmalloc+0xfb/0x112 ieee80211_find_sta_by_ifaddr+0x12/0x61 [mac80211] ath10k_wmi_tlv_parse_peer_stats_info+0xbd/0x10b [ath10k_core] ath10k_wmi_tlv_iter+0x8b/0x1a1 [ath10k_core] ? ath10k_wmi_tlv_iter+0x1a1/0x1a1 [ath10k_core] ath10k_wmi_tlv_event_peer_stats_info+0x103/0x13b [ath10k_core] ath10k_wmi_tlv_op_rx+0x722/0x80d [ath10k_core] ath10k_htc_rx_completion_handler+0x16e/0x1d7 [ath10k_core] ath10k_pci_process_rx_cb+0x116/0x22c [ath10k_pci] ? ath10k_htc_process_trailer+0x332/0x332 [ath10k_core] ? _raw_spin_unlock_irqrestore+0x34/0x61 ? lockdep_hardirqs_on+0x8e/0x12e ath10k_ce_per_engine_service+0x55/0x74 [ath10k_core] ath10k_ce_per_engine_service_any+0x76/0x84 [ath10k_core] ath10k_pci_napi_poll+0x49/0x141 [ath10k_pci] net_rx_action+0x11a/0x347 __do_softirq+0x2d3/0x539 run_ksoftirqd+0x4b/0x86 smpboot_thread_fn+0x1d0/0x2ab ? cpu_report_death+0x7f/0x7f kthread+0x189/0x191 ? cpu_report_death+0x7f/0x7f ? kthread_blkcg+0x31/0x31 ret_from_fork+0x22/0x30 Fixes: 0f7cb26830a6e ("ath10k: add rx bitrate report for SDIO") Signed-off-by: Anand K Mistry --- drivers/net/wireless/ath/ath10k/wmi-tlv.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c b/drivers/net/wireless/ath/ath10k/wmi-tlv.c index 7b5834157fe5..e6135795719a 100644 --- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c +++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c @@ -240,8 +240,10 @@ static int ath10k_wmi_tlv_parse_peer_stats_info(struct ath10k *ar, u16 tag, u16 __le32_to_cpu(stat->last_tx_rate_code), __le32_to_cpu(stat->last_tx_bitrate_kbps)); + rcu_read_lock(); sta = ieee80211_find_sta_by_ifaddr(ar->hw, stat->peer_macaddr.addr, NULL); if (!sta) { + rcu_read_unlock(); ath10k_warn(ar, "not found station for peer stats\n"); return -EINVAL; } @@ -251,6 +253,7 @@ static int ath10k_wmi_tlv_parse_peer_stats_info(struct ath10k *ar, u16 tag, u16 arsta->rx_bitrate_kbps = __le32_to_cpu(stat->last_rx_bitrate_kbps); arsta->tx_rate_code = __le32_to_cpu(stat->last_tx_rate_code); arsta->tx_bitrate_kbps = __le32_to_cpu(stat->last_tx_bitrate_kbps); + rcu_read_unlock(); return 0; } -- 2.30.0.365.g02bc693789-goog
Re: [RFC PATCH] x86/speculation: Add finer control for when to issue IBPB
> > This proposal attempts to reduce that cost by letting the system > > developer choose whether to issue the IBPB on entry or exit of an IB > > speculation disabled process (default is both, which is current > > behaviour). Documentation/admin-guide/hw-vuln/spectre.rst documents two > > mitigation strategies that use conditional IBPB; > > "Protect sensitive programs", and "Sandbox untrusted programs". > > Why make the setting system-wide? Shouldn't this decision be made on a > per-task basis, depending on whether the task is sensitive or untrusted? It definitely could be. I didn't give it as much thought since for me, the entire system uses a "sandbox" approach, so the behaviour would apply to any IB spec disabled process. And conversely, any system taking the "sensitive programs" approach would also expect the same behaviour from all processes. I'm open to making it per-process. It's just that making it system-wide seemed to "fit" with the documented mitigation strategies, and it's what I would use in production.
Re: [RFC PATCH] x86/speculation: Add finer control for when to issue IBPB
> > > > Signed-off-by: Anand K Mistry > > Signed-off-by: Anand K Mistry > > Two SoBs by you, why? Tooling issues probably. Not intentional. > > > --- > > Background: > > IBPB is slow on some CPUs. > > > > More detailed background: > > On some CPUs, issuing an IBPB can cause the address space switch to be > > 10x more expensive (yes, 10x, not 10%). > > Which CPUs are those?! AMD A4-9120C. Probably the A6-9220C too, but I don't have one of those machines to test with, > > > On a system that makes heavy use of processes, this can cause a very > > significant performance hit. > > You're not really trying to convince reviewers for why you need to add > more complexity to an already too complex and confusing code. "some > CPUs" and "can cause" is not good enough. On a simple ping-ping test between two processes (using a pair of pipes), a process switch is ~7us with IBPB disabled. But with it enabled, it increases to around 80us (tested with the powersave CPU governor). On Chrome's IPC system, a perftest running 50,000 ping-pong messages: without IBPB5579.49 ms with IBPB21396 ms (~4x difference) And, doing video playback in the browser (which is already very optimised), the IBPB hit turns out to be ~2.5% of CPU cycles. Doing a webrtc video call (tested using http://appr.tc), it's ~9% of CPU cycles. I don't have exact numbers, but it's worse on some real VC apps. > > > I understand this is likely to be very contentious. Obviously, this > > isn't ready for code review, but I'm hoping to get some thoughts on the > > problem and this approach. > > Yes, in the absence of hard performance data, I'm not convinced at all. With this change, I can get a >80% reduction in CPU cycles consumed by IBPB. A video call on my test device goes from ~9% to ~0.80% cycles used by IBPB. It doesn't sound like much, but it's a significant difference on these devices.
[RFC PATCH] x86/speculation: Add finer control for when to issue IBPB
When IB speculation is conditionally disabled for a process (via prctl() or seccomp), IBPB is issued whenever that process is switched to/from. However, this results more IBPBs than necessary. The goal is to protect a victim process from an attacker poisoning the BTB by issuing IBPB in the attacker->victim switch. However, the current logic will also issue IBPB in the victim->attacker switch, because there's no notion of whether the attacker or victim has IB speculation disabled. Instead of always issuing IBPB when either the previous or next process has IB speculation disabled, add a boot flag to explicitly choose to issue IBPB when the IB spec disabled process is entered or left. Signed-off-by: Anand K Mistry Signed-off-by: Anand K Mistry --- Background: IBPB is slow on some CPUs. More detailed background: On some CPUs, issuing an IBPB can cause the address space switch to be 10x more expensive (yes, 10x, not 10%). On a system that makes heavy use of processes, this can cause a very significant performance hit. Although we can choose which processes will pay the IBPB cost by using prctl(), the performance hit is often still too high because IBPB is being issued more often than necessary. This proposal attempts to reduce that cost by letting the system developer choose whether to issue the IBPB on entry or exit of an IB speculation disabled process (default is both, which is current behaviour). Documentation/admin-guide/hw-vuln/spectre.rst documents two mitigation strategies that use conditional IBPB; "Protect sensitive programs", and "Sandbox untrusted programs". In the first case of protecting sensitive programs, the victim process has IB spec disabled. So the attacker->victim switch is an _entry_ of an IB spec disabled process. Conversly, the second case of sandboxing and untrusted process, the attacker has IB spec disabled and so we want to issue of IBPB on _exit_ of the IB spec disabled process. I understand this is likely to be very contentious. Obviously, this isn't ready for code review, but I'm hoping to get some thoughts on the problem and this approach. arch/x86/include/asm/nospec-branch.h | 3 ++ arch/x86/kernel/cpu/bugs.c | 42 arch/x86/mm/tlb.c| 11 ++-- 3 files changed, 53 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index cb9ad6b73973..b153af75 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -250,6 +250,9 @@ DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp); DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb); +DECLARE_STATIC_KEY_TRUE(switch_mm_cond_ibpb_enter); +DECLARE_STATIC_KEY_TRUE(switch_mm_cond_ibpb_leave); + DECLARE_STATIC_KEY_FALSE(mds_user_clear); DECLARE_STATIC_KEY_FALSE(mds_idle_clear); diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index d41b70fe4918..a87200db7786 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -69,6 +69,9 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); /* Control unconditional IBPB in switch_mm() */ DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb); +DEFINE_STATIC_KEY_TRUE(switch_mm_cond_ibpb_enter); +DEFINE_STATIC_KEY_TRUE(switch_mm_cond_ibpb_leave); + /* Control MDS CPU buffer clear before returning to user space */ DEFINE_STATIC_KEY_FALSE(mds_user_clear); EXPORT_SYMBOL_GPL(mds_user_clear); @@ -640,6 +643,12 @@ enum spectre_v2_user_cmd { SPECTRE_V2_USER_CMD_SECCOMP_IBPB, }; +enum spectre_v2_user_ibpb_mode { + SPECTRE_V2_USER_IBPB_BOTH, + SPECTRE_V2_USER_IBPB_ENTER, + SPECTRE_V2_USER_IBPB_LEAVE, +}; + static const char * const spectre_v2_user_strings[] = { [SPECTRE_V2_USER_NONE] = "User space: Vulnerable", [SPECTRE_V2_USER_STRICT]= "User space: Mitigation: STIBP protection", @@ -700,12 +709,31 @@ spectre_v2_parse_user_cmdline(enum spectre_v2_mitigation_cmd v2_cmd) return SPECTRE_V2_USER_CMD_AUTO; } +static enum spectre_v2_user_ibpb_mode __init +spectre_v2_parse_user_ibpb_mode(void) +{ + char arg[8]; + int ret; + + ret = cmdline_find_option(boot_command_line, + "spectre_v2_user_ibpb_mode", + arg, sizeof(arg)); + + if (ret == 5 && !strncmp(arg, "enter", 5)) + return SPECTRE_V2_USER_IBPB_ENTER; + if (ret == 5 && !strncmp(arg, "leave", 5)) + return SPECTRE_V2_USER_IBPB_LEAVE; + + return SPECTRE_V2_USER_IBPB_BOTH; +} + static void __init spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd) { enum spectre_v2_user_mitigation mode = SPECTRE_V2_USER_NONE; bool smt_possible = IS_ENABLED(CONFIG_SMP); enum spectre
[tip: x86/urgent] x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb
The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 33fc379df76b4991e5ae312f07bcd6820811971e Gitweb: https://git.kernel.org/tip/33fc379df76b4991e5ae312f07bcd6820811971e Author:Anand K Mistry AuthorDate:Tue, 10 Nov 2020 12:33:53 +11:00 Committer: Borislav Petkov CommitterDate: Wed, 25 Nov 2020 20:17:09 +01:00 x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb When spectre_v2_user={seccomp,prctl},ibpb is specified on the command line, IBPB is force-enabled and STIPB is conditionally-enabled (or not available). However, since 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP} instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour. Because the issuing of IBPB relies on the switch_mm_*_ibpb static branches, the mitigations behave as expected. Since 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP") this discrepency caused the misreporting of IB speculation via prctl(). On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb, prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL | PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB speculation mode, even though the flag is ignored. Similarly, for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not available. [ bp: Massage commit message. ] Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Fixes: 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP") Signed-off-by: Anand K Mistry Signed-off-by: Borislav Petkov Cc: Link: https://lkml.kernel.org/r/20201110123349.1.Id0cbf996d2151f4c143c90f9028651a5b49a5908@changeid --- arch/x86/kernel/cpu/bugs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 581fb72..d41b70f 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -739,11 +739,13 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd) if (boot_cpu_has(X86_FEATURE_IBPB)) { setup_force_cpu_cap(X86_FEATURE_USE_IBPB); + spectre_v2_user_ibpb = mode; switch (cmd) { case SPECTRE_V2_USER_CMD_FORCE: case SPECTRE_V2_USER_CMD_PRCTL_IBPB: case SPECTRE_V2_USER_CMD_SECCOMP_IBPB: static_branch_enable(_mm_always_ibpb); + spectre_v2_user_ibpb = SPECTRE_V2_USER_STRICT; break; case SPECTRE_V2_USER_CMD_PRCTL: case SPECTRE_V2_USER_CMD_AUTO: @@ -757,8 +759,6 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd) pr_info("mitigation: Enabling %s Indirect Branch Prediction Barrier\n", static_key_enabled(_mm_always_ibpb) ? "always-on" : "conditional"); - - spectre_v2_user_ibpb = mode; } /*
[PATCH] x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb
When spectre_v2_user={seccomp,prctl},ibpb, IBPB is force-enabled and STIPB is conditionally-enabled (or not available). However, since commit 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP} instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour. Because the issuing of IBPB relies on the switch_mm_*_ibpb static branches, the mitigations behave as expected. Since commit 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP") this discrepency caused the misreporting of IB speculation via prctl(). On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb, prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL | PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB speculation mode, even though the flag is ignored. Similarly for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not available. Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Fixes: 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP") Signed-off-by: Anand K Mistry --- This is a follow up to my last patch (https://lore.kernel.org/lkml/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid/). Unlike that one, this one does not change the behaviour or allow enabling/disabling of IB speculation. Rather, it fixes the way the prctl() responds. The last patch exposed an existing oversight in way migitations were being tracked. Basically, the switch_mm_*_ibpb static branches and the spectre_v2_user_ibpb variable were not consistent. This patch tried to line those ducks in a row, as well as you can wrangle ducks. Unfortunately, duck-wrangling is not one of my areas of expertise. arch/x86/kernel/cpu/bugs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 581fb7223ad0..d41b70fe4918 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -739,11 +739,13 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd) if (boot_cpu_has(X86_FEATURE_IBPB)) { setup_force_cpu_cap(X86_FEATURE_USE_IBPB); + spectre_v2_user_ibpb = mode; switch (cmd) { case SPECTRE_V2_USER_CMD_FORCE: case SPECTRE_V2_USER_CMD_PRCTL_IBPB: case SPECTRE_V2_USER_CMD_SECCOMP_IBPB: static_branch_enable(_mm_always_ibpb); + spectre_v2_user_ibpb = SPECTRE_V2_USER_STRICT; break; case SPECTRE_V2_USER_CMD_PRCTL: case SPECTRE_V2_USER_CMD_AUTO: @@ -757,8 +759,6 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd) pr_info("mitigation: Enabling %s Indirect Branch Prediction Barrier\n", static_key_enabled(_mm_always_ibpb) ? "always-on" : "conditional"); - - spectre_v2_user_ibpb = mode; } /* -- 2.29.2.222.g5d2a92d10f8-goog
[tip: x86/urgent] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 1978b3a53a74e3230cd46932b149c6e62e832e9a Gitweb: https://git.kernel.org/tip/1978b3a53a74e3230cd46932b149c6e62e832e9a Author:Anand K Mistry AuthorDate:Thu, 05 Nov 2020 16:33:04 +11:00 Committer: Borislav Petkov CommitterDate: Thu, 05 Nov 2020 21:43:34 +01:00 x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON, STIBP is set to on and spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED At the same time, IBPB can be set to conditional. However, this leads to the case where it's impossible to turn on IBPB for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED condition leads to a return before the task flag is set. Similarly, ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to conditional. More generally, the following cases are possible: 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb 2. STIBP = on && IBPB = conditional for AMD CPUs with X86_FEATURE_AMD_STIBP_ALWAYS_ON The first case functions correctly today, but only because spectre_v2_user_ibpb isn't updated to reflect the IBPB mode. At a high level, this change does one thing. If either STIBP or IBPB is set to conditional, allow the prctl to change the task flag. Also, reflect that capability when querying the state. This isn't perfect since it doesn't take into account if only STIBP or IBPB is unconditionally on. But it allows the conditional feature to work as expected, without affecting the unconditional one. [ bp: Massage commit message and comment; space out statements for better readability. ] Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Signed-off-by: Anand K Mistry Signed-off-by: Borislav Petkov Acked-by: Thomas Gleixner Acked-by: Tom Lendacky Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid --- arch/x86/kernel/cpu/bugs.c | 51 +++-- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index d3f0db4..581fb72 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1254,6 +1254,14 @@ static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl) return 0; } +static bool is_spec_ib_user_controlled(void) +{ + return spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL || + spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP || + spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL || + spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP; +} + static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) { switch (ctrl) { @@ -1261,16 +1269,26 @@ static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return 0; + /* -* Indirect branch speculation is always disabled in strict -* mode. It can neither be enabled if it was force-disabled -* by a previous prctl call. +* With strict mode for both IBPB and STIBP, the instruction +* code paths avoid checking this task flag and instead, +* unconditionally run the instruction. However, STIBP and IBPB +* are independent and either can be set to conditionally +* enabled regardless of the mode of the other. +* +* If either is set to conditional, allow the task flag to be +* updated, unless it was force-disabled by a previous prctl +* call. Currently, this is possible on an AMD CPU which has the +* feature X86_FEATURE_AMD_STIBP_ALWAYS_ON. In this case, if the +* kernel is booted with 'spectre_v2_user=seccomp', then +* spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP and +* spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED. */ - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED || + if (!is_spec_ib_user_controlled() || task_spec_ib_force_disable(task)) return -EPERM; + task_clear_spec_ib_disable(task); task_update_spec_tif(task); break; @@ -1283,10 +1301,10 @@ static int ib_prctl_set(struct task
[PATCH v2] proc: Provide details on indirect branch speculation
Similar to speculation store bypass, show information about the indirect branch speculation mode of a task in /proc/$pid/status. Signed-off-by: Anand K Mistry --- Changes in v2: - Remove underscores from field name to workaround documentation issue Documentation/filesystems/proc.rst | 2 ++ fs/proc/array.c| 28 2 files changed, 30 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 533c79e8d2cd..531edaf07924 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -210,6 +210,7 @@ read the file /proc/PID/status:: NoNewPrivs: 0 Seccomp:0 Speculation_Store_Bypass: thread vulnerable + SpeculationIndirectBranch: conditional enabled voluntary_ctxt_switches:0 nonvoluntary_ctxt_switches: 1 @@ -292,6 +293,7 @@ It's slow but very precise. NoNewPrivs no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...) Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...) Speculation_Store_Bypassspeculative store bypass mitigation status + SpeculationIndirectBranch indirect branch speculation mode Cpus_allowedmask of CPUs on which this process may run Cpus_allowed_list Same as previous, but in "list format" Mems_allowedmask of memory nodes allowed to this process diff --git a/fs/proc/array.c b/fs/proc/array.c index 65ec2029fa80..014c1859554d 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -368,6 +368,34 @@ static inline void task_seccomp(struct seq_file *m, struct task_struct *p) seq_puts(m, "vulnerable"); break; } + + seq_puts(m, "\nSpeculationIndirectBranch:\t"); + switch (arch_prctl_spec_ctrl_get(p, PR_SPEC_INDIRECT_BRANCH)) { + case -EINVAL: + seq_puts(m, "unsupported"); + break; + case PR_SPEC_NOT_AFFECTED: + seq_puts(m, "not affected"); + break; + case PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE: + seq_puts(m, "conditional force disabled"); + break; + case PR_SPEC_PRCTL | PR_SPEC_DISABLE: + seq_puts(m, "conditional disabled"); + break; + case PR_SPEC_PRCTL | PR_SPEC_ENABLE: + seq_puts(m, "conditional enabled"); + break; + case PR_SPEC_ENABLE: + seq_puts(m, "always enabled"); + break; + case PR_SPEC_DISABLE: + seq_puts(m, "always disabled"); + break; + default: + seq_puts(m, "unknown"); + break; + } seq_putc(m, '\n'); } -- 2.29.1.341.ge80a0c044ae-goog
Re: linux-next: build warning after merge of the akpm-current tree
SNIPPED > > > > Looks like left column became too wide, so rather than shift the right > > column to the right, I'd suggest to drop underscores in Speculation*. > > Hm. That makes it inconsistent with Speculation_Store_Bypass. I guess > it's the lesser of two evils. Oh, do you mean renaming the existing Speculation_Store_Bypass? I thought that was a big no-no for the kernel?
Re: linux-next: build warning after merge of the akpm-current tree
On Thu, 5 Nov 2020 at 18:03, Mike Rapoport wrote: > > On Thu, Nov 05, 2020 at 05:45:49PM +1100, Stephen Rothwell wrote: > > Hi all, > > > > After merging the akpm-current tree, today's linux-next build (htmldocs) > > produced this warning: > > > > Documentation/filesystems/proc.rst:296: WARNING: Malformed table. > > Text in column margin in table line 61. > > > > == > > === > > Field Content > > == > > === > ... > > Speculation_Store_Bypassspeculative store bypass mitigation status > > Speculation_Indirect_Branch indirect branch speculation mode > ... > > == > > === > > Documentation/filesystems/proc.rst:234: WARNING: Error parsing content > > block for the "table" directive: exactly one table expected. > > Looks like left column became too wide, so rather than shift the right > column to the right, I'd suggest to drop underscores in Speculation*. Hm. That makes it inconsistent with Speculation_Store_Bypass. I guess it's the lesser of two evils. How would I go about fixing this? Send a new (v2), fixed patch to the mailing list? I'm not that familiar with how patches get merged through the branches. > > > > > .. table:: Table 1-2: Contents of the status files (as of 4.19) > > > > == > > === > > Field Content > > == > > === >... > > Speculation_Store_Bypassspeculative store bypass mitigation status > > Speculation_Indirect_Branch indirect branch speculation mode > > Cpus_allowedmask of CPUs on which this process may run > > Cpus_allowed_list Same as previous, but in "list format" > > Mems_allowedmask of memory nodes allowed to this process > > Mems_allowed_list Same as previous, but in "list format" > > voluntary_ctxt_switches number of voluntary context switches > > nonvoluntary_ctxt_switches number of non voluntary context switches > > ====== > > === > > Same here. > > > Introduced by commit > > > > 60b492d745d5 ("proc: provide details on indirect branch speculation") > > > > -- > > Cheers, > > Stephen Rothwell > > > > -- > Sincerely yours, > Mike. -- Anand K. Mistry Software Engineer Google Australia
Re: [PATCH] proc: Provide details on indirect branch speculation
On Sun, 1 Nov 2020 at 07:05, Andrew Morton wrote: > > On Fri, 30 Oct 2020 17:27:54 +1100 Anand K Mistry wrote: > > > Similar to speculation store bypass, show information about the indirect > > branch speculation mode of a task in /proc/$pid/status. > > Why is this considered useful? For testing/benchmarking, I needed to see whether IB (Indirect Branch) speculation (see Spectre-v2) is enabled on a task, to see whether an IBPB instruction should be executed on an address space switch. Unfortunately, this information isn't available anywhere else and currently the only way to get it is to hack the kernel to expose it (like this change). It also helped expose a bug with conditional IB speculation on certain CPUs. Another place this could be useful is to audit the system when using sanboxing. With this change, I can confirm that seccomp-enabled process have IB speculation force disabled as expected when the kernel command line parameter `spectre_v2_user=seccomp`. Since there's already a 'Speculation_Store_Bypass' field, I used that as precedence for adding this one. -- Anand K. Mistry Software Engineer Google Australia
[PATCH v2] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON, STIBP is set to on and 'spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED'. At the same time, IBPB can be set to conditional. However, this leads to the case where it's impossible to turn on IBPB for a process because in the PR_SPEC_DISABLE case in ib_prctl_set, the (spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) condition leads to a return before the task flag is set. Similarly, ib_prctl_get will return PR_SPEC_DISABLE even though IBPB is set to conditional. More generally, the following cases are possible: 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb 2. STIBP = on && IBPB = conditional for AMD CPUs with X86_FEATURE_AMD_STIBP_ALWAYS_ON The first case functions correctly today, but only because spectre_v2_user_ibpb isn't updated to reflect the IBPB mode. At a high level, this change does one thing. If either STIBP or IBPB is set to conditional, allow the prctl to change the task flag. Also, reflect that capability when querying the state. This isn't perfect since it doesn't take into account if only STIBP or IBPB is unconditionally on. But it allows the conditional feature to work as expected, without affecting the unconditional one. Signed-off-by: Anand K Mistry --- Changes in v2: - Fix typo in commit message - s/is_spec_ib_user/is_spec_ib_user_controlled - Update comment in ib_prctl_set() to reference X86_FEATURE_AMD_STIBP_ALWAYS_ON - Have is_spec_ib_user_controlled() check both IBPB and STIBP modes arch/x86/kernel/cpu/bugs.c | 46 +++--- 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index d3f0db463f96..534225afe832 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1254,6 +1254,14 @@ static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl) return 0; } +static bool is_spec_ib_user_controlled(void) +{ + return spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL || + spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP || + spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL || + spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP; +} + static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) { switch (ctrl) { @@ -1262,13 +1270,20 @@ static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return 0; /* -* Indirect branch speculation is always disabled in strict -* mode. It can neither be enabled if it was force-disabled -* by a previous prctl call. +* With strict mode for both IBPB and STIBP, the instruction +* code paths avoid checking this task flag and instead, +* unconditionally run the instruction. However, STIBP and IBPB +* are independent and either can be set to conditionally +* enabled regardless of the mode of the other. If either is set +* to conditional, allow the task flag to be updated, unless it +* was force-disabled by a previous prctl call. +* Currently, this is possible on an AMD CPU which has the +* feature X86_FEATURE_AMD_STIBP_ALWAYS_ON. In this case, if the +* kernel is booted with 'spectre_v2_user=seccomp', then +* spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP and +* spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED. */ - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED || + if (!is_spec_ib_user_controlled() || task_spec_ib_force_disable(task)) return -EPERM; task_clear_spec_ib_disable(task); @@ -1283,9 +1298,7 @@ static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return -EPERM; - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) + if (!is_spec_ib_user_controlled()) return 0; task_set_spec_ib_disable(task); if (ctrl == PR_SPEC_FORCE_DISABLE) @@ -1351,20 +1364,17 @@ static int ib_prctl_get(struct task_struct *task) if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE &&
Re: [PATCH 1/1] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
On Tue, 3 Nov 2020 at 21:58, Borislav Petkov wrote: > > On Mon, Nov 02, 2020 at 11:02:10AM +1100, Anand K. Mistry wrote: > > > I like the idea of passing in the mode you want to check, but it appears > > > they are never used independently. The ibpb and stibp modes are always > > > checked together in one of the if statements below, so you could make this > > > a function that checks both modes and just have a single call. I'll leave > > > that up to the maintainers to see what is preferred. > > > > I can see both sides to this. Personally, I think I prefer it as-is > > since I think it improves readability a bit by making the conditions > > less complicated whilst not hiding too many details. I'll wait to see > > what others say before changing this one. > > Yes, but if you make it a single function with a descriptive name, you'd > make the call sites even more readable: > > if (!is_spec_ib_conditional(..)) > bla; > > or > > if (!is_spec_ib_user_controlled(..)) > blu; > > and that function should simply check both spectre_v2_user_ibpb *and* > spectre_v2_user_stibp in one go. > > Why should we do that? > > Exactly because you both got your brains twisted just from looking at > this. Because this mitigation crap is such an ugly and complex maze that > we would take even the smallest simplification any day of the week! Ok then, two votes for. I'll make the change in v2 and post that up today. > > Welcome to my life since meltdown. Brain twist feels good, doesn't it? I don't think "feels good" are the words I'd use. > > :-))) > > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette -- Anand K. Mistry Software Engineer Google Australia
Re: [PATCH 1/1] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
On Sun, 1 Nov 2020 at 02:05, Tom Lendacky wrote: > > On 10/29/20 1:51 AM, Anand K Mistry wrote: > > On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON, > > STIBP is set to on and 'spectre_v2_user_stibp == > > SPECTRE_V2_USER_STRICT_PREFERRED'. At the same time, IBPB can be set to > > conditional. However, this leads to the case where it's impossible to > > turn on IBPB for a process because in the PR_SPEC_DISABLE case in > > ib_prctl_set, the (spectre_v2_user_stibp == > > SPECTRE_V2_USER_STRICT_PREFERRED) condition leads to a return before the > > task flag is set. Similarly, ib_prctl_get will return PR_SPEC_DISABLE > > even though IBPB is set to conditional. > > > > More generally, the following cases are possible: > > 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb > > 2. STIBP = on && IBPB = conditional for AMD CPUs with > > X86_FEATURE_AMD_STIBP_ALWAYS_ON > > > > The first case functions correctly today, but only because > > spectre_v2_user_ibpb isn't updated to reflect the IBPB mode. > > > > At a high level, this change does one thing. If either STIBP is IBPB is > > s/STIBP is IBPB/STIBP or IBPB/ Oops. Will be fixed in v2. > > > set to conditional, allow the prctl to change the task flag. Also, > > reflect that capability when querying the state. This isn't perfect > > since it doesn't take into account if only STIBP or IBPB is > > unconditionally on. But it allows the conditional feature to work as > > expected, without affecting the unconditional one. > > > > Signed-off-by: Anand K Mistry > > > > --- > > > > arch/x86/kernel/cpu/bugs.c | 41 +- > > 1 file changed, 23 insertions(+), 18 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c > > index d3f0db463f96..fb64e02eed6f 100644 > > --- a/arch/x86/kernel/cpu/bugs.c > > +++ b/arch/x86/kernel/cpu/bugs.c > > @@ -1254,6 +1254,11 @@ static int ssb_prctl_set(struct task_struct *task, > > unsigned long ctrl) > > return 0; > > } > > > > +static bool is_spec_ib_user(enum spectre_v2_user_mitigation mode) > > Maybe something like is_spec_ib_user_controlled() would be a better name. Changed in v2. > > > +{ > > + return mode == SPECTRE_V2_USER_PRCTL || mode == > > SPECTRE_V2_USER_SECCOMP; > > +} > > + > > I like the idea of passing in the mode you want to check, but it appears > they are never used independently. The ibpb and stibp modes are always > checked together in one of the if statements below, so you could make this > a function that checks both modes and just have a single call. I'll leave > that up to the maintainers to see what is preferred. I can see both sides to this. Personally, I think I prefer it as-is since I think it improves readability a bit by making the conditions less complicated whilst not hiding too many details. I'll wait to see what others say before changing this one. > > > static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) > > { > > switch (ctrl) { > > @@ -1262,13 +1267,16 @@ static int ib_prctl_set(struct task_struct *task, > > unsigned long ctrl) > > spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) > > return 0; > > /* > > - * Indirect branch speculation is always disabled in strict > > - * mode. It can neither be enabled if it was force-disabled > > - * by a previous prctl call. > > + * With strict mode for both IBPB and STIBP, the instruction > > + * code paths avoid checking this task flag and instead, > > + * unconditionally run the instruction. However, STIBP and > > IBPB > > + * are independent and either can be set to conditionally > > + * enabled regardless of the mode of the other. If either is > > set > > + * to conditional, allow the task flag to be updated, unless > > it > > + * was force-disabled by a previous prctl call. > > You probably want to reference the STIBP always on mode that allows this > situation. Updated comment in v2 to mention the X86_FEATURE_AMD_STIBP_ALWAYS_ON case. > > >*/ > > - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || > > - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || > > - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED > > || > >
Re: [PATCH 0/1] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
On Sun, 1 Nov 2020 at 01:50, Tom Lendacky wrote: > > On 10/29/20 1:51 AM, Anand K Mistry wrote: > > When attempting to do some performance testing of IBPB on and AMD > > platform, I noticed the IBPB instruction was never being issued, even > > though it was conditionally on and various seccomp protected processes > > were force enabling it. Turns out, on those AMD CPUs, STIBP is set to > > always-on and this was causing an early-out on the prctl() which turns > > off IB speculation. Here is my attempt to fix it. > > > > I'm hoping someone that understands this better than me can explain why > > I'm wrong. > > It all looks reasonable to me (some comments in the patch to follow). The > thing that makes this tough is the command line option of being able to > force IBPB using the "prctl,ibpb" or "seccomp,ibpb" while STIBP is prctl > or seccomp controlled. There's an inherent quality that is assumed that if > STIBP is forced then IBPB must be forced and it looks like 21998a351512 > ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced > IBRS.") used that. However, with the STIBP always on support, that doesn't > hold true. Yeah, and this is what I found confusing. With that commit, the number of combinations of IBPB and STIBP is 25, but only a small subset is possible. For example: - (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT && spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT) - (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT && spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) are the only possible combinations of STRICT. But also, if 'spectre_v2_user=seccomp,ibpb' (or prctl,ibpb), then spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP even though it is logically SPECTRE_V2_USER_STRICT. It took a bit of time to wrap my head around this, hence I'm a bit hesitant about this change (even though I think it's right). > > Thanks, > Tom > > > > > > > Anand K Mistry (1): > >x86/speculation: Allow IBPB to be conditionally enabled on CPUs with > > always-on STIBP > > > > arch/x86/kernel/cpu/bugs.c | 41 +- > > 1 file changed, 23 insertions(+), 18 deletions(-) > >
Re: [PATCH 1/1] debugfs: Add a helper to export atomic64_t values
On Fri, 30 Oct 2020 at 18:20, Greg Kroah-Hartman wrote: > > On Fri, Oct 30, 2020 at 06:04:42PM +1100, Anand K Mistry wrote: > > This mirrors support for exporting atomic_t values. > > > > Signed-off-by: Anand K Mistry > > > > --- > > > > fs/debugfs/file.c | 37 + > > include/linux/debugfs.h | 6 ++ > > 2 files changed, 43 insertions(+) > > > > diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c > > index a768a09430c3..798bd3bdedec 100644 > > --- a/fs/debugfs/file.c > > +++ b/fs/debugfs/file.c > > @@ -770,6 +770,43 @@ void debugfs_create_atomic_t(const char *name, umode_t > > mode, > > } > > EXPORT_SYMBOL_GPL(debugfs_create_atomic_t); > > > > +static int debugfs_atomic64_t_set(void *data, u64 val) > > +{ > > + atomic64_set((atomic64_t *)data, val); > > + return 0; > > +} > > +static int debugfs_atomic64_t_get(void *data, u64 *val) > > +{ > > + *val = atomic64_read((atomic64_t *)data); > > + return 0; > > +} > > +DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t, debugfs_atomic64_t_get, > > + debugfs_atomic64_t_set, "%lld\n"); > > +DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t_ro, debugfs_atomic64_t_get, NULL, > > + "%lld\n"); > > +DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t_wo, NULL, debugfs_atomic64_t_set, > > + "%lld\n"); > > + > > +/** > > + * debugfs_create_atomic64_t - create a debugfs file that is used to read > > and > > + * write an atomic64_t value > > + * @name: a pointer to a string containing the name of the file to create. > > + * @mode: the permission that the file should have > > + * @parent: a pointer to the parent dentry for this file. This should be a > > + * directory dentry if set. If this parameter is %NULL, then the > > + * file will be created in the root of the debugfs filesystem. > > + * @value: a pointer to the variable that the file should read to and write > > + * from. > > + */ > > +void debugfs_create_atomic64_t(const char *name, umode_t mode, > > +struct dentry *parent, atomic64_t *value) > > +{ > > + debugfs_create_mode_unsafe(name, mode, parent, value, > > +_atomic64_t, _atomic64_t_ro, > > +_atomic64_t_wo); > > +} > > +EXPORT_SYMBOL_GPL(debugfs_create_atomic64_t); > > + > > ssize_t debugfs_read_file_bool(struct file *file, char __user *user_buf, > > size_t count, loff_t *ppos) > > { > > diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h > > index 851dd1f9a8a5..0fac84c53eab 100644 > > --- a/include/linux/debugfs.h > > +++ b/include/linux/debugfs.h > > @@ -126,6 +126,8 @@ void debugfs_create_size_t(const char *name, umode_t > > mode, > > struct dentry *parent, size_t *value); > > void debugfs_create_atomic_t(const char *name, umode_t mode, > >struct dentry *parent, atomic_t *value); > > +void debugfs_create_atomic64_t(const char *name, umode_t mode, > > + struct dentry *parent, atomic64_t > > *value); > > struct dentry *debugfs_create_bool(const char *name, umode_t mode, > > struct dentry *parent, bool *value); > > > > @@ -291,6 +293,10 @@ static inline void debugfs_create_atomic_t(const char > > *name, umode_t mode, > > atomic_t *value) > > { } > > > > +static inline void debugfs_create_atomic64_t(const char *name, umode_t > > mode, > > + struct dentry *parent, > > atomic64_t *value) > > +{ } > > + > > static inline struct dentry *debugfs_create_bool(const char *name, umode_t > > mode, > >struct dentry *parent, > >bool *value) > > Looks good, but where is the user of this code? I can't add new apis > without a user. Fair enough. Right now, the user is just some local debugging/performance measuring which will never be upstreamed. Happy to let this drop. > > And are you _SURE_ you want to be using an atomic64_t in the first > place? We are starting to reduce the "raw" usage of atomic variables as > almost no one needs them, they should be using something else instead, > or just a u64 as atomics are not needed for simple statistics. I understand, and would generally never use atomics in real code. I used an atomic since I wanted accuracy (for some of the benchmarks I want to run) but can't use anything that blocks (spinlock/mutex) since the code is somewhere inside the scheduler. > > thanks, > > greg k-h -- Anand K. Mistry Software Engineer Google Australia
[PATCH 1/1] debugfs: Add a helper to export atomic64_t values
This mirrors support for exporting atomic_t values. Signed-off-by: Anand K Mistry --- fs/debugfs/file.c | 37 + include/linux/debugfs.h | 6 ++ 2 files changed, 43 insertions(+) diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c index a768a09430c3..798bd3bdedec 100644 --- a/fs/debugfs/file.c +++ b/fs/debugfs/file.c @@ -770,6 +770,43 @@ void debugfs_create_atomic_t(const char *name, umode_t mode, } EXPORT_SYMBOL_GPL(debugfs_create_atomic_t); +static int debugfs_atomic64_t_set(void *data, u64 val) +{ + atomic64_set((atomic64_t *)data, val); + return 0; +} +static int debugfs_atomic64_t_get(void *data, u64 *val) +{ + *val = atomic64_read((atomic64_t *)data); + return 0; +} +DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t, debugfs_atomic64_t_get, + debugfs_atomic64_t_set, "%lld\n"); +DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t_ro, debugfs_atomic64_t_get, NULL, + "%lld\n"); +DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t_wo, NULL, debugfs_atomic64_t_set, + "%lld\n"); + +/** + * debugfs_create_atomic64_t - create a debugfs file that is used to read and + * write an atomic64_t value + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is %NULL, then the + * file will be created in the root of the debugfs filesystem. + * @value: a pointer to the variable that the file should read to and write + * from. + */ +void debugfs_create_atomic64_t(const char *name, umode_t mode, + struct dentry *parent, atomic64_t *value) +{ + debugfs_create_mode_unsafe(name, mode, parent, value, + _atomic64_t, _atomic64_t_ro, + _atomic64_t_wo); +} +EXPORT_SYMBOL_GPL(debugfs_create_atomic64_t); + ssize_t debugfs_read_file_bool(struct file *file, char __user *user_buf, size_t count, loff_t *ppos) { diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h index 851dd1f9a8a5..0fac84c53eab 100644 --- a/include/linux/debugfs.h +++ b/include/linux/debugfs.h @@ -126,6 +126,8 @@ void debugfs_create_size_t(const char *name, umode_t mode, struct dentry *parent, size_t *value); void debugfs_create_atomic_t(const char *name, umode_t mode, struct dentry *parent, atomic_t *value); +void debugfs_create_atomic64_t(const char *name, umode_t mode, +struct dentry *parent, atomic64_t *value); struct dentry *debugfs_create_bool(const char *name, umode_t mode, struct dentry *parent, bool *value); @@ -291,6 +293,10 @@ static inline void debugfs_create_atomic_t(const char *name, umode_t mode, atomic_t *value) { } +static inline void debugfs_create_atomic64_t(const char *name, umode_t mode, +struct dentry *parent, atomic64_t *value) +{ } + static inline struct dentry *debugfs_create_bool(const char *name, umode_t mode, struct dentry *parent, bool *value) -- 2.29.1.341.ge80a0c044ae-goog
[PATCH 0/1] debugfs: Add a helper to export atomic64_t values
Here's my story: Once upon a time, there lived a developer that wanted to export an atomic64_t value to userspace using debugfs to help with debugging. But that developer found there was no helper function to do so and was very sad. The End. Yeah, it's a sad story. Here's my patch, merge me maybe? Anand K Mistry (1): debugfs: Add a helper to export atomic64_t values fs/debugfs/file.c | 37 + include/linux/debugfs.h | 6 ++ 2 files changed, 43 insertions(+) -- 2.29.1.341.ge80a0c044ae-goog
[PATCH] proc: Provide details on indirect branch speculation
Similar to speculation store bypass, show information about the indirect branch speculation mode of a task in /proc/$pid/status. Signed-off-by: Anand K Mistry --- Documentation/filesystems/proc.rst | 2 ++ fs/proc/array.c| 28 2 files changed, 30 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 533c79e8d2cd..710dd69614b9 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -210,6 +210,7 @@ read the file /proc/PID/status:: NoNewPrivs: 0 Seccomp:0 Speculation_Store_Bypass: thread vulnerable + Speculation_Indirect_Branch:conditional enabled voluntary_ctxt_switches:0 nonvoluntary_ctxt_switches: 1 @@ -292,6 +293,7 @@ It's slow but very precise. NoNewPrivs no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...) Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...) Speculation_Store_Bypassspeculative store bypass mitigation status + Speculation_Indirect_Branch indirect branch speculation mode Cpus_allowedmask of CPUs on which this process may run Cpus_allowed_list Same as previous, but in "list format" Mems_allowedmask of memory nodes allowed to this process diff --git a/fs/proc/array.c b/fs/proc/array.c index 65ec2029fa80..ce4fa948c9dd 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -368,6 +368,34 @@ static inline void task_seccomp(struct seq_file *m, struct task_struct *p) seq_puts(m, "vulnerable"); break; } + + seq_puts(m, "\nSpeculation_Indirect_Branch:\t"); + switch (arch_prctl_spec_ctrl_get(p, PR_SPEC_INDIRECT_BRANCH)) { + case -EINVAL: + seq_puts(m, "unsupported"); + break; + case PR_SPEC_NOT_AFFECTED: + seq_puts(m, "not affected"); + break; + case PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE: + seq_puts(m, "conditional force disabled"); + break; + case PR_SPEC_PRCTL | PR_SPEC_DISABLE: + seq_puts(m, "conditional disabled"); + break; + case PR_SPEC_PRCTL | PR_SPEC_ENABLE: + seq_puts(m, "conditional enabled"); + break; + case PR_SPEC_ENABLE: + seq_puts(m, "always enabled"); + break; + case PR_SPEC_DISABLE: + seq_puts(m, "always disabled"); + break; + default: + seq_puts(m, "unknown"); + break; + } seq_putc(m, '\n'); } -- 2.29.1.341.ge80a0c044ae-goog
[PATCH 0/1] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
When attempting to do some performance testing of IBPB on and AMD platform, I noticed the IBPB instruction was never being issued, even though it was conditionally on and various seccomp protected processes were force enabling it. Turns out, on those AMD CPUs, STIBP is set to always-on and this was causing an early-out on the prctl() which turns off IB speculation. Here is my attempt to fix it. I'm hoping someone that understands this better than me can explain why I'm wrong. Anand K Mistry (1): x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP arch/x86/kernel/cpu/bugs.c | 41 +- 1 file changed, 23 insertions(+), 18 deletions(-) -- 2.29.1.341.ge80a0c044ae-goog
[PATCH 1/1] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON, STIBP is set to on and 'spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED'. At the same time, IBPB can be set to conditional. However, this leads to the case where it's impossible to turn on IBPB for a process because in the PR_SPEC_DISABLE case in ib_prctl_set, the (spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) condition leads to a return before the task flag is set. Similarly, ib_prctl_get will return PR_SPEC_DISABLE even though IBPB is set to conditional. More generally, the following cases are possible: 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb 2. STIBP = on && IBPB = conditional for AMD CPUs with X86_FEATURE_AMD_STIBP_ALWAYS_ON The first case functions correctly today, but only because spectre_v2_user_ibpb isn't updated to reflect the IBPB mode. At a high level, this change does one thing. If either STIBP is IBPB is set to conditional, allow the prctl to change the task flag. Also, reflect that capability when querying the state. This isn't perfect since it doesn't take into account if only STIBP or IBPB is unconditionally on. But it allows the conditional feature to work as expected, without affecting the unconditional one. Signed-off-by: Anand K Mistry --- arch/x86/kernel/cpu/bugs.c | 41 +- 1 file changed, 23 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index d3f0db463f96..fb64e02eed6f 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1254,6 +1254,11 @@ static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl) return 0; } +static bool is_spec_ib_user(enum spectre_v2_user_mitigation mode) +{ + return mode == SPECTRE_V2_USER_PRCTL || mode == SPECTRE_V2_USER_SECCOMP; +} + static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) { switch (ctrl) { @@ -1262,13 +1267,16 @@ static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return 0; /* -* Indirect branch speculation is always disabled in strict -* mode. It can neither be enabled if it was force-disabled -* by a previous prctl call. +* With strict mode for both IBPB and STIBP, the instruction +* code paths avoid checking this task flag and instead, +* unconditionally run the instruction. However, STIBP and IBPB +* are independent and either can be set to conditionally +* enabled regardless of the mode of the other. If either is set +* to conditional, allow the task flag to be updated, unless it +* was force-disabled by a previous prctl call. */ - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED || + if ((!is_spec_ib_user(spectre_v2_user_ibpb) && +!is_spec_ib_user(spectre_v2_user_stibp)) || task_spec_ib_force_disable(task)) return -EPERM; task_clear_spec_ib_disable(task); @@ -1283,9 +1291,8 @@ static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return -EPERM; - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) + if (!is_spec_ib_user(spectre_v2_user_ibpb) && + !is_spec_ib_user(spectre_v2_user_stibp)) return 0; task_set_spec_ib_disable(task); if (ctrl == PR_SPEC_FORCE_DISABLE) @@ -1351,20 +1358,18 @@ static int ib_prctl_get(struct task_struct *task) if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE && spectre_v2_user_stibp == SPECTRE_V2_USER_NONE) return PR_SPEC_ENABLE; - else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT || - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED) - return PR_SPEC_DISABLE; - else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL || - spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP || - spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL || - spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP) { + else if (is_spec_ib_user(spectre_v
[PATCH v2] arm64: dts: mt8173-elm: Set GPU power regulator to always on
Keep the da9212 BUCKB always-on. This works around an issue on Elm/Hana devices where sometimes, the regulator is disabled before scpsys is suspended, causing the suspension of scpsys to fail. Usually, the GPU and scpsys are suspended by the runtime PM before the system is suspended, due to the GPU being idle. In this case, scpsys is suspended inline with the GPU suspend, which then disables the regulator. However, if the GPU is still active when system is suspended, GPU suspend occurs but defers suspending scpsys to the PM's noirq phase. Since GPU suspend disables the regulator, scpsys isn't powered and suspending it fails with the following error: [ 523.773227] mtk-scpsys 10006000.scpsys: Failed to power off domain mfg_2d On resume, scpsys is resumed in the noirq phase. Since scpsys requires power from the regulator, which is still disabled at this point, attempting to turn it on will hang the CPU. A HW watchdog eventually reboots the system. The obvious solution would be to add a link to the regulator from scpsys in the devicetree. This would prevent the regulator from being disabled until scpsys is suspended. However, in the case where suspending scpsys is deferred to the noirq phase, disabling the regulator will fail since it is connected over I2C which requires IRQs to be enabled. Even in the usual case where scpsys is suspended inline with the GPU, PM will always attempt to resume scpsys in noirq. This will attempt to enable the regulator, which will also fail due to being unable to communicate over I2C. Since I2C can't be using with IRQs disabled, a workaround is to never turn off the regulator. Measuring power on the GPU rail on an Elm DVT device shows that the change in power usage is negligible. The two relavent cases are S0 with an idle GPU, and S3 (system suspended). Measurements taken using a debug board every 100ms for 1 minute. In S0 with an idle GPU, current behaviour with the regulator off: @@ NAME COUNT AVERAGE STDDEV MAXMIN @@ gpu_mw600 1.741.31 6.75 0.00 ... and with the regulator on, but no load: @@ NAME COUNT AVERAGE STDDEV MAXMIN @@ gpu_mw600 1.681.257.13 0.00 The difference being well within the margin of error. In S3, current behaviour with the regulator off: @@ NAME COUNT AVERAGE STDDEV MAXMIN @@ gpu_mw600 0.940.743.25 0.00 ... and with the regulator on: @@ NAME COUNT AVERAGE STDDEV MAX MIN @@ gpu_mw600 0.830.663.250.00 Signed-off-by: Anand K Mistry --- Changes in v2: - Remove CHROMIUM from subject line - Correct device in subject line (8183 -> 8173) - Grammar/clarity changes in description arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi index a5a12b2599a4..1294f27b21c1 100644 --- a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi @@ -304,6 +304,7 @@ da9211_vgpu_reg: BUCKB { regulator-min-microamp = <200>; regulator-max-microamp = <300>; regulator-ramp-delay = <1>; + regulator-always-on; }; }; }; -- 2.28.0.297.g1956fa8f8d-goog
Re: [PATCH] CHROMIUM: arm64: dts: mt8183-elm: Set GPU power regulator to always on
:facepalm: sorry about the subject line. I'll fix it up in the next revision. On Tue, 25 Aug 2020 at 15:26, Anand K Mistry wrote: > > Keep the da9212 BUCKB always-on. This works around an issue on Elm/Hana > devices where sometimes, the regulator is disabled before scpsys is > suspended, causing the suspension of scpsys to fail. > > Usually, the GPU and scpsys are suspended by the runtime PM before the > system is suspended, due to the GPU being idle. In this case, scpsys is > suspended inline with the GPU suspend, which then disables the > regulator. However, if the GPU is still active when system is suspended, > GPU suspend occurs but defers suspending scpsys to the PM's noirq phase. > Since GPU suspend disables the regulator, scpsys isn't powered and > suspending it fails with the following error: > [ 523.773227] mtk-scpsys 10006000.scpsys: Failed to power off domain mfg_2d > > On resume, scpsys is resumed in the noirq phase. Since scpsys requires > power from the regulator, which is still disabled at this point, > attempting to turn it on will hang the CPU. A HW watchdog eventually > reboots the system. > > The obvious solution would be to add a link to the regulator from scpsys > in the devicetree. This would prevent the regulator from being disabled > until scpsys is suspended. However, in the case where suspending scpsys > is deferred to the noirq phase, disabling the regulator will fail since > it is connected over I2C which requires IRQs to be enabled. Even in the > usual case where scpsys is suspended inline with the GPU, PM will always > attempt to resume scpsys in noirq. This will attempt to enable the > regulator, which will also fail due to being unable to communicate over > I2C. > > Since I2C can't be using with IRQs disabled, a workaround is to never > turn off the regulator. > > Measuring power on the GPU rail on a Elm DVT shows that the change in > power usage is negligible. The two relavent cases are S0 with an idle > GPU, and S3. > > In S0 with an idle GPU, current behaviour with the regulator off: > @@ NAME COUNT AVERAGE STDDEV MAXMIN > @@ gpu_mw600 1.741.31 6.75 0.00 > ... and with the regulator on, but no load: > @@ NAME COUNT AVERAGE STDDEV MAXMIN > @@ gpu_mw600 1.681.257.13 0.00 > The difference being well within the margin of error. > > In S3, current behaviour with the regulator off: > @@ NAME COUNT AVERAGE STDDEV MAXMIN > @@ gpu_mw600 0.940.743.25 0.00 > ... and with the regulator on: > @@ NAME COUNT AVERAGE STDDEV MAX MIN > @@ gpu_mw600 0.830.663.250.00 > > Signed-off-by: Anand K Mistry > > --- > > arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi > b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi > index a5a12b2599a4..1294f27b21c1 100644 > --- a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi > +++ b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi > @@ -304,6 +304,7 @@ da9211_vgpu_reg: BUCKB { > regulator-min-microamp = <200>; > regulator-max-microamp = <300>; > regulator-ramp-delay = <1>; > + regulator-always-on; > }; > }; > }; > -- > 2.28.0.297.g1956fa8f8d-goog > -- Anand K. Mistry Software Engineer Google Australia
[PATCH] CHROMIUM: arm64: dts: mt8183-elm: Set GPU power regulator to always on
Keep the da9212 BUCKB always-on. This works around an issue on Elm/Hana devices where sometimes, the regulator is disabled before scpsys is suspended, causing the suspension of scpsys to fail. Usually, the GPU and scpsys are suspended by the runtime PM before the system is suspended, due to the GPU being idle. In this case, scpsys is suspended inline with the GPU suspend, which then disables the regulator. However, if the GPU is still active when system is suspended, GPU suspend occurs but defers suspending scpsys to the PM's noirq phase. Since GPU suspend disables the regulator, scpsys isn't powered and suspending it fails with the following error: [ 523.773227] mtk-scpsys 10006000.scpsys: Failed to power off domain mfg_2d On resume, scpsys is resumed in the noirq phase. Since scpsys requires power from the regulator, which is still disabled at this point, attempting to turn it on will hang the CPU. A HW watchdog eventually reboots the system. The obvious solution would be to add a link to the regulator from scpsys in the devicetree. This would prevent the regulator from being disabled until scpsys is suspended. However, in the case where suspending scpsys is deferred to the noirq phase, disabling the regulator will fail since it is connected over I2C which requires IRQs to be enabled. Even in the usual case where scpsys is suspended inline with the GPU, PM will always attempt to resume scpsys in noirq. This will attempt to enable the regulator, which will also fail due to being unable to communicate over I2C. Since I2C can't be using with IRQs disabled, a workaround is to never turn off the regulator. Measuring power on the GPU rail on a Elm DVT shows that the change in power usage is negligible. The two relavent cases are S0 with an idle GPU, and S3. In S0 with an idle GPU, current behaviour with the regulator off: @@ NAME COUNT AVERAGE STDDEV MAXMIN @@ gpu_mw600 1.741.31 6.75 0.00 ... and with the regulator on, but no load: @@ NAME COUNT AVERAGE STDDEV MAXMIN @@ gpu_mw600 1.681.257.13 0.00 The difference being well within the margin of error. In S3, current behaviour with the regulator off: @@ NAME COUNT AVERAGE STDDEV MAXMIN @@ gpu_mw600 0.940.743.25 0.00 ... and with the regulator on: @@ NAME COUNT AVERAGE STDDEV MAX MIN @@ gpu_mw600 0.830.663.250.00 Signed-off-by: Anand K Mistry --- arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi index a5a12b2599a4..1294f27b21c1 100644 --- a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi @@ -304,6 +304,7 @@ da9211_vgpu_reg: BUCKB { regulator-min-microamp = <200>; regulator-max-microamp = <300>; regulator-ramp-delay = <1>; + regulator-always-on; }; }; }; -- 2.28.0.297.g1956fa8f8d-goog
[PATCH v2 2/4] dt-bindings: regulator: mt6397: Document valid modes
Document valid mode values for BUCK regulators. Signed-off-by: Anand K Mistry --- Changes in v2: None .../devicetree/bindings/regulator/mt6397-regulator.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/devicetree/bindings/regulator/mt6397-regulator.txt b/Documentation/devicetree/bindings/regulator/mt6397-regulator.txt index 01141fb00875..c080086d3e62 100644 --- a/Documentation/devicetree/bindings/regulator/mt6397-regulator.txt +++ b/Documentation/devicetree/bindings/regulator/mt6397-regulator.txt @@ -16,6 +16,9 @@ LDO: ldo_vemc3v3, ldo_vgp1, ldo_vgp2, ldo_vgp3, ldo_vgp4, ldo_vgp5, ldo_vgp6, ldo_vibr +BUCK regulators can set regulator-initial-mode and regulator-allowed-modes to +values specified in dt-bindings/regulator/mediatek,mt6397-regulator.h + Example: pmic { compatible = "mediatek,mt6397"; -- 2.27.0.212.ge8ba1cc988-goog
[PATCH v2 4/4] arm64: dts: mediatek: Update allowed mt6397 regulator modes for elm boards
This updates the allowed mt6397 regulator modes for elm (and derivative) boards to use named constants. Signed-off-by: Anand K Mistry --- Changes in v2: - Introduce constants in dt-bindings - Improve conditional readability arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi index a5a12b2599a4..e9cfded307b3 100644 --- a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi @@ -6,6 +6,7 @@ #include #include #include +#include #include "mt8173.dtsi" / { @@ -926,7 +927,8 @@ mt6397_vpca15_reg: buck_vpca15 { regulator-max-microvolt = <135>; regulator-ramp-delay = <12500>; regulator-always-on; - regulator-allowed-modes = <0 1>; + regulator-allowed-modes = ; }; mt6397_vpca7_reg: buck_vpca7 { -- 2.27.0.212.ge8ba1cc988-goog
[PATCH v2 1/4] regulator: mt6397: Move buck modes into header file
This will allow device trees to make use of these constants. Signed-off-by: Anand K Mistry --- Changes in v2: None drivers/regulator/mt6397-regulator.c | 4 +--- .../regulator/mediatek,mt6397-regulator.h | 15 +++ 2 files changed, 16 insertions(+), 3 deletions(-) create mode 100644 include/dt-bindings/regulator/mediatek,mt6397-regulator.h diff --git a/drivers/regulator/mt6397-regulator.c b/drivers/regulator/mt6397-regulator.c index 269c2a6028e8..d51e98ce1138 100644 --- a/drivers/regulator/mt6397-regulator.c +++ b/drivers/regulator/mt6397-regulator.c @@ -13,9 +13,7 @@ #include #include #include - -#define MT6397_BUCK_MODE_AUTO 0 -#define MT6397_BUCK_MODE_FORCE_PWM 1 +#include /* * MT6397 regulators' information diff --git a/include/dt-bindings/regulator/mediatek,mt6397-regulator.h b/include/dt-bindings/regulator/mediatek,mt6397-regulator.h new file mode 100644 index ..99869a8665cf --- /dev/null +++ b/include/dt-bindings/regulator/mediatek,mt6397-regulator.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _DT_BINDINGS_REGULATOR_MEDIATEK_MT6397_H_ +#define _DT_BINDINGS_REGULATOR_MEDIATEK_MT6397_H_ + +/* + * Buck mode constants which may be used in devicetree properties (eg. + * regulator-initial-mode, regulator-allowed-modes). + * See the manufacturer's datasheet for more information on these modes. + */ + +#define MT6397_BUCK_MODE_AUTO 0 +#define MT6397_BUCK_MODE_FORCE_PWM 1 + +#endif -- 2.27.0.212.ge8ba1cc988-goog
[PATCH v2 3/4] regulator: mt6397: Implement of_map_mode
Implementing of_map_mode is necessary to be able to specify operating modes in the devicetree using 'regulator-allowed-modes', and to change regulator modes. Signed-off-by: Anand K Mistry --- Changes in v2: None drivers/regulator/mt6397-regulator.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/regulator/mt6397-regulator.c b/drivers/regulator/mt6397-regulator.c index d51e98ce1138..0a30df5e414f 100644 --- a/drivers/regulator/mt6397-regulator.c +++ b/drivers/regulator/mt6397-regulator.c @@ -53,6 +53,7 @@ struct mt6397_regulator_info { .vsel_mask = vosel_mask,\ .enable_reg = enreg,\ .enable_mask = BIT(0), \ + .of_map_mode = mt6397_map_mode, \ }, \ .qi = BIT(13), \ .vselon_reg = voselon, \ @@ -144,6 +145,18 @@ static const unsigned int ldo_volt_table7[] = { 130, 150, 180, 200, 250, 280, 300, 330, }; +static unsigned int mt6397_map_mode(unsigned int mode) +{ + switch (mode) { + case MT6397_BUCK_MODE_AUTO: + return REGULATOR_MODE_NORMAL; + case MT6397_BUCK_MODE_FORCE_PWM: + return REGULATOR_MODE_FAST; + default: + return REGULATOR_MODE_INVALID; + } +} + static int mt6397_regulator_set_mode(struct regulator_dev *rdev, unsigned int mode) { -- 2.27.0.212.ge8ba1cc988-goog
[PATCH v2 0/4] regulator: mt6397: Implement of_map_mode regulator_desc function
This patchset adds support for being able to change regulator modes for the mt6397 regulator. This is needed to allow the voltage scaling support in the MT8173 SoC to be used on the elm (Acer Chromebook R13) and hana (several Lenovo Chromebooks) devices. Without a of_map_mode implementation, the regulator-allowed-modes devicetree field is skipped, and attempting to change the regulator mode results in an error: [1.439165] vpca15: mode operation not allowed Changes in v2: - Introduce constants in dt-bindings - Improve conditional readability Anand K Mistry (4): regulator: mt6397: Move buck modes into header file dt-bindings: regulator: mt6397: Document valid modes regulator: mt6397: Implement of_map_mode arm64: dts: mediatek: Update allowed mt6397 regulator modes for elm boards .../bindings/regulator/mt6397-regulator.txt | 3 +++ arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi| 4 +++- drivers/regulator/mt6397-regulator.c| 17 ++--- .../regulator/mediatek,mt6397-regulator.h | 15 +++ 4 files changed, 35 insertions(+), 4 deletions(-) create mode 100644 include/dt-bindings/regulator/mediatek,mt6397-regulator.h -- 2.27.0.212.ge8ba1cc988-goog
[PATCH 4/4] arm64: dts: mediatek: Update allowed regulator modes for elm boards
This sets the allowed regulator modes for elm (and derivative) boards. According to the datasheet, the da9211 does not support SLEEP mode. Signed-off-by: Anand K Mistry --- arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi index a5a12b2599a4..d4c84a856e7b 100644 --- a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi @@ -6,6 +6,7 @@ #include #include #include +#include #include "mt8173.dtsi" / { @@ -294,7 +295,8 @@ da9211_vcpu_reg: BUCKA { regulator-max-microamp = <440>; regulator-ramp-delay = <1>; regulator-always-on; - regulator-allowed-modes = <0 1>; + regulator-allowed-modes = ; }; da9211_vgpu_reg: BUCKB { -- 2.27.0.212.ge8ba1cc988-goog
[PATCH 3/4] regulator: da9211: Implement of_map_mode
Implementing of_map_mode is necessary to be able to specify operating modes in the devicetree using 'regulator-allowed-modes', and to change regulator modes. Signed-off-by: Anand K Mistry --- drivers/regulator/da9211-regulator.c | 25 +++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/drivers/regulator/da9211-regulator.c b/drivers/regulator/da9211-regulator.c index 1f9b75b41346..297b3aa7c753 100644 --- a/drivers/regulator/da9211-regulator.c +++ b/drivers/regulator/da9211-regulator.c @@ -86,6 +86,20 @@ static const int da9215_current_limits[] = { 560, 580, 600, 620, 640, 660, 680, 700 }; +static unsigned int da9211_map_buck_mode(unsigned int mode) +{ + switch (mode) { + case DA9211_BUCK_MODE_SLEEP: + return REGULATOR_MODE_STANDBY; + case DA9211_BUCK_MODE_SYNC: + return REGULATOR_MODE_FAST; + case DA9211_BUCK_MODE_AUTO: + return REGULATOR_MODE_NORMAL; + default: + return REGULATOR_MODE_INVALID; + } +} + static unsigned int da9211_buck_get_mode(struct regulator_dev *rdev) { int id = rdev_get_id(rdev); @@ -233,6 +247,7 @@ static const struct regulator_ops da9211_buck_ops = { .vsel_reg = DA9211_REG_VBUCKA_A + DA9211_ID_##_id * 2,\ .vsel_mask = DA9211_VBUCK_MASK,\ .owner = THIS_MODULE,\ + .of_map_mode = da9211_map_buck_mode,\ } static struct regulator_desc da9211_regulators[] = { @@ -242,8 +257,14 @@ static struct regulator_desc da9211_regulators[] = { #ifdef CONFIG_OF static struct of_regulator_match da9211_matches[] = { - [DA9211_ID_BUCKA] = { .name = "BUCKA" }, - [DA9211_ID_BUCKB] = { .name = "BUCKB" }, + [DA9211_ID_BUCKA] = { + .name = "BUCKA", + .desc = _regulators[DA9211_ID_BUCKA], + }, + [DA9211_ID_BUCKB] = { + .name = "BUCKB", + .desc = _regulators[DA9211_ID_BUCKB], + }, }; static struct da9211_pdata *da9211_parse_regulators_dt( -- 2.27.0.212.ge8ba1cc988-goog
[PATCH 1/4] regulator: da9211: Move buck modes into header file
This will allow device trees to make use of these constants. Signed-off-by: Anand K Mistry --- drivers/regulator/da9211-regulator.c | 5 + .../dt-bindings/regulator/dlg,da9211-regulator.h | 16 2 files changed, 17 insertions(+), 4 deletions(-) create mode 100644 include/dt-bindings/regulator/dlg,da9211-regulator.h diff --git a/drivers/regulator/da9211-regulator.c b/drivers/regulator/da9211-regulator.c index 2ea4362ffa5c..1f9b75b41346 100644 --- a/drivers/regulator/da9211-regulator.c +++ b/drivers/regulator/da9211-regulator.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "da9211-regulator.h" /* DEVICE IDs */ @@ -24,10 +25,6 @@ #define DA9213_DEVICE_ID 0x23 #define DA9215_DEVICE_ID 0x24 -#define DA9211_BUCK_MODE_SLEEP 1 -#define DA9211_BUCK_MODE_SYNC 2 -#define DA9211_BUCK_MODE_AUTO 3 - /* DA9211 REGULATOR IDs */ #define DA9211_ID_BUCKA0 #define DA9211_ID_BUCKB1 diff --git a/include/dt-bindings/regulator/dlg,da9211-regulator.h b/include/dt-bindings/regulator/dlg,da9211-regulator.h new file mode 100644 index ..cdce2d54c8ba --- /dev/null +++ b/include/dt-bindings/regulator/dlg,da9211-regulator.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _DT_BINDINGS_REGULATOR_DLG_DA9211_H +#define _DT_BINDINGS_REGULATOR_DLG_DA9211_H + +/* + * These buck mode constants may be used to specify values in device tree + * properties (e.g. regulator-initial-mode, regulator-allowed-modes). + * A description of the following modes is in the manufacturers datasheet. + */ + +#define DA9211_BUCK_MODE_SLEEP 1 +#define DA9211_BUCK_MODE_SYNC 2 +#define DA9211_BUCK_MODE_AUTO 3 + +#endif -- 2.27.0.212.ge8ba1cc988-goog
[PATCH 0/4] regulator: da9211: support changing modes
This patchset adds support for being able to change regulator modes for the da9211 regulator. This is needed to allow the voltage scaling support in the MT8173 SoC to be used in the elm (Acer Chromebook R13) and hana (several Lenovo Chromebooks) devices. Anand K Mistry (4): regulator: da9211: Move buck modes into header file dt-bindings: regulator: da9211: Document allowed modes regulator: da9211: Implement of_map_mode arm64: dts: mediatek: Update allowed regulator modes for elm boards .../devicetree/bindings/regulator/da9211.txt | 4 +++ arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi | 4 ++- drivers/regulator/da9211-regulator.c | 30 +++ .../regulator/dlg,da9211-regulator.h | 16 ++ 4 files changed, 47 insertions(+), 7 deletions(-) create mode 100644 include/dt-bindings/regulator/dlg,da9211-regulator.h -- 2.27.0.212.ge8ba1cc988-goog
[PATCH 2/4] dt-bindings: regulator: da9211: Document allowed modes
This patch adds a description of how operating modes may be specified. Signed-off-by: Anand K Mistry --- Documentation/devicetree/bindings/regulator/da9211.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/regulator/da9211.txt b/Documentation/devicetree/bindings/regulator/da9211.txt index 27717e816e71..eb871447d508 100644 --- a/Documentation/devicetree/bindings/regulator/da9211.txt +++ b/Documentation/devicetree/bindings/regulator/da9211.txt @@ -15,6 +15,8 @@ Required properties: Optional properties: - enable-gpios: platform gpio for control of BUCKA/BUCKB. - Any optional property defined in regulator.txt + - regulator-initial-mode and regulator-allowed-modes may be specified using +mode values from dt-bindings/regulator/dlg,da9211-regulator.h Example 1) DA9211 pmic: da9211@68 { @@ -30,6 +32,8 @@ Example 1) DA9211 regulator-min-microamp = <200>; regulator-max-microamp = <500>; enable-gpios = < 27 0>; + regulator-allowed-modes = ; }; }; }; -- 2.27.0.212.ge8ba1cc988-goog
[PATCH] regulator: mt6397: Implement of_map_mode regulator_desc function
Without a of_map_mode implementation, the regulator-allowed-modes devicetree field is skipped, and attempting to change the regulator mode results in an error: [1.439165] vpca15: mode operation not allowed Signed-off-by: Anand K Mistry --- drivers/regulator/mt6397-regulator.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/regulator/mt6397-regulator.c b/drivers/regulator/mt6397-regulator.c index 269c2a6028e8..5c60b52addf1 100644 --- a/drivers/regulator/mt6397-regulator.c +++ b/drivers/regulator/mt6397-regulator.c @@ -55,6 +55,7 @@ struct mt6397_regulator_info { .vsel_mask = vosel_mask,\ .enable_reg = enreg,\ .enable_mask = BIT(0), \ + .of_map_mode = mt6397_map_mode, \ }, \ .qi = BIT(13), \ .vselon_reg = voselon, \ @@ -146,6 +147,12 @@ static const unsigned int ldo_volt_table7[] = { 130, 150, 180, 200, 250, 280, 300, 330, }; +static unsigned int mt6397_map_mode(unsigned int mode) +{ + return mode == MT6397_BUCK_MODE_AUTO ? + REGULATOR_MODE_NORMAL : REGULATOR_MODE_FAST; +} + static int mt6397_regulator_set_mode(struct regulator_dev *rdev, unsigned int mode) { -- 2.27.0.212.ge8ba1cc988-goog
Re: [PATCH v3] perf record: Use an eventfd to wakeup when done
On Sat, 23 May 2020 at 23:35, Andi Kleen wrote: > > Anand K Mistry writes: > > } > > > > + done_fd = eventfd(0, EFD_NONBLOCK); > > This will make perf depend on a recent glibc or other library > that implements eventfd. Wouldn't surprise me if some kind > of build time check is needed for this to pass all of Arnaldo's > built tests. Looks like Arnaldo made that change when merging: https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=perf/core=e9db221d37f91409040cf7f3fbed08b44e055ae9 This makes me curious. How old a kernel should modern tools support? >From the man page, eventfd was added in 2.6.22 (and eventfd2 in 2.6.27), which was 2007 (or 2008 for eventfd2) and glibc-2.8 which was 2008. I understand the kernel's policy of never breaking userspace, but what about userspace tools? > > > -Andi -- Anand K. Mistry Software Engineer Google Australia
Re: [PATCH] perf record: Use an eventfd to wakeup when done
On Wed, 13 May 2020 at 00:12, Arnaldo Carvalho de Melo wrote: > > Em Tue, May 12, 2020 at 02:12:32PM +0200, Jiri Olsa escreveu: > > On Tue, May 12, 2020 at 02:59:36PM +1000, Anand K Mistry wrote: > > > > SNIP > > > > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c > > > index 1ab349abe90469..099ecaa66732a2 100644 > > > --- a/tools/perf/builtin-record.c > > > +++ b/tools/perf/builtin-record.c > > > @@ -53,6 +53,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > #include > > > #include > > > #include > > > @@ -518,15 +519,28 @@ static int record__pushfn(struct mmap *map, void > > > *to, void *bf, size_t size) > > > > > > static volatile int signr = -1; > > > static volatile int child_finished; > > > +static int done_fd = -1; > > > > > > static void sig_handler(int sig) > > > { > > > + u64 tmp = 1; > > > if (sig == SIGCHLD) > > > child_finished = 1; > > > else > > > signr = sig; > > > > > > done = 1; > > > + > > > + /* > > > +* It is possible for this signal handler to run after done is checked > > > +* in the main loop, but before the perf counter fds are polled. If > > > this > > > +* happens, the poll() will continue to wait even though done is set, > > > +* and will only break out if either another signal is received, or > > > the > > > +* counters are ready for read. To ensure the poll() doesn't sleep > > > when > > > +* done is set, use an eventfd (done_fd) to wake up the poll(). > > > +*/ > > > + if (write(done_fd, , sizeof(tmp)) < 0) > > > + pr_err("failed to signal wakeup fd\n"); > > > } > > > > > > static void sigsegv_handler(int sig) > > > @@ -1424,6 +1438,17 @@ static int __cmd_record(struct record *rec, int > > > argc, const char **argv) > > > int fd; > > > float ratio = 0; > > > > > > + done_fd = eventfd(0, EFD_NONBLOCK); > > > + if (done_fd < 0) { > > > + pr_err("Failed to create wakeup eventfd, error: %m\n"); > > > + return -1; > > > + } > > > + err = evlist__add_pollfd(rec->evlist, done_fd); > > > + if (err < 0) { > > > + pr_err("Failed to add wakeup eventfd to poll list\n"); > > > + return -1; > > > + } > > > > sorry I did not notice before, but I think we also > > need to close done_fd descriptor on the exit path > > > > also please change subject to PATCHv3 for the next version Apologies. I'm still getting the hang of this. > > Yeah, and, and don't take this as a requirement for this patch to be > processed, this can be made as a follow up patch by you or someone else > (me, maybe :)), that maybe tools/perf/builtin-top.c and > tools/perf/builtin-trace.c have the same issue? > > Could you please take a look there as well? I looked at 'top', 'trace', and 'kvm'. kvm doesn't really have this issue because the poll() has a 100ms timeout. Even though it's technically affected, the timeout will make it unnoticeable (just delaying the exit for 100ms). top is in the same boat (uses a timeout). trace is the affected one because it has the following code: int timeout = done ? 100 : -1; if (!draining && evlist__poll(evlist, timeout) > 0) { Different logic, but still a gap and an indefinite timeout. > > - Arnaldo > > > thanks, > > jirka > > > > > + > > > atexit(record__sig_exit); > > > signal(SIGCHLD, sig_handler); > > > signal(SIGINT, sig_handler); > > > -- > > > 2.26.2.645.ge9eca65c58-goog > > > > > > > -- > > - Arnaldo -- Anand K. Mistry Software Engineer Google Australia
[PATCH v3] perf record: Use an eventfd to wakeup when done
The setting and checking of 'done' contains a rare race where the signal handler setting 'done' is run after checking to break the loop, but before waiting in evlist__poll(). In this case, the main loop won't wake up until either another signal is sent, or the perf data fd causes a wake up. The following simple script can trigger this condition (but you might need to run it for several hours): for ((i = 0; i >= 0; i++)) ; do echo "Loop $i" delay=$(echo "scale=4; 0.1 * $RANDOM/32768" | bc) ./perf record -- sleep 3000 >/dev/null& pid=$! sleep $delay kill -TERM $pid echo "PID $pid" wait $pid done At some point, the loop will stall. Adding logging, even though perf has received the SIGTERM and set 'done = 1', perf will remain sleeping until a second signal is sent. Signed-off-by: Anand K Mistry --- Changes in v3: - Move done_fd creation to below session initialisation - Close done_fd on exit - Log errno when write(done_fd) fails Changes in v2: - Added comment to signal handler explaining why the eventfd is added - Added error handling when creating done_fd tools/perf/builtin-record.c | 29 + 1 file changed, 29 insertions(+) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 1ab349abe90469..a1af6857f24748 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -518,15 +519,28 @@ static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size) static volatile int signr = -1; static volatile int child_finished; +static int done_fd = -1; static void sig_handler(int sig) { + u64 tmp = 1; if (sig == SIGCHLD) child_finished = 1; else signr = sig; done = 1; + + /* +* It is possible for this signal handler to run after done is checked +* in the main loop, but before the perf counter fds are polled. If this +* happens, the poll() will continue to wait even though done is set, +* and will only break out if either another signal is received, or the +* counters are ready for read. To ensure the poll() doesn't sleep when +* done is set, use an eventfd (done_fd) to wake up the poll(). +*/ + if (write(done_fd, , sizeof(tmp)) < 0) + pr_err("failed to signal wakeup fd, error: %m\n"); } static void sigsegv_handler(int sig) @@ -1466,6 +1480,19 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) return -1; } + done_fd = eventfd(0, EFD_NONBLOCK); + if (done_fd < 0) { + pr_err("Failed to create wakeup eventfd, error: %m\n"); + status = -1; + goto out_delete_session; + } + err = evlist__add_pollfd(rec->evlist, done_fd); + if (err < 0) { + pr_err("Failed to add wakeup eventfd to poll list\n"); + status = err; + goto out_delete_session; + } + session->header.env.comp_type = PERF_COMP_ZSTD; session->header.env.comp_level = rec->opts.comp_level; @@ -1827,6 +1854,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) } out_delete_session: + if (done_fd >= 0) + close(done_fd); zstd_fini(>zstd_data); perf_session__delete(session); -- 2.26.2.645.ge9eca65c58-goog
[PATCH] perf record: Use an eventfd to wakeup when done
The setting and checking of 'done' contains a rare race where the signal handler setting 'done' is run after checking to break the loop, but before waiting in evlist__poll(). In this case, the main loop won't wake up until either another signal is sent, or the perf data fd causes a wake up. The following simple script can trigger this condition (but you might need to run it for several hours): for ((i = 0; i >= 0; i++)) ; do echo "Loop $i" delay=$(echo "scale=4; 0.1 * $RANDOM/32768" | bc) ./perf record -- sleep 3000 >/dev/null& pid=$! sleep $delay kill -TERM $pid echo "PID $pid" wait $pid done At some point, the loop will stall. Adding logging, even though perf has received the SIGTERM and set 'done = 1', perf will remain sleeping until a second signal is sent. Signed-off-by: Anand K Mistry --- tools/perf/builtin-record.c | 25 + 1 file changed, 25 insertions(+) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 1ab349abe90469..099ecaa66732a2 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -518,15 +519,28 @@ static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size) static volatile int signr = -1; static volatile int child_finished; +static int done_fd = -1; static void sig_handler(int sig) { + u64 tmp = 1; if (sig == SIGCHLD) child_finished = 1; else signr = sig; done = 1; + + /* +* It is possible for this signal handler to run after done is checked +* in the main loop, but before the perf counter fds are polled. If this +* happens, the poll() will continue to wait even though done is set, +* and will only break out if either another signal is received, or the +* counters are ready for read. To ensure the poll() doesn't sleep when +* done is set, use an eventfd (done_fd) to wake up the poll(). +*/ + if (write(done_fd, , sizeof(tmp)) < 0) + pr_err("failed to signal wakeup fd\n"); } static void sigsegv_handler(int sig) @@ -1424,6 +1438,17 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) int fd; float ratio = 0; + done_fd = eventfd(0, EFD_NONBLOCK); + if (done_fd < 0) { + pr_err("Failed to create wakeup eventfd, error: %m\n"); + return -1; + } + err = evlist__add_pollfd(rec->evlist, done_fd); + if (err < 0) { + pr_err("Failed to add wakeup eventfd to poll list\n"); + return -1; + } + atexit(record__sig_exit); signal(SIGCHLD, sig_handler); signal(SIGINT, sig_handler); -- 2.26.2.645.ge9eca65c58-goog
[PATCH] perf record: Use an eventfd to wakeup when done
The setting and checking of 'done' contains a rare race where the signal handler setting 'done' is run after checking to break the loop, but before waiting in evlist__poll(). In this case, the main loop won't wake up until either another signal is sent, or the perf data fd causes a wake up. The following simple script can trigger this condition (but you might need to run it for several hours): for ((i = 0; i >= 0; i++)) ; do echo "Loop $i" delay=$(echo "scale=4; 0.1 * $RANDOM/32768" | bc) ./perf record -- sleep 3000 >/dev/null& pid=$! sleep $delay kill -TERM $pid echo "PID $pid" wait $pid done At some point, the loop will stall. Adding logging, even though perf has received the SIGTERM and set 'done = 1', perf will remain sleeping until a second signal is sent. Signed-off-by: Anand K Mistry --- tools/perf/builtin-record.c | 8 1 file changed, 8 insertions(+) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 1ab349abe90469..ce5fc3860131d2 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -518,15 +519,19 @@ static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size) static volatile int signr = -1; static volatile int child_finished; +static int done_fd = -1; static void sig_handler(int sig) { + u64 tmp = 1; if (sig == SIGCHLD) child_finished = 1; else signr = sig; done = 1; + if (write(done_fd, , sizeof(tmp)) < 0) + pr_err("failed to signal wakeup fd\n"); } static void sigsegv_handler(int sig) @@ -1424,6 +1429,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) int fd; float ratio = 0; + done_fd = eventfd(0, EFD_NONBLOCK); + evlist__add_pollfd(rec->evlist, done_fd); + atexit(record__sig_exit); signal(SIGCHLD, sig_handler); signal(SIGINT, sig_handler); -- 2.26.2.645.ge9eca65c58-goog