[tip: x86/cleanups] x86/Kconfig: Remove HPET_EMULATE_RTC depends on RTC

2021-02-05 Thread tip-bot2 for Anand K Mistry
The following commit has been merged into the x86/cleanups branch of tip:

Commit-ID: 3228e1dc80983ee1f5d2e533d010b3bd8b50f0e2
Gitweb:
https://git.kernel.org/tip/3228e1dc80983ee1f5d2e533d010b3bd8b50f0e2
Author:Anand K Mistry 
AuthorDate:Thu, 04 Feb 2021 18:32:32 +11:00
Committer: Thomas Gleixner 
CommitterDate: Fri, 05 Feb 2021 19:56:35 +01:00

x86/Kconfig: Remove HPET_EMULATE_RTC depends on RTC

The RTC config option was removed in commit f52ef24be21a ("rtc/alpha:
remove legacy rtc driver")

Signed-off-by: Anand K Mistry 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Randy Dunlap 
Link: 
https://lore.kernel.org/r/20210204183205.1.If5c6ded53a00ecad6a02a1e974316291cc0239d1@changeid
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7b6dd10..865c1e7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -889,7 +889,7 @@ config HPET_TIMER
 
 config HPET_EMULATE_RTC
def_bool y
-   depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || 
RTC_DRV_CMOS=y)
+   depends on HPET_TIMER && (RTC_DRV_CMOS=m || RTC_DRV_CMOS=y)
 
 config APB_TIMER
def_bool y if X86_INTEL_MID


Re: [PATCH] x86: Add a prompt for HPET_EMULATE_RTC

2021-02-04 Thread Anand K. Mistry
> >
> > Hi,
> >
> > On plain vanilla 5.11-rc6, when I take this (partial) .config file
> > which contains:
> >
> > CONFIG_HPET=y
> > # CONFIG_HPET_EMULATE_RTC is not set
> > # CONFIG_HPET_MMAP is not set
> >
> > and run
> > $ make olddefconfig
> >
> > my new .config file contains
> >
> > CONFIG_HPET_TIMER=y
> > CONFIG_HPET_EMULATE_RTC=y
> > CONFIG_HPET=y
> > # CONFIG_HPET_MMAP is not set
> >
> >
> > Isn't that what you are expecting?
> > or are you trying to keep the opposite?
> >
> > # CONFIG_HPET_EMULATE_RTC is not set
> >
> > Hm, I bet that I misunderstood you and you want to keep
> > it disabled. Is that right?

Yes, I want to keep it disabled. My understanding (from the behaviour
I've seen with other settings, as well as reading the sources) is that
"is not set" is treated the same as "n".

In fact, if you change the old config to CONFIG_HPET_EMULATE_RTC=n,
you get the same before/after behaviour of this patch. Before,
olddefconfig will ignore the old setting and change it to "y". With
this patch, olddefconfig keeps the old setting, but re-writes it to
"is not set". This is even more surprising because the user is
explicit in what the old config setting is, but olddefconfig still
ignores it.

>
> If that's the case, then I agree that your original patch to
> make HPET_EMULATE_RTC user-visible is needed.
>
> Sorry to be so slow about understanding your goal (if I do
> understand it now).
>
> --
> ~Randy
>


-- 
Anand K. Mistry
Software Engineer
Google Australia


[PATCH] x86: Remove HPET_EMULATE_RTC depends on RTC

2021-02-03 Thread Anand K Mistry
The RTC config option was removed in commit f52ef24be21a ("rtc/alpha:
remove legacy rtc driver")

Signed-off-by: Anand K Mistry 
---

 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff0..37775478d86f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -890,7 +890,7 @@ config HPET_TIMER
 
 config HPET_EMULATE_RTC
def_bool y
-   depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || 
RTC_DRV_CMOS=y)
+   depends on HPET_TIMER && (RTC_DRV_CMOS=m || RTC_DRV_CMOS=y)
 
 config APB_TIMER
def_bool y if X86_INTEL_MID
-- 
2.30.0.365.g02bc693789-goog



Re: [PATCH] x86: Add a prompt for HPET_EMULATE_RTC

2021-02-03 Thread Anand K. Mistry
On Thu, 4 Feb 2021 at 17:30, Randy Dunlap  wrote:
>
> On 2/3/21 10:13 PM, Anand K. Mistry wrote:
> >> Hi,
> >>
> >> When you run "make olddefconfig", should this "depends on"
> >> line evaluate to true or false?
> >
> > True. RTC_DRV_CMOS=y by default on x86 systems and HPET_TIMER also
> > appears to default yes (on x86-64 if I'm reading this right).
> >
> > Oddly, the RTC config option doesn't appear to exist. Probably a separate 
> > issue.
>
> Yes, just a separate simple patch.
>
> >> I.e., what are the settings of these symbols in the old .config file?
> >>
> >>
> >> depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || 
> >> RTC_DRV_CMOS=y)
> >
> > Actually, none of these options are set in the old config.
> > RTC_DRV_CMOS and HPET_TIMER appear to default to yes.
> >
> >  % grep HPET .config
> > CONFIG_HPET=y
> > # CONFIG_HPET_EMULATE_RTC is not set
> > # CONFIG_HPET_MMAP is not set
> >  % grep RTC .config
> > CONFIG_RTC_CLASS=y
> > # CONFIG_HPET_EMULATE_RTC is not set
> > CONFIG_PM_TRACE_RTC=y
> > # CONFIG_RTC_HCTOSYS is not set
> >
> >>
> >> If not, is there some out-of-tree driver involved?
> >
> > No out of tree drivers involved. I tried this on a vanilla 5.11-rc6.
> >
> >> I'm having a little trouble seeing why this is needed.
> >
> > So am I. But this is the magic that lets me keep
> > CONFIG_HPET_EMULATE_RTC from the old config. I did manage to trace
> > where the option is being overridden in the conf tool, but the
> > reasoning why is beyond my knowledge.
>
> Can you post the .config that you are feeding to 'make olddefconfig', please.
> I'll take a look (or the x86 guys can do so).

Sure. Here it is:

#
# Config options generated by splitconfig
#
CONFIG_ANDROID=y
CONFIG_ANDROID_BINDER_IPC=y
CONFIG_ASHMEM=y
CONFIG_AUDIT=y
CONFIG_BFQ_GROUP_IOSCHED=y
CONFIG_BINFMT_MISC=y
CONFIG_BLK_CGROUP=y
# CONFIG_BLK_DEBUG_FS is not set
CONFIG_BLK_DEV_DM=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_SD=y
CONFIG_BLK_DEV_SR=m
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_SYSCALL=y
CONFIG_BRIDGE=m
CONFIG_BT=m
CONFIG_BT_FEATURE_DEBUG=y
CONFIG_BT_FEATURE_DEBUG_FUNC_NAMES=y
CONFIG_BT_HCIBFUSB=m
CONFIG_BT_HCIBTUSB=m
CONFIG_BT_HCIBTUSB_AUTOSUSPEND=y
CONFIG_BT_HCIBTUSB_INTERVAL=y
CONFIG_BT_HCIVHCI=m
CONFIG_BT_HIDP=m
CONFIG_BT_RFCOMM=m
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_CFG80211=m
CONFIG_CFG80211_CERTIFICATION_ONUS=y
CONFIG_CFG80211_DEBUGFS=y
# CONFIG_CFG80211_REQUIRE_SIGNED_REGDB is not set
CONFIG_CFG80211_WEXT=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_BPF=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_SCHED=y
CONFIG_CHROME_PLATFORMS=y
CONFIG_CLS_U32_MARK=y
# CONFIG_COMPAT_BRK is not set
CONFIG_CONFIGFS_FS=y
CONFIG_CONNECTOR=y
CONFIG_CPUSETS=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_TEO=y
CONFIG_CRC7=m
CONFIG_CROS_EC=y
CONFIG_CROS_EC_PD_UPDATE=y
CONFIG_CROS_EC_SENSORHUB=m
CONFIG_CROS_EC_SPI=y
CONFIG_CRYPTO_ARC4=y
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_USER_API_HASH=m
CONFIG_CRYPTO_USER_API_SKCIPHER=m
CONFIG_DAX=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_CREDENTIALS=y
CONFIG_DEBUG_DEVRES=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_DWARF4=y
# CONFIG_DEBUG_PREEMPT is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEFAULT_HOSTNAME="localhost"
CONFIG_DEFAULT_MMAP_MIN_ADDR=32768
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_DM_CRYPT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_INIT=y
CONFIG_DM_INTEGRITY=m
CONFIG_DM_THIN_PROVISIONING=y
CONFIG_DM_VERITY=y
CONFIG_DM_VERITY_CHROMEOS=y
# CONFIG_DNOTIFY is not set
CONFIG_DRM=y
CONFIG_DRM_DP_AUX_CHARDEV=y
CONFIG_DRM_EVDI=m
# CONFIG_DRM_FBDEV_EMULATION is not set
CONFIG_DRM_UDL=y
CONFIG_DRM_VGEM=y
CONFIG_ECRYPT_FS=y
CONFIG_EMBEDDED=y
CONFIG_ENCRYPTED_KEYS=y
CONFIG_ERROR_ON_WARNING=y
CONFIG_ESD_FS=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
CONFIG_FANOTIFY=y
CONFIG_FB=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FORTIFY_SOURCE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FS_ENCRYPTION=y
CONFIG_FS_VERITY=y
CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUSE_FS=m
CONFIG_GOOGLE_COREBOOT_TABLE=y
CONFIG_GOOGLE_FIRMWARE=y
CONFIG_GOOGLE_MEMCONSOLE_COREBOOT=y
CONFIG_GOOGLE_VPD=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_HFSPLUS_FS=m
CONFIG_HIDRAW=y
CONFIG_HID_APPLE=m
CONFIG_HID_BATTERY_STRENGTH=y
CONFIG_HID_CHERRY=m
CONFIG_HID_CHICONY=y
CONFIG_HID_HOLTEK=y
CONFIG_HID_KENSINGTON=y
CONFIG_HID_LOGITECH=m
CONFIG_HID_LOGITECH

Re: [PATCH] x86: Add a prompt for HPET_EMULATE_RTC

2021-02-03 Thread Anand K. Mistry
> Hi,
>
> When you run "make olddefconfig", should this "depends on"
> line evaluate to true or false?

True. RTC_DRV_CMOS=y by default on x86 systems and HPET_TIMER also
appears to default yes (on x86-64 if I'm reading this right).

Oddly, the RTC config option doesn't appear to exist. Probably a separate issue.

> I.e., what are the settings of these symbols in the old .config file?
>
>
> depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || 
> RTC_DRV_CMOS=y)

Actually, none of these options are set in the old config.
RTC_DRV_CMOS and HPET_TIMER appear to default to yes.

 % grep HPET .config
CONFIG_HPET=y
# CONFIG_HPET_EMULATE_RTC is not set
# CONFIG_HPET_MMAP is not set
 % grep RTC .config
CONFIG_RTC_CLASS=y
# CONFIG_HPET_EMULATE_RTC is not set
CONFIG_PM_TRACE_RTC=y
# CONFIG_RTC_HCTOSYS is not set

>
> If not, is there some out-of-tree driver involved?

No out of tree drivers involved. I tried this on a vanilla 5.11-rc6.

> I'm having a little trouble seeing why this is needed.

So am I. But this is the magic that lets me keep
CONFIG_HPET_EMULATE_RTC from the old config. I did manage to trace
where the option is being overridden in the conf tool, but the
reasoning why is beyond my knowledge.

-- 
Anand K. Mistry
Software Engineer
Google Australia


[PATCH] x86: Add a prompt for HPET_EMULATE_RTC

2021-02-03 Thread Anand K Mistry
This does two things:
1. Makes the option visible in menuconfig, allowing the user to easily
   disable this option
2. Allows olddefconfig to respoct the option if it is set in the old
   .config file

It's not clear exactly why the second consequence is true, but it
appears to be because when the conf tool reads the config file, it only
respects the existing setting if the option is "visible" (see
scripts/kconfig/symbol.c:381).

Signed-off-by: Anand K Mistry 
---

 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff0..28f814493c7b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -890,6 +890,7 @@ config HPET_TIMER
 
 config HPET_EMULATE_RTC
def_bool y
+   prompt "HPET RTC emulation"
depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || 
RTC_DRV_CMOS=y)
 
 config APB_TIMER
-- 
2.30.0.365.g02bc693789-goog



[PATCH] ath10k: Fix lockdep assertion warning in ath10k_sta_statistics

2021-02-01 Thread Anand K Mistry
ath10k_debug_fw_stats_request just be called with conf_mutex held,
otherwise the following warning is seen when lock debugging is enabled:

WARNING: CPU: 0 PID: 793 at drivers/net/wireless/ath/ath10k/debug.c:357 
ath10k_debug_fw_stats_request+0x12c/0x133 [ath10k_core]
Modules linked in: snd_hda_codec_hdmi designware_i2s snd_hda_intel 
snd_intel_dspcfg snd_hda_codec i2c_piix4 snd_hwdep snd_hda_core acpi_als 
kfifo_buf industrialio snd_soc_max98357a snd_soc_adau7002 
snd_soc_acp_da7219mx98357_mach snd_soc_da7219 acp_audio_dma ccm xt_MASQUERADE 
fuse ath10k_pci ath10k_core lzo_rle ath lzo_compress mac80211 zram cfg80211 
r8152 mii joydev
CPU: 0 PID: 793 Comm: wpa_supplicant Tainted: GW 5.10.9 #5
Hardware name: HP Grunt/Grunt, BIOS Google_Grunt.11031.104.0 09/05/2019
RIP: 0010:ath10k_debug_fw_stats_request+0x12c/0x133 [ath10k_core]
Code: 1e bb a1 ff ff ff 4c 89 ef 48 c7 c6 d3 31 2e c0 89 da 31 c0 e8 bd f8 ff 
ff 89 d8 eb 02 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b e9 04 ff ff ff 0f 
1f 44 00 00 55 48 89 e5 41 56 53 48 89 fb
RSP: 0018:b2478099f7d0 EFLAGS: 00010246
RAX:  RBX: 9e432700cce0 RCX: 11c85cfd6b8e3b00
RDX: 9e432700cce0 RSI: 9e43127c5668 RDI: 9e4318deddf0
RBP: b2478099f7f8 R08: 0002 R09: 0003fd7068cc
R10: c01b2749 R11: c029efaf R12: 9e432700c000
R13: 9e43127c33e0 R14: b2478099f918 R15: 9e43127c33e0
FS:  7f7ea48e2740() GS:9e432aa0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 59aa799ddf38 CR3: 000118de2000 CR4: 001506f0
Call Trace:
 ath10k_sta_statistics+0x4d/0x270 [ath10k_core]
 sta_set_sinfo+0x1be/0xaec [mac80211]
 ieee80211_get_station+0x58/0x76 [mac80211]
 rdev_get_station+0xf1/0x11e [cfg80211]
 nl80211_get_station+0x7f/0x146 [cfg80211]
 genl_rcv_msg+0x32e/0x35e
 ? nl80211_stop_ap+0x19/0x19 [cfg80211]
 ? nl80211_get_station+0x146/0x146 [cfg80211]
 ? genl_rcv+0x19/0x36
 ? genl_rcv+0x36/0x36
 netlink_rcv_skb+0x89/0xfb
 genl_rcv+0x28/0x36
 netlink_unicast+0x169/0x23b
 netlink_sendmsg+0x38a/0x402
 sock_sendmsg+0x72/0x76
 sys_sendmsg+0x153/0x1cc
 ? copy_msghdr_from_user+0x5d/0x85
 ___sys_sendmsg+0x7c/0xb5
 ? lock_acquire+0x181/0x23d
 ? syscall_trace_enter+0x15e/0x160
 ? find_held_lock+0x3d/0xb2
 ? syscall_trace_enter+0x15e/0x160
 ? sched_clock_cpu+0x15/0xc6
 __sys_sendmsg+0x62/0x9a
 do_syscall_64+0x43/0x55
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 4913e675630e ("ath10k: enable rx duration report default for wmi tlv")

Signed-off-by: Anand K Mistry 
---

 drivers/net/wireless/ath/ath10k/mac.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 7d98250380ec..e815aab412d7 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -9117,7 +9117,9 @@ static void ath10k_sta_statistics(struct ieee80211_hw *hw,
if (!ath10k_peer_stats_enabled(ar))
return;
 
+   mutex_lock(>conf_mutex);
ath10k_debug_fw_stats_request(ar);
+   mutex_unlock(>conf_mutex);
 
sinfo->rx_duration = arsta->rx_duration;
sinfo->filled |= BIT_ULL(NL80211_STA_INFO_RX_DURATION);
-- 
2.30.0.365.g02bc693789-goog



[PATCH] ath10k: Fix suspicious RCU usage warning in ath10k_wmi_tlv_parse_peer_stats_info()

2021-02-01 Thread Anand K Mistry
The ieee80211_find_sta_by_ifaddr call in
ath10k_wmi_tlv_parse_peer_stats_info must be called while holding the
RCU read lock. Otherwise, the following warning will be seen when RCU
usage checking is enabled:

=
WARNING: suspicious RCU usage
5.10.3 #8 Tainted: GW
-
include/linux/rhashtable.h:594 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
no locks held by ksoftirqd/1/16.

stack backtrace:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: GW 5.10.3 #8
Hardware name: HP Grunt/Grunt, BIOS Google_Grunt.11031.104.0 09/05/2019
Call Trace:
 dump_stack+0xab/0x115
 sta_info_hash_lookup+0x71/0x1e9 [mac80211]
 ? lock_is_held_type+0xe6/0x12f
 ? __kasan_kmalloc+0xfb/0x112
 ieee80211_find_sta_by_ifaddr+0x12/0x61 [mac80211]
 ath10k_wmi_tlv_parse_peer_stats_info+0xbd/0x10b [ath10k_core]
 ath10k_wmi_tlv_iter+0x8b/0x1a1 [ath10k_core]
 ? ath10k_wmi_tlv_iter+0x1a1/0x1a1 [ath10k_core]
 ath10k_wmi_tlv_event_peer_stats_info+0x103/0x13b [ath10k_core]
 ath10k_wmi_tlv_op_rx+0x722/0x80d [ath10k_core]
 ath10k_htc_rx_completion_handler+0x16e/0x1d7 [ath10k_core]
 ath10k_pci_process_rx_cb+0x116/0x22c [ath10k_pci]
 ? ath10k_htc_process_trailer+0x332/0x332 [ath10k_core]
 ? _raw_spin_unlock_irqrestore+0x34/0x61
 ? lockdep_hardirqs_on+0x8e/0x12e
 ath10k_ce_per_engine_service+0x55/0x74 [ath10k_core]
 ath10k_ce_per_engine_service_any+0x76/0x84 [ath10k_core]
 ath10k_pci_napi_poll+0x49/0x141 [ath10k_pci]
 net_rx_action+0x11a/0x347
 __do_softirq+0x2d3/0x539
 run_ksoftirqd+0x4b/0x86
 smpboot_thread_fn+0x1d0/0x2ab
 ? cpu_report_death+0x7f/0x7f
 kthread+0x189/0x191
 ? cpu_report_death+0x7f/0x7f
 ? kthread_blkcg+0x31/0x31
 ret_from_fork+0x22/0x30

Fixes: 0f7cb26830a6e ("ath10k: add rx bitrate report for SDIO")

Signed-off-by: Anand K Mistry 
---

 drivers/net/wireless/ath/ath10k/wmi-tlv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c 
b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
index 7b5834157fe5..e6135795719a 100644
--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c
+++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
@@ -240,8 +240,10 @@ static int ath10k_wmi_tlv_parse_peer_stats_info(struct 
ath10k *ar, u16 tag, u16
   __le32_to_cpu(stat->last_tx_rate_code),
   __le32_to_cpu(stat->last_tx_bitrate_kbps));
 
+   rcu_read_lock();
sta = ieee80211_find_sta_by_ifaddr(ar->hw, stat->peer_macaddr.addr, 
NULL);
if (!sta) {
+   rcu_read_unlock();
ath10k_warn(ar, "not found station for peer stats\n");
return -EINVAL;
}
@@ -251,6 +253,7 @@ static int ath10k_wmi_tlv_parse_peer_stats_info(struct 
ath10k *ar, u16 tag, u16
arsta->rx_bitrate_kbps = __le32_to_cpu(stat->last_rx_bitrate_kbps);
arsta->tx_rate_code = __le32_to_cpu(stat->last_tx_rate_code);
arsta->tx_bitrate_kbps = __le32_to_cpu(stat->last_tx_bitrate_kbps);
+   rcu_read_unlock();
 
return 0;
 }
-- 
2.30.0.365.g02bc693789-goog



Re: [RFC PATCH] x86/speculation: Add finer control for when to issue IBPB

2021-01-20 Thread Anand K. Mistry
> > This proposal attempts to reduce that cost by letting the system
> > developer choose whether to issue the IBPB on entry or exit of an IB
> > speculation disabled process (default is both, which is current
> > behaviour). Documentation/admin-guide/hw-vuln/spectre.rst documents two
> > mitigation strategies that use conditional IBPB;
> > "Protect sensitive programs", and "Sandbox untrusted programs".
>
> Why make the setting system-wide?  Shouldn't this decision be made on a
> per-task basis, depending on whether the task is sensitive or untrusted?

It definitely could be. I didn't give it as much thought since for me,
the entire system uses a "sandbox" approach, so the behaviour would
apply to any IB spec disabled process. And conversely, any system
taking the "sensitive programs" approach would also expect the same
behaviour from all processes.

I'm open to making it per-process. It's just that making it
system-wide seemed to "fit" with the documented mitigation strategies,
and it's what I would use in production.


Re: [RFC PATCH] x86/speculation: Add finer control for when to issue IBPB

2021-01-20 Thread Anand K. Mistry
> >
> > Signed-off-by: Anand K Mistry 
> > Signed-off-by: Anand K Mistry 
>
> Two SoBs by you, why?

Tooling issues probably. Not intentional.

>
> > ---
> > Background:
> > IBPB is slow on some CPUs.
> >
> > More detailed background:
> > On some CPUs, issuing an IBPB can cause the address space switch to be
> > 10x more expensive (yes, 10x, not 10%).
>
> Which CPUs are those?!

AMD A4-9120C. Probably the A6-9220C too, but I don't have one of those
machines to test with,

>
> > On a system that makes heavy use of processes, this can cause a very
> > significant performance hit.
>
> You're not really trying to convince reviewers for why you need to add
> more complexity to an already too complex and confusing code. "some
> CPUs" and "can cause" is not good enough.

On a simple ping-ping test between two processes (using a pair of
pipes), a process switch is ~7us with IBPB disabled. But with it
enabled, it increases to around 80us (tested with the powersave CPU
governor).

On Chrome's IPC system, a perftest running 50,000 ping-pong messages:
without IBPB5579.49 ms
with IBPB21396 ms
(~4x difference)

And, doing video playback in the browser (which is already very
optimised), the IBPB hit turns out to be ~2.5% of CPU cycles. Doing a
webrtc video call (tested using http://appr.tc), it's ~9% of CPU
cycles. I don't have exact numbers, but it's worse on some real VC
apps.

>
> > I understand this is likely to be very contentious. Obviously, this
> > isn't ready for code review, but I'm hoping to get some thoughts on the
> > problem and this approach.
>
> Yes, in the absence of hard performance data, I'm not convinced at all.

With this change, I can get a >80% reduction in CPU cycles consumed by
IBPB. A  video call on my test device goes from ~9% to ~0.80% cycles
used by IBPB. It doesn't sound like much, but it's a significant
difference on these devices.


[RFC PATCH] x86/speculation: Add finer control for when to issue IBPB

2021-01-13 Thread Anand K Mistry
When IB speculation is conditionally disabled for a process (via prctl()
or seccomp), IBPB is issued whenever that process is switched to/from.
However, this results more IBPBs than necessary. The goal is to protect
a victim process from an attacker poisoning the BTB by issuing IBPB in
the attacker->victim switch. However, the current logic will also issue
IBPB in the victim->attacker switch, because there's no notion of
whether the attacker or victim has IB speculation disabled.

Instead of always issuing IBPB when either the previous or next process
has IB speculation disabled, add a boot flag to explicitly choose
to issue IBPB when the IB spec disabled process is entered or left.

Signed-off-by: Anand K Mistry 
Signed-off-by: Anand K Mistry 
---
Background:
IBPB is slow on some CPUs.

More detailed background:
On some CPUs, issuing an IBPB can cause the address space switch to be
10x more expensive (yes, 10x, not 10%). On a system that makes heavy use
of processes, this can cause a very significant performance hit.
Although we can choose which processes will pay the IBPB
cost by using prctl(), the performance hit is often still too high
because IBPB is being issued more often than necessary.

This proposal attempts to reduce that cost by letting the system
developer choose whether to issue the IBPB on entry or exit of an IB
speculation disabled process (default is both, which is current
behaviour). Documentation/admin-guide/hw-vuln/spectre.rst documents two
mitigation strategies that use conditional IBPB;
"Protect sensitive programs", and "Sandbox untrusted programs".

In the first case of protecting sensitive programs, the victim process
has IB spec disabled. So the attacker->victim switch is an _entry_ of
an IB spec disabled process. Conversly, the second case of sandboxing
and untrusted process, the attacker has IB spec disabled and so we want
to issue of IBPB on _exit_ of the IB spec disabled process.

I understand this is likely to be very contentious. Obviously, this
isn't ready for code review, but I'm hoping to get some thoughts on the
problem and this approach.

 arch/x86/include/asm/nospec-branch.h |  3 ++
 arch/x86/kernel/cpu/bugs.c   | 42 
 arch/x86/mm/tlb.c| 11 ++--
 3 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index cb9ad6b73973..b153af75 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -250,6 +250,9 @@ DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp);
 DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
 DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
+DECLARE_STATIC_KEY_TRUE(switch_mm_cond_ibpb_enter);
+DECLARE_STATIC_KEY_TRUE(switch_mm_cond_ibpb_leave);
+
 DECLARE_STATIC_KEY_FALSE(mds_user_clear);
 DECLARE_STATIC_KEY_FALSE(mds_idle_clear);
 
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d41b70fe4918..a87200db7786 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -69,6 +69,9 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
 /* Control unconditional IBPB in switch_mm() */
 DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
+DEFINE_STATIC_KEY_TRUE(switch_mm_cond_ibpb_enter);
+DEFINE_STATIC_KEY_TRUE(switch_mm_cond_ibpb_leave);
+
 /* Control MDS CPU buffer clear before returning to user space */
 DEFINE_STATIC_KEY_FALSE(mds_user_clear);
 EXPORT_SYMBOL_GPL(mds_user_clear);
@@ -640,6 +643,12 @@ enum spectre_v2_user_cmd {
SPECTRE_V2_USER_CMD_SECCOMP_IBPB,
 };
 
+enum spectre_v2_user_ibpb_mode {
+   SPECTRE_V2_USER_IBPB_BOTH,
+   SPECTRE_V2_USER_IBPB_ENTER,
+   SPECTRE_V2_USER_IBPB_LEAVE,
+};
+
 static const char * const spectre_v2_user_strings[] = {
[SPECTRE_V2_USER_NONE]  = "User space: Vulnerable",
[SPECTRE_V2_USER_STRICT]= "User space: Mitigation: 
STIBP protection",
@@ -700,12 +709,31 @@ spectre_v2_parse_user_cmdline(enum 
spectre_v2_mitigation_cmd v2_cmd)
return SPECTRE_V2_USER_CMD_AUTO;
 }
 
+static enum spectre_v2_user_ibpb_mode __init
+spectre_v2_parse_user_ibpb_mode(void)
+{
+   char arg[8];
+   int ret;
+
+   ret = cmdline_find_option(boot_command_line,
+ "spectre_v2_user_ibpb_mode",
+ arg, sizeof(arg));
+
+   if (ret == 5 && !strncmp(arg, "enter", 5))
+   return SPECTRE_V2_USER_IBPB_ENTER;
+   if (ret == 5 && !strncmp(arg, "leave", 5))
+   return SPECTRE_V2_USER_IBPB_LEAVE;
+
+   return SPECTRE_V2_USER_IBPB_BOTH;
+}
+
 static void __init
 spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
 {
enum spectre_v2_user_mitigation mode = SPECTRE_V2_USER_NONE;
bool smt_possible = IS_ENABLED(CONFIG_SMP);
enum spectre

[tip: x86/urgent] x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb

2020-11-25 Thread tip-bot2 for Anand K Mistry
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 33fc379df76b4991e5ae312f07bcd6820811971e
Gitweb:
https://git.kernel.org/tip/33fc379df76b4991e5ae312f07bcd6820811971e
Author:Anand K Mistry 
AuthorDate:Tue, 10 Nov 2020 12:33:53 +11:00
Committer: Borislav Petkov 
CommitterDate: Wed, 25 Nov 2020 20:17:09 +01:00

x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb

When spectre_v2_user={seccomp,prctl},ibpb is specified on the command
line, IBPB is force-enabled and STIPB is conditionally-enabled (or not
available).

However, since

  21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and 
enhanced IBRS.")

the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP}
instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour.
Because the issuing of IBPB relies on the switch_mm_*_ibpb static
branches, the mitigations behave as expected.

Since

  1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on 
CPUs with always-on STIBP")

this discrepency caused the misreporting of IB speculation via prctl().

On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb,
prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL |
PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are
always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB
speculation mode, even though the flag is ignored.

Similarly, for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should
also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not
available.

 [ bp: Massage commit message. ]

Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on 
STIBP and enhanced IBRS.")
Fixes: 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled 
on CPUs with always-on STIBP")
Signed-off-by: Anand K Mistry 
Signed-off-by: Borislav Petkov 
Cc: 
Link: 
https://lkml.kernel.org/r/20201110123349.1.Id0cbf996d2151f4c143c90f9028651a5b49a5908@changeid
---
 arch/x86/kernel/cpu/bugs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 581fb72..d41b70f 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -739,11 +739,13 @@ spectre_v2_user_select_mitigation(enum 
spectre_v2_mitigation_cmd v2_cmd)
if (boot_cpu_has(X86_FEATURE_IBPB)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
 
+   spectre_v2_user_ibpb = mode;
switch (cmd) {
case SPECTRE_V2_USER_CMD_FORCE:
case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
case SPECTRE_V2_USER_CMD_SECCOMP_IBPB:
static_branch_enable(_mm_always_ibpb);
+   spectre_v2_user_ibpb = SPECTRE_V2_USER_STRICT;
break;
case SPECTRE_V2_USER_CMD_PRCTL:
case SPECTRE_V2_USER_CMD_AUTO:
@@ -757,8 +759,6 @@ spectre_v2_user_select_mitigation(enum 
spectre_v2_mitigation_cmd v2_cmd)
pr_info("mitigation: Enabling %s Indirect Branch Prediction 
Barrier\n",
static_key_enabled(_mm_always_ibpb) ?
"always-on" : "conditional");
-
-   spectre_v2_user_ibpb = mode;
}
 
/*


[PATCH] x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb

2020-11-09 Thread Anand K Mistry
When spectre_v2_user={seccomp,prctl},ibpb, IBPB is force-enabled and
STIPB is conditionally-enabled (or not available). However, since
commit 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on
STIBP and enhanced IBRS.") the spectre_v2_user_ibpb variable is set to
SPECTRE_V2_USER_{PRCTL,SECCOMP} instead of SPECTRE_V2_USER_STRICT, which
is the actual behaviour. Because the issuing of IBPB relies on the
switch_mm_*_ibpb static branches, the mitigations behave as expected.

Since commit 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally
enabled on CPUs with always-on STIBP") this discrepency caused the
misreporting of IB speculation via prctl().

On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb,
prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL |
PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are
always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB
speculation mode, even though the flag is ignored.

Similarly for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should
also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not
available.

Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on 
STIBP and enhanced IBRS.")
Fixes: 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled 
on CPUs with always-on STIBP")
Signed-off-by: Anand K Mistry 

---
This is a follow up to my last patch 
(https://lore.kernel.org/lkml/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid/).
Unlike that one, this one does not change the behaviour or allow
enabling/disabling of IB speculation. Rather, it fixes the way the
prctl() responds. The last patch exposed an existing oversight in way
migitations were being tracked. Basically, the switch_mm_*_ibpb static
branches and the spectre_v2_user_ibpb variable were not consistent. This
patch tried to line those ducks in a row, as well as you can wrangle
ducks. Unfortunately, duck-wrangling is not one of my areas of
expertise.

 arch/x86/kernel/cpu/bugs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 581fb7223ad0..d41b70fe4918 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -739,11 +739,13 @@ spectre_v2_user_select_mitigation(enum 
spectre_v2_mitigation_cmd v2_cmd)
if (boot_cpu_has(X86_FEATURE_IBPB)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
 
+   spectre_v2_user_ibpb = mode;
switch (cmd) {
case SPECTRE_V2_USER_CMD_FORCE:
case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
case SPECTRE_V2_USER_CMD_SECCOMP_IBPB:
static_branch_enable(_mm_always_ibpb);
+   spectre_v2_user_ibpb = SPECTRE_V2_USER_STRICT;
break;
case SPECTRE_V2_USER_CMD_PRCTL:
case SPECTRE_V2_USER_CMD_AUTO:
@@ -757,8 +759,6 @@ spectre_v2_user_select_mitigation(enum 
spectre_v2_mitigation_cmd v2_cmd)
pr_info("mitigation: Enabling %s Indirect Branch Prediction 
Barrier\n",
static_key_enabled(_mm_always_ibpb) ?
"always-on" : "conditional");
-
-   spectre_v2_user_ibpb = mode;
}
 
/*
-- 
2.29.2.222.g5d2a92d10f8-goog



[tip: x86/urgent] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP

2020-11-06 Thread tip-bot2 for Anand K Mistry
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 1978b3a53a74e3230cd46932b149c6e62e832e9a
Gitweb:
https://git.kernel.org/tip/1978b3a53a74e3230cd46932b149c6e62e832e9a
Author:Anand K Mistry 
AuthorDate:Thu, 05 Nov 2020 16:33:04 +11:00
Committer: Borislav Petkov 
CommitterDate: Thu, 05 Nov 2020 21:43:34 +01:00

x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on 
STIBP

On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON,
STIBP is set to on and

  spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED

At the same time, IBPB can be set to conditional.

However, this leads to the case where it's impossible to turn on IBPB
for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the

  spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED

condition leads to a return before the task flag is set. Similarly,
ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to
conditional.

More generally, the following cases are possible:

1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb
2. STIBP = on && IBPB = conditional for AMD CPUs with
   X86_FEATURE_AMD_STIBP_ALWAYS_ON

The first case functions correctly today, but only because
spectre_v2_user_ibpb isn't updated to reflect the IBPB mode.

At a high level, this change does one thing. If either STIBP or IBPB
is set to conditional, allow the prctl to change the task flag.
Also, reflect that capability when querying the state. This isn't
perfect since it doesn't take into account if only STIBP or IBPB is
unconditionally on. But it allows the conditional feature to work as
expected, without affecting the unconditional one.

 [ bp: Massage commit message and comment; space out statements for
   better readability. ]

Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on 
STIBP and enhanced IBRS.")
Signed-off-by: Anand K Mistry 
Signed-off-by: Borislav Petkov 
Acked-by: Thomas Gleixner 
Acked-by: Tom Lendacky 
Link: 
https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid
---
 arch/x86/kernel/cpu/bugs.c | 51 +++--
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d3f0db4..581fb72 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1254,6 +1254,14 @@ static int ssb_prctl_set(struct task_struct *task, 
unsigned long ctrl)
return 0;
 }
 
+static bool is_spec_ib_user_controlled(void)
+{
+   return spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL ||
+   spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP ||
+   spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL ||
+   spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP;
+}
+
 static int ib_prctl_set(struct task_struct *task, unsigned long ctrl)
 {
switch (ctrl) {
@@ -1261,16 +1269,26 @@ static int ib_prctl_set(struct task_struct *task, 
unsigned long ctrl)
if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE &&
spectre_v2_user_stibp == SPECTRE_V2_USER_NONE)
return 0;
+
/*
-* Indirect branch speculation is always disabled in strict
-* mode. It can neither be enabled if it was force-disabled
-* by a  previous prctl call.
+* With strict mode for both IBPB and STIBP, the instruction
+* code paths avoid checking this task flag and instead,
+* unconditionally run the instruction. However, STIBP and IBPB
+* are independent and either can be set to conditionally
+* enabled regardless of the mode of the other.
+*
+* If either is set to conditional, allow the task flag to be
+* updated, unless it was force-disabled by a previous prctl
+* call. Currently, this is possible on an AMD CPU which has the
+* feature X86_FEATURE_AMD_STIBP_ALWAYS_ON. In this case, if the
+* kernel is booted with 'spectre_v2_user=seccomp', then
+* spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP and
+* spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED.
 */
-   if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED ||
+   if (!is_spec_ib_user_controlled() ||
task_spec_ib_force_disable(task))
return -EPERM;
+
task_clear_spec_ib_disable(task);
task_update_spec_tif(task);
break;
@@ -1283,10 +1301,10 @@ static int ib_prctl_set(struct task

[PATCH v2] proc: Provide details on indirect branch speculation

2020-11-05 Thread Anand K Mistry
Similar to speculation store bypass, show information about the indirect
branch speculation mode of a task in /proc/$pid/status.

Signed-off-by: Anand K Mistry 
---

Changes in v2:
- Remove underscores from field name to workaround documentation issue

 Documentation/filesystems/proc.rst |  2 ++
 fs/proc/array.c| 28 
 2 files changed, 30 insertions(+)

diff --git a/Documentation/filesystems/proc.rst 
b/Documentation/filesystems/proc.rst
index 533c79e8d2cd..531edaf07924 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -210,6 +210,7 @@ read the file /proc/PID/status::
   NoNewPrivs: 0
   Seccomp:0
   Speculation_Store_Bypass:   thread vulnerable
+  SpeculationIndirectBranch:  conditional enabled
   voluntary_ctxt_switches:0
   nonvoluntary_ctxt_switches: 1
 
@@ -292,6 +293,7 @@ It's slow but very precise.
  NoNewPrivs  no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
  Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...)
  Speculation_Store_Bypassspeculative store bypass mitigation status
+ SpeculationIndirectBranch   indirect branch speculation mode
  Cpus_allowedmask of CPUs on which this process may run
  Cpus_allowed_list   Same as previous, but in "list format"
  Mems_allowedmask of memory nodes allowed to this process
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 65ec2029fa80..014c1859554d 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -368,6 +368,34 @@ static inline void task_seccomp(struct seq_file *m, struct 
task_struct *p)
seq_puts(m, "vulnerable");
break;
}
+
+   seq_puts(m, "\nSpeculationIndirectBranch:\t");
+   switch (arch_prctl_spec_ctrl_get(p, PR_SPEC_INDIRECT_BRANCH)) {
+   case -EINVAL:
+   seq_puts(m, "unsupported");
+   break;
+   case PR_SPEC_NOT_AFFECTED:
+   seq_puts(m, "not affected");
+   break;
+   case PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE:
+   seq_puts(m, "conditional force disabled");
+   break;
+   case PR_SPEC_PRCTL | PR_SPEC_DISABLE:
+   seq_puts(m, "conditional disabled");
+   break;
+   case PR_SPEC_PRCTL | PR_SPEC_ENABLE:
+   seq_puts(m, "conditional enabled");
+   break;
+   case PR_SPEC_ENABLE:
+   seq_puts(m, "always enabled");
+   break;
+   case PR_SPEC_DISABLE:
+   seq_puts(m, "always disabled");
+   break;
+   default:
+   seq_puts(m, "unknown");
+   break;
+   }
seq_putc(m, '\n');
 }
 
-- 
2.29.1.341.ge80a0c044ae-goog



Re: linux-next: build warning after merge of the akpm-current tree

2020-11-04 Thread Anand K. Mistry
SNIPPED

> >
> > Looks like left column became too wide, so rather than shift the right
> > column to the right, I'd suggest to drop underscores in Speculation*.
>
> Hm. That makes it inconsistent with Speculation_Store_Bypass. I guess
> it's the lesser of two evils.

Oh, do you mean renaming the existing Speculation_Store_Bypass? I
thought that was a big no-no for the kernel?


Re: linux-next: build warning after merge of the akpm-current tree

2020-11-04 Thread Anand K. Mistry
On Thu, 5 Nov 2020 at 18:03, Mike Rapoport  wrote:
>
> On Thu, Nov 05, 2020 at 05:45:49PM +1100, Stephen Rothwell wrote:
> > Hi all,
> >
> > After merging the akpm-current tree, today's linux-next build (htmldocs)
> > produced this warning:
> >
> > Documentation/filesystems/proc.rst:296: WARNING: Malformed table.
> > Text in column margin in table line 61.
> >
> > ==  
> > ===
> > Field   Content
> > ==  
> > ===
>   ...
> > Speculation_Store_Bypassspeculative store bypass mitigation status
> > Speculation_Indirect_Branch indirect branch speculation mode
>   ...
> > ==  
> > ===
> > Documentation/filesystems/proc.rst:234: WARNING: Error parsing content 
> > block for the "table" directive: exactly one table expected.
>
> Looks like left column became too wide, so rather than shift the right
> column to the right, I'd suggest to drop underscores in Speculation*.

Hm. That makes it inconsistent with Speculation_Store_Bypass. I guess
it's the lesser of two evils.

How would I go about fixing this? Send a new (v2), fixed patch to the
mailing list? I'm not that familiar with how patches get merged
through the branches.

>
> >
> > .. table:: Table 1-2: Contents of the status files (as of 4.19)
> >
> >  ==  
> > ===
> >  Field   Content
> >  ==  
> > ===
>...
> >  Speculation_Store_Bypassspeculative store bypass mitigation status
> >  Speculation_Indirect_Branch indirect branch speculation mode
> >  Cpus_allowedmask of CPUs on which this process may run
> >  Cpus_allowed_list   Same as previous, but in "list format"
> >  Mems_allowedmask of memory nodes allowed to this process
> >  Mems_allowed_list   Same as previous, but in "list format"
> >  voluntary_ctxt_switches number of voluntary context switches
> >  nonvoluntary_ctxt_switches  number of non voluntary context switches
> >  ======  
> > ===
>
> Same here.
>
> > Introduced by commit
> >
> >   60b492d745d5 ("proc: provide details on indirect branch speculation")
> >
> > --
> > Cheers,
> > Stephen Rothwell
>
>
>
> --
> Sincerely yours,
> Mike.



-- 
Anand K. Mistry
Software Engineer
Google Australia


Re: [PATCH] proc: Provide details on indirect branch speculation

2020-11-04 Thread Anand K. Mistry
On Sun, 1 Nov 2020 at 07:05, Andrew Morton  wrote:
>
> On Fri, 30 Oct 2020 17:27:54 +1100 Anand K Mistry  wrote:
>
> > Similar to speculation store bypass, show information about the indirect
> > branch speculation mode of a task in /proc/$pid/status.
>
> Why is this considered useful?

For testing/benchmarking, I needed to see whether IB (Indirect Branch)
speculation (see Spectre-v2) is enabled on a task, to see whether an
IBPB instruction should be executed on an address space switch.
Unfortunately, this information isn't available anywhere else and
currently the only way to get it is to hack the kernel to expose it
(like this change). It also helped expose a bug with conditional IB
speculation on certain CPUs.

Another place this could be useful is to audit the system when using
sanboxing. With this change, I can confirm that seccomp-enabled
process have IB speculation force disabled as expected when the kernel
command line parameter `spectre_v2_user=seccomp`.

Since there's already a 'Speculation_Store_Bypass' field, I used that
as precedence for adding this one.

-- 
Anand K. Mistry
Software Engineer
Google Australia


[PATCH v2] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP

2020-11-04 Thread Anand K Mistry
On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON,
STIBP is set to on and 'spectre_v2_user_stibp ==
SPECTRE_V2_USER_STRICT_PREFERRED'. At the same time, IBPB can be set to
conditional. However, this leads to the case where it's impossible to
turn on IBPB for a process because in the PR_SPEC_DISABLE case in
ib_prctl_set, the (spectre_v2_user_stibp ==
SPECTRE_V2_USER_STRICT_PREFERRED) condition leads to a return before the
task flag is set. Similarly, ib_prctl_get will return PR_SPEC_DISABLE
even though IBPB is set to conditional.

More generally, the following cases are possible:
1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb
2. STIBP = on && IBPB = conditional for AMD CPUs with
   X86_FEATURE_AMD_STIBP_ALWAYS_ON

The first case functions correctly today, but only because
spectre_v2_user_ibpb isn't updated to reflect the IBPB mode.

At a high level, this change does one thing. If either STIBP or IBPB is
set to conditional, allow the prctl to change the task flag. Also,
reflect that capability when querying the state. This isn't perfect
since it doesn't take into account if only STIBP or IBPB is
unconditionally on. But it allows the conditional feature to work as
expected, without affecting the unconditional one.

Signed-off-by: Anand K Mistry 

---

Changes in v2:
- Fix typo in commit message
- s/is_spec_ib_user/is_spec_ib_user_controlled
- Update comment in ib_prctl_set() to reference X86_FEATURE_AMD_STIBP_ALWAYS_ON
- Have is_spec_ib_user_controlled() check both IBPB and STIBP modes

 arch/x86/kernel/cpu/bugs.c | 46 +++---
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d3f0db463f96..534225afe832 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1254,6 +1254,14 @@ static int ssb_prctl_set(struct task_struct *task, 
unsigned long ctrl)
return 0;
 }
 
+static bool is_spec_ib_user_controlled(void)
+{
+   return spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL ||
+   spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP ||
+   spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL ||
+   spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP;
+}
+
 static int ib_prctl_set(struct task_struct *task, unsigned long ctrl)
 {
switch (ctrl) {
@@ -1262,13 +1270,20 @@ static int ib_prctl_set(struct task_struct *task, 
unsigned long ctrl)
spectre_v2_user_stibp == SPECTRE_V2_USER_NONE)
return 0;
/*
-* Indirect branch speculation is always disabled in strict
-* mode. It can neither be enabled if it was force-disabled
-* by a  previous prctl call.
+* With strict mode for both IBPB and STIBP, the instruction
+* code paths avoid checking this task flag and instead,
+* unconditionally run the instruction. However, STIBP and IBPB
+* are independent and either can be set to conditionally
+* enabled regardless of the mode of the other. If either is set
+* to conditional, allow the task flag to be updated, unless it
+* was force-disabled by a previous prctl call.
+* Currently, this is possible on an AMD CPU which has the
+* feature X86_FEATURE_AMD_STIBP_ALWAYS_ON. In this case, if the
+* kernel is booted with 'spectre_v2_user=seccomp', then
+* spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP and
+* spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED.
 */
-   if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED ||
+   if (!is_spec_ib_user_controlled() ||
task_spec_ib_force_disable(task))
return -EPERM;
task_clear_spec_ib_disable(task);
@@ -1283,9 +1298,7 @@ static int ib_prctl_set(struct task_struct *task, 
unsigned long ctrl)
if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE &&
spectre_v2_user_stibp == SPECTRE_V2_USER_NONE)
return -EPERM;
-   if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED)
+   if (!is_spec_ib_user_controlled())
return 0;
task_set_spec_ib_disable(task);
if (ctrl == PR_SPEC_FORCE_DISABLE)
@@ -1351,20 +1364,17 @@ static int ib_prctl_get(struct task_struct *task)
if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE &&
 

Re: [PATCH 1/1] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP

2020-11-04 Thread Anand K. Mistry
On Tue, 3 Nov 2020 at 21:58, Borislav Petkov  wrote:
>
> On Mon, Nov 02, 2020 at 11:02:10AM +1100, Anand K. Mistry wrote:
> > > I like the idea of passing in the mode you want to check, but it appears
> > > they are never used independently. The ibpb and stibp modes are always
> > > checked together in one of the if statements below, so you could make this
> > > a function that checks both modes and just have a single call. I'll leave
> > > that up to the maintainers to see what is preferred.
> >
> > I can see both sides to this. Personally, I think I prefer it as-is
> > since I think it improves readability a bit by making the conditions
> > less complicated whilst not hiding too many details. I'll wait to see
> > what others say before changing this one.
>
> Yes, but if you make it a single function with a descriptive name, you'd
> make the call sites even more readable:
>
> if (!is_spec_ib_conditional(..))
> bla;
>
> or
>
> if (!is_spec_ib_user_controlled(..))
> blu;
>
> and that function should simply check both spectre_v2_user_ibpb *and*
> spectre_v2_user_stibp in one go.
>
> Why should we do that?
>
> Exactly because you both got your brains twisted just from looking at
> this. Because this mitigation crap is such an ugly and complex maze that
> we would take even the smallest simplification any day of the week!

Ok then, two votes for. I'll make the change in v2 and post that up today.

>
> Welcome to my life since meltdown. Brain twist feels good, doesn't it?

I don't think "feels good" are the words I'd use.

>
> :-)))
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette



-- 
Anand K. Mistry
Software Engineer
Google Australia


Re: [PATCH 1/1] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP

2020-11-01 Thread Anand K. Mistry
On Sun, 1 Nov 2020 at 02:05, Tom Lendacky  wrote:
>
> On 10/29/20 1:51 AM, Anand K Mistry wrote:
> > On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON,
> > STIBP is set to on and 'spectre_v2_user_stibp ==
> > SPECTRE_V2_USER_STRICT_PREFERRED'. At the same time, IBPB can be set to
> > conditional. However, this leads to the case where it's impossible to
> > turn on IBPB for a process because in the PR_SPEC_DISABLE case in
> > ib_prctl_set, the (spectre_v2_user_stibp ==
> > SPECTRE_V2_USER_STRICT_PREFERRED) condition leads to a return before the
> > task flag is set. Similarly, ib_prctl_get will return PR_SPEC_DISABLE
> > even though IBPB is set to conditional.
> >
> > More generally, the following cases are possible:
> > 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb
> > 2. STIBP = on && IBPB = conditional for AMD CPUs with
> > X86_FEATURE_AMD_STIBP_ALWAYS_ON
> >
> > The first case functions correctly today, but only because
> > spectre_v2_user_ibpb isn't updated to reflect the IBPB mode.
> >
> > At a high level, this change does one thing. If either STIBP is IBPB is
>
> s/STIBP is IBPB/STIBP or IBPB/

Oops. Will be fixed in v2.

>
> > set to conditional, allow the prctl to change the task flag. Also,
> > reflect that capability when querying the state. This isn't perfect
> > since it doesn't take into account if only STIBP or IBPB is
> > unconditionally on. But it allows the conditional feature to work as
> > expected, without affecting the unconditional one.
> >
> > Signed-off-by: Anand K Mistry 
> >
> > ---
> >
> >   arch/x86/kernel/cpu/bugs.c | 41 +-
> >   1 file changed, 23 insertions(+), 18 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > index d3f0db463f96..fb64e02eed6f 100644
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -1254,6 +1254,11 @@ static int ssb_prctl_set(struct task_struct *task, 
> > unsigned long ctrl)
> >   return 0;
> >   }
> >
> > +static bool is_spec_ib_user(enum spectre_v2_user_mitigation mode)
>
> Maybe something like is_spec_ib_user_controlled() would be a better name.

Changed in v2.

>
> > +{
> > + return mode == SPECTRE_V2_USER_PRCTL || mode == 
> > SPECTRE_V2_USER_SECCOMP;
> > +}
> > +
>
> I like the idea of passing in the mode you want to check, but it appears
> they are never used independently. The ibpb and stibp modes are always
> checked together in one of the if statements below, so you could make this
> a function that checks both modes and just have a single call. I'll leave
> that up to the maintainers to see what is preferred.

I can see both sides to this. Personally, I think I prefer it as-is
since I think it improves readability a bit by making the conditions
less complicated whilst not hiding too many details. I'll wait to see
what others say before changing this one.

>
> >   static int ib_prctl_set(struct task_struct *task, unsigned long ctrl)
> >   {
> >   switch (ctrl) {
> > @@ -1262,13 +1267,16 @@ static int ib_prctl_set(struct task_struct *task, 
> > unsigned long ctrl)
> >   spectre_v2_user_stibp == SPECTRE_V2_USER_NONE)
> >   return 0;
> >   /*
> > -  * Indirect branch speculation is always disabled in strict
> > -  * mode. It can neither be enabled if it was force-disabled
> > -  * by a  previous prctl call.
> > +  * With strict mode for both IBPB and STIBP, the instruction
> > +  * code paths avoid checking this task flag and instead,
> > +  * unconditionally run the instruction. However, STIBP and 
> > IBPB
> > +  * are independent and either can be set to conditionally
> > +  * enabled regardless of the mode of the other. If either is 
> > set
> > +  * to conditional, allow the task flag to be updated, unless 
> > it
> > +  * was force-disabled by a previous prctl call.
>
> You probably want to reference the STIBP always on mode that allows this
> situation.

Updated comment in v2 to mention the X86_FEATURE_AMD_STIBP_ALWAYS_ON case.

>
> >*/
> > - if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
> > - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
> > - spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED 
> > ||
> >

Re: [PATCH 0/1] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP

2020-11-01 Thread Anand K. Mistry
On Sun, 1 Nov 2020 at 01:50, Tom Lendacky  wrote:
>
> On 10/29/20 1:51 AM, Anand K Mistry wrote:
> > When attempting to do some performance testing of IBPB on and AMD
> > platform, I noticed the IBPB instruction was never being issued, even
> > though it was conditionally on and various seccomp protected processes
> > were force enabling it. Turns out, on those AMD CPUs, STIBP is set to
> > always-on and this was causing an early-out on the prctl() which turns
> > off IB speculation. Here is my attempt to fix it.
> >
> > I'm hoping someone that understands this better than me can explain why
> > I'm wrong.
>
> It all looks reasonable to me (some comments in the patch to follow). The
> thing that makes this tough is the command line option of being able to
> force IBPB using the "prctl,ibpb" or "seccomp,ibpb" while STIBP is prctl
> or seccomp controlled. There's an inherent quality that is assumed that if
> STIBP is forced then IBPB must be forced and it looks like 21998a351512
> ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced
> IBRS.") used that. However, with the STIBP always on support, that doesn't
> hold true.

Yeah, and this is what I found confusing. With that commit, the number
of combinations
of IBPB and STIBP is 25, but only a small subset is possible. For example:
- (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT &&
spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT)
- (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT &&
spectre_v2_user_stibp == SPECTRE_V2_USER_NONE)
are the only possible combinations of STRICT.

But also, if 'spectre_v2_user=seccomp,ibpb' (or prctl,ibpb), then
spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP even though it is
logically SPECTRE_V2_USER_STRICT.

It took a bit of time to wrap my head around this, hence I'm a bit
hesitant about this change (even though I think it's right).

>
> Thanks,
> Tom
>
> >
> >
> > Anand K Mistry (1):
> >x86/speculation: Allow IBPB to be conditionally enabled on CPUs with
> >  always-on STIBP
> >
> >   arch/x86/kernel/cpu/bugs.c | 41 +-
> >   1 file changed, 23 insertions(+), 18 deletions(-)
> >


Re: [PATCH 1/1] debugfs: Add a helper to export atomic64_t values

2020-10-30 Thread Anand K. Mistry
On Fri, 30 Oct 2020 at 18:20, Greg Kroah-Hartman
 wrote:
>
> On Fri, Oct 30, 2020 at 06:04:42PM +1100, Anand K Mistry wrote:
> > This mirrors support for exporting atomic_t values.
> >
> > Signed-off-by: Anand K Mistry 
> >
> > ---
> >
> >  fs/debugfs/file.c   | 37 +
> >  include/linux/debugfs.h |  6 ++
> >  2 files changed, 43 insertions(+)
> >
> > diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
> > index a768a09430c3..798bd3bdedec 100644
> > --- a/fs/debugfs/file.c
> > +++ b/fs/debugfs/file.c
> > @@ -770,6 +770,43 @@ void debugfs_create_atomic_t(const char *name, umode_t 
> > mode,
> >  }
> >  EXPORT_SYMBOL_GPL(debugfs_create_atomic_t);
> >
> > +static int debugfs_atomic64_t_set(void *data, u64 val)
> > +{
> > + atomic64_set((atomic64_t *)data, val);
> > + return 0;
> > +}
> > +static int debugfs_atomic64_t_get(void *data, u64 *val)
> > +{
> > + *val = atomic64_read((atomic64_t *)data);
> > + return 0;
> > +}
> > +DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t, debugfs_atomic64_t_get,
> > + debugfs_atomic64_t_set, "%lld\n");
> > +DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t_ro, debugfs_atomic64_t_get, NULL,
> > + "%lld\n");
> > +DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t_wo, NULL, debugfs_atomic64_t_set,
> > + "%lld\n");
> > +
> > +/**
> > + * debugfs_create_atomic64_t - create a debugfs file that is used to read 
> > and
> > + * write an atomic64_t value
> > + * @name: a pointer to a string containing the name of the file to create.
> > + * @mode: the permission that the file should have
> > + * @parent: a pointer to the parent dentry for this file.  This should be a
> > + *  directory dentry if set.  If this parameter is %NULL, then the
> > + *  file will be created in the root of the debugfs filesystem.
> > + * @value: a pointer to the variable that the file should read to and write
> > + * from.
> > + */
> > +void debugfs_create_atomic64_t(const char *name, umode_t mode,
> > +struct dentry *parent, atomic64_t *value)
> > +{
> > + debugfs_create_mode_unsafe(name, mode, parent, value,
> > +_atomic64_t, _atomic64_t_ro,
> > +_atomic64_t_wo);
> > +}
> > +EXPORT_SYMBOL_GPL(debugfs_create_atomic64_t);
> > +
> >  ssize_t debugfs_read_file_bool(struct file *file, char __user *user_buf,
> >  size_t count, loff_t *ppos)
> >  {
> > diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
> > index 851dd1f9a8a5..0fac84c53eab 100644
> > --- a/include/linux/debugfs.h
> > +++ b/include/linux/debugfs.h
> > @@ -126,6 +126,8 @@ void debugfs_create_size_t(const char *name, umode_t 
> > mode,
> >  struct dentry *parent, size_t *value);
> >  void debugfs_create_atomic_t(const char *name, umode_t mode,
> >struct dentry *parent, atomic_t *value);
> > +void debugfs_create_atomic64_t(const char *name, umode_t mode,
> > +  struct dentry *parent, atomic64_t 
> > *value);
> >  struct dentry *debugfs_create_bool(const char *name, umode_t mode,
> > struct dentry *parent, bool *value);
> >
> > @@ -291,6 +293,10 @@ static inline void debugfs_create_atomic_t(const char 
> > *name, umode_t mode,
> >  atomic_t *value)
> >  { }
> >
> > +static inline void debugfs_create_atomic64_t(const char *name, umode_t 
> > mode,
> > +  struct dentry *parent, 
> > atomic64_t *value)
> > +{ }
> > +
> >  static inline struct dentry *debugfs_create_bool(const char *name, umode_t 
> > mode,
> >struct dentry *parent,
> >bool *value)
>
> Looks good, but where is the user of this code?  I can't add new apis
> without a user.

Fair enough. Right now, the user is just some local
debugging/performance measuring which will never be upstreamed.
Happy to let this drop.

>
> And are you _SURE_ you want to be using an atomic64_t in the first
> place?  We are starting to reduce the "raw" usage of atomic variables as
> almost no one needs them, they should be using something else instead,
> or just a u64 as atomics are not needed for simple statistics.

I understand, and would generally never use atomics in real code. I
used an atomic since I wanted accuracy
(for some of the benchmarks I want to run) but can't use anything that
blocks (spinlock/mutex) since
the code is somewhere inside the scheduler.

>
> thanks,
>
> greg k-h

-- 
Anand K. Mistry
Software Engineer
Google Australia


[PATCH 1/1] debugfs: Add a helper to export atomic64_t values

2020-10-30 Thread Anand K Mistry
This mirrors support for exporting atomic_t values.

Signed-off-by: Anand K Mistry 

---

 fs/debugfs/file.c   | 37 +
 include/linux/debugfs.h |  6 ++
 2 files changed, 43 insertions(+)

diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index a768a09430c3..798bd3bdedec 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -770,6 +770,43 @@ void debugfs_create_atomic_t(const char *name, umode_t 
mode,
 }
 EXPORT_SYMBOL_GPL(debugfs_create_atomic_t);
 
+static int debugfs_atomic64_t_set(void *data, u64 val)
+{
+   atomic64_set((atomic64_t *)data, val);
+   return 0;
+}
+static int debugfs_atomic64_t_get(void *data, u64 *val)
+{
+   *val = atomic64_read((atomic64_t *)data);
+   return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t, debugfs_atomic64_t_get,
+   debugfs_atomic64_t_set, "%lld\n");
+DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t_ro, debugfs_atomic64_t_get, NULL,
+   "%lld\n");
+DEFINE_DEBUGFS_ATTRIBUTE(fops_atomic64_t_wo, NULL, debugfs_atomic64_t_set,
+   "%lld\n");
+
+/**
+ * debugfs_create_atomic64_t - create a debugfs file that is used to read and
+ * write an atomic64_t value
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *  directory dentry if set.  If this parameter is %NULL, then the
+ *  file will be created in the root of the debugfs filesystem.
+ * @value: a pointer to the variable that the file should read to and write
+ * from.
+ */
+void debugfs_create_atomic64_t(const char *name, umode_t mode,
+  struct dentry *parent, atomic64_t *value)
+{
+   debugfs_create_mode_unsafe(name, mode, parent, value,
+  _atomic64_t, _atomic64_t_ro,
+  _atomic64_t_wo);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_atomic64_t);
+
 ssize_t debugfs_read_file_bool(struct file *file, char __user *user_buf,
   size_t count, loff_t *ppos)
 {
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index 851dd1f9a8a5..0fac84c53eab 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -126,6 +126,8 @@ void debugfs_create_size_t(const char *name, umode_t mode,
   struct dentry *parent, size_t *value);
 void debugfs_create_atomic_t(const char *name, umode_t mode,
 struct dentry *parent, atomic_t *value);
+void debugfs_create_atomic64_t(const char *name, umode_t mode,
+struct dentry *parent, atomic64_t *value);
 struct dentry *debugfs_create_bool(const char *name, umode_t mode,
  struct dentry *parent, bool *value);
 
@@ -291,6 +293,10 @@ static inline void debugfs_create_atomic_t(const char 
*name, umode_t mode,
   atomic_t *value)
 { }
 
+static inline void debugfs_create_atomic64_t(const char *name, umode_t mode,
+struct dentry *parent, atomic64_t 
*value)
+{ }
+
 static inline struct dentry *debugfs_create_bool(const char *name, umode_t 
mode,
 struct dentry *parent,
 bool *value)
-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH 0/1] debugfs: Add a helper to export atomic64_t values

2020-10-30 Thread Anand K Mistry
Here's my story:
Once upon a time, there lived a developer that wanted to export an
atomic64_t value to userspace using debugfs to help with debugging. But
that developer found there was no helper function to do so and was very
sad.
The End.

Yeah, it's a sad story. Here's my patch, merge me maybe?


Anand K Mistry (1):
  debugfs: Add a helper to export atomic64_t values

 fs/debugfs/file.c   | 37 +
 include/linux/debugfs.h |  6 ++
 2 files changed, 43 insertions(+)

-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH] proc: Provide details on indirect branch speculation

2020-10-30 Thread Anand K Mistry
Similar to speculation store bypass, show information about the indirect
branch speculation mode of a task in /proc/$pid/status.

Signed-off-by: Anand K Mistry 
---

 Documentation/filesystems/proc.rst |  2 ++
 fs/proc/array.c| 28 
 2 files changed, 30 insertions(+)

diff --git a/Documentation/filesystems/proc.rst 
b/Documentation/filesystems/proc.rst
index 533c79e8d2cd..710dd69614b9 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -210,6 +210,7 @@ read the file /proc/PID/status::
   NoNewPrivs: 0
   Seccomp:0
   Speculation_Store_Bypass:   thread vulnerable
+  Speculation_Indirect_Branch:conditional enabled
   voluntary_ctxt_switches:0
   nonvoluntary_ctxt_switches: 1
 
@@ -292,6 +293,7 @@ It's slow but very precise.
  NoNewPrivs  no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
  Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...)
  Speculation_Store_Bypassspeculative store bypass mitigation status
+ Speculation_Indirect_Branch indirect branch speculation mode
  Cpus_allowedmask of CPUs on which this process may run
  Cpus_allowed_list   Same as previous, but in "list format"
  Mems_allowedmask of memory nodes allowed to this process
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 65ec2029fa80..ce4fa948c9dd 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -368,6 +368,34 @@ static inline void task_seccomp(struct seq_file *m, struct 
task_struct *p)
seq_puts(m, "vulnerable");
break;
}
+
+   seq_puts(m, "\nSpeculation_Indirect_Branch:\t");
+   switch (arch_prctl_spec_ctrl_get(p, PR_SPEC_INDIRECT_BRANCH)) {
+   case -EINVAL:
+   seq_puts(m, "unsupported");
+   break;
+   case PR_SPEC_NOT_AFFECTED:
+   seq_puts(m, "not affected");
+   break;
+   case PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE:
+   seq_puts(m, "conditional force disabled");
+   break;
+   case PR_SPEC_PRCTL | PR_SPEC_DISABLE:
+   seq_puts(m, "conditional disabled");
+   break;
+   case PR_SPEC_PRCTL | PR_SPEC_ENABLE:
+   seq_puts(m, "conditional enabled");
+   break;
+   case PR_SPEC_ENABLE:
+   seq_puts(m, "always enabled");
+   break;
+   case PR_SPEC_DISABLE:
+   seq_puts(m, "always disabled");
+   break;
+   default:
+   seq_puts(m, "unknown");
+   break;
+   }
seq_putc(m, '\n');
 }
 
-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH 0/1] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP

2020-10-29 Thread Anand K Mistry
When attempting to do some performance testing of IBPB on and AMD
platform, I noticed the IBPB instruction was never being issued, even
though it was conditionally on and various seccomp protected processes
were force enabling it. Turns out, on those AMD CPUs, STIBP is set to
always-on and this was causing an early-out on the prctl() which turns
off IB speculation. Here is my attempt to fix it.

I'm hoping someone that understands this better than me can explain why
I'm wrong.


Anand K Mistry (1):
  x86/speculation: Allow IBPB to be conditionally enabled on CPUs with
always-on STIBP

 arch/x86/kernel/cpu/bugs.c | 41 +-
 1 file changed, 23 insertions(+), 18 deletions(-)

-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH 1/1] x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP

2020-10-29 Thread Anand K Mistry
On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON,
STIBP is set to on and 'spectre_v2_user_stibp ==
SPECTRE_V2_USER_STRICT_PREFERRED'. At the same time, IBPB can be set to
conditional. However, this leads to the case where it's impossible to
turn on IBPB for a process because in the PR_SPEC_DISABLE case in
ib_prctl_set, the (spectre_v2_user_stibp ==
SPECTRE_V2_USER_STRICT_PREFERRED) condition leads to a return before the
task flag is set. Similarly, ib_prctl_get will return PR_SPEC_DISABLE
even though IBPB is set to conditional.

More generally, the following cases are possible:
1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb
2. STIBP = on && IBPB = conditional for AMD CPUs with
   X86_FEATURE_AMD_STIBP_ALWAYS_ON

The first case functions correctly today, but only because
spectre_v2_user_ibpb isn't updated to reflect the IBPB mode.

At a high level, this change does one thing. If either STIBP is IBPB is
set to conditional, allow the prctl to change the task flag. Also,
reflect that capability when querying the state. This isn't perfect
since it doesn't take into account if only STIBP or IBPB is
unconditionally on. But it allows the conditional feature to work as
expected, without affecting the unconditional one.

Signed-off-by: Anand K Mistry 

---

 arch/x86/kernel/cpu/bugs.c | 41 +-
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d3f0db463f96..fb64e02eed6f 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1254,6 +1254,11 @@ static int ssb_prctl_set(struct task_struct *task, 
unsigned long ctrl)
return 0;
 }
 
+static bool is_spec_ib_user(enum spectre_v2_user_mitigation mode)
+{
+   return mode == SPECTRE_V2_USER_PRCTL || mode == SPECTRE_V2_USER_SECCOMP;
+}
+
 static int ib_prctl_set(struct task_struct *task, unsigned long ctrl)
 {
switch (ctrl) {
@@ -1262,13 +1267,16 @@ static int ib_prctl_set(struct task_struct *task, 
unsigned long ctrl)
spectre_v2_user_stibp == SPECTRE_V2_USER_NONE)
return 0;
/*
-* Indirect branch speculation is always disabled in strict
-* mode. It can neither be enabled if it was force-disabled
-* by a  previous prctl call.
+* With strict mode for both IBPB and STIBP, the instruction
+* code paths avoid checking this task flag and instead,
+* unconditionally run the instruction. However, STIBP and IBPB
+* are independent and either can be set to conditionally
+* enabled regardless of the mode of the other. If either is set
+* to conditional, allow the task flag to be updated, unless it
+* was force-disabled by a previous prctl call.
 */
-   if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED ||
+   if ((!is_spec_ib_user(spectre_v2_user_ibpb) &&
+!is_spec_ib_user(spectre_v2_user_stibp)) ||
task_spec_ib_force_disable(task))
return -EPERM;
task_clear_spec_ib_disable(task);
@@ -1283,9 +1291,8 @@ static int ib_prctl_set(struct task_struct *task, 
unsigned long ctrl)
if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE &&
spectre_v2_user_stibp == SPECTRE_V2_USER_NONE)
return -EPERM;
-   if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED)
+   if (!is_spec_ib_user(spectre_v2_user_ibpb) &&
+   !is_spec_ib_user(spectre_v2_user_stibp))
return 0;
task_set_spec_ib_disable(task);
if (ctrl == PR_SPEC_FORCE_DISABLE)
@@ -1351,20 +1358,18 @@ static int ib_prctl_get(struct task_struct *task)
if (spectre_v2_user_ibpb == SPECTRE_V2_USER_NONE &&
spectre_v2_user_stibp == SPECTRE_V2_USER_NONE)
return PR_SPEC_ENABLE;
-   else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED)
-   return PR_SPEC_DISABLE;
-   else if (spectre_v2_user_ibpb == SPECTRE_V2_USER_PRCTL ||
-   spectre_v2_user_ibpb == SPECTRE_V2_USER_SECCOMP ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_PRCTL ||
-   spectre_v2_user_stibp == SPECTRE_V2_USER_SECCOMP) {
+   else if (is_spec_ib_user(spectre_v

[PATCH v2] arm64: dts: mt8173-elm: Set GPU power regulator to always on

2020-08-26 Thread Anand K Mistry
Keep the da9212 BUCKB always-on. This works around an issue on Elm/Hana
devices where sometimes, the regulator is disabled before scpsys is
suspended, causing the suspension of scpsys to fail.

Usually, the GPU and scpsys are suspended by the runtime PM before the
system is suspended, due to the GPU being idle. In this case, scpsys is
suspended inline with the GPU suspend, which then disables the
regulator. However, if the GPU is still active when system is suspended,
GPU suspend occurs but defers suspending scpsys to the PM's noirq phase.
Since GPU suspend disables the regulator, scpsys isn't powered and
suspending it fails with the following error:
[  523.773227] mtk-scpsys 10006000.scpsys: Failed to power off domain mfg_2d

On resume, scpsys is resumed in the noirq phase. Since scpsys requires
power from the regulator, which is still disabled at this point,
attempting to turn it on will hang the CPU. A HW watchdog eventually
reboots the system.

The obvious solution would be to add a link to the regulator from scpsys
in the devicetree. This would prevent the regulator from being disabled
until scpsys is suspended. However, in the case where suspending scpsys
is deferred to the noirq phase, disabling the regulator will fail since
it is connected over I2C which requires IRQs to be enabled. Even in the
usual case where scpsys is suspended inline with the GPU, PM will always
attempt to resume scpsys in noirq. This will attempt to enable the
regulator, which will also fail due to being unable to communicate over
I2C.

Since I2C can't be using with IRQs disabled, a workaround is to never
turn off the regulator.

Measuring power on the GPU rail on an Elm DVT device shows that the
change in power usage is negligible. The two relavent cases are S0 with
an idle GPU, and S3 (system suspended). Measurements taken using a debug
board every 100ms for 1 minute.

In S0 with an idle GPU, current behaviour with the regulator off:
@@   NAME  COUNT  AVERAGE  STDDEV  MAXMIN
@@ gpu_mw600 1.741.31 6.75   0.00
... and with the regulator on, but no load:
@@   NAME  COUNT  AVERAGE  STDDEV MAXMIN
@@ gpu_mw600 1.681.257.13   0.00
The difference being well within the margin of error.

In S3, current behaviour with the regulator off:
@@   NAME  COUNT  AVERAGE  STDDEV MAXMIN
@@ gpu_mw600 0.940.743.25   0.00
... and with the regulator on:
@@   NAME  COUNT  AVERAGE  STDDEV MAX MIN
@@ gpu_mw600 0.830.663.250.00

Signed-off-by: Anand K Mistry 

---

Changes in v2:
- Remove CHROMIUM from subject line
- Correct device in subject line (8183 -> 8173)
- Grammar/clarity changes in description

 arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
index a5a12b2599a4..1294f27b21c1 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
@@ -304,6 +304,7 @@ da9211_vgpu_reg: BUCKB {
regulator-min-microamp  = <200>;
regulator-max-microamp  = <300>;
regulator-ramp-delay = <1>;
+   regulator-always-on;
};
};
};
-- 
2.28.0.297.g1956fa8f8d-goog



Re: [PATCH] CHROMIUM: arm64: dts: mt8183-elm: Set GPU power regulator to always on

2020-08-24 Thread Anand K. Mistry
:facepalm: sorry about the subject line. I'll fix it up in the next revision.

On Tue, 25 Aug 2020 at 15:26, Anand K Mistry  wrote:
>
> Keep the da9212 BUCKB always-on. This works around an issue on Elm/Hana
> devices where sometimes, the regulator is disabled before scpsys is
> suspended, causing the suspension of scpsys to fail.
>
> Usually, the GPU and scpsys are suspended by the runtime PM before the
> system is suspended, due to the GPU being idle. In this case, scpsys is
> suspended inline with the GPU suspend, which then disables the
> regulator. However, if the GPU is still active when system is suspended,
> GPU suspend occurs but defers suspending scpsys to the PM's noirq phase.
> Since GPU suspend disables the regulator, scpsys isn't powered and
> suspending it fails with the following error:
> [  523.773227] mtk-scpsys 10006000.scpsys: Failed to power off domain mfg_2d
>
> On resume, scpsys is resumed in the noirq phase. Since scpsys requires
> power from the regulator, which is still disabled at this point,
> attempting to turn it on will hang the CPU. A HW watchdog eventually
> reboots the system.
>
> The obvious solution would be to add a link to the regulator from scpsys
> in the devicetree. This would prevent the regulator from being disabled
> until scpsys is suspended. However, in the case where suspending scpsys
> is deferred to the noirq phase, disabling the regulator will fail since
> it is connected over I2C which requires IRQs to be enabled. Even in the
> usual case where scpsys is suspended inline with the GPU, PM will always
> attempt to resume scpsys in noirq. This will attempt to enable the
> regulator, which will also fail due to being unable to communicate over
> I2C.
>
> Since I2C can't be using with IRQs disabled, a workaround is to never
> turn off the regulator.
>
> Measuring power on the GPU rail on a Elm DVT shows that the change in
> power usage is negligible. The two relavent cases are S0 with an idle
> GPU, and S3.
>
> In S0 with an idle GPU, current behaviour with the regulator off:
> @@   NAME  COUNT  AVERAGE  STDDEV  MAXMIN
> @@ gpu_mw600 1.741.31 6.75   0.00
> ... and with the regulator on, but no load:
> @@   NAME  COUNT  AVERAGE  STDDEV MAXMIN
> @@ gpu_mw600 1.681.257.13   0.00
> The difference being well within the margin of error.
>
> In S3, current behaviour with the regulator off:
> @@   NAME  COUNT  AVERAGE  STDDEV MAXMIN
> @@ gpu_mw600 0.940.743.25   0.00
> ... and with the regulator on:
> @@   NAME  COUNT  AVERAGE  STDDEV MAX MIN
> @@ gpu_mw600 0.830.663.250.00
>
> Signed-off-by: Anand K Mistry 
>
> ---
>
>  arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi 
> b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
> index a5a12b2599a4..1294f27b21c1 100644
> --- a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
> +++ b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
> @@ -304,6 +304,7 @@ da9211_vgpu_reg: BUCKB {
> regulator-min-microamp  = <200>;
> regulator-max-microamp  = <300>;
>         regulator-ramp-delay = <1>;
> +   regulator-always-on;
> };
> };
> };
> --
> 2.28.0.297.g1956fa8f8d-goog
>


-- 
Anand K. Mistry
Software Engineer
Google Australia


[PATCH] CHROMIUM: arm64: dts: mt8183-elm: Set GPU power regulator to always on

2020-08-24 Thread Anand K Mistry
Keep the da9212 BUCKB always-on. This works around an issue on Elm/Hana
devices where sometimes, the regulator is disabled before scpsys is
suspended, causing the suspension of scpsys to fail.

Usually, the GPU and scpsys are suspended by the runtime PM before the
system is suspended, due to the GPU being idle. In this case, scpsys is
suspended inline with the GPU suspend, which then disables the
regulator. However, if the GPU is still active when system is suspended,
GPU suspend occurs but defers suspending scpsys to the PM's noirq phase.
Since GPU suspend disables the regulator, scpsys isn't powered and
suspending it fails with the following error:
[  523.773227] mtk-scpsys 10006000.scpsys: Failed to power off domain mfg_2d

On resume, scpsys is resumed in the noirq phase. Since scpsys requires
power from the regulator, which is still disabled at this point,
attempting to turn it on will hang the CPU. A HW watchdog eventually
reboots the system.

The obvious solution would be to add a link to the regulator from scpsys
in the devicetree. This would prevent the regulator from being disabled
until scpsys is suspended. However, in the case where suspending scpsys
is deferred to the noirq phase, disabling the regulator will fail since
it is connected over I2C which requires IRQs to be enabled. Even in the
usual case where scpsys is suspended inline with the GPU, PM will always
attempt to resume scpsys in noirq. This will attempt to enable the
regulator, which will also fail due to being unable to communicate over
I2C.

Since I2C can't be using with IRQs disabled, a workaround is to never
turn off the regulator.

Measuring power on the GPU rail on a Elm DVT shows that the change in
power usage is negligible. The two relavent cases are S0 with an idle
GPU, and S3.

In S0 with an idle GPU, current behaviour with the regulator off:
@@   NAME  COUNT  AVERAGE  STDDEV  MAXMIN
@@ gpu_mw600 1.741.31 6.75   0.00
... and with the regulator on, but no load:
@@   NAME  COUNT  AVERAGE  STDDEV MAXMIN
@@ gpu_mw600 1.681.257.13   0.00
The difference being well within the margin of error.

In S3, current behaviour with the regulator off:
@@   NAME  COUNT  AVERAGE  STDDEV MAXMIN
@@ gpu_mw600 0.940.743.25   0.00
... and with the regulator on:
@@   NAME  COUNT  AVERAGE  STDDEV MAX MIN
@@ gpu_mw600 0.830.663.250.00

Signed-off-by: Anand K Mistry 

---

 arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
index a5a12b2599a4..1294f27b21c1 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
@@ -304,6 +304,7 @@ da9211_vgpu_reg: BUCKB {
regulator-min-microamp  = <200>;
regulator-max-microamp  = <300>;
regulator-ramp-delay = <1>;
+   regulator-always-on;
};
};
};
-- 
2.28.0.297.g1956fa8f8d-goog



[PATCH v2 2/4] dt-bindings: regulator: mt6397: Document valid modes

2020-07-02 Thread Anand K Mistry
Document valid mode values for BUCK regulators.

Signed-off-by: Anand K Mistry 
---

Changes in v2: None

 .../devicetree/bindings/regulator/mt6397-regulator.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/regulator/mt6397-regulator.txt 
b/Documentation/devicetree/bindings/regulator/mt6397-regulator.txt
index 01141fb00875..c080086d3e62 100644
--- a/Documentation/devicetree/bindings/regulator/mt6397-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/mt6397-regulator.txt
@@ -16,6 +16,9 @@ LDO:
   ldo_vemc3v3, ldo_vgp1, ldo_vgp2, ldo_vgp3, ldo_vgp4, ldo_vgp5, ldo_vgp6,
   ldo_vibr
 
+BUCK regulators can set regulator-initial-mode and regulator-allowed-modes to
+values specified in dt-bindings/regulator/mediatek,mt6397-regulator.h
+
 Example:
pmic {
compatible = "mediatek,mt6397";
-- 
2.27.0.212.ge8ba1cc988-goog



[PATCH v2 4/4] arm64: dts: mediatek: Update allowed mt6397 regulator modes for elm boards

2020-07-02 Thread Anand K Mistry
This updates the allowed mt6397 regulator modes for elm (and derivative)
boards to use named constants.

Signed-off-by: Anand K Mistry 

---

Changes in v2:
- Introduce constants in dt-bindings
- Improve conditional readability

 arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
index a5a12b2599a4..e9cfded307b3 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "mt8173.dtsi"
 
 / {
@@ -926,7 +927,8 @@ mt6397_vpca15_reg: buck_vpca15 {
regulator-max-microvolt = <135>;
regulator-ramp-delay = <12500>;
regulator-always-on;
-   regulator-allowed-modes = <0 1>;
+   regulator-allowed-modes = ;
};
 
mt6397_vpca7_reg: buck_vpca7 {
-- 
2.27.0.212.ge8ba1cc988-goog



[PATCH v2 1/4] regulator: mt6397: Move buck modes into header file

2020-07-02 Thread Anand K Mistry
This will allow device trees to make use of these constants.

Signed-off-by: Anand K Mistry 
---

Changes in v2: None

 drivers/regulator/mt6397-regulator.c  |  4 +---
 .../regulator/mediatek,mt6397-regulator.h | 15 +++
 2 files changed, 16 insertions(+), 3 deletions(-)
 create mode 100644 include/dt-bindings/regulator/mediatek,mt6397-regulator.h

diff --git a/drivers/regulator/mt6397-regulator.c 
b/drivers/regulator/mt6397-regulator.c
index 269c2a6028e8..d51e98ce1138 100644
--- a/drivers/regulator/mt6397-regulator.c
+++ b/drivers/regulator/mt6397-regulator.c
@@ -13,9 +13,7 @@
 #include 
 #include 
 #include 
-
-#define MT6397_BUCK_MODE_AUTO  0
-#define MT6397_BUCK_MODE_FORCE_PWM 1
+#include 
 
 /*
  * MT6397 regulators' information
diff --git a/include/dt-bindings/regulator/mediatek,mt6397-regulator.h 
b/include/dt-bindings/regulator/mediatek,mt6397-regulator.h
new file mode 100644
index ..99869a8665cf
--- /dev/null
+++ b/include/dt-bindings/regulator/mediatek,mt6397-regulator.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _DT_BINDINGS_REGULATOR_MEDIATEK_MT6397_H_
+#define _DT_BINDINGS_REGULATOR_MEDIATEK_MT6397_H_
+
+/*
+ * Buck mode constants which may be used in devicetree properties (eg.
+ * regulator-initial-mode, regulator-allowed-modes).
+ * See the manufacturer's datasheet for more information on these modes.
+ */
+
+#define MT6397_BUCK_MODE_AUTO  0
+#define MT6397_BUCK_MODE_FORCE_PWM 1
+
+#endif
-- 
2.27.0.212.ge8ba1cc988-goog



[PATCH v2 3/4] regulator: mt6397: Implement of_map_mode

2020-07-02 Thread Anand K Mistry
Implementing of_map_mode is necessary to be able to specify operating
modes in the devicetree using 'regulator-allowed-modes', and to change
regulator modes.

Signed-off-by: Anand K Mistry 
---

Changes in v2: None

 drivers/regulator/mt6397-regulator.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/regulator/mt6397-regulator.c 
b/drivers/regulator/mt6397-regulator.c
index d51e98ce1138..0a30df5e414f 100644
--- a/drivers/regulator/mt6397-regulator.c
+++ b/drivers/regulator/mt6397-regulator.c
@@ -53,6 +53,7 @@ struct mt6397_regulator_info {
.vsel_mask = vosel_mask,\
.enable_reg = enreg,\
.enable_mask = BIT(0),  \
+   .of_map_mode = mt6397_map_mode, \
},  \
.qi = BIT(13),  \
.vselon_reg = voselon,  \
@@ -144,6 +145,18 @@ static const unsigned int ldo_volt_table7[] = {
130, 150, 180, 200, 250, 280, 300, 330,
 };
 
+static unsigned int mt6397_map_mode(unsigned int mode)
+{
+   switch (mode) {
+   case MT6397_BUCK_MODE_AUTO:
+   return REGULATOR_MODE_NORMAL;
+   case MT6397_BUCK_MODE_FORCE_PWM:
+   return REGULATOR_MODE_FAST;
+   default:
+   return REGULATOR_MODE_INVALID;
+   }
+}
+
 static int mt6397_regulator_set_mode(struct regulator_dev *rdev,
 unsigned int mode)
 {
-- 
2.27.0.212.ge8ba1cc988-goog



[PATCH v2 0/4] regulator: mt6397: Implement of_map_mode regulator_desc function

2020-07-02 Thread Anand K Mistry
This patchset adds support for being able to change regulator modes for
the mt6397 regulator. This is needed to allow the voltage scaling
support in the MT8173 SoC to be used on the elm (Acer Chromebook R13)
and hana (several Lenovo Chromebooks) devices.

Without a of_map_mode implementation, the regulator-allowed-modes
devicetree field is skipped, and attempting to change the regulator mode
results in an error:
[1.439165] vpca15: mode operation not allowed

Changes in v2:
- Introduce constants in dt-bindings
- Improve conditional readability

Anand K Mistry (4):
  regulator: mt6397: Move buck modes into header file
  dt-bindings: regulator: mt6397: Document valid modes
  regulator: mt6397: Implement of_map_mode
  arm64: dts: mediatek: Update allowed mt6397 regulator modes for elm
boards

 .../bindings/regulator/mt6397-regulator.txt |  3 +++
 arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi|  4 +++-
 drivers/regulator/mt6397-regulator.c| 17 ++---
 .../regulator/mediatek,mt6397-regulator.h   | 15 +++
 4 files changed, 35 insertions(+), 4 deletions(-)
 create mode 100644 include/dt-bindings/regulator/mediatek,mt6397-regulator.h

-- 
2.27.0.212.ge8ba1cc988-goog



[PATCH 4/4] arm64: dts: mediatek: Update allowed regulator modes for elm boards

2020-07-01 Thread Anand K Mistry
This sets the allowed regulator modes for elm (and derivative) boards.
According to the datasheet, the da9211 does not support SLEEP mode.

Signed-off-by: Anand K Mistry 

---

 arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
index a5a12b2599a4..d4c84a856e7b 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "mt8173.dtsi"
 
 / {
@@ -294,7 +295,8 @@ da9211_vcpu_reg: BUCKA {
regulator-max-microamp  = <440>;
regulator-ramp-delay = <1>;
regulator-always-on;
-   regulator-allowed-modes = <0 1>;
+   regulator-allowed-modes = ;
};
 
da9211_vgpu_reg: BUCKB {
-- 
2.27.0.212.ge8ba1cc988-goog



[PATCH 3/4] regulator: da9211: Implement of_map_mode

2020-07-01 Thread Anand K Mistry
Implementing of_map_mode is necessary to be able to specify operating
modes in the devicetree using 'regulator-allowed-modes', and to change
regulator modes.

Signed-off-by: Anand K Mistry 
---

 drivers/regulator/da9211-regulator.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/regulator/da9211-regulator.c 
b/drivers/regulator/da9211-regulator.c
index 1f9b75b41346..297b3aa7c753 100644
--- a/drivers/regulator/da9211-regulator.c
+++ b/drivers/regulator/da9211-regulator.c
@@ -86,6 +86,20 @@ static const int da9215_current_limits[] = {
560, 580, 600, 620, 640, 660, 680, 700
 };
 
+static unsigned int da9211_map_buck_mode(unsigned int mode)
+{
+   switch (mode) {
+   case DA9211_BUCK_MODE_SLEEP:
+   return REGULATOR_MODE_STANDBY;
+   case DA9211_BUCK_MODE_SYNC:
+   return REGULATOR_MODE_FAST;
+   case DA9211_BUCK_MODE_AUTO:
+   return REGULATOR_MODE_NORMAL;
+   default:
+   return REGULATOR_MODE_INVALID;
+   }
+}
+
 static unsigned int da9211_buck_get_mode(struct regulator_dev *rdev)
 {
int id = rdev_get_id(rdev);
@@ -233,6 +247,7 @@ static const struct regulator_ops da9211_buck_ops = {
.vsel_reg = DA9211_REG_VBUCKA_A + DA9211_ID_##_id * 2,\
.vsel_mask = DA9211_VBUCK_MASK,\
.owner = THIS_MODULE,\
+   .of_map_mode = da9211_map_buck_mode,\
 }
 
 static struct regulator_desc da9211_regulators[] = {
@@ -242,8 +257,14 @@ static struct regulator_desc da9211_regulators[] = {
 
 #ifdef CONFIG_OF
 static struct of_regulator_match da9211_matches[] = {
-   [DA9211_ID_BUCKA] = { .name = "BUCKA" },
-   [DA9211_ID_BUCKB] = { .name = "BUCKB" },
+   [DA9211_ID_BUCKA] = {
+   .name = "BUCKA",
+   .desc = _regulators[DA9211_ID_BUCKA],
+   },
+   [DA9211_ID_BUCKB] = {
+   .name = "BUCKB",
+   .desc = _regulators[DA9211_ID_BUCKB],
+   },
};
 
 static struct da9211_pdata *da9211_parse_regulators_dt(
-- 
2.27.0.212.ge8ba1cc988-goog



[PATCH 1/4] regulator: da9211: Move buck modes into header file

2020-07-01 Thread Anand K Mistry
This will allow device trees to make use of these constants.

Signed-off-by: Anand K Mistry 
---

 drivers/regulator/da9211-regulator.c |  5 +
 .../dt-bindings/regulator/dlg,da9211-regulator.h | 16 
 2 files changed, 17 insertions(+), 4 deletions(-)
 create mode 100644 include/dt-bindings/regulator/dlg,da9211-regulator.h

diff --git a/drivers/regulator/da9211-regulator.c 
b/drivers/regulator/da9211-regulator.c
index 2ea4362ffa5c..1f9b75b41346 100644
--- a/drivers/regulator/da9211-regulator.c
+++ b/drivers/regulator/da9211-regulator.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "da9211-regulator.h"
 
 /* DEVICE IDs */
@@ -24,10 +25,6 @@
 #define DA9213_DEVICE_ID   0x23
 #define DA9215_DEVICE_ID   0x24
 
-#define DA9211_BUCK_MODE_SLEEP 1
-#define DA9211_BUCK_MODE_SYNC  2
-#define DA9211_BUCK_MODE_AUTO  3
-
 /* DA9211 REGULATOR IDs */
 #define DA9211_ID_BUCKA0
 #define DA9211_ID_BUCKB1
diff --git a/include/dt-bindings/regulator/dlg,da9211-regulator.h 
b/include/dt-bindings/regulator/dlg,da9211-regulator.h
new file mode 100644
index ..cdce2d54c8ba
--- /dev/null
+++ b/include/dt-bindings/regulator/dlg,da9211-regulator.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _DT_BINDINGS_REGULATOR_DLG_DA9211_H
+#define _DT_BINDINGS_REGULATOR_DLG_DA9211_H
+
+/*
+ * These buck mode constants may be used to specify values in device tree
+ * properties (e.g. regulator-initial-mode, regulator-allowed-modes).
+ * A description of the following modes is in the manufacturers datasheet.
+ */
+
+#define DA9211_BUCK_MODE_SLEEP 1
+#define DA9211_BUCK_MODE_SYNC  2
+#define DA9211_BUCK_MODE_AUTO  3
+
+#endif
-- 
2.27.0.212.ge8ba1cc988-goog



[PATCH 0/4] regulator: da9211: support changing modes

2020-07-01 Thread Anand K Mistry
This patchset adds support for being able to change regulator modes for
the da9211 regulator. This is needed to allow the voltage scaling
support in the MT8173 SoC to be used in the elm (Acer Chromebook R13)
and hana (several Lenovo Chromebooks) devices.


Anand K Mistry (4):
  regulator: da9211: Move buck modes into header file
  dt-bindings: regulator: da9211: Document allowed modes
  regulator: da9211: Implement of_map_mode
  arm64: dts: mediatek: Update allowed regulator modes for elm boards

 .../devicetree/bindings/regulator/da9211.txt  |  4 +++
 arch/arm64/boot/dts/mediatek/mt8173-elm.dtsi  |  4 ++-
 drivers/regulator/da9211-regulator.c  | 30 +++
 .../regulator/dlg,da9211-regulator.h  | 16 ++
 4 files changed, 47 insertions(+), 7 deletions(-)
 create mode 100644 include/dt-bindings/regulator/dlg,da9211-regulator.h

-- 
2.27.0.212.ge8ba1cc988-goog



[PATCH 2/4] dt-bindings: regulator: da9211: Document allowed modes

2020-07-01 Thread Anand K Mistry
This patch adds a description of how operating modes may be specified.

Signed-off-by: Anand K Mistry 
---

 Documentation/devicetree/bindings/regulator/da9211.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/regulator/da9211.txt 
b/Documentation/devicetree/bindings/regulator/da9211.txt
index 27717e816e71..eb871447d508 100644
--- a/Documentation/devicetree/bindings/regulator/da9211.txt
+++ b/Documentation/devicetree/bindings/regulator/da9211.txt
@@ -15,6 +15,8 @@ Required properties:
 Optional properties:
 - enable-gpios: platform gpio for control of BUCKA/BUCKB.
 - Any optional property defined in regulator.txt
+  - regulator-initial-mode and regulator-allowed-modes may be specified using
+mode values from dt-bindings/regulator/dlg,da9211-regulator.h
 
 Example 1) DA9211
pmic: da9211@68 {
@@ -30,6 +32,8 @@ Example 1) DA9211
regulator-min-microamp  = <200>;
regulator-max-microamp  = <500>;
enable-gpios = < 27 0>;
+   regulator-allowed-modes = ;
};
};
};
-- 
2.27.0.212.ge8ba1cc988-goog



[PATCH] regulator: mt6397: Implement of_map_mode regulator_desc function

2020-06-29 Thread Anand K Mistry
Without a of_map_mode implementation, the regulator-allowed-modes
devicetree field is skipped, and attempting to change the regulator mode
results in an error:
[1.439165] vpca15: mode operation not allowed

Signed-off-by: Anand K Mistry 

---

 drivers/regulator/mt6397-regulator.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/regulator/mt6397-regulator.c 
b/drivers/regulator/mt6397-regulator.c
index 269c2a6028e8..5c60b52addf1 100644
--- a/drivers/regulator/mt6397-regulator.c
+++ b/drivers/regulator/mt6397-regulator.c
@@ -55,6 +55,7 @@ struct mt6397_regulator_info {
.vsel_mask = vosel_mask,\
.enable_reg = enreg,\
.enable_mask = BIT(0),  \
+   .of_map_mode = mt6397_map_mode, \
},  \
.qi = BIT(13),  \
.vselon_reg = voselon,  \
@@ -146,6 +147,12 @@ static const unsigned int ldo_volt_table7[] = {
130, 150, 180, 200, 250, 280, 300, 330,
 };
 
+static unsigned int mt6397_map_mode(unsigned int mode)
+{
+   return mode == MT6397_BUCK_MODE_AUTO ?
+   REGULATOR_MODE_NORMAL : REGULATOR_MODE_FAST;
+}
+
 static int mt6397_regulator_set_mode(struct regulator_dev *rdev,
 unsigned int mode)
 {
-- 
2.27.0.212.ge8ba1cc988-goog



Re: [PATCH v3] perf record: Use an eventfd to wakeup when done

2020-05-24 Thread Anand K. Mistry
On Sat, 23 May 2020 at 23:35, Andi Kleen  wrote:
>
> Anand K Mistry  writes:
> >   }
> >
> > + done_fd = eventfd(0, EFD_NONBLOCK);
>
> This will make perf depend on a recent glibc or other library
> that implements eventfd. Wouldn't surprise me if some kind
> of build time check is needed for this to pass all of Arnaldo's
> built tests.

Looks like Arnaldo made that change when merging:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=perf/core=e9db221d37f91409040cf7f3fbed08b44e055ae9

This makes me curious. How old a kernel should modern tools support?
>From the man page, eventfd was added in 2.6.22 (and eventfd2 in
2.6.27), which was 2007 (or 2008 for eventfd2) and glibc-2.8 which was
2008. I understand the kernel's policy of never breaking userspace,
but what about userspace tools?

>
>
> -Andi



-- 
Anand K. Mistry
Software Engineer
Google Australia


Re: [PATCH] perf record: Use an eventfd to wakeup when done

2020-05-12 Thread Anand K. Mistry
On Wed, 13 May 2020 at 00:12, Arnaldo Carvalho de Melo
 wrote:
>
> Em Tue, May 12, 2020 at 02:12:32PM +0200, Jiri Olsa escreveu:
> > On Tue, May 12, 2020 at 02:59:36PM +1000, Anand K Mistry wrote:
> >
> > SNIP
> >
> > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > > index 1ab349abe90469..099ecaa66732a2 100644
> > > --- a/tools/perf/builtin-record.c
> > > +++ b/tools/perf/builtin-record.c
> > > @@ -53,6 +53,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -518,15 +519,28 @@ static int record__pushfn(struct mmap *map, void 
> > > *to, void *bf, size_t size)
> > >
> > >  static volatile int signr = -1;
> > >  static volatile int child_finished;
> > > +static int done_fd = -1;
> > >
> > >  static void sig_handler(int sig)
> > >  {
> > > +   u64 tmp = 1;
> > > if (sig == SIGCHLD)
> > > child_finished = 1;
> > > else
> > > signr = sig;
> > >
> > > done = 1;
> > > +
> > > +   /*
> > > +* It is possible for this signal handler to run after done is checked
> > > +* in the main loop, but before the perf counter fds are polled. If 
> > > this
> > > +* happens, the poll() will continue to wait even though done is set,
> > > +* and will only break out if either another signal is received, or 
> > > the
> > > +* counters are ready for read. To ensure the poll() doesn't sleep 
> > > when
> > > +* done is set, use an eventfd (done_fd) to wake up the poll().
> > > +*/
> > > +   if (write(done_fd, , sizeof(tmp)) < 0)
> > > +   pr_err("failed to signal wakeup fd\n");
> > >  }
> > >
> > >  static void sigsegv_handler(int sig)
> > > @@ -1424,6 +1438,17 @@ static int __cmd_record(struct record *rec, int 
> > > argc, const char **argv)
> > > int fd;
> > > float ratio = 0;
> > >
> > > +   done_fd = eventfd(0, EFD_NONBLOCK);
> > > +   if (done_fd < 0) {
> > > +   pr_err("Failed to create wakeup eventfd, error: %m\n");
> > > +   return -1;
> > > +   }
> > > +   err = evlist__add_pollfd(rec->evlist, done_fd);
> > > +   if (err < 0) {
> > > +   pr_err("Failed to add wakeup eventfd to poll list\n");
> > > +   return -1;
> > > +   }
> >
> > sorry I did not notice before, but I think we also
> > need to close done_fd descriptor on the exit path
> >
> > also please change subject to PATCHv3 for the next version

Apologies. I'm still getting the hang of this.

>
> Yeah, and, and don't take this as a requirement for this patch to be
> processed, this can be made as a follow up patch by you or someone else
> (me, maybe :)), that maybe tools/perf/builtin-top.c and
> tools/perf/builtin-trace.c have the same issue?
>
> Could you please take a look there as well?

I looked at 'top', 'trace', and 'kvm'. kvm doesn't really have this
issue because
the poll() has a 100ms timeout. Even though it's technically affected,
the timeout
will make it unnoticeable (just delaying the exit for 100ms). top is
in the same boat
(uses a timeout).

trace is the affected one because it has the following code:
int timeout = done ? 100 : -1;
if (!draining && evlist__poll(evlist, timeout) > 0) {

Different logic, but still a gap and an indefinite timeout.

>
> - Arnaldo
>
> > thanks,
> > jirka
> >
> > > +
> > > atexit(record__sig_exit);
> > > signal(SIGCHLD, sig_handler);
> > > signal(SIGINT, sig_handler);
> > > --
> > > 2.26.2.645.ge9eca65c58-goog
> > >
> >
>
> --
>
> - Arnaldo



-- 
Anand K. Mistry
Software Engineer
Google Australia


[PATCH v3] perf record: Use an eventfd to wakeup when done

2020-05-12 Thread Anand K Mistry
The setting and checking of 'done' contains a rare race where the signal
handler setting 'done' is run after checking to break the loop, but
before waiting in evlist__poll(). In this case, the main loop won't wake
up until either another signal is sent, or the perf data fd causes a
wake up.

The following simple script can trigger this condition (but you might
need to run it for several hours):
for ((i = 0; i >= 0; i++)) ; do
  echo "Loop $i"
  delay=$(echo "scale=4; 0.1 * $RANDOM/32768" | bc)
  ./perf record -- sleep 3000 >/dev/null&
  pid=$!
  sleep $delay
  kill -TERM $pid
  echo "PID $pid"
  wait $pid
done

At some point, the loop will stall. Adding logging, even though perf has
received the SIGTERM and set 'done = 1', perf will remain sleeping until
a second signal is sent.

Signed-off-by: Anand K Mistry 

---

Changes in v3:
- Move done_fd creation to below session initialisation
- Close done_fd on exit
- Log errno when write(done_fd) fails

Changes in v2:
- Added comment to signal handler explaining why the eventfd is added
- Added error handling when creating done_fd

 tools/perf/builtin-record.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 1ab349abe90469..a1af6857f24748 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -518,15 +519,28 @@ static int record__pushfn(struct mmap *map, void *to, 
void *bf, size_t size)
 
 static volatile int signr = -1;
 static volatile int child_finished;
+static int done_fd = -1;
 
 static void sig_handler(int sig)
 {
+   u64 tmp = 1;
if (sig == SIGCHLD)
child_finished = 1;
else
signr = sig;
 
done = 1;
+
+   /*
+* It is possible for this signal handler to run after done is checked
+* in the main loop, but before the perf counter fds are polled. If this
+* happens, the poll() will continue to wait even though done is set,
+* and will only break out if either another signal is received, or the
+* counters are ready for read. To ensure the poll() doesn't sleep when
+* done is set, use an eventfd (done_fd) to wake up the poll().
+*/
+   if (write(done_fd, , sizeof(tmp)) < 0)
+   pr_err("failed to signal wakeup fd, error: %m\n");
 }
 
 static void sigsegv_handler(int sig)
@@ -1466,6 +1480,19 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
return -1;
}
 
+   done_fd = eventfd(0, EFD_NONBLOCK);
+   if (done_fd < 0) {
+   pr_err("Failed to create wakeup eventfd, error: %m\n");
+   status = -1;
+   goto out_delete_session;
+   }
+   err = evlist__add_pollfd(rec->evlist, done_fd);
+   if (err < 0) {
+   pr_err("Failed to add wakeup eventfd to poll list\n");
+   status = err;
+   goto out_delete_session;
+   }
+
session->header.env.comp_type  = PERF_COMP_ZSTD;
session->header.env.comp_level = rec->opts.comp_level;
 
@@ -1827,6 +1854,8 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
}
 
 out_delete_session:
+   if (done_fd >= 0)
+   close(done_fd);
zstd_fini(>zstd_data);
perf_session__delete(session);
 
-- 
2.26.2.645.ge9eca65c58-goog



[PATCH] perf record: Use an eventfd to wakeup when done

2020-05-11 Thread Anand K Mistry
The setting and checking of 'done' contains a rare race where the signal
handler setting 'done' is run after checking to break the loop, but
before waiting in evlist__poll(). In this case, the main loop won't wake
up until either another signal is sent, or the perf data fd causes a
wake up.

The following simple script can trigger this condition (but you might
need to run it for several hours):
for ((i = 0; i >= 0; i++)) ; do
  echo "Loop $i"
  delay=$(echo "scale=4; 0.1 * $RANDOM/32768" | bc)
  ./perf record -- sleep 3000 >/dev/null&
  pid=$!
  sleep $delay
  kill -TERM $pid
  echo "PID $pid"
  wait $pid
done

At some point, the loop will stall. Adding logging, even though perf has
received the SIGTERM and set 'done = 1', perf will remain sleeping until
a second signal is sent.

Signed-off-by: Anand K Mistry 

---

 tools/perf/builtin-record.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 1ab349abe90469..099ecaa66732a2 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -518,15 +519,28 @@ static int record__pushfn(struct mmap *map, void *to, 
void *bf, size_t size)
 
 static volatile int signr = -1;
 static volatile int child_finished;
+static int done_fd = -1;
 
 static void sig_handler(int sig)
 {
+   u64 tmp = 1;
if (sig == SIGCHLD)
child_finished = 1;
else
signr = sig;
 
done = 1;
+
+   /*
+* It is possible for this signal handler to run after done is checked
+* in the main loop, but before the perf counter fds are polled. If this
+* happens, the poll() will continue to wait even though done is set,
+* and will only break out if either another signal is received, or the
+* counters are ready for read. To ensure the poll() doesn't sleep when
+* done is set, use an eventfd (done_fd) to wake up the poll().
+*/
+   if (write(done_fd, , sizeof(tmp)) < 0)
+   pr_err("failed to signal wakeup fd\n");
 }
 
 static void sigsegv_handler(int sig)
@@ -1424,6 +1438,17 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
int fd;
float ratio = 0;
 
+   done_fd = eventfd(0, EFD_NONBLOCK);
+   if (done_fd < 0) {
+   pr_err("Failed to create wakeup eventfd, error: %m\n");
+   return -1;
+   }
+   err = evlist__add_pollfd(rec->evlist, done_fd);
+   if (err < 0) {
+   pr_err("Failed to add wakeup eventfd to poll list\n");
+   return -1;
+   }
+
atexit(record__sig_exit);
signal(SIGCHLD, sig_handler);
signal(SIGINT, sig_handler);
-- 
2.26.2.645.ge9eca65c58-goog



[PATCH] perf record: Use an eventfd to wakeup when done

2020-05-07 Thread Anand K Mistry
The setting and checking of 'done' contains a rare race where the signal
handler setting 'done' is run after checking to break the loop, but
before waiting in evlist__poll(). In this case, the main loop won't wake
up until either another signal is sent, or the perf data fd causes a
wake up.

The following simple script can trigger this condition (but you might
need to run it for several hours):
for ((i = 0; i >= 0; i++)) ; do
  echo "Loop $i"
  delay=$(echo "scale=4; 0.1 * $RANDOM/32768" | bc)
  ./perf record -- sleep 3000 >/dev/null&
  pid=$!
  sleep $delay
  kill -TERM $pid
  echo "PID $pid"
  wait $pid
done

At some point, the loop will stall. Adding logging, even though perf has
received the SIGTERM and set 'done = 1', perf will remain sleeping until
a second signal is sent.

Signed-off-by: Anand K Mistry 

---

 tools/perf/builtin-record.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 1ab349abe90469..ce5fc3860131d2 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -518,15 +519,19 @@ static int record__pushfn(struct mmap *map, void *to, 
void *bf, size_t size)
 
 static volatile int signr = -1;
 static volatile int child_finished;
+static int done_fd = -1;
 
 static void sig_handler(int sig)
 {
+   u64 tmp = 1;
if (sig == SIGCHLD)
child_finished = 1;
else
signr = sig;
 
done = 1;
+   if (write(done_fd, , sizeof(tmp)) < 0)
+   pr_err("failed to signal wakeup fd\n");
 }
 
 static void sigsegv_handler(int sig)
@@ -1424,6 +1429,9 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
int fd;
float ratio = 0;
 
+   done_fd = eventfd(0, EFD_NONBLOCK);
+   evlist__add_pollfd(rec->evlist, done_fd);
+
atexit(record__sig_exit);
signal(SIGCHLD, sig_handler);
signal(SIGINT, sig_handler);
-- 
2.26.2.645.ge9eca65c58-goog