[rcu:dev.2020.12.23a 133/149] kernel/time/clocksource.c:220:6: sparse: sparse: symbol 'clocksource_verify_one_cpu' was not declared. Should it be
tree: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2020.12.23a head: 7cc07f4867eb9618d4f7c35ddfbd746131b52f51 commit: 6a70298420b2bd6d3e3dc86d81b993f618df8569 [133/149] clocksource: Check per-CPU clock synchronization when marked unstable config: x86_64-randconfig-s021-20201223 (attached as .config) compiler: gcc-9 (Debian 9.3.0-15) 9.3.0 reproduce: # apt-get install sparse # sparse version: v0.6.3-184-g1b896707-dirty # https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?id=6a70298420b2bd6d3e3dc86d81b993f618df8569 git remote add rcu https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git git fetch --no-tags rcu dev.2020.12.23a git checkout 6a70298420b2bd6d3e3dc86d81b993f618df8569 # save the attached .config to linux build tree make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot "sparse warnings: (new ones prefixed by >>)" >> kernel/time/clocksource.c:220:6: sparse: sparse: symbol >> 'clocksource_verify_one_cpu' was not declared. Should it be static? Please review and possibly fold the followup patch. --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: Time to re-enable Runtime PM per default for PCI devcies?
On 17.11.2020 17:57, Rafael J. Wysocki wrote: > On Tue, Nov 17, 2020 at 5:38 PM Bjorn Helgaas wrote: >> >> [+to Rafael, author of the commit you mentioned, >> +cc Mika, Kai Heng, Lukas, linux-pm, linux-kernel] >> >> On Tue, Nov 17, 2020 at 04:56:09PM +0100, Heiner Kallweit wrote: >>> More than 10 yrs ago Runtime PM was disabled per default by bb910a7040 >>> ("PCI/PM Runtime: Make runtime PM of PCI devices inactive by default"). >>> >>> Reason given: "avoid breakage on systems where ACPI-based wake-up is >>> known to fail for some devices" >>> Unfortunately the commit message doesn't mention any affected devices >>> or systems. > > Even if it did that, it wouldn't have been a full list almost for sure. > > We had received multiple problem reports related to that, most likely > because the ACPI PM in BIOSes at that time was tailored for > system-wide PM transitions only. > To follow up on this discussion: We could call pm_runtime_forbid() conditionally, e.g. with the following condition. This would enable runtime pm per default for all non-ACPI systems, and it uses the BIOS date as an indicator for a hopefully not that broken ACPI implementation. However I could understand the argument that this looks a little hacky .. if (IS_ENABLED(CONFIG_ACPI) && dmi_get_bios_year() <= 2016) >>> With Runtime PM disabled e.g. the PHY on network devices may remain >>> powered up even with no cable plugged in, affecting battery lifetime >>> on mobile devices. Currently we have to rely on the respective distro >>> or user to enable Runtime PM via sysfs (echo auto > power/control). >>> Some devices work around this restriction by calling pm_runtime_allow >>> in their probe routine, even though that's not recommended by >>> https://www.kernel.org/doc/Documentation/power/pci.txt >>> >>> Disabling Runtime PM per default seems to be a big hammer, a quirk >>> for affected devices / systems may had been better. And we still >>> have the option to disable Runtime PM for selected devices via sysfs. >>> >>> So, to cut a long story short: Wouldn't it be time to remove this >>> restriction? >> >> I don't know the history of this, but maybe Rafael or the others can >> shed some light on it. > > The systems that had those problems 10 years ago would still have > them, but I expect there to be more systems where runtime PM can be > enabled by default for PCI devices without issues. >
Re: [rcu:dev.2020.12.23a 133/149] kernel/time/clocksource.c:220:6: warning: no previous prototype for function 'clocksource_verify_one_cpu'
On Fri, Dec 25, 2020 at 06:55:07PM +0800, kernel test robot wrote: > tree: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git > dev.2020.12.23a > head: 7cc07f4867eb9618d4f7c35ddfbd746131b52f51 > commit: 6a70298420b2bd6d3e3dc86d81b993f618df8569 [133/149] clocksource: Check > per-CPU clock synchronization when marked unstable > config: x86_64-randconfig-r013-20201223 (attached as .config) > compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project > cee1e7d14f4628d6174b33640d502bff3b54ae45) > reproduce (this is a W=1 build): > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # install x86_64 cross compiling tool for clang build > # apt-get install binutils-x86-64-linux-gnu > # > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?id=6a70298420b2bd6d3e3dc86d81b993f618df8569 > git remote add rcu > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git > git fetch --no-tags rcu dev.2020.12.23a > git checkout 6a70298420b2bd6d3e3dc86d81b993f618df8569 > # save the attached .config to linux build tree > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross > ARCH=x86_64 > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot Good catch! I will fold the fix into the original with attribution, thank you! Thanx,Paul > All warnings (new ones prefixed by >>): > > >> kernel/time/clocksource.c:220:6: warning: no previous prototype for > >> function 'clocksource_verify_one_cpu' [-Wmissing-prototypes] >void clocksource_verify_one_cpu(void *csin) > ^ >kernel/time/clocksource.c:220:1: note: declare 'static' if the function is > not intended to be used outside of this translation unit > void clocksource_verify_one_cpu(void *csin) >^ >static >1 warning generated. > > > vim +/clocksource_verify_one_cpu +220 kernel/time/clocksource.c > >219 > > 220void clocksource_verify_one_cpu(void *csin) >221{ >222struct clocksource *cs = (struct clocksource *)csin; >223 >224__this_cpu_write(csnow_mid, cs->read(cs)); >225} >226 > > --- > 0-DAY CI Kernel Test Service, Intel Corporation > https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
[rcu:dev.2020.12.23a 133/149] kernel/time/clocksource.c:220:6: warning: no previous prototype for function 'clocksource_verify_one_cpu'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2020.12.23a head: 7cc07f4867eb9618d4f7c35ddfbd746131b52f51 commit: 6a70298420b2bd6d3e3dc86d81b993f618df8569 [133/149] clocksource: Check per-CPU clock synchronization when marked unstable config: x86_64-randconfig-r013-20201223 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cee1e7d14f4628d6174b33640d502bff3b54ae45) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu # https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?id=6a70298420b2bd6d3e3dc86d81b993f618df8569 git remote add rcu https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git git fetch --no-tags rcu dev.2020.12.23a git checkout 6a70298420b2bd6d3e3dc86d81b993f618df8569 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> kernel/time/clocksource.c:220:6: warning: no previous prototype for function >> 'clocksource_verify_one_cpu' [-Wmissing-prototypes] void clocksource_verify_one_cpu(void *csin) ^ kernel/time/clocksource.c:220:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void clocksource_verify_one_cpu(void *csin) ^ static 1 warning generated. vim +/clocksource_verify_one_cpu +220 kernel/time/clocksource.c 219 > 220 void clocksource_verify_one_cpu(void *csin) 221 { 222 struct clocksource *cs = (struct clocksource *)csin; 223 224 __this_cpu_write(csnow_mid, cs->read(cs)); 225 } 226 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
[PATCH v3 17/21] x86/fpu/amx: Define AMX state components and have it used for boot-time checks
Linux uses check_xstate_against_struct() to sanity check the size of XSTATE-enabled features. AMX is the XSAVE-enabled feature, and its size is not hard-coded but discoverable at run-time via CPUID. The AMX state is composed of state components 17 and 18, which are all user state components. The first component is the XTILECFG state of a 64-byte tile-related control register. The state component 18, called XTILEDATA, contains the actual tile data, and the state size varies on implementations. The architectural maximum, as defined in the CPUID(0x1d, 1): EAX[15:0], is a byte less than 64KB. The first implementation supports 8KB. Check the XTILEDATA state size dynamically. The feature introduces the new tile register, TMM. Define one register struct only and read the number of registers from CPUID. Cross-check the overall size with CPUID again. Signed-off-by: Chang S. Bae Reviewed-by: Len Brown Cc: x...@kernel.org Cc: linux-kernel@vger.kernel.org --- Changes from v2: * Updated the code comments. Changes from v1: * Rebased on the upstream kernel (5.10) --- arch/x86/include/asm/fpu/types.h | 27 ++ arch/x86/include/asm/fpu/xstate.h | 2 + arch/x86/kernel/fpu/xstate.c | 62 +++ 3 files changed, 91 insertions(+) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index 3fc6dbbe3ede..bf9511efd546 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -120,6 +120,9 @@ enum xfeature { XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, + XFEATURE_RSRVD_COMP_16, + XFEATURE_XTILE_CFG, + XFEATURE_XTILE_DATA, XFEATURE_MAX, }; @@ -136,11 +139,15 @@ enum xfeature { #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID(1 << XFEATURE_PASID) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) +#define XFEATURE_MASK_XTILE_CFG(1 << XFEATURE_XTILE_CFG) +#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) #define XFEATURE_MASK_FPSSE(XFEATURE_MASK_FP | XFEATURE_MASK_SSE) #define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \ | XFEATURE_MASK_ZMM_Hi256 \ | XFEATURE_MASK_Hi16_ZMM) +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILE_DATA \ +| XFEATURE_MASK_XTILE_CFG) #define FIRST_EXTENDED_XFEATUREXFEATURE_YMM @@ -153,6 +160,9 @@ struct reg_256_bit { struct reg_512_bit { u8 regbytes[512/8]; }; +struct reg_1024_byte { + u8 regbytes[1024]; +}; /* * State component 2: @@ -255,6 +265,23 @@ struct arch_lbr_state { u64 ler_to; u64 ler_info; struct lbr_entryentries[]; +}; + +/* + * State component 17: 64-byte tile configuration register. + */ +struct xtile_cfg { + u64 tcfg[8]; +} __packed; + +/* + * State component 18: 1KB tile data register. + * Each register represents 16 64-byte rows of the matrix + * data. But the number of registers depends on the actual + * implementation. + */ +struct xtile_data { + struct reg_1024_bytetmm; } __packed; /* diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index 5927033e017f..08d3dd18d7d8 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -13,6 +13,8 @@ #define XSTATE_CPUID 0x000d +#define TILE_CPUID 0x001d + #define FXSAVE_SIZE512 #define XSAVE_HDR_SIZE 64 diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index c2acfee581ba..f54ff1d4a44b 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -41,6 +41,14 @@ static const char *xfeature_names[] = "Protection Keys User registers", "PASID state", "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "AMX Tile config" , + "AMX Tile data" , + "unknown xstate feature", }; struct xfeature_capflag_info { @@ -60,6 +68,8 @@ static struct xfeature_capflag_info xfeature_capflags[] __initdata = { { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, X86_FEATURE_INTEL_PT }, { XFEATURE_PKRU,X86_FEATURE_PKU }, { XFEATURE_PASID, X86_FEATURE_ENQCMD }, + { XFEATURE_XTILE_CFG, X86_FEATURE_AMX_TILE }, + { XFEATURE_XTILE_DATA, X86_FEATURE
NASA scientists achieve long-distance 'quantum teleportation' over 27 miles for the first time – paving the way for unhackable networks that transfer data faster than the speed of light
Subject: NASA scientists achieve long-distance 'quantum teleportation' over 27 miles for the first time – paving the way for unhackable networks that transfer data faster than the speed of light Good day from Singapore, I am sharing the below news article: News Article: NASA scientists achieve long-distance 'quantum teleportation' over 27 miles for the first time – paving the way for unhackable networks that transfer data faster than the speed of light Author: JOE PINKSTONE FOR MAILONLINE Date Published: 22 December 2020 Link: https://www.dailymail.co.uk/sciencetech/article-9078855/NASA-scientists-achieve-long-distance-quantum-teleportation-time.html Publisher: MailOnline UK Synopis: - Scientists built a 27-mile long prototype quantum internet in the US - They successfully used quantum entanglement to teleport signals instantly - The phenomenon sees qubits, the quantum equivalent of computer bits, pair up and respond instantly Scientists have demonstrated long-distance 'quantum teleportation' – the instant transfer of units of quantum information known as qubits – for the first time. The qubits were transferred faster than the speed of light over a distance of 27 miles, laying the foundations for a quantum internet service, which could one day revolutionise computing. Quantum communication systems are faster and more secure than regular networks because they use photons rather than computer code, which can be hacked. But their development relies on cutting-edge scientific theory which transforms our understanding of how computers work. In a quantum internet, information stored in qubits (the quantum equivalent of computer bits) is shuttled, or 'teleported', over long distances through entanglement. Entanglement is a phenomenon whereby two particles are linked in such a way that information shared with one is shared with the other at exactly the same time. This means that the quantum state of each particle is dependent on the state of the other – even when they are separated by a large distance. Quantum teleportation, therefore, is the transfer of quantum states from one location to the other. However, it is highly sensitive to environmental interference that can easily disrupt the quality or 'fidelity' of teleportation, so proving the theory in practice has been technologically challenging. In their latest experiment, researchers from Caltech, NASA, and Fermilab (Fermi National Accelerator Laboratory) built a unique system between two labs separated by 27 miles (44km). The system comprises three nodes which interact with one another to trigger a sequence of qubits, which pass a signal from one place to the other instantly. The 'teleportation' is instant, occurring faster than the speed of light, and the researchers reported a fidelity of more than 90 percent, according to the new study, published in PRX Quantum. Fidelity is used to measure how close the resulting qubit signal is to the original message that was sent. 'This high fidelity is important especially in the case of quantum networks designed to connect advanced quantum devices, including quantum sensors,' explains Professor Maria Spiropulu from Caltech. The findings of the project are crucial to hopes of a future quantum internet as well as pushing the boundaries of what scientists known about the quantum realm. Although the technology is yet to reach the point of being rolled out beyond sophisticated tests such as this, there are already plans for how policy makers will employ the technology. For example, the US Department of Energy hopes to erect a quantum network between its laboratories across the states. The power of a quantum computer running on quantum internet will likely exceed the speeds of the world's current most sophisticated supercomputers by around 100 trillion times. 'People on social media are asking if they should sign up for a quantum internet provider (jokingly of course),' Professor Spiropulu told Motherboard. 'We need (a lot) more R&D work.' WHAT IS QUANTUM ENTANGLEMENT? In quantum physics, entangled particles remain connected so that actions performed by one affects the behaviour of the other, even if they are separated by huge distances. This means if you measure, 'up' for the spin of one photon from an entangled pair, the spin of the other, measured an instant later, will be 'down' - even if the two are on opposite sides of the world. Entanglement takes place when a part of particles interact physically. For instance, a laser beam fired through a certain type of crystal can cause individual light particles to be split into pairs of entangled photons. The theory that so riled Einstein is also referred to as 'spooky action at a distance'. Einstein wasn't happy with theory, because
[PATCH AUTOSEL 5.4 106/130] iwlwifi: avoid endless HW errors at assert time
From: Mordechay Goodstein [ Upstream commit 861bae42e1f125a5a255ace3ccd731e59f58ddec ] Curretly we only mark HW error state "after" trying to collect HW data, but if any HW error happens while colleting HW data we go into endless loop. avoid this by setting HW error state "before" collecting HW data. Signed-off-by: Mordechay Goodstein Signed-off-by: Luca Coelho Link: https://lore.kernel.org/r/iwlwifi.20201209231352.4c7e5a87da15.Ic35b2f28ff08f7ac23143c80f224d52eb97a0454@changeid Signed-off-by: Luca Coelho Signed-off-by: Sasha Levin --- drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c index 3acbd5b7ab4b2..87f53810fdac3 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c @@ -1291,6 +1291,12 @@ void iwl_mvm_nic_restart(struct iwl_mvm *mvm, bool fw_error) } else if (mvm->fwrt.cur_fw_img == IWL_UCODE_REGULAR && mvm->hw_registered && !test_bit(STATUS_TRANS_DEAD, &mvm->trans->status)) { + /* This should be first thing before trying to collect any +* data to avoid endless loops if any HW error happens while +* collecting debug data. +*/ + set_bit(IWL_MVM_STATUS_HW_RESTART_REQUESTED, &mvm->status); + if (mvm->fw->ucode_capa.error_log_size) { u32 src_size = mvm->fw->ucode_capa.error_log_size; u32 src_addr = mvm->fw->ucode_capa.error_log_addr; @@ -1309,7 +1315,6 @@ void iwl_mvm_nic_restart(struct iwl_mvm *mvm, bool fw_error) if (fw_error && mvm->fw_restart > 0) mvm->fw_restart--; - set_bit(IWL_MVM_STATUS_HW_RESTART_REQUESTED, &mvm->status); ieee80211_restart_hw(mvm->hw); } } -- 2.27.0
Re: [PATCH] MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK
On Thu, Dec 17, 2020 at 8:16 AM Lukas Bulwahn wrote: > > The current pattern in the file entry does not make the files in the > governors subdirectory to be a part of the CPU IDLE TIME MANAGEMENT > FRAMEWORK. > > Adjust the file pattern to include files in governors. > > Signed-off-by: Lukas Bulwahn > --- > applies cleanly on current master and next-20201215 > > MAINTAINERS | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 952731d1e43c..ac679aa00e0d 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -4596,7 +4596,7 @@ B:https://bugzilla.kernel.org > T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git > F: Documentation/admin-guide/pm/cpuidle.rst > F: Documentation/driver-api/pm/cpuidle.rst > -F: drivers/cpuidle/* > +F: drivers/cpuidle/ > F: include/linux/cpuidle.h > > CPU POWER MONITORING SUBSYSTEM > -- Applied as 5.11-rc material, thanks! >
[PATCH v1 13/15] powerpc/32: Enable instruction translation at the same time as data translation
On 8xx, kernel text is pinned. On book3s/32, kernel text is mapped by BATs. Enable instruction translation at the same time as data translation, it makes things simpler. In syscall handler, MSR_RI can also be set at the same time because srr0/srr1 are already saved and r1 is set properly. Also update comment in power_save_ppc32_restore(). Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/entry_32.S | 15 - arch/powerpc/kernel/head_6xx_8xx.h | 35 +++--- arch/powerpc/kernel/idle_6xx.S | 4 +--- 3 files changed, 28 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 9ef75efaff47..2c38106c2c93 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -213,12 +213,8 @@ transfer_to_handler_cont: 3: mflrr9 tovirt_novmstack r2, r2 /* set r2 to current */ - tovirt_vmstack r9, r9 lwz r11,0(r9) /* virtual address of handler */ lwz r9,4(r9)/* where to go when done */ -#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS) - mtspr SPRN_NRI, r0 -#endif #ifdef CONFIG_TRACE_IRQFLAGS /* * When tracing IRQ state (lockdep) we enable the MMU before we call @@ -235,6 +231,11 @@ transfer_to_handler_cont: /* MSR isn't changing, just transition directly */ #endif +#ifdef CONFIG_HAVE_ARCH_VMAP_STACK + mtctr r11 + mtlrr9 + bctr/* jump to handler */ +#else mtspr SPRN_SRR0,r11 mtspr SPRN_SRR1,r10 mtlrr9 @@ -242,6 +243,7 @@ transfer_to_handler_cont: #ifdef CONFIG_40x b . /* Prevent prefetch past rfi */ #endif +#endif #if defined (CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500) 4: rlwinm r12,r12,0,~_TLF_NAPPING @@ -261,7 +263,9 @@ _ASM_NOKPROBE_SYMBOL(transfer_to_handler) _ASM_NOKPROBE_SYMBOL(transfer_to_handler_cont) #ifdef CONFIG_TRACE_IRQFLAGS -1: /* MSR is changing, re-enable MMU so we can notify lockdep. We need to +1: +#ifndef CONFIG_HAVE_ARCH_VMAP_STACK + /* MSR is changing, re-enable MMU so we can notify lockdep. We need to * keep interrupts disabled at this point otherwise we might risk * taking an interrupt before we tell lockdep they are enabled. */ @@ -276,6 +280,7 @@ _ASM_NOKPROBE_SYMBOL(transfer_to_handler_cont) #endif reenable_mmu: +#endif /* * We save a bunch of GPRs, * r3 can be different from GPR3(r1) at this point, r9 and r11 diff --git a/arch/powerpc/kernel/head_6xx_8xx.h b/arch/powerpc/kernel/head_6xx_8xx.h index 11b608b6f4b7..bedbf37c2a0c 100644 --- a/arch/powerpc/kernel/head_6xx_8xx.h +++ b/arch/powerpc/kernel/head_6xx_8xx.h @@ -49,10 +49,14 @@ .endm .macro EXCEPTION_PROLOG_2 handle_dar_dsisr=0 - li r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */ - mtmsr r11 - isync + li r11, MSR_KERNEL & ~MSR_RI /* re-enable MMU */ + mtspr SPRN_SRR1, r11 + lis r11, 1f@h + ori r11, r11, 1f@l + mtspr SPRN_SRR0, r11 mfspr r11, SPRN_SPRG_SCRATCH2 + rfi +1: stw r11,GPR1(r1) stw r11,0(r1) mr r11, r1 @@ -75,7 +79,7 @@ .endif lwz r9, SRR1(r12) lwz r12, SRR0(r12) - li r10, MSR_KERNEL & ~MSR_IR /* can take exceptions */ + li r10, MSR_KERNEL /* can take exceptions */ mtmsr r10 /* (except for mach check in rtas) */ stw r0,GPR0(r11) lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */ @@ -95,9 +99,13 @@ lwz r1,TASK_STACK-THREAD(r12) beq-99f addir1, r1, THREAD_SIZE - INT_FRAME_SIZE - li r10, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */ - mtmsr r10 - isync + li r10, MSR_KERNEL /* can take exceptions */ + mtspr SPRN_SRR1, r10 + lis r10, 1f@h + ori r10, r10, 1f@l + mtspr SPRN_SRR0, r10 + rfi +1: tovirt(r12, r12) stw r11,GPR1(r1) stw r11,0(r1) @@ -108,8 +116,6 @@ mfcrr10 rlwinm r10,r10,0,4,2 /* Clear SO bit in CR */ stw r10,_CCR(r1)/* save registers */ - LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~MSR_IR) /* can take exceptions */ - mtmsr r10 /* (except for mach check in rtas) */ lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */ stw r2,GPR2(r1) addir10,r10,STACK_FRAME_REGS_MARKER@l @@ -126,8 +132,6 @@ ACCOUNT_CPU_USER_ENTRY(r2, r11, r12) 3: - lis r11, transfer_to_syscall@h - ori r11, r11, transfer_to_syscall@l #ifdef CONFIG_TRACE_IRQFLAGS /* * If MSR is changing we ne
Re: [PATCH] powerpc/time: Force inlining of get_tb()
On Sun, 20 Dec 2020 18:18:26 + (UTC), Christophe Leroy wrote: > Force inlining of get_tb() in order to avoid getting > following function in vdso32, leading to suboptimal > performance in clock_gettime() > > 0688 <.get_tb>: > 688: 7c 6d 42 a6 mftbu r3 > 68c: 7c 8c 42 a6 mftbr4 > 690: 7d 2d 42 a6 mftbu r9 > 694: 7c 03 48 40 cmplw r3,r9 > 698: 40 e2 ff f0 bne+688 <.get_tb> > 69c: 4e 80 00 20 blr Applied to powerpc/fixes. [1/1] powerpc/time: Force inlining of get_tb() https://git.kernel.org/powerpc/c/0faa22f09caadc11af2aa7570870ebd2ac5b8170 cheers
[PATCH v1 5/7] perf arm-spe: Assign kernel time to synthesized event
In current code, it assigns the arch timer counter to the synthesized samples Arm SPE trace, thus the samples don't contain the kernel time but only contain the raw counter value. To fix the issue, this patch converts the timer counter to kernel time and assigns it to sample timestamp. Signed-off-by: Leo Yan --- tools/perf/util/arm-spe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index bc512c3479f7..2b008b973387 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -232,7 +232,7 @@ static void arm_spe_prep_sample(struct arm_spe *spe, struct arm_spe_record *record = &speq->decoder->record; if (!spe->timeless_decoding) - sample->time = speq->timestamp; + sample->time = tsc_to_perf_time(record->timestamp, &spe->tc); sample->ip = record->from_ip; sample->cpumode = arm_spe_cpumode(spe, sample->ip); -- 2.17.1
[PATCH v1 4/7] perf arm-spe: Convert event kernel time to counter value
When handle a perf event, Arm SPE decoder needs to decide if this perf event is earlier or later than the samples from Arm SPE trace data; to do comparision, it needs to use the same unit for the time. This patch converts the event kernel time to arch timer's counter value, thus it can be used to compare with counter value contained in Arm SPE Timestamp packet. Signed-off-by: Leo Yan --- tools/perf/util/arm-spe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index a504ceec2de6..bc512c3479f7 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -588,7 +588,7 @@ static int arm_spe_process_event(struct perf_session *session, } if (sample->time && (sample->time != (u64) -1)) - timestamp = sample->time; + timestamp = perf_time_to_tsc(sample->time, &spe->tc); else timestamp = 0; -- 2.17.1
[PATCH] powerpc/time: Force inlining of get_tb()
Force inlining of get_tb() in order to avoid getting following function in vdso32, leading to suboptimal performance in clock_gettime() 0688 <.get_tb>: 688: 7c 6d 42 a6 mftbu r3 68c: 7c 8c 42 a6 mftbr4 690: 7d 2d 42 a6 mftbu r9 694: 7c 03 48 40 cmplw r3,r9 698: 40 e2 ff f0 bne+688 <.get_tb> 69c: 4e 80 00 20 blr Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/vdso/timebase.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/vdso/timebase.h b/arch/powerpc/include/asm/vdso/timebase.h index b558b07959ce..881f655caa0a 100644 --- a/arch/powerpc/include/asm/vdso/timebase.h +++ b/arch/powerpc/include/asm/vdso/timebase.h @@ -49,7 +49,7 @@ static inline unsigned long get_tbl(void) return mftb(); } -static inline u64 get_tb(void) +static __always_inline u64 get_tb(void) { unsigned int tbhi, tblo, tbhi2; -- 2.25.0
[PATCH 5.4 24/34] xhci: Give USB2 ports time to enter U3 in bus suspend
From: Li Jun commit c1373f10479b624fb6dba0805d673e860f1b421d upstream. If a USB2 device wakeup is not enabled/supported the link state may still be in U0 in xhci_bus_suspend(), where it's then manually put to suspended U3 state. Just as with selective suspend the device needs time to enter U3 suspend before continuing with further suspend operations (e.g. system suspend), otherwise we may enter system suspend with link state in U0. [commit message rewording -Mathias] Cc: Signed-off-by: Li Jun Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20201208092912.1773650-6-mathias.ny...@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-hub.c |4 1 file changed, 4 insertions(+) --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -1705,6 +1705,10 @@ retry: hcd->state = HC_STATE_SUSPENDED; bus_state->next_statechange = jiffies + msecs_to_jiffies(10); spin_unlock_irqrestore(&xhci->lock, flags); + + if (bus_state->bus_suspended) + usleep_range(5000, 1); + return 0; }
[PATCH 5.9 39/49] xhci: Give USB2 ports time to enter U3 in bus suspend
From: Li Jun commit c1373f10479b624fb6dba0805d673e860f1b421d upstream. If a USB2 device wakeup is not enabled/supported the link state may still be in U0 in xhci_bus_suspend(), where it's then manually put to suspended U3 state. Just as with selective suspend the device needs time to enter U3 suspend before continuing with further suspend operations (e.g. system suspend), otherwise we may enter system suspend with link state in U0. [commit message rewording -Mathias] Cc: Signed-off-by: Li Jun Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20201208092912.1773650-6-mathias.ny...@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-hub.c |4 1 file changed, 4 insertions(+) --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -1712,6 +1712,10 @@ retry: hcd->state = HC_STATE_SUSPENDED; bus_state->next_statechange = jiffies + msecs_to_jiffies(10); spin_unlock_irqrestore(&xhci->lock, flags); + + if (bus_state->bus_suspended) + usleep_range(5000, 1); + return 0; }
[PATCH 5.9 28/49] bonding: fix feature flag setting at init time
From: Jarod Wilson [ Upstream commit 007ab5345545aba2f9cbe4c096cc35d2fd3275ac ] Don't try to adjust XFRM support flags if the bond device isn't yet registered. Bad things can currently happen when netdev_change_features() is called without having wanted_features fully filled in yet. This code runs both on post-module-load mode changes, as well as at module init time, and when run at module init time, it is before register_netdevice() has been called and filled in wanted_features. The empty wanted_features led to features also getting emptied out, which was definitely not the intended behavior, so prevent that from happening. Originally, I'd hoped to stop adjusting wanted_features at all in the bonding driver, as it's documented as being something only the network core should touch, but we actually do need to do this to properly update both the features and wanted_features fields when changing the bond type, or we get to a situation where ethtool sees: esp-hw-offload: off [requested on] I do think we should be using netdev_update_features instead of netdev_change_features here though, so we only send notifiers when the features actually changed. Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") Reported-by: Ivan Vecera Suggested-by: Ivan Vecera Cc: Jay Vosburgh Cc: Veaceslav Falico Cc: Andy Gospodarek Signed-off-by: Jarod Wilson Link: https://lore.kernel.org/r/20201205172229.576587-1-ja...@redhat.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/bonding/bond_options.c | 22 +++--- include/net/bonding.h |2 -- 2 files changed, 15 insertions(+), 9 deletions(-) --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -745,6 +745,19 @@ const struct bond_option *bond_opt_get(u return &bond_opts[option]; } +static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode) +{ + if (!IS_ENABLED(CONFIG_XFRM_OFFLOAD)) + return; + + if (mode == BOND_MODE_ACTIVEBACKUP) + bond_dev->wanted_features |= BOND_XFRM_FEATURES; + else + bond_dev->wanted_features &= ~BOND_XFRM_FEATURES; + + netdev_update_features(bond_dev); +} + static int bond_option_mode_set(struct bonding *bond, const struct bond_opt_value *newval) { @@ -767,13 +780,8 @@ static int bond_option_mode_set(struct b if (newval->value == BOND_MODE_ALB) bond->params.tlb_dynamic_lb = 1; -#ifdef CONFIG_XFRM_OFFLOAD - if (newval->value == BOND_MODE_ACTIVEBACKUP) - bond->dev->wanted_features |= BOND_XFRM_FEATURES; - else - bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; - netdev_change_features(bond->dev); -#endif /* CONFIG_XFRM_OFFLOAD */ + if (bond->dev->reg_state == NETREG_REGISTERED) + bond_set_xfrm_features(bond->dev, newval->value); /* don't cache arp_validate between modes */ bond->params.arp_validate = BOND_ARP_VALIDATE_NONE; --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -86,10 +86,8 @@ #define bond_for_each_slave_rcu(bond, pos, iter) \ netdev_for_each_lower_private_rcu((bond)->dev, pos, iter) -#ifdef CONFIG_XFRM_OFFLOAD #define BOND_XFRM_FEATURES (NETIF_F_HW_ESP | NETIF_F_HW_ESP_TX_CSUM | \ NETIF_F_GSO_ESP) -#endif /* CONFIG_XFRM_OFFLOAD */ #ifdef CONFIG_NET_POLL_CONTROLLER extern atomic_t netpoll_block_tx;
[PATCH 5.10 09/16] xhci: Give USB2 ports time to enter U3 in bus suspend
From: Li Jun commit c1373f10479b624fb6dba0805d673e860f1b421d upstream. If a USB2 device wakeup is not enabled/supported the link state may still be in U0 in xhci_bus_suspend(), where it's then manually put to suspended U3 state. Just as with selective suspend the device needs time to enter U3 suspend before continuing with further suspend operations (e.g. system suspend), otherwise we may enter system suspend with link state in U0. [commit message rewording -Mathias] Cc: Signed-off-by: Li Jun Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20201208092912.1773650-6-mathias.ny...@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-hub.c |4 1 file changed, 4 insertions(+) --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -1712,6 +1712,10 @@ retry: hcd->state = HC_STATE_SUSPENDED; bus_state->next_statechange = jiffies + msecs_to_jiffies(10); spin_unlock_irqrestore(&xhci->lock, flags); + + if (bus_state->bus_suspended) + usleep_range(5000, 1); + return 0; }
[PATCH v1 05/13] sparc32: Drop run-time patching of ipi trap
There is no longer any need for the run-time patching of the ipi trap with the removal of sun4m and sun4d. Remove the patching and drop the ipi implementation for the two machines. The patch includes removal of patching from pcic as this was needed to fix the build. pcic will be removed in a later commit. Signed-off-by: Sam Ravnborg Cc: Mike Rapoport Cc: Andrew Morton Cc: Sam Ravnborg Cc: Christian Brauner Cc: "David S. Miller" Cc: Geert Uytterhoeven Cc: Pekka Enberg Cc: Arnd Bergmann Cc: Andreas Larsson --- arch/sparc/kernel/entry.S | 70 ++- arch/sparc/kernel/kernel.h| 4 -- arch/sparc/kernel/leon_smp.c | 3 -- arch/sparc/kernel/pcic.c | 11 -- arch/sparc/kernel/sun4d_smp.c | 3 -- arch/sparc/kernel/ttable_32.S | 9 ++--- 6 files changed, 7 insertions(+), 93 deletions(-) diff --git a/arch/sparc/kernel/entry.S b/arch/sparc/kernel/entry.S index 9985b08a3467..1a2e20a7e584 100644 --- a/arch/sparc/kernel/entry.S +++ b/arch/sparc/kernel/entry.S @@ -174,32 +174,6 @@ maybe_smp4m_msg_check_resched: maybe_smp4m_msg_out: RESTORE_ALL - .align 4 - .globl linux_trap_ipi15_sun4m -linux_trap_ipi15_sun4m: - SAVE_ALL - sethi %hi(0x8000), %o2 - GET_PROCESSOR4M_ID(o0) - sethi %hi(sun4m_irq_percpu), %l5 - or %l5, %lo(sun4m_irq_percpu), %o5 - sll %o0, 2, %o0 - ld [%o5 + %o0], %o5 - ld [%o5 + 0x00], %o3 ! sun4m_irq_percpu[cpu]->pending - andcc %o3, %o2, %g0 - be sun4m_nmi_error ! Must be an NMI async memory error -st %o2, [%o5 + 0x04] ! sun4m_irq_percpu[cpu]->clear=0x8000 - WRITE_PAUSE - ld [%o5 + 0x00], %g0 ! sun4m_irq_percpu[cpu]->pending - WRITE_PAUSE - or %l0, PSR_PIL, %l4 - wr %l4, 0x0, %psr - WRITE_PAUSE - wr %l4, PSR_ET, %psr - WRITE_PAUSE - callsmp4m_cross_call_irq -nop - b ret_trap_lockless_ipi -clr%l6 .globl smp4d_ticker /* SMP per-cpu ticker interrupts are handled specially. */ @@ -220,44 +194,6 @@ smp4d_ticker: WRITE_PAUSE RESTORE_ALL - .align 4 - .globl linux_trap_ipi15_sun4d -linux_trap_ipi15_sun4d: - SAVE_ALL - sethi %hi(CC_BASE), %o4 - sethi %hi(MXCC_ERR_ME|MXCC_ERR_PEW|MXCC_ERR_ASE|MXCC_ERR_PEE), %o2 - or %o4, (CC_EREG - CC_BASE), %o0 - ldda[%o0] ASI_M_MXCC, %o0 - andcc %o0, %o2, %g0 - bne 1f -sethi %hi(BB_STAT2), %o2 - lduba [%o2] ASI_M_CTL, %o2 - andcc %o2, BB_STAT2_MASK, %g0 - bne 2f -or %o4, (CC_ICLR - CC_BASE), %o0 - sethi %hi(1 << 15), %o1 - stha%o1, [%o0] ASI_M_MXCC /* Clear PIL 15 in MXCC's ICLR */ - or %l0, PSR_PIL, %l4 - wr %l4, 0x0, %psr - WRITE_PAUSE - wr %l4, PSR_ET, %psr - WRITE_PAUSE - callsmp4d_cross_call_irq -nop - b ret_trap_lockless_ipi -clr%l6 - -1: /* MXCC error */ -2: /* BB error */ - /* Disable PIL 15 */ - set CC_IMSK, %l4 - lduha [%l4] ASI_M_MXCC, %l5 - sethi %hi(1 << 15), %l7 - or %l5, %l7, %l5 - stha%l5, [%l4] ASI_M_MXCC - /* FIXME */ -1: b,a 1b - .globl smpleon_ipi .extern leon_ipi_interrupt /* SMP per-cpu IPI interrupts are handled specially. */ @@ -618,11 +554,11 @@ sun4m_nmi_error: #ifndef CONFIG_SMP .align 4 - .globl linux_trap_ipi15_sun4m -linux_trap_ipi15_sun4m: + .globl linux_trap_ipi15_leon +linux_trap_ipi15_leon: SAVE_ALL - ba sun4m_nmi_error + ba sun4m_nmi_error nop #endif /* CONFIG_SMP */ diff --git a/arch/sparc/kernel/kernel.h b/arch/sparc/kernel/kernel.h index 7328d13875e4..c9e1b13d955f 100644 --- a/arch/sparc/kernel/kernel.h +++ b/arch/sparc/kernel/kernel.h @@ -135,10 +135,6 @@ void leonsmp_ipi_interrupt(void); void leon_cross_call_irq(void); /* head_32.S */ -extern unsigned int t_nmi[]; -extern unsigned int linux_trap_ipi15_sun4d[]; -extern unsigned int linux_trap_ipi15_sun4m[]; - extern struct tt_entry trapbase; extern struct tt_entry trapbase_cpu1; extern struct tt_entry trapbase_cpu2; diff --git a/arch/sparc/kernel/leon_smp.c b/arch/sparc/kernel/leon_smp.c index 1eed26d423fb..b0d75783d337 100644 --- a/arch/sparc/kernel/leon_smp.c +++ b/arch/sparc/kernel/leon_smp.c @@ -461,8 +461,5 @@ static const struct sparc32_ipi_ops leon_ipi_ops = { void __init leon_init_smp(void) { - /* Patch ipi15 trap table */ - t_nmi[1] = t_nmi[1] + (linux_trap_ipi15_leon - linux_trap_ipi15_sun4m); - sparc32_ipi_ops = &leon_ipi_ops; } diff --git a/arch/sparc/kernel/pcic.c b/arch/sparc/kernel/pcic.c index ee4c9a9a171c..87bdcb16019b 100644 --- a/arch/sparc/kern
RE: [PATCH v3 06/15] x86/paravirt: switch time pvops functions to use static_call()
From: Juergen Gross Sent: Thursday, December 17, 2020 1:31 AM > The time pvops functions are the only ones left which might be > used in 32-bit mode and which return a 64-bit value. > > Switch them to use the static_call() mechanism instead of pvops, as > this allows quite some simplification of the pvops implementation. > > Due to include hell this requires to split out the time interfaces > into a new header file. > > Signed-off-by: Juergen Gross > --- > arch/x86/Kconfig | 1 + > arch/x86/include/asm/mshyperv.h | 11 > arch/x86/include/asm/paravirt.h | 14 -- > arch/x86/include/asm/paravirt_time.h | 38 +++ > arch/x86/include/asm/paravirt_types.h | 6 - > arch/x86/kernel/cpu/vmware.c | 5 ++-- > arch/x86/kernel/kvm.c | 3 ++- > arch/x86/kernel/kvmclock.c| 3 ++- > arch/x86/kernel/paravirt.c| 16 --- > arch/x86/kernel/tsc.c | 3 ++- > arch/x86/xen/time.c | 12 - > drivers/clocksource/hyperv_timer.c| 5 ++-- > drivers/xen/time.c| 3 ++- > kernel/sched/sched.h | 1 + > 14 files changed, 71 insertions(+), 50 deletions(-) > create mode 100644 arch/x86/include/asm/paravirt_time.h > [snip] > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h > index ffc289992d1b..45942d420626 100644 > --- a/arch/x86/include/asm/mshyperv.h > +++ b/arch/x86/include/asm/mshyperv.h > @@ -56,17 +56,6 @@ typedef int (*hyperv_fill_flush_list_func)( > #define hv_get_raw_timer() rdtsc_ordered() > #define hv_get_vector() HYPERVISOR_CALLBACK_VECTOR > > -/* > - * Reference to pv_ops must be inline so objtool > - * detection of noinstr violations can work correctly. > - */ > -static __always_inline void hv_setup_sched_clock(void *sched_clock) > -{ > -#ifdef CONFIG_PARAVIRT > - pv_ops.time.sched_clock = sched_clock; > -#endif > -} > - > void hyperv_vector_handler(struct pt_regs *regs); > > static inline void hv_enable_stimer0_percpu_irq(int irq) {} [snip] > diff --git a/drivers/clocksource/hyperv_timer.c > b/drivers/clocksource/hyperv_timer.c > index ba04cb381cd3..1ed79993fc50 100644 > --- a/drivers/clocksource/hyperv_timer.c > +++ b/drivers/clocksource/hyperv_timer.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > static struct clock_event_device __percpu *hv_clock_event; > static u64 hv_sched_clock_offset __ro_after_init; > @@ -445,7 +446,7 @@ static bool __init hv_init_tsc_clocksource(void) > clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100); > > hv_sched_clock_offset = hv_read_reference_counter(); > - hv_setup_sched_clock(read_hv_sched_clock_tsc); > + paravirt_set_sched_clock(read_hv_sched_clock_tsc); > > return true; > } > @@ -470,6 +471,6 @@ void __init hv_init_clocksource(void) > clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100); > > hv_sched_clock_offset = hv_read_reference_counter(); > - hv_setup_sched_clock(read_hv_sched_clock_msr); > + static_call_update(pv_sched_clock, read_hv_sched_clock_msr); > } > EXPORT_SYMBOL_GPL(hv_init_clocksource); These Hyper-V changes are problematic as we want to keep hyperv_timer.c architecture independent. While only the code for x86/x64 is currently accepted upstream, code for ARM64 support is in progress. So we need to use hv_setup_sched_clock() in hyperv_timer.c, and have the per-arch implementation in mshyperv.h. Michael
[PATCH v3 06/15] x86/paravirt: switch time pvops functions to use static_call()
The time pvops functions are the only ones left which might be used in 32-bit mode and which return a 64-bit value. Switch them to use the static_call() mechanism instead of pvops, as this allows quite some simplification of the pvops implementation. Due to include hell this requires to split out the time interfaces into a new header file. Signed-off-by: Juergen Gross --- arch/x86/Kconfig | 1 + arch/x86/include/asm/mshyperv.h | 11 arch/x86/include/asm/paravirt.h | 14 -- arch/x86/include/asm/paravirt_time.h | 38 +++ arch/x86/include/asm/paravirt_types.h | 6 - arch/x86/kernel/cpu/vmware.c | 5 ++-- arch/x86/kernel/kvm.c | 3 ++- arch/x86/kernel/kvmclock.c| 3 ++- arch/x86/kernel/paravirt.c| 16 --- arch/x86/kernel/tsc.c | 3 ++- arch/x86/xen/time.c | 12 - drivers/clocksource/hyperv_timer.c| 5 ++-- drivers/xen/time.c| 3 ++- kernel/sched/sched.h | 1 + 14 files changed, 71 insertions(+), 50 deletions(-) create mode 100644 arch/x86/include/asm/paravirt_time.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8bd298e45b1..ebabd8bf4064 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -769,6 +769,7 @@ if HYPERVISOR_GUEST config PARAVIRT bool "Enable paravirtualization code" + depends on HAVE_STATIC_CALL help This changes the kernel so it can modify itself when it is run under a hypervisor, potentially improving performance significantly diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index ffc289992d1b..45942d420626 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -56,17 +56,6 @@ typedef int (*hyperv_fill_flush_list_func)( #define hv_get_raw_timer() rdtsc_ordered() #define hv_get_vector() HYPERVISOR_CALLBACK_VECTOR -/* - * Reference to pv_ops must be inline so objtool - * detection of noinstr violations can work correctly. - */ -static __always_inline void hv_setup_sched_clock(void *sched_clock) -{ -#ifdef CONFIG_PARAVIRT - pv_ops.time.sched_clock = sched_clock; -#endif -} - void hyperv_vector_handler(struct pt_regs *regs); static inline void hv_enable_stimer0_percpu_irq(int irq) {} diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 4abf110e2243..0785a9686e32 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -17,25 +17,11 @@ #include #include -static inline unsigned long long paravirt_sched_clock(void) -{ - return PVOP_CALL0(unsigned long long, time.sched_clock); -} - -struct static_key; -extern struct static_key paravirt_steal_enabled; -extern struct static_key paravirt_steal_rq_enabled; - __visible void __native_queued_spin_unlock(struct qspinlock *lock); bool pv_is_native_spin_unlock(void); __visible bool __native_vcpu_is_preempted(long cpu); bool pv_is_native_vcpu_is_preempted(void); -static inline u64 paravirt_steal_clock(int cpu) -{ - return PVOP_CALL1(u64, time.steal_clock, cpu); -} - /* The paravirtualized I/O functions */ static inline void slow_down_io(void) { diff --git a/arch/x86/include/asm/paravirt_time.h b/arch/x86/include/asm/paravirt_time.h new file mode 100644 index ..76cf94b7c899 --- /dev/null +++ b/arch/x86/include/asm/paravirt_time.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_PARAVIRT_TIME_H +#define _ASM_X86_PARAVIRT_TIME_H + +/* Time related para-virtualized functions. */ + +#ifdef CONFIG_PARAVIRT + +#include +#include +#include + +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +u64 dummy_sched_clock(void); + +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); +DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock); + +extern bool paravirt_using_native_sched_clock; + +void paravirt_set_sched_clock(u64 (*func)(void)); + +static inline u64 paravirt_sched_clock(void) +{ + return static_call(pv_sched_clock)(); +} + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +#endif /* CONFIG_PARAVIRT */ + +#endif /* _ASM_X86_PARAVIRT_TIME_H */ diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index de87087d3bde..1fff349e4792 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -95,11 +95,6 @@ struct pv_lazy_ops { } __no_randomize_layout; #endif -struct pv_time_ops { - unsigned long long (*sched_clock)(void); - unsigned long long (*steal_clock)(int cpu); -} __no_randomize_layout; - struct pv_cpu_ops { /* hooks for various privileged instructions */ void (*io_delay)(void); @@ -291,7 +286,6 @@ struct p
[PATCH] MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK
The current pattern in the file entry does not make the files in the governors subdirectory to be a part of the CPU IDLE TIME MANAGEMENT FRAMEWORK. Adjust the file pattern to include files in governors. Signed-off-by: Lukas Bulwahn --- applies cleanly on current master and next-20201215 MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 952731d1e43c..ac679aa00e0d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4596,7 +4596,7 @@ B:https://bugzilla.kernel.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git F: Documentation/admin-guide/pm/cpuidle.rst F: Documentation/driver-api/pm/cpuidle.rst -F: drivers/cpuidle/* +F: drivers/cpuidle/ F: include/linux/cpuidle.h CPU POWER MONITORING SUBSYSTEM -- 2.17.1
[RFC PATCH 2/8] x86/cpu: Load Key Locker internal key at boot-time
Internal (Wrapping) Key is a new entity of Intel Key Locker feature. This internal key is loaded in a software-inaccessible CPU state and used to encode a data encryption key. The kernel makes random data and loads it as the internal key in each CPU. The data need to be invalidated as soon as the load is done. The BIOS may disable the feature. Check the dynamic CPUID bit (KEYLOCKER_CPUID_EBX_AESKLE) at first. Add byte code for LOADIWKEY -- an instruction to load the internal key, in the 'x86-opcode-map.txt' file to avoid objtool's misinterpretation. Signed-off-by: Chang S. Bae Cc: x...@kernel.org Cc: linux-kernel@vger.kernel.org --- arch/x86/include/asm/keylocker.h | 11 + arch/x86/kernel/Makefile | 1 + arch/x86/kernel/cpu/common.c | 38 +- arch/x86/kernel/keylocker.c | 71 +++ arch/x86/kernel/smpboot.c | 2 + arch/x86/lib/x86-opcode-map.txt | 2 +- tools/arch/x86/lib/x86-opcode-map.txt | 2 +- 7 files changed, 124 insertions(+), 3 deletions(-) create mode 100644 arch/x86/kernel/keylocker.c diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h index 2fe13c21c63f..daf0734a4095 100644 --- a/arch/x86/include/asm/keylocker.h +++ b/arch/x86/include/asm/keylocker.h @@ -14,5 +14,16 @@ #define KEYLOCKER_CPUID_EBX_BACKUP BIT(4) #define KEYLOCKER_CPUID_ECX_RAND BIT(1) +bool check_keylocker_readiness(void); + +bool load_keylocker(void); + +void make_keylocker_data(void); +#ifdef CONFIG_X86_KEYLOCKER +void invalidate_keylocker_data(void); +#else +#define invalidate_keylocker_data() do { } while (0) +#endif + #endif /*__ASSEMBLY__ */ #endif /* _ASM_KEYLOCKER_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 68608bd892c0..085dbf49b3b9 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -145,6 +145,7 @@ obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o obj-$(CONFIG_X86_UMIP) += umip.o +obj-$(CONFIG_X86_KEYLOCKER)+= keylocker.o obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 35ad8480c464..d675075848bb 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -57,6 +57,8 @@ #include #include #include +#include + #include #include "cpu.h" @@ -459,6 +461,39 @@ static __init int x86_nofsgsbase_setup(char *arg) } __setup("nofsgsbase", x86_nofsgsbase_setup); +static __always_inline void setup_keylocker(struct cpuinfo_x86 *c) +{ + bool keyloaded; + + if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER)) + goto out; + + cr4_set_bits(X86_CR4_KEYLOCKER); + + if (c == &boot_cpu_data) { + if (!check_keylocker_readiness()) + goto disable_keylocker; + + make_keylocker_data(); + } + + keyloaded = load_keylocker(); + if (!keyloaded) { + pr_err_once("x86/keylocker: Failed to load internal key\n"); + goto disable_keylocker; + } + + pr_info_once("x86/keylocker: Activated\n"); + return; + +disable_keylocker: + clear_cpu_cap(c, X86_FEATURE_KEYLOCKER); + pr_info_once("x86/keylocker: Disabled\n"); +out: + /* Make sure the feature disabled for kexec-reboot. */ + cr4_clear_bits(X86_CR4_KEYLOCKER); +} + /* * Protection Keys are not available in 32-bit mode. */ @@ -1554,10 +1589,11 @@ static void identify_cpu(struct cpuinfo_x86 *c) /* Disable the PN if appropriate */ squash_the_stupid_serial_number(c); - /* Set up SMEP/SMAP/UMIP */ + /* Setup various Intel-specific CPU security features */ setup_smep(c); setup_smap(c); setup_umip(c); + setup_keylocker(c); /* Enable FSGSBASE instructions if available. */ if (cpu_has(c, X86_FEATURE_FSGSBASE)) { diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c new file mode 100644 index ..e455d806b80c --- /dev/null +++ b/arch/x86/kernel/keylocker.c @@ -0,0 +1,71 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Key Locker feature check and support the internal key + */ + +#include + +#include +#include +#include + +bool check_keylocker_readiness(void) +{ + u32 eax, ebx, ecx, edx; + + cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx); + /* BIOS may not enable it on some systems. */ + if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE)) { + pr_debug("x86/keylocker: not fully enabled\n"); + return false; + } + + return true; +} + +/* Load Internal (Wrapping) Key */ +#define LOADIWKEY ".byte 0xf3,0x0f,0x38,0xdc,0xd1" +#define LOADIWKEY_NUM_OPERANDS 3 +
[PATCH] usemem: Add option init-time
From: Hui Zhu This commit add a new option init-time to remove the initialization time from the run time and show the initialization time. Signed-off-by: Hui Zhu --- usemem.c | 29 +++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/usemem.c b/usemem.c index 823647e..6d1d575 100644 --- a/usemem.c +++ b/usemem.c @@ -96,6 +96,7 @@ int opt_bind_interval = 0; unsigned long opt_delay = 0; int opt_read_again = 0; int opt_punch_holes = 0; +int opt_init_time = 0; int nr_task; int nr_thread; int nr_cpu; @@ -155,6 +156,7 @@ void usage(int ok) "-U|--hugetlballocate hugetlbfs page\n" "-Z|--read-again read memory again after access the memory\n" "--punch-holes free every other page after allocation\n" + "--init-time remove the initialization time from the run time and show the initialization time\n" "-h|--help show this message\n" , ourname); @@ -193,7 +195,8 @@ static const struct option opts[] = { { "delay" , 1, NULL, 'e' }, { "hugetlb" , 0, NULL, 'U' }, { "read-again" , 0, NULL, 'Z' }, - { "punch-holes" , 0, NULL, 0 }, + { "punch-holes" , 0, NULL, 0 }, + { "init-time" , 0, NULL, 0 }, { "help", 0, NULL, 'h' }, { NULL , 0, NULL, 0 } }; @@ -945,6 +948,8 @@ int main(int argc, char *argv[]) case 0: if (strcmp(opts[opt_index].name, "punch-holes") == 0) { opt_punch_holes = 1; + } else if (strcmp(opts[opt_index].name, "init-time") == 0) { + opt_init_time = 1; } else usage(1); break; @@ -1128,7 +1133,7 @@ int main(int argc, char *argv[]) if (optind != argc - 1) usage(0); - if (!opt_write_signal_read) + if (!opt_write_signal_read || opt_init_time) gettimeofday(&start_time, NULL); opt_bytes = memparse(argv[optind], NULL); @@ -1263,5 +1268,25 @@ int main(int argc, char *argv[]) if (!nr_task) nr_task = 1; + if (opt_init_time) { + struct timeval stop; + char buf[1024]; + size_t len; + unsigned long delta_us; + + gettimeofday(&stop, NULL); + delta_us = (stop.tv_sec - start_time.tv_sec) * 100 + + (stop.tv_usec - start_time.tv_usec); + len = snprintf(buf, sizeof(buf), + "the initialization time is %lu secs %lu usecs\n", + delta_us / 100, delta_us % 100); + fflush(stdout); + if (write(1, buf, len) != len) + fprintf(stderr, "WARNING: statistics output may be incomplete.\n"); + + if (!opt_write_signal_read) + gettimeofday(&start_time, NULL); + } + return do_tasks(); } -- 2.17.1
RE: [PATCH RFC 0/3] Implement guest time scaling in RISC-V KVM
> -Original Message- > From: Anup Patel [mailto:a...@brainfault.org] > Sent: Wednesday, December 16, 2020 2:40 PM > To: Jiangyifei > Cc: Anup Patel ; Atish Patra ; > Paul Walmsley ; Palmer Dabbelt > ; Albert Ou ; Paolo Bonzini > ; Zhanghailiang ; > KVM General ; yinyipeng ; > Zhangxiaofeng (F) ; > linux-kernel@vger.kernel.org List ; > kvm-ri...@lists.infradead.org; linux-riscv ; > Wubin (H) ; dengkai (A) > Subject: Re: [PATCH RFC 0/3] Implement guest time scaling in RISC-V KVM > > On Thu, Dec 3, 2020 at 5:51 PM Yifei Jiang wrote: > > > > This series implements guest time scaling based on RDTIME instruction > > emulation so that we can allow migrating Guest/VM across Hosts with > > different time frequency. > > > > Why not through para-virt. From arm's experience[1], para-virt > > implementation doesn't really solve the problem for the following two main > reasons: > > - RDTIME not only be used in linux, but also in firmware and userspace. > > - It is difficult to be compatible with nested virtualization. > > I think this approach is rather incomplete. Also, I don't see how para-virt > time > scaling will be difficult for nested virtualization. > > If trap-n-emulate TIME CSR for Guest Linux then it will have significant > performance impact of systems where TIME CSR is implemented in HW. > > Best approach will be to have VDSO-style para-virt time-scale SBI calls > (similar > to what KVM x86 does). If the Guest software (Linux/Bootloader) does not > enable para-virt time-scaling then we trap-n-emulate TIME CSR (this series). > > Please propose VDSO-style para-virt time-scale SBI call and expand this this > series to provide both: > 1. VDSO-style para-virt time-scaling > 2. Trap-n-emulation of TIME CSR when #1 is disabled > > Regards, > Anup > OK, it sounds good. We will look into the para-virt time-scaling for more details. Yifei > > > > [1] https://lore.kernel.org/patchwork/cover/1288153/ > > > > Yifei Jiang (3): > > RISC-V: KVM: Change the method of calculating cycles to nanoseconds > > RISC-V: KVM: Support dynamic time frequency from userspace > > RISC-V: KVM: Implement guest time scaling > > > > arch/riscv/include/asm/csr.h| 3 ++ > > arch/riscv/include/asm/kvm_vcpu_timer.h | 13 +-- > > arch/riscv/kvm/vcpu_exit.c | 35 + > > arch/riscv/kvm/vcpu_timer.c | 51 > ++--- > > 4 files changed, 93 insertions(+), 9 deletions(-) > > > > -- > > 2.19.1 > > > > > > -- > > kvm-riscv mailing list > > kvm-ri...@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kvm-riscv
[PATCH v4 7/9] regulator: mt6359: Set the enable time for LDOs
Add the enable time for LDOs. This patch is preparing for adding mt6359p regulator support. Signed-off-by: Hsin-Hsiung Wang Acked-by: Mark Brown --- drivers/regulator/mt6359-regulator.c | 65 ++-- 1 file changed, 42 insertions(+), 23 deletions(-) diff --git a/drivers/regulator/mt6359-regulator.c b/drivers/regulator/mt6359-regulator.c index 4ac6380f9875..e46fb95b87e2 100644 --- a/drivers/regulator/mt6359-regulator.c +++ b/drivers/regulator/mt6359-regulator.c @@ -103,7 +103,7 @@ struct mt6359_regulator_info { #define MT6359_LDO(match, _name, _volt_table, \ _enable_reg, _enable_mask, _status_reg, \ - _vsel_reg, _vsel_mask) \ + _vsel_reg, _vsel_mask, _en_delay) \ [MT6359_ID_##_name] = {\ .desc = { \ .name = #_name, \ @@ -118,6 +118,7 @@ struct mt6359_regulator_info { .vsel_mask = _vsel_mask,\ .enable_reg = _enable_reg, \ .enable_mask = BIT(_enable_mask), \ + .enable_time = _en_delay, \ }, \ .status_reg = _status_reg, \ .qi = BIT(0), \ @@ -466,15 +467,18 @@ static struct mt6359_regulator_info mt6359_regulators[] = { MT6359_LDO("ldo_vsim1", VSIM1, vsim1_voltages, MT6359_RG_LDO_VSIM1_EN_ADDR, MT6359_RG_LDO_VSIM1_EN_SHIFT, MT6359_DA_VSIM1_B_EN_ADDR, MT6359_RG_VSIM1_VOSEL_ADDR, - MT6359_RG_VSIM1_VOSEL_MASK << MT6359_RG_VSIM1_VOSEL_SHIFT), + MT6359_RG_VSIM1_VOSEL_MASK << MT6359_RG_VSIM1_VOSEL_SHIFT, + 480), MT6359_LDO("ldo_vibr", VIBR, vibr_voltages, MT6359_RG_LDO_VIBR_EN_ADDR, MT6359_RG_LDO_VIBR_EN_SHIFT, MT6359_DA_VIBR_B_EN_ADDR, MT6359_RG_VIBR_VOSEL_ADDR, - MT6359_RG_VIBR_VOSEL_MASK << MT6359_RG_VIBR_VOSEL_SHIFT), + MT6359_RG_VIBR_VOSEL_MASK << MT6359_RG_VIBR_VOSEL_SHIFT, + 240), MT6359_LDO("ldo_vrf12", VRF12, vrf12_voltages, MT6359_RG_LDO_VRF12_EN_ADDR, MT6359_RG_LDO_VRF12_EN_SHIFT, MT6359_DA_VRF12_B_EN_ADDR, MT6359_RG_VRF12_VOSEL_ADDR, - MT6359_RG_VRF12_VOSEL_MASK << MT6359_RG_VRF12_VOSEL_SHIFT), + MT6359_RG_VRF12_VOSEL_MASK << MT6359_RG_VRF12_VOSEL_SHIFT, + 120), MT6359_REG_FIXED("ldo_vusb", VUSB, MT6359_RG_LDO_VUSB_EN_0_ADDR, MT6359_DA_VUSB_B_EN_ADDR, 300), MT6359_LDO_LINEAR("ldo_vsram_proc2", VSRAM_PROC2, 50, 1293750, 6250, @@ -486,11 +490,13 @@ static struct mt6359_regulator_info mt6359_regulators[] = { MT6359_LDO("ldo_vio18", VIO18, volt18_voltages, MT6359_RG_LDO_VIO18_EN_ADDR, MT6359_RG_LDO_VIO18_EN_SHIFT, MT6359_DA_VIO18_B_EN_ADDR, MT6359_RG_VIO18_VOSEL_ADDR, - MT6359_RG_VIO18_VOSEL_MASK << MT6359_RG_VIO18_VOSEL_SHIFT), + MT6359_RG_VIO18_VOSEL_MASK << MT6359_RG_VIO18_VOSEL_SHIFT, + 960), MT6359_LDO("ldo_vcamio", VCAMIO, volt18_voltages, MT6359_RG_LDO_VCAMIO_EN_ADDR, MT6359_RG_LDO_VCAMIO_EN_SHIFT, MT6359_DA_VCAMIO_B_EN_ADDR, MT6359_RG_VCAMIO_VOSEL_ADDR, - MT6359_RG_VCAMIO_VOSEL_MASK << MT6359_RG_VCAMIO_VOSEL_SHIFT), + MT6359_RG_VCAMIO_VOSEL_MASK << MT6359_RG_VCAMIO_VOSEL_SHIFT, + 1290), MT6359_REG_FIXED("ldo_vcn18", VCN18, MT6359_RG_LDO_VCN18_EN_ADDR, MT6359_DA_VCN18_B_EN_ADDR, 180), MT6359_REG_FIXED("ldo_vfe28", VFE28, MT6359_RG_LDO_VFE28_EN_ADDR, @@ -498,19 +504,20 @@ static struct mt6359_regulator_info mt6359_regulators[] = { MT6359_LDO("ldo_vcn13", VCN13, vcn13_voltages, MT6359_RG_LDO_VCN13_EN_ADDR, MT6359_RG_LDO_VCN13_EN_SHIFT, MT6359_DA_VCN13_B_EN_ADDR, MT6359_RG_VCN13_VOSEL_ADDR, - MT6359_RG_VCN13_VOSEL_MASK << MT6359_RG_VCN13_VOSEL_SHIFT), + MT6359_RG_VCN13_VOSEL_MASK << MT6359_RG_VCN13_VOSEL_SHIFT, + 240), MT6359_LDO("ldo_vcn33_1_bt", VCN33_1_BT, vcn33_voltages, MT6359_RG_LDO_VCN33_1_EN_0_ADDR, MT6359_RG_LDO_VCN33_1_EN_0_SHIFT, MT6359_DA_VCN33_1_B_EN_ADDR, MT6359_RG_VCN33_1_VOSEL_ADDR, MT6359_R
Re: [PATCH RFC 0/3] Implement guest time scaling in RISC-V KVM
On Thu, Dec 3, 2020 at 5:51 PM Yifei Jiang wrote: > > This series implements guest time scaling based on RDTIME instruction > emulation so that we can allow migrating Guest/VM across Hosts with > different time frequency. > > Why not through para-virt. From arm's experience[1], para-virt implementation > doesn't really solve the problem for the following two main reasons: > - RDTIME not only be used in linux, but also in firmware and userspace. > - It is difficult to be compatible with nested virtualization. I think this approach is rather incomplete. Also, I don't see how para-virt time scaling will be difficult for nested virtualization. If trap-n-emulate TIME CSR for Guest Linux then it will have significant performance impact of systems where TIME CSR is implemented in HW. Best approach will be to have VDSO-style para-virt time-scale SBI calls (similar to what KVM x86 does). If the Guest software (Linux/Bootloader) does not enable para-virt time-scaling then we trap-n-emulate TIME CSR (this series). Please propose VDSO-style para-virt time-scale SBI call and expand this this series to provide both: 1. VDSO-style para-virt time-scaling 2. Trap-n-emulation of TIME CSR when #1 is disabled Regards, Anup > > [1] https://lore.kernel.org/patchwork/cover/1288153/ > > Yifei Jiang (3): > RISC-V: KVM: Change the method of calculating cycles to nanoseconds > RISC-V: KVM: Support dynamic time frequency from userspace > RISC-V: KVM: Implement guest time scaling > > arch/riscv/include/asm/csr.h| 3 ++ > arch/riscv/include/asm/kvm_vcpu_timer.h | 13 +-- > arch/riscv/kvm/vcpu_exit.c | 35 + > arch/riscv/kvm/vcpu_timer.c | 51 ++--- > 4 files changed, 93 insertions(+), 9 deletions(-) > > -- > 2.19.1 > > > -- > kvm-riscv mailing list > kvm-ri...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kvm-riscv
Re: [GIT PULL] time namespace fixes for v5.11
The pull request you sent on Mon, 14 Dec 2020 12:57:44 +0100: > g...@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux > tags/time-namespace-v5.11 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/6d93a1971a0ded67887eeab8d00a02074490f071 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
[PATCH 5.9 040/105] iwlwifi: pcie: limit memory read spin time
From: Johannes Berg [ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ] When we read device memory, we lock a spinlock, write the address we want to read from the device and then spin in a loop reading the data in 32-bit quantities from another register. As the description makes clear, this is rather inefficient, incurring a PCIe bus transaction for every read. In a typical device today, we want to read 786k SMEM if it crashes, leading to 192k register reads. Occasionally, we've seen the whole loop take over 20 seconds and then triggering the soft lockup detector. Clearly, it is unreasonable to spin here for such extended periods of time. To fix this, break the loop down into an outer and an inner loop, and break out of the inner loop if more than half a second elapsed. To avoid too much overhead, check for that only every 128 reads, though there's no particular reason for that number. Then, unlock and relock to obtain NIC access again, reprogram the start address and continue. This will keep (interrupt) latencies on the CPU down to a reasonable time. Signed-off-by: Johannes Berg Signed-off-by: Mordechay Goodstein Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid Signed-off-by: Sasha Levin --- .../net/wireless/intel/iwlwifi/pcie/trans.c | 36 ++- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index e5160d6208688..6393e895f95c6 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -2155,18 +2155,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr, void *buf, int dwords) { unsigned long flags; - int offs, ret = 0; + int offs = 0; u32 *vals = buf; - if (iwl_trans_grab_nic_access(trans, &flags)) { - iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr); - for (offs = 0; offs < dwords; offs++) - vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT); - iwl_trans_release_nic_access(trans, &flags); - } else { - ret = -EBUSY; + while (offs < dwords) { + /* limit the time we spin here under lock to 1/2s */ + ktime_t timeout = ktime_add_us(ktime_get(), 500 * USEC_PER_MSEC); + + if (iwl_trans_grab_nic_access(trans, &flags)) { + iwl_write32(trans, HBUS_TARG_MEM_RADDR, + addr + 4 * offs); + + while (offs < dwords) { + vals[offs] = iwl_read32(trans, + HBUS_TARG_MEM_RDAT); + offs++; + + /* calling ktime_get is expensive so +* do it once in 128 reads +*/ + if (offs % 128 == 0 && ktime_after(ktime_get(), + timeout)) + break; + } + iwl_trans_release_nic_access(trans, &flags); + } else { + return -EBUSY; + } } - return ret; + + return 0; } static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr, -- 2.27.0
[PATCH 5.9 021/105] ibmvnic: reduce wait for completion time
From: Dany Madden [ Upstream commit 98c41f04a67abf5e7f7191d55d286e905d1430ef ] Reduce the wait time for Command Response Queue response from 30 seconds to 20 seconds, as recommended by VIOS and Power Hypervisor teams. Fixes: bd0b672313941 ("ibmvnic: Move login and queue negotiation into ibmvnic_open") Fixes: 53da09e92910f ("ibmvnic: Add set_link_state routine for setting adapter link state") Signed-off-by: Dany Madden Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/ibm/ibmvnic.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index de25d1860f16f..a1556673300a0 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -846,7 +846,7 @@ static void release_napi(struct ibmvnic_adapter *adapter) static int ibmvnic_login(struct net_device *netdev) { struct ibmvnic_adapter *adapter = netdev_priv(netdev); - unsigned long timeout = msecs_to_jiffies(3); + unsigned long timeout = msecs_to_jiffies(2); int retry_count = 0; int retries = 10; bool retry; @@ -950,7 +950,7 @@ static void release_resources(struct ibmvnic_adapter *adapter) static int set_link_state(struct ibmvnic_adapter *adapter, u8 link_state) { struct net_device *netdev = adapter->netdev; - unsigned long timeout = msecs_to_jiffies(3); + unsigned long timeout = msecs_to_jiffies(2); union ibmvnic_crq crq; bool resend; int rc; @@ -5081,7 +5081,7 @@ map_failed: static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter) { struct device *dev = &adapter->vdev->dev; - unsigned long timeout = msecs_to_jiffies(3); + unsigned long timeout = msecs_to_jiffies(2); u64 old_num_rx_queues, old_num_tx_queues; int rc; -- 2.27.0
[PATCH 5.4 03/36] iwlwifi: pcie: limit memory read spin time
From: Johannes Berg [ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ] When we read device memory, we lock a spinlock, write the address we want to read from the device and then spin in a loop reading the data in 32-bit quantities from another register. As the description makes clear, this is rather inefficient, incurring a PCIe bus transaction for every read. In a typical device today, we want to read 786k SMEM if it crashes, leading to 192k register reads. Occasionally, we've seen the whole loop take over 20 seconds and then triggering the soft lockup detector. Clearly, it is unreasonable to spin here for such extended periods of time. To fix this, break the loop down into an outer and an inner loop, and break out of the inner loop if more than half a second elapsed. To avoid too much overhead, check for that only every 128 reads, though there's no particular reason for that number. Then, unlock and relock to obtain NIC access again, reprogram the start address and continue. This will keep (interrupt) latencies on the CPU down to a reasonable time. Signed-off-by: Johannes Berg Signed-off-by: Mordechay Goodstein Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid Signed-off-by: Sasha Levin --- .../net/wireless/intel/iwlwifi/pcie/trans.c | 36 ++- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index c76d26708e659..ef5a8ecabc60a 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -2178,18 +2178,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr, void *buf, int dwords) { unsigned long flags; - int offs, ret = 0; + int offs = 0; u32 *vals = buf; - if (iwl_trans_grab_nic_access(trans, &flags)) { - iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr); - for (offs = 0; offs < dwords; offs++) - vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT); - iwl_trans_release_nic_access(trans, &flags); - } else { - ret = -EBUSY; + while (offs < dwords) { + /* limit the time we spin here under lock to 1/2s */ + ktime_t timeout = ktime_add_us(ktime_get(), 500 * USEC_PER_MSEC); + + if (iwl_trans_grab_nic_access(trans, &flags)) { + iwl_write32(trans, HBUS_TARG_MEM_RADDR, + addr + 4 * offs); + + while (offs < dwords) { + vals[offs] = iwl_read32(trans, + HBUS_TARG_MEM_RDAT); + offs++; + + /* calling ktime_get is expensive so +* do it once in 128 reads +*/ + if (offs % 128 == 0 && ktime_after(ktime_get(), + timeout)) + break; + } + iwl_trans_release_nic_access(trans, &flags); + } else { + return -EBUSY; + } } - return ret; + + return 0; } static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr, -- 2.27.0
[GIT PULL] time namespace fixes for v5.11
Hi Linus, Here are some time namespace fixes for v5.11. /* Summary */ When time namespaces were introduced we missed to virtualize the "btime" field in /proc/stat. This confuses tasks which are in another time namespace with a virtualized boottime which is common in some container workloads. This pr contains Michael's series to fix "btime" which Thomas asked me to take through my tree. To fix "btime" virtualization we simply subtract the offset of the time namespace's boottime from btime before printing the stats. Note that since start_boottime of processes are seconds since boottime and the boottime stamp is now shifted according to the time namespace's offset, the offset of the time namespace also needs to be applied before the process stats are given to userspace. This avoids that processes shown by tools such as "ps" appear as time travelers in the corresponding time namespace. Selftests are included to verify that btime virtualization in /proc/stat works as expected. The following changes since commit 3cea11cd5e3b00d91caf0b4730194039b45c5891: Linux 5.10-rc2 (2020-11-01 14:43:51 -0800) are available in the Git repository at: g...@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/time-namespace-v5.11 for you to fetch changes up to 5c62634fc65101d350cbd47722fb76f02693059d: namespace: make timens_on_fork() return nothing (2020-11-18 11:06:47 +0100) /* Testing */ All patches are based on v5.10-rc2. All old and new tests are passing. Please note that I missed to merge these fixes into my for-next branch and so linux-next has not contained the commits in this pr. I'm still sending this pr because these are fairly trivial bugfixes and have seen vetting from multiple people. I have also now merged this tag into my for-next branch so these commits will show up in linux-next soon. If you feel more comfortable with this sitting in linux-next for a while please just ignore this pr and I'll resend after rc1 has been released. /* Conflicts */ At the time of creating this PR no merge conflicts were reported from linux-next and no merge 2c85ebc57b3e ("Linux 5.10"). Please consider pulling these changes from the signed time-namespace-v5.11 tag. Thanks! Christian time-namespace-v5.11 Hui Su (1): namespace: make timens_on_fork() return nothing Michael Weiß (3): timens: additional helper functions for boottime offset handling fs/proc: apply the time namespace offset to /proc/stat btime selftests/timens: added selftest for /proc/stat btime fs/proc/array.c | 6 ++-- fs/proc/stat.c | 3 ++ include/linux/time_namespace.h | 28 ++-- kernel/nsproxy.c| 7 +--- kernel/time/namespace.c | 6 ++-- tools/testing/selftests/timens/procfs.c | 58 - 6 files changed, 92 insertions(+), 16 deletions(-)
[tip: core/rcu] torture: Accept time units on kvm.sh --duration argument
The following commit has been merged into the core/rcu branch of tip: Commit-ID: 7de1ca35269ee20e40c35666c810cbaea528c719 Gitweb: https://git.kernel.org/tip/7de1ca35269ee20e40c35666c810cbaea528c719 Author:Paul E. McKenney AuthorDate:Tue, 22 Sep 2020 17:20:11 -07:00 Committer: Paul E. McKenney CommitterDate: Fri, 06 Nov 2020 17:13:55 -08:00 torture: Accept time units on kvm.sh --duration argument The "--duration " has worked well for a very long time, but it can be inconvenient to compute the minutes for (say) a 28-hour run. It can also be annoying to have to let a simple boot test run for a full minute. This commit therefore permits an "s" suffix to specify seconds, "m" to specify minutes (which remains the default), "h" suffix to specify hours, and "d" to specify days. With this change, "--duration 5" still specifies that each scenario run for five minutes, but "--duration 30s" runs for only 30 seconds, "--duration 8h" runs for eight hours, and "--duration 2d" runs for two days. Signed-off-by: Paul E. McKenney --- tools/testing/selftests/rcutorture/bin/kvm.sh | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh b/tools/testing/selftests/rcutorture/bin/kvm.sh index 5ad3882..c348d96 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm.sh @@ -58,7 +58,7 @@ usage () { echo " --datestamp string" echo " --defconfig string" echo " --dryrun sched|script" - echo " --duration minutes" + echo " --duration minutes | s | h | d" echo " --gdb" echo " --help" echo " --interactive" @@ -128,8 +128,20 @@ do shift ;; --duration) - checkarg --duration "(minutes)" $# "$2" '^[0-9]*$' '^error' - dur=$(($2*60)) + checkarg --duration "(minutes)" $# "$2" '^[0-9][0-9]*\(s\|m\|h\|d\|\)$' '^error' + mult=60 + if echo "$2" | grep -q 's$' + then + mult=1 + elif echo "$2" | grep -q 'h$' + then + mult=3600 + elif echo "$2" | grep -q 'd$' + then + mult=86400 + fi + ts=`echo $2 | sed -e 's/[smhd]$//'` + dur=$(($ts*mult)) shift ;; --gdb)
[tip: core/rcu] locktorture: Track time of last ->writeunlock()
The following commit has been merged into the core/rcu branch of tip: Commit-ID: 3480d6774f07341e3e1cf3114f58bef98ea58ae0 Gitweb: https://git.kernel.org/tip/3480d6774f07341e3e1cf3114f58bef98ea58ae0 Author:Paul E. McKenney AuthorDate:Sun, 30 Aug 2020 21:48:23 -07:00 Committer: Paul E. McKenney CommitterDate: Fri, 06 Nov 2020 17:13:29 -08:00 locktorture: Track time of last ->writeunlock() This commit adds a last_lock_release variable that tracks the time of the last ->writeunlock() call, which allows easier diagnosing of lock hangs when using a kernel debugger. Acked-by: Davidlohr Bueso Signed-off-by: Paul E. McKenney --- kernel/locking/locktorture.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 62d215b..316531d 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -60,6 +60,7 @@ static struct task_struct **reader_tasks; static bool lock_is_write_held; static bool lock_is_read_held; +static unsigned long last_lock_release; struct lock_stress_stats { long n_lock_fail; @@ -632,6 +633,7 @@ static int lock_torture_writer(void *arg) lwsp->n_lock_acquired++; cxt.cur_ops->write_delay(&rand); lock_is_write_held = false; + WRITE_ONCE(last_lock_release, jiffies); cxt.cur_ops->writeunlock(); stutter_wait("lock_torture_writer");
Re: [PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property
On Thu, Dec 10, 2020 at 11:13:50AM +0200, Cristian Ciocaltea wrote: > Hi Rob, > > On Wed, Dec 09, 2020 at 09:37:08PM -0600, Rob Herring wrote: > > On Sun, Dec 06, 2020 at 03:27:01AM +0200, Cristian Ciocaltea wrote: > > > Add a new common property 'reset-time-sec' to be used in conjunction > > > with the devices supporting the key pressed reset feature. > > > > > > Signed-off-by: Cristian Ciocaltea > > > --- > > > Changes in v3: > > > - This patch was not present in v2 > > > > > > Documentation/devicetree/bindings/input/input.yaml | 7 +++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/Documentation/devicetree/bindings/input/input.yaml > > > b/Documentation/devicetree/bindings/input/input.yaml > > > index ab407f266bef..caba93209ae7 100644 > > > --- a/Documentation/devicetree/bindings/input/input.yaml > > > +++ b/Documentation/devicetree/bindings/input/input.yaml > > > @@ -34,4 +34,11 @@ properties: > > >specify this property. > > > $ref: /schemas/types.yaml#/definitions/uint32 > > > > > > + reset-time-sec: > > > > Humm, I'm pretty sure we already have something for this. Or maybe just > > power off. > > We only have 'power-off-time-sec', so I added 'reset-time-sec' according > to your review in v2: > https://lore.kernel.org/lkml/20200908214724.GA959481@bogus/ I'm doing good if I remember reviews from a week ago. From 3 months ago, no chance without some reminder. Reviewed-by: Rob Herring Rob
[PATCH 2/2] platform: cros_ec: Call interrupt bottom half at probe time
While the AP was powered off, the EC may have send messages. If the message is not serviced within 3s, the EC stops sending message. Unlock the EC by purging stale messages at probe time. Signed-off-by: Gwendal Grignou --- drivers/platform/chrome/cros_ec.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/platform/chrome/cros_ec.c b/drivers/platform/chrome/cros_ec.c index 4ac33491d0d18..a45d6a6640d50 100644 --- a/drivers/platform/chrome/cros_ec.c +++ b/drivers/platform/chrome/cros_ec.c @@ -252,6 +252,13 @@ int cros_ec_register(struct cros_ec_device *ec_dev) dev_info(dev, "Chrome EC device registered\n"); + /* +* Unlock EC that may be waiting for AP to process MKBP events. +* If the AP takes to long to answer, the EC would stop sending events. +*/ + if (ec_dev->mkbp_event_supported) + cros_ec_irq_thread(0, ec_dev); + return 0; } EXPORT_SYMBOL(cros_ec_register); -- 2.29.2.576.ga3fc446d84-goog
Re: [PATCH v2 1/2] Add save/restore of Precision Time Measurement capability
On Mon, Dec 07, 2020 at 02:39:50PM -0800, David E. Box wrote: > The PCI subsystem does not currently save and restore the configuration > space for the Precision Time Measurement (PTM) PCIe extended capability > leading to the possibility of the feature returning disabled on S3 resume. > This has been observed on Intel Coffee Lake desktops. Add save/restore of > the PTM control register. This saves the PTM Enable, Root Select, and > Effective Granularity bits. > > Suggested-by: Rafael J. Wysocki > Signed-off-by: David E. Box Applied both to pci/ptm for v5.11, thanks! > --- > > Changes from V1: > - Move save/restore functions to ptm.c > - Move pci_add_ext_cap_sve_buffer() to pci_ptm_init in ptm.c > > drivers/pci/pci.c | 2 ++ > drivers/pci/pci.h | 8 > drivers/pci/pcie/ptm.c | 43 ++ > 3 files changed, 53 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index e578d34095e9..12ba6351c05b 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1566,6 +1566,7 @@ int pci_save_state(struct pci_dev *dev) > pci_save_ltr_state(dev); > pci_save_dpc_state(dev); > pci_save_aer_state(dev); > + pci_save_ptm_state(dev); > return pci_save_vc_state(dev); > } > EXPORT_SYMBOL(pci_save_state); > @@ -1677,6 +1678,7 @@ void pci_restore_state(struct pci_dev *dev) > pci_restore_vc_state(dev); > pci_restore_rebar_state(dev); > pci_restore_dpc_state(dev); > + pci_restore_ptm_state(dev); > > pci_aer_clear_status(dev); > pci_restore_aer_state(dev); > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index f86cae9aa1f4..62cdacba5954 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -516,6 +516,14 @@ static inline int pci_iov_bus_range(struct pci_bus *bus) > > #endif /* CONFIG_PCI_IOV */ > > +#ifdef CONFIG_PCIE_PTM > +void pci_save_ptm_state(struct pci_dev *dev); > +void pci_restore_ptm_state(struct pci_dev *dev); > +#else > +static inline void pci_save_ptm_state(struct pci_dev *dev) {} > +static inline void pci_restore_ptm_state(struct pci_dev *dev) {} > +#endif > + > unsigned long pci_cardbus_resource_alignment(struct resource *); > > static inline resource_size_t pci_resource_alignment(struct pci_dev *dev, > diff --git a/drivers/pci/pcie/ptm.c b/drivers/pci/pcie/ptm.c > index 357a454cafa0..6b24a1c9327a 100644 > --- a/drivers/pci/pcie/ptm.c > +++ b/drivers/pci/pcie/ptm.c > @@ -29,6 +29,47 @@ static void pci_ptm_info(struct pci_dev *dev) >dev->ptm_root ? " (root)" : "", clock_desc); > } > > +void pci_save_ptm_state(struct pci_dev *dev) > +{ > + int ptm; > + struct pci_cap_saved_state *save_state; > + u16 *cap; > + > + if (!pci_is_pcie(dev)) > + return; > + > + ptm = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PTM); > + if (!ptm) > + return; > + > + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_PTM); > + if (!save_state) { > + pci_err(dev, "no suspend buffer for PTM\n"); > + return; > + } > + > + cap = (u16 *)&save_state->cap.data[0]; > + pci_read_config_word(dev, ptm + PCI_PTM_CTRL, cap); > +} > + > +void pci_restore_ptm_state(struct pci_dev *dev) > +{ > + struct pci_cap_saved_state *save_state; > + int ptm; > + u16 *cap; > + > + if (!pci_is_pcie(dev)) > + return; > + > + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_PTM); > + ptm = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PTM); > + if (!save_state || !ptm) > + return; > + > + cap = (u16 *)&save_state->cap.data[0]; > + pci_write_config_word(dev, ptm + PCI_PTM_CTRL, *cap); > +} > + > void pci_ptm_init(struct pci_dev *dev) > { > int pos; > @@ -65,6 +106,8 @@ void pci_ptm_init(struct pci_dev *dev) > if (!pos) > return; > > + pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_PTM, sizeof(u16)); > + > pci_read_config_dword(dev, pos + PCI_PTM_CAP, &cap); > local_clock = (cap & PCI_PTM_GRANULARITY_MASK) >> 8; > > -- > 2.20.1 >
Re: [PATCH rdma-next 0/3] Various fixes collected over time
On Tue, Dec 08, 2020 at 09:35:42AM +0200, Leon Romanovsky wrote: > From: Leon Romanovsky > > Hi, > > This is set of various and unrelated fixes that we collected over time. > > Thanks > > Avihai Horon (1): > RDMA/uverbs: Fix incorrect variable type > > Jack Morgenstein (2): > RDMA/core: Clean up cq pool mechanism > RDMA/core: Do not indicate device ready when device enablement fails > > drivers/infiniband/core/core_priv.h | 3 +-- > drivers/infiniband/core/cq.c | 12 ++-- > drivers/infiniband/core/device.c | 16 ++-- > .../infiniband/core/uverbs_std_types_device.c| 14 +- > include/rdma/uverbs_ioctl.h | 10 ++ > 5 files changed, 28 insertions(+), 27 deletions(-) Applied to for-next, thanks Jason
[PATCH v2 3/7] watchdog/softlockup: Report the overall time of softlockups
The softlockup detector currently shows the time spent since the last report. As a result it is not clear whether a CPU is infinitely hogged by a single task or if it is a repeated event. The situation can be simulated with a simply busy loop: while (true) cpu_relax(); The softlockup detector produces: [ 168.277520] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865] [ 196.277604] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865] [ 236.277522] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [cat:4865] But it should be, something like: [ 480.372418] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [cat:4943] [ 508.372359] watchdog: BUG: soft lockup - CPU#2 stuck for 52s! [cat:4943] [ 548.372359] watchdog: BUG: soft lockup - CPU#2 stuck for 89s! [cat:4943] [ 576.372351] watchdog: BUG: soft lockup - CPU#2 stuck for 115s! [cat:4943] For the better output, add an additional timestamp of the last report. Only this timestamp is reset when the watchdog is intentionally touched from slow code paths or when printing the report. Signed-off-by: Petr Mladek --- kernel/watchdog.c | 40 1 file changed, 28 insertions(+), 12 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 7776d53a015c..6259590d6474 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -154,7 +154,11 @@ static void lockup_detector_update_enable(void) #ifdef CONFIG_SOFTLOCKUP_DETECTOR -#define SOFTLOCKUP_RESET ULONG_MAX +/* + * Delay the soflockup report when running a known slow code. + * It does _not_ affect the timestamp of the last successdul reschedule. + */ +#define SOFTLOCKUP_DELAY_REPORTULONG_MAX #ifdef CONFIG_SMP int __read_mostly sysctl_softlockup_all_cpu_backtrace; @@ -169,7 +173,10 @@ unsigned int __read_mostly softlockup_panic = static bool softlockup_initialized __read_mostly; static u64 __read_mostly sample_period; +/* Timestamp taken after the last successful reschedule. */ static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts); +/* Timestamp of the last softlockup report. */ +static DEFINE_PER_CPU(unsigned long, watchdog_report_ts); static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer); static DEFINE_PER_CPU(bool, softlockup_touch_sync); static DEFINE_PER_CPU(bool, soft_watchdog_warn); @@ -235,10 +242,16 @@ static void set_sample_period(void) watchdog_update_hrtimer_threshold(sample_period); } +static void update_report_ts(void) +{ + __this_cpu_write(watchdog_report_ts, get_timestamp()); +} + /* Commands for resetting the watchdog */ static void update_touch_ts(void) { __this_cpu_write(watchdog_touch_ts, get_timestamp()); + update_report_ts(); } /** @@ -252,10 +265,10 @@ static void update_touch_ts(void) notrace void touch_softlockup_watchdog_sched(void) { /* -* Preemption can be enabled. It doesn't matter which CPU's timestamp -* gets zeroed here, so use the raw_ operation. +* Preemption can be enabled. It doesn't matter which CPU's watchdog +* report period gets restarted here, so use the raw_ operation. */ - raw_cpu_write(watchdog_touch_ts, SOFTLOCKUP_RESET); + raw_cpu_write(watchdog_report_ts, SOFTLOCKUP_DELAY_REPORT); } notrace void touch_softlockup_watchdog(void) @@ -279,23 +292,23 @@ void touch_all_softlockup_watchdogs(void) * the softlockup check. */ for_each_cpu(cpu, &watchdog_allowed_mask) - per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET; + per_cpu(watchdog_report_ts, cpu) = SOFTLOCKUP_DELAY_REPORT; wq_watchdog_touch(-1); } void touch_softlockup_watchdog_sync(void) { __this_cpu_write(softlockup_touch_sync, true); - __this_cpu_write(watchdog_touch_ts, SOFTLOCKUP_RESET); + __this_cpu_write(watchdog_report_ts, SOFTLOCKUP_DELAY_REPORT); } -static int is_softlockup(unsigned long touch_ts) +static int is_softlockup(unsigned long touch_ts, unsigned long period_ts) { unsigned long now = get_timestamp(); if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){ /* Warn about unreasonable delays. */ - if (time_after(now, touch_ts + get_softlockup_thresh())) + if (time_after(now, period_ts + get_softlockup_thresh())) return now - touch_ts; } return 0; @@ -341,6 +354,7 @@ static int softlockup_fn(void *data) static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) { unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts); + unsigned long period_ts = __this_cpu_read(watchdog_report_ts); struct pt_regs *regs = get_irq_regs(); int duration; int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace; @@ -362,7 +376,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct
[PATCH v2 0/7] watchdog/softlockup: Report overall time and some cleanup
I dug deep into the softlockup watchdog history when time permitted this year. And reworked the patchset that fixed timestamps and cleaned up the code[1]. I split it into very small steps and did even more code clean up. The result looks quite strightforward and I am pretty confident with the changes. [*] v1: https://lore.kernel.org/r/20191024114928.15377-1-pmla...@suse.com Petr Mladek (7): watchdog: Rename __touch_watchdog() to a better descriptive name watchdog: Explicitly update timestamp when reporting softlockup watchdog/softlockup: Report the overall time of softlockups watchdog/softlockup: Remove logic that tried to prevent repeated reports watchdog: Fix barriers when printing backtraces from all CPUs watchdog: Cleanup handling of false positives Test softlockup fs/proc/consoles.c | 5 +++ fs/proc/version.c | 7 kernel/watchdog.c | 97 ++ 3 files changed, 66 insertions(+), 43 deletions(-) -- 2.26.2
Re: [PATCH] powerpc/time: Remove ifdef in get_vtb()
On Thu, 1 Oct 2020 10:59:20 + (UTC), Christophe Leroy wrote: > SPRN_VTB and CPU_FTR_ARCH_207S are always defined, > no need of an ifdef. Applied to powerpc/next. [1/1] powerpc/time: Remove ifdef in get_vtb() https://git.kernel.org/powerpc/c/c3cb5dbd85dbd9ae51fadf867782dc34806f04d8 cheers
Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time
On Wed, Dec 09, 2020 at 12:24:32PM -0800, Saravana Kannan wrote: > On Wed, Dec 9, 2020 at 10:15 AM Greg Kroah-Hartman > wrote: > > > > On Fri, Nov 20, 2020 at 06:02:15PM -0800, Saravana Kannan wrote: > > > The current implementation of fw_devlink is very inefficient because it > > > tries to get away without creating fwnode links in the name of saving > > > memory usage. Past attempts to optimize runtime at the cost of memory > > > usage were blocked with request for data showing that the optimization > > > made significant improvement for real world scenarios. > > > > > > We have those scenarios now. There have been several reports of boot > > > time increase in the order of seconds in this thread [1]. Several OEMs > > > and SoC manufacturers have also privately reported significant > > > (350-400ms) increase in boot time due to all the parsing done by > > > fw_devlink. > > > > > > So this patch series refactors fw_devlink to be more efficient. The key > > > difference now is the addition of support for fwnode links -- just a few > > > simple APIs. This also allows most of the code to be moved out of > > > firmware specific (DT mostly) code into driver core. > > > > > > This brings the following benefits: > > > - Instead of parsing the device tree multiple times (complexity was > > > close to O(N^3) where N in the number of properties) during bootup, > > > fw_devlink parses each fwnode node/property only once and creates > > > fwnode links. The rest of the fw_devlink code then just looks at these > > > fwnode links to do rest of the work. > > > > > > - Makes it much easier to debug probe issue due to fw_devlink in the > > > future. fw_devlink=on blocks the probing of devices if they depend on > > > a device that hasn't been added yet. With this refactor, it'll be very > > > easy to tell what that device is because we now have a reference to > > > the fwnode of the device. > > > > > > - Much easier to add fw_devlink support to ACPI and other firmware > > > types. A refactor to move the common bits from DT specific code to > > > driver core was in my TODO list as a prerequisite to adding ACPI > > > support to fw_devlink. This series gets that done. > > > > > > Laurent and Grygorii tested the v1 series and they saw boot time > > > improvment of about 12 seconds and 3 seconds, respectively. > > > > Now queued up to my tree. Note, I had to hand-apply patches 13 and 16 > > due to some reason (for 13, I have no idea, for 16 it was due to a > > previous patch applied to my tree that I cc:ed you on.) > > > > Verifying I got it all correct would be great :) > > A quick diff of drivers/base/core.c between driver-core-testing and my > local tree doesn't show any major diff (only some unrelated comment > fixes). So, it looks fine. > > The patch 13 conflict is probably due to having to rebase the v2 > series on top of this: > https://lore.kernel.org/lkml/20201104205431.3795207-1-sarava...@google.com/ > > And looks like Patch 16 was handled fine. Great, thanks for verifying! greg k-h
Re: [PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property
Hi Mani, On Thu, Dec 10, 2020 at 09:36:44AM +0530, Manivannan Sadhasivam wrote: > On Sun, Dec 06, 2020 at 03:27:01AM +0200, Cristian Ciocaltea wrote: > > Add a new common property 'reset-time-sec' to be used in conjunction > > with the devices supporting the key pressed reset feature. > > > > Signed-off-by: Cristian Ciocaltea > > --- > > Changes in v3: > > - This patch was not present in v2 > > > > Documentation/devicetree/bindings/input/input.yaml | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/input/input.yaml > > b/Documentation/devicetree/bindings/input/input.yaml > > index ab407f266bef..caba93209ae7 100644 > > --- a/Documentation/devicetree/bindings/input/input.yaml > > +++ b/Documentation/devicetree/bindings/input/input.yaml > > @@ -34,4 +34,11 @@ properties: > >specify this property. > > $ref: /schemas/types.yaml#/definitions/uint32 > > > > + reset-time-sec: > > +description: > > + Duration in seconds which the key should be kept pressed for device > > to > > + reset automatically. Device with key pressed reset feature can > > specify > > + this property. > > +$ref: /schemas/types.yaml#/definitions/uint32 > > + > > Why can't you just use "power-off-time-sec"? I think the common behavior of keeping the power button pressed is to trigger a power off rather than a reset. Hence, per Rob's suggestion in the previous revision of this patch series, I added the reset variant: https://lore.kernel.org/lkml/20200908214724.GA959481@bogus/ Thanks, Cristi > Thanks, > Mani > > > additionalProperties: true > > -- > > 2.29.2 > >
Re: [PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property
Hi Rob, On Wed, Dec 09, 2020 at 09:37:08PM -0600, Rob Herring wrote: > On Sun, Dec 06, 2020 at 03:27:01AM +0200, Cristian Ciocaltea wrote: > > Add a new common property 'reset-time-sec' to be used in conjunction > > with the devices supporting the key pressed reset feature. > > > > Signed-off-by: Cristian Ciocaltea > > --- > > Changes in v3: > > - This patch was not present in v2 > > > > Documentation/devicetree/bindings/input/input.yaml | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/input/input.yaml > > b/Documentation/devicetree/bindings/input/input.yaml > > index ab407f266bef..caba93209ae7 100644 > > --- a/Documentation/devicetree/bindings/input/input.yaml > > +++ b/Documentation/devicetree/bindings/input/input.yaml > > @@ -34,4 +34,11 @@ properties: > >specify this property. > > $ref: /schemas/types.yaml#/definitions/uint32 > > > > + reset-time-sec: > > Humm, I'm pretty sure we already have something for this. Or maybe just > power off. We only have 'power-off-time-sec', so I added 'reset-time-sec' according to your review in v2: https://lore.kernel.org/lkml/20200908214724.GA959481@bogus/ Thanks, Cristi > > +description: > > + Duration in seconds which the key should be kept pressed for device > > to > > + reset automatically. Device with key pressed reset feature can > > specify > > + this property. > > +$ref: /schemas/types.yaml#/definitions/uint32 > > + > > additionalProperties: true > > -- > > 2.29.2 > >
Re: [PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property
On Sun, Dec 06, 2020 at 03:27:01AM +0200, Cristian Ciocaltea wrote: > Add a new common property 'reset-time-sec' to be used in conjunction > with the devices supporting the key pressed reset feature. > > Signed-off-by: Cristian Ciocaltea > --- > Changes in v3: > - This patch was not present in v2 > > Documentation/devicetree/bindings/input/input.yaml | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/Documentation/devicetree/bindings/input/input.yaml > b/Documentation/devicetree/bindings/input/input.yaml > index ab407f266bef..caba93209ae7 100644 > --- a/Documentation/devicetree/bindings/input/input.yaml > +++ b/Documentation/devicetree/bindings/input/input.yaml > @@ -34,4 +34,11 @@ properties: >specify this property. > $ref: /schemas/types.yaml#/definitions/uint32 > > + reset-time-sec: > +description: > + Duration in seconds which the key should be kept pressed for device to > + reset automatically. Device with key pressed reset feature can specify > + this property. > + $ref: /schemas/types.yaml#/definitions/uint32 > + Why can't you just use "power-off-time-sec"? Thanks, Mani > additionalProperties: true > -- > 2.29.2 >
Re: [PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property
On Sun, Dec 06, 2020 at 03:27:01AM +0200, Cristian Ciocaltea wrote: > Add a new common property 'reset-time-sec' to be used in conjunction > with the devices supporting the key pressed reset feature. > > Signed-off-by: Cristian Ciocaltea > --- > Changes in v3: > - This patch was not present in v2 > > Documentation/devicetree/bindings/input/input.yaml | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/Documentation/devicetree/bindings/input/input.yaml > b/Documentation/devicetree/bindings/input/input.yaml > index ab407f266bef..caba93209ae7 100644 > --- a/Documentation/devicetree/bindings/input/input.yaml > +++ b/Documentation/devicetree/bindings/input/input.yaml > @@ -34,4 +34,11 @@ properties: >specify this property. > $ref: /schemas/types.yaml#/definitions/uint32 > > + reset-time-sec: Humm, I'm pretty sure we already have something for this. Or maybe just power off. > +description: > + Duration in seconds which the key should be kept pressed for device to > + reset automatically. Device with key pressed reset feature can specify > + this property. > +$ref: /schemas/types.yaml#/definitions/uint32 > + > additionalProperties: true > -- > 2.29.2 >
Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time
On Wed, Dec 9, 2020 at 10:15 AM Greg Kroah-Hartman wrote: > > On Fri, Nov 20, 2020 at 06:02:15PM -0800, Saravana Kannan wrote: > > The current implementation of fw_devlink is very inefficient because it > > tries to get away without creating fwnode links in the name of saving > > memory usage. Past attempts to optimize runtime at the cost of memory > > usage were blocked with request for data showing that the optimization > > made significant improvement for real world scenarios. > > > > We have those scenarios now. There have been several reports of boot > > time increase in the order of seconds in this thread [1]. Several OEMs > > and SoC manufacturers have also privately reported significant > > (350-400ms) increase in boot time due to all the parsing done by > > fw_devlink. > > > > So this patch series refactors fw_devlink to be more efficient. The key > > difference now is the addition of support for fwnode links -- just a few > > simple APIs. This also allows most of the code to be moved out of > > firmware specific (DT mostly) code into driver core. > > > > This brings the following benefits: > > - Instead of parsing the device tree multiple times (complexity was > > close to O(N^3) where N in the number of properties) during bootup, > > fw_devlink parses each fwnode node/property only once and creates > > fwnode links. The rest of the fw_devlink code then just looks at these > > fwnode links to do rest of the work. > > > > - Makes it much easier to debug probe issue due to fw_devlink in the > > future. fw_devlink=on blocks the probing of devices if they depend on > > a device that hasn't been added yet. With this refactor, it'll be very > > easy to tell what that device is because we now have a reference to > > the fwnode of the device. > > > > - Much easier to add fw_devlink support to ACPI and other firmware > > types. A refactor to move the common bits from DT specific code to > > driver core was in my TODO list as a prerequisite to adding ACPI > > support to fw_devlink. This series gets that done. > > > > Laurent and Grygorii tested the v1 series and they saw boot time > > improvment of about 12 seconds and 3 seconds, respectively. > > Now queued up to my tree. Note, I had to hand-apply patches 13 and 16 > due to some reason (for 13, I have no idea, for 16 it was due to a > previous patch applied to my tree that I cc:ed you on.) > > Verifying I got it all correct would be great :) A quick diff of drivers/base/core.c between driver-core-testing and my local tree doesn't show any major diff (only some unrelated comment fixes). So, it looks fine. The patch 13 conflict is probably due to having to rebase the v2 series on top of this: https://lore.kernel.org/lkml/20201104205431.3795207-1-sarava...@google.com/ And looks like Patch 16 was handled fine. Thanks for applying the series. -Saravana
Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time
On Fri, Nov 20, 2020 at 06:02:15PM -0800, Saravana Kannan wrote: > The current implementation of fw_devlink is very inefficient because it > tries to get away without creating fwnode links in the name of saving > memory usage. Past attempts to optimize runtime at the cost of memory > usage were blocked with request for data showing that the optimization > made significant improvement for real world scenarios. > > We have those scenarios now. There have been several reports of boot > time increase in the order of seconds in this thread [1]. Several OEMs > and SoC manufacturers have also privately reported significant > (350-400ms) increase in boot time due to all the parsing done by > fw_devlink. > > So this patch series refactors fw_devlink to be more efficient. The key > difference now is the addition of support for fwnode links -- just a few > simple APIs. This also allows most of the code to be moved out of > firmware specific (DT mostly) code into driver core. > > This brings the following benefits: > - Instead of parsing the device tree multiple times (complexity was > close to O(N^3) where N in the number of properties) during bootup, > fw_devlink parses each fwnode node/property only once and creates > fwnode links. The rest of the fw_devlink code then just looks at these > fwnode links to do rest of the work. > > - Makes it much easier to debug probe issue due to fw_devlink in the > future. fw_devlink=on blocks the probing of devices if they depend on > a device that hasn't been added yet. With this refactor, it'll be very > easy to tell what that device is because we now have a reference to > the fwnode of the device. > > - Much easier to add fw_devlink support to ACPI and other firmware > types. A refactor to move the common bits from DT specific code to > driver core was in my TODO list as a prerequisite to adding ACPI > support to fw_devlink. This series gets that done. > > Laurent and Grygorii tested the v1 series and they saw boot time > improvment of about 12 seconds and 3 seconds, respectively. Now queued up to my tree. Note, I had to hand-apply patches 13 and 16 due to some reason (for 13, I have no idea, for 16 it was due to a previous patch applied to my tree that I cc:ed you on.) Verifying I got it all correct would be great :) thanks, greg k-h
[PATCH v16 4/9] time: Add mechanism to recognize clocksource in time_get_snapshot
From: Thomas Gleixner System time snapshots are not conveying information about the current clocksource which was used, but callers like the PTP KVM guest implementation have the requirement to evaluate the clocksource type to select the appropriate mechanism. Introduce a clocksource id field in struct clocksource which is by default set to CSID_GENERIC (0). Clocksource implementations can set that field to a value which allows to identify the clocksource. Store the clocksource id of the current clocksource in the system_time_snapshot so callers can evaluate which clocksource was used to take the snapshot and act accordingly. Signed-off-by: Thomas Gleixner Signed-off-by: Jianyong Wu --- include/linux/clocksource.h | 6 ++ include/linux/clocksource_ids.h | 11 +++ include/linux/timekeeping.h | 12 +++- kernel/time/clocksource.c | 2 ++ kernel/time/timekeeping.c | 1 + 5 files changed, 27 insertions(+), 5 deletions(-) create mode 100644 include/linux/clocksource_ids.h diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 86d143db6523..1290d0dce840 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -62,6 +63,10 @@ struct module; * 400-499: Perfect * The ideal clocksource. A must-use where * available. + * @id:Defaults to CSID_GENERIC. The id value is captured + * in certain snapshot functions to allow callers to + * validate the clocksource from which the snapshot was + * taken. * @flags: Flags describing special properties * @enable:Optional function to enable the clocksource * @disable: Optional function to disable the clocksource @@ -100,6 +105,7 @@ struct clocksource { const char *name; struct list_headlist; int rating; + enum clocksource_idsid; enum vdso_clock_modevdso_clock_mode; unsigned long flags; diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h new file mode 100644 index ..4d8e19e05328 --- /dev/null +++ b/include/linux/clocksource_ids.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_CLOCKSOURCE_IDS_H +#define _LINUX_CLOCKSOURCE_IDS_H + +/* Enum to give clocksources a unique identifier */ +enum clocksource_ids { + CSID_GENERIC= 0, + CSID_MAX, +}; + +#endif diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index d47009611109..688ec2e1a3bf 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -3,6 +3,7 @@ #define _LINUX_TIMEKEEPING_H #include +#include /* Included from linux/ktime.h */ @@ -243,11 +244,12 @@ struct ktime_timestamps { * @cs_was_changed_seq:The sequence number of clocksource change events */ struct system_time_snapshot { - u64 cycles; - ktime_t real; - ktime_t raw; - unsigned intclock_was_set_seq; - u8 cs_was_changed_seq; + u64 cycles; + ktime_t real; + ktime_t raw; + enum clocksource_idscs_id; + unsigned intclock_was_set_seq; + u8 cs_was_changed_seq; }; /** diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index cce484a2cc7c..4fe1df894ee5 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -920,6 +920,8 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq) clocksource_arch_init(cs); + if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX)) + cs->id = CSID_GENERIC; if (cs->vdso_clock_mode < 0 || cs->vdso_clock_mode >= VDSO_CLOCKMODE_MAX) { pr_warn("clocksource %s registered with invalid VDSO mode %d. Disabling VDSO support.\n", diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index a45cedda93a7..50f08632165c 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1049,6 +1049,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot) do { seq = read_seqcount_begin(&tk_core.seq); now = tk_clock_read(&tk->tkr_mono); + systime_snapshot->cs_id = tk->tkr_mono.clock->id; systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq; systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; base_real = ktime_add(tk->tkr_mono.base, -- 2.17.1
Re: [PATCH] soc: ti: omap-prm: Fix boot time errors for rst_map_012 bits 0 and 1
On Tue, 8 Dec 2020 16:08:02 +0200 Tony Lindgren wrote: > We have rst_map_012 used for various accelerators like dsp, ipu and iva. > For these use cases, we have rstctrl bit 2 control the subsystem module > reset, and have and bits 0 and 1 control the accelerator specific > features. > > If the bootloader, or kexec boot, has left any accelerator specific > reset bits deasserted, deasserting bit 2 reset will potentially enable > an accelerator with unconfigured MMU and no firmware. And we may get > spammed with a lot by warnings on boot with "Data Access in User mode > during Functional access", or depending on the accelerator, the system > can also just hang. > > This issue can be quite easily reproduced by setting a rst_map_012 type > rstctrl register to 0 or 4 in the bootloader, and booting the system. > > Let's just assert all reset bits for rst_map_012 type resets. So far > it looks like the other rstctrl types don't need this. If it turns out > that the other type rstctrl bits also need reset on init, we need to > add an instance specific reset mask for the bits to avoid resetting > unwanted bits. > > Reported-by: Carl Philipp Klemm > Cc: Philipp Zabel > Cc: Santosh Shilimkar > Cc: Suman Anna > Cc: Tero Kristo > Signed-off-by: Tony Lindgren > --- > drivers/soc/ti/omap_prm.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c > --- a/drivers/soc/ti/omap_prm.c > +++ b/drivers/soc/ti/omap_prm.c > @@ -860,6 +860,7 @@ static int omap_prm_reset_init(struct platform_device > *pdev, > const struct omap_rst_map *map; > struct ti_prm_platform_data *pdata = dev_get_platdata(&pdev->dev); > char buf[32]; > + u32 v; > > /* >* Check if we have controllable resets. If either rstctrl is non-zero > @@ -907,6 +908,16 @@ static int omap_prm_reset_init(struct platform_device > *pdev, > map++; > } > > + /* Quirk handling to assert rst_map_012 bits on reset and avoid errors > */ > + if (prm->data->rstmap == rst_map_012) { > + v = readl_relaxed(reset->prm->base + reset->prm->data->rstctrl); > + if ((v & reset->mask) != reset->mask) { > + dev_dbg(&pdev->dev, "Asserting all resets: %08x\n", v); > + writel_relaxed(reset->mask, reset->prm->base + > +reset->prm->data->rstctrl); > + } > + } > + > return devm_reset_controller_register(&pdev->dev, &reset->rcdev); > } > > -- > 2.29.2 Works for me on xt875, idle now also works without userspace hack. Tested-by: Carl Philipp Klemm
Re: [PATCH net v4] bonding: fix feature flag setting at init time
On Sat, 5 Dec 2020 12:22:29 -0500 Jarod Wilson wrote: > Don't try to adjust XFRM support flags if the bond device isn't yet > registered. Bad things can currently happen when netdev_change_features() > is called without having wanted_features fully filled in yet. This code > runs both on post-module-load mode changes, as well as at module init > time, and when run at module init time, it is before register_netdevice() > has been called and filled in wanted_features. The empty wanted_features > led to features also getting emptied out, which was definitely not the > intended behavior, so prevent that from happening. > > Originally, I'd hoped to stop adjusting wanted_features at all in the > bonding driver, as it's documented as being something only the network > core should touch, but we actually do need to do this to properly update > both the features and wanted_features fields when changing the bond type, > or we get to a situation where ethtool sees: > > esp-hw-offload: off [requested on] > > I do think we should be using netdev_update_features instead of > netdev_change_features here though, so we only send notifiers when the > features actually changed. > > Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") > Reported-by: Ivan Vecera > Suggested-by: Ivan Vecera Applied, thanks!
[PATCH] soc: ti: omap-prm: Fix boot time errors for rst_map_012 bits 0 and 1
We have rst_map_012 used for various accelerators like dsp, ipu and iva. For these use cases, we have rstctrl bit 2 control the subsystem module reset, and have and bits 0 and 1 control the accelerator specific features. If the bootloader, or kexec boot, has left any accelerator specific reset bits deasserted, deasserting bit 2 reset will potentially enable an accelerator with unconfigured MMU and no firmware. And we may get spammed with a lot by warnings on boot with "Data Access in User mode during Functional access", or depending on the accelerator, the system can also just hang. This issue can be quite easily reproduced by setting a rst_map_012 type rstctrl register to 0 or 4 in the bootloader, and booting the system. Let's just assert all reset bits for rst_map_012 type resets. So far it looks like the other rstctrl types don't need this. If it turns out that the other type rstctrl bits also need reset on init, we need to add an instance specific reset mask for the bits to avoid resetting unwanted bits. Reported-by: Carl Philipp Klemm Cc: Philipp Zabel Cc: Santosh Shilimkar Cc: Suman Anna Cc: Tero Kristo Signed-off-by: Tony Lindgren --- drivers/soc/ti/omap_prm.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c --- a/drivers/soc/ti/omap_prm.c +++ b/drivers/soc/ti/omap_prm.c @@ -860,6 +860,7 @@ static int omap_prm_reset_init(struct platform_device *pdev, const struct omap_rst_map *map; struct ti_prm_platform_data *pdata = dev_get_platdata(&pdev->dev); char buf[32]; + u32 v; /* * Check if we have controllable resets. If either rstctrl is non-zero @@ -907,6 +908,16 @@ static int omap_prm_reset_init(struct platform_device *pdev, map++; } + /* Quirk handling to assert rst_map_012 bits on reset and avoid errors */ + if (prm->data->rstmap == rst_map_012) { + v = readl_relaxed(reset->prm->base + reset->prm->data->rstctrl); + if ((v & reset->mask) != reset->mask) { + dev_dbg(&pdev->dev, "Asserting all resets: %08x\n", v); + writel_relaxed(reset->mask, reset->prm->base + + reset->prm->data->rstctrl); + } + } + return devm_reset_controller_register(&pdev->dev, &reset->rcdev); } -- 2.29.2
Re: [PATCH] arm64/irq: report bug if NR_IPI greater than max SGI during compile time
On Tue, Dec 8, 2020 at 5:51 PM Marc Zyngier wrote: > > On 2020-12-08 09:43, Pingfan Liu wrote: > > On Tue, Dec 8, 2020 at 5:31 PM Marc Zyngier wrote: > >> > >> On 2020-12-08 09:21, Pingfan Liu wrote: > >> > Although there is a runtime WARN_ON() when NR_IPR > max SGI, it had > >> > better > >> > do the check during built time, and associate these related code > >> > together. > >> > > >> > Signed-off-by: Pingfan Liu > >> > Cc: Catalin Marinas > >> > Cc: Will Deacon > >> > Cc: Thomas Gleixner > >> > Cc: Jason Cooper > >> > Cc: Marc Zyngier > >> > Cc: Mark Rutland > >> > To: linux-arm-ker...@lists.infradead.org > >> > Cc: linux-kernel@vger.kernel.org > >> > --- > >> > arch/arm64/kernel/smp.c| 2 ++ > >> > drivers/irqchip/irq-gic-v3.c | 2 +- > >> > drivers/irqchip/irq-gic.c | 2 +- > >> > include/linux/irqchip/arm-gic-common.h | 2 ++ > >> > 4 files changed, 6 insertions(+), 2 deletions(-) > >> > > >> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > >> > index 18e9727..9fc383c 100644 > >> > --- a/arch/arm64/kernel/smp.c > >> > +++ b/arch/arm64/kernel/smp.c > >> > @@ -33,6 +33,7 @@ > >> > #include > >> > #include > >> > #include > >> > +#include > >> > > >> > #include > >> > #include > >> > @@ -76,6 +77,7 @@ enum ipi_msg_type { > >> > IPI_WAKEUP, > >> > NR_IPI > >> > }; > >> > +static_assert(NR_IPI <= MAX_SGI_NUM); > >> > >> I am trying *very hard* to remove dependencies between the > >> architecture > >> code and random drivers, so this kind of check really is > >> counter-productive. > >> > >> Driver code should not have to know the number of IPIs, because there > >> is > >> no requirement that all IPIs should map 1:1 to SGIs. Conflating the > >> two > > > > Just curious about this. Is there an IPI which is not implemented by > > SGI? Or mapping several IPIs to a single SGI, and scatter out due to a > > global variable value? > > We currently have a single NS SGI left, and I'd like to move some of the > non-critical IPIs over to dispatching mechanism (the two "CPU stop" IPIs > definitely are candidate for merging). That's not implemented yet, but > I don't see a need to add checks that would otherwise violate this > IPI/SGI distinction. Got it. Thanks for your detailed explanation. Regards, Pingfan
Re: [PATCH] arm64/irq: report bug if NR_IPI greater than max SGI during compile time
On 2020-12-08 09:43, Pingfan Liu wrote: On Tue, Dec 8, 2020 at 5:31 PM Marc Zyngier wrote: On 2020-12-08 09:21, Pingfan Liu wrote: > Although there is a runtime WARN_ON() when NR_IPR > max SGI, it had > better > do the check during built time, and associate these related code > together. > > Signed-off-by: Pingfan Liu > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Thomas Gleixner > Cc: Jason Cooper > Cc: Marc Zyngier > Cc: Mark Rutland > To: linux-arm-ker...@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > --- > arch/arm64/kernel/smp.c| 2 ++ > drivers/irqchip/irq-gic-v3.c | 2 +- > drivers/irqchip/irq-gic.c | 2 +- > include/linux/irqchip/arm-gic-common.h | 2 ++ > 4 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 18e9727..9fc383c 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -33,6 +33,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -76,6 +77,7 @@ enum ipi_msg_type { > IPI_WAKEUP, > NR_IPI > }; > +static_assert(NR_IPI <= MAX_SGI_NUM); I am trying *very hard* to remove dependencies between the architecture code and random drivers, so this kind of check really is counter-productive. Driver code should not have to know the number of IPIs, because there is no requirement that all IPIs should map 1:1 to SGIs. Conflating the two Just curious about this. Is there an IPI which is not implemented by SGI? Or mapping several IPIs to a single SGI, and scatter out due to a global variable value? We currently have a single NS SGI left, and I'd like to move some of the non-critical IPIs over to dispatching mechanism (the two "CPU stop" IPIs definitely are candidate for merging). That's not implemented yet, but I don't see a need to add checks that would otherwise violate this IPI/SGI distinction. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH] arm64/irq: report bug if NR_IPI greater than max SGI during compile time
On Tue, Dec 8, 2020 at 5:31 PM Marc Zyngier wrote: > > On 2020-12-08 09:21, Pingfan Liu wrote: > > Although there is a runtime WARN_ON() when NR_IPR > max SGI, it had > > better > > do the check during built time, and associate these related code > > together. > > > > Signed-off-by: Pingfan Liu > > Cc: Catalin Marinas > > Cc: Will Deacon > > Cc: Thomas Gleixner > > Cc: Jason Cooper > > Cc: Marc Zyngier > > Cc: Mark Rutland > > To: linux-arm-ker...@lists.infradead.org > > Cc: linux-kernel@vger.kernel.org > > --- > > arch/arm64/kernel/smp.c| 2 ++ > > drivers/irqchip/irq-gic-v3.c | 2 +- > > drivers/irqchip/irq-gic.c | 2 +- > > include/linux/irqchip/arm-gic-common.h | 2 ++ > > 4 files changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > index 18e9727..9fc383c 100644 > > --- a/arch/arm64/kernel/smp.c > > +++ b/arch/arm64/kernel/smp.c > > @@ -33,6 +33,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -76,6 +77,7 @@ enum ipi_msg_type { > > IPI_WAKEUP, > > NR_IPI > > }; > > +static_assert(NR_IPI <= MAX_SGI_NUM); > > I am trying *very hard* to remove dependencies between the architecture > code and random drivers, so this kind of check really is > counter-productive. > > Driver code should not have to know the number of IPIs, because there is > no requirement that all IPIs should map 1:1 to SGIs. Conflating the two Just curious about this. Is there an IPI which is not implemented by SGI? Or mapping several IPIs to a single SGI, and scatter out due to a global variable value? Thanks, Pingfan > is already wrong, and I really don't want to add more of that. > > Thanks, > > M. > -- > Jazz is not dead. It just smells funny...
Re: [PATCH] arm64/irq: report bug if NR_IPI greater than max SGI during compile time
On 2020-12-08 09:21, Pingfan Liu wrote: Although there is a runtime WARN_ON() when NR_IPR > max SGI, it had better do the check during built time, and associate these related code together. Signed-off-by: Pingfan Liu Cc: Catalin Marinas Cc: Will Deacon Cc: Thomas Gleixner Cc: Jason Cooper Cc: Marc Zyngier Cc: Mark Rutland To: linux-arm-ker...@lists.infradead.org Cc: linux-kernel@vger.kernel.org --- arch/arm64/kernel/smp.c| 2 ++ drivers/irqchip/irq-gic-v3.c | 2 +- drivers/irqchip/irq-gic.c | 2 +- include/linux/irqchip/arm-gic-common.h | 2 ++ 4 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 18e9727..9fc383c 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -76,6 +77,7 @@ enum ipi_msg_type { IPI_WAKEUP, NR_IPI }; +static_assert(NR_IPI <= MAX_SGI_NUM); I am trying *very hard* to remove dependencies between the architecture code and random drivers, so this kind of check really is counter-productive. Driver code should not have to know the number of IPIs, because there is no requirement that all IPIs should map 1:1 to SGIs. Conflating the two is already wrong, and I really don't want to add more of that. Thanks, M. -- Jazz is not dead. It just smells funny...
[PATCH] arm64/irq: report bug if NR_IPI greater than max SGI during compile time
Although there is a runtime WARN_ON() when NR_IPR > max SGI, it had better do the check during built time, and associate these related code together. Signed-off-by: Pingfan Liu Cc: Catalin Marinas Cc: Will Deacon Cc: Thomas Gleixner Cc: Jason Cooper Cc: Marc Zyngier Cc: Mark Rutland To: linux-arm-ker...@lists.infradead.org Cc: linux-kernel@vger.kernel.org --- arch/arm64/kernel/smp.c| 2 ++ drivers/irqchip/irq-gic-v3.c | 2 +- drivers/irqchip/irq-gic.c | 2 +- include/linux/irqchip/arm-gic-common.h | 2 ++ 4 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 18e9727..9fc383c 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -76,6 +77,7 @@ enum ipi_msg_type { IPI_WAKEUP, NR_IPI }; +static_assert(NR_IPI <= MAX_SGI_NUM); static int ipi_irq_base __read_mostly; static int nr_ipi __read_mostly = NR_IPI; diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 16fecc0..ee13f85 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1162,7 +1162,7 @@ static void __init gic_smp_init(void) gic_starting_cpu, NULL); /* Register all 8 non-secure SGIs */ - base_sgi = __irq_domain_alloc_irqs(gic_data.domain, -1, 8, + base_sgi = __irq_domain_alloc_irqs(gic_data.domain, -1, MAX_SGI_NUM, NUMA_NO_NODE, &sgi_fwspec, false, NULL); if (WARN_ON(base_sgi <= 0)) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 6053245..07d36de 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -845,7 +845,7 @@ static __init void gic_smp_init(void) "irqchip/arm/gic:starting", gic_starting_cpu, NULL); - base_sgi = __irq_domain_alloc_irqs(gic_data[0].domain, -1, 8, + base_sgi = __irq_domain_alloc_irqs(gic_data[0].domain, -1, MAX_SGI_NUM, NUMA_NO_NODE, &sgi_fwspec, false, NULL); if (WARN_ON(base_sgi <= 0)) diff --git a/include/linux/irqchip/arm-gic-common.h b/include/linux/irqchip/arm-gic-common.h index fa8c045..7e45a9f 100644 --- a/include/linux/irqchip/arm-gic-common.h +++ b/include/linux/irqchip/arm-gic-common.h @@ -16,6 +16,8 @@ (GICD_INT_DEF_PRI << 8) |\ GICD_INT_DEF_PRI) +#define MAX_SGI_NUM8 + enum gic_type { GIC_V2, GIC_V3, -- 2.7.5
[PATCH rdma-next 0/3] Various fixes collected over time
From: Leon Romanovsky Hi, This is set of various and unrelated fixes that we collected over time. Thanks Avihai Horon (1): RDMA/uverbs: Fix incorrect variable type Jack Morgenstein (2): RDMA/core: Clean up cq pool mechanism RDMA/core: Do not indicate device ready when device enablement fails drivers/infiniband/core/core_priv.h | 3 +-- drivers/infiniband/core/cq.c | 12 ++-- drivers/infiniband/core/device.c | 16 ++-- .../infiniband/core/uverbs_std_types_device.c| 14 +- include/rdma/uverbs_ioctl.h | 10 ++ 5 files changed, 28 insertions(+), 27 deletions(-) -- 2.28.0
Re: Ftrace startup test and boot-time tracing
On Tue, 8 Dec 2020 08:26:49 +0900 Masami Hiramatsu wrote: > Hi Steve, > > On Mon, 7 Dec 2020 15:25:40 -0500 > Steven Rostedt wrote: > > > On Mon, 7 Dec 2020 23:02:59 +0900 > > Masami Hiramatsu wrote: > > > > > There will be the 2 options, one is to change kconfig so that user can not > > > select FTRACE_STARTUP_TEST if BOOTTIME_TRACING=y, another is to provide > > > a flag from trace_boot and all tests checks the flag at runtime. > > > (moreover, that flag will be good to be set from other command-line > > > options) > > > What would you think? > > > > Yeah, a "disable_ftrace_startup_tests" flag should be implemented. And > > something that could also be on the kernel command line itself :-) > > > > "disabe_ftrace_startup_tests" > > > > Sometimes when debugging something, I don't want the tests running, even > > though the config has them, and I don't want to change the config. > > OK, BTW, I found tracing_selftest_disabled, it seemed what we need. > Yeah, I thought we had something like this. It's getting hard to keep track of ;-) -- Steve
Re: Ftrace startup test and boot-time tracing
Hi Steve, On Mon, 7 Dec 2020 15:25:40 -0500 Steven Rostedt wrote: > On Mon, 7 Dec 2020 23:02:59 +0900 > Masami Hiramatsu wrote: > > > There will be the 2 options, one is to change kconfig so that user can not > > select FTRACE_STARTUP_TEST if BOOTTIME_TRACING=y, another is to provide > > a flag from trace_boot and all tests checks the flag at runtime. > > (moreover, that flag will be good to be set from other command-line options) > > What would you think? > > Yeah, a "disable_ftrace_startup_tests" flag should be implemented. And > something that could also be on the kernel command line itself :-) > > "disabe_ftrace_startup_tests" > > Sometimes when debugging something, I don't want the tests running, even > though the config has them, and I don't want to change the config. OK, BTW, I found tracing_selftest_disabled, it seemed what we need. Thank you, -- Masami Hiramatsu
[PATCH v2 1/2] Add save/restore of Precision Time Measurement capability
The PCI subsystem does not currently save and restore the configuration space for the Precision Time Measurement (PTM) PCIe extended capability leading to the possibility of the feature returning disabled on S3 resume. This has been observed on Intel Coffee Lake desktops. Add save/restore of the PTM control register. This saves the PTM Enable, Root Select, and Effective Granularity bits. Suggested-by: Rafael J. Wysocki Signed-off-by: David E. Box --- Changes from V1: - Move save/restore functions to ptm.c - Move pci_add_ext_cap_sve_buffer() to pci_ptm_init in ptm.c drivers/pci/pci.c | 2 ++ drivers/pci/pci.h | 8 drivers/pci/pcie/ptm.c | 43 ++ 3 files changed, 53 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e578d34095e9..12ba6351c05b 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1566,6 +1566,7 @@ int pci_save_state(struct pci_dev *dev) pci_save_ltr_state(dev); pci_save_dpc_state(dev); pci_save_aer_state(dev); + pci_save_ptm_state(dev); return pci_save_vc_state(dev); } EXPORT_SYMBOL(pci_save_state); @@ -1677,6 +1678,7 @@ void pci_restore_state(struct pci_dev *dev) pci_restore_vc_state(dev); pci_restore_rebar_state(dev); pci_restore_dpc_state(dev); + pci_restore_ptm_state(dev); pci_aer_clear_status(dev); pci_restore_aer_state(dev); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index f86cae9aa1f4..62cdacba5954 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -516,6 +516,14 @@ static inline int pci_iov_bus_range(struct pci_bus *bus) #endif /* CONFIG_PCI_IOV */ +#ifdef CONFIG_PCIE_PTM +void pci_save_ptm_state(struct pci_dev *dev); +void pci_restore_ptm_state(struct pci_dev *dev); +#else +static inline void pci_save_ptm_state(struct pci_dev *dev) {} +static inline void pci_restore_ptm_state(struct pci_dev *dev) {} +#endif + unsigned long pci_cardbus_resource_alignment(struct resource *); static inline resource_size_t pci_resource_alignment(struct pci_dev *dev, diff --git a/drivers/pci/pcie/ptm.c b/drivers/pci/pcie/ptm.c index 357a454cafa0..6b24a1c9327a 100644 --- a/drivers/pci/pcie/ptm.c +++ b/drivers/pci/pcie/ptm.c @@ -29,6 +29,47 @@ static void pci_ptm_info(struct pci_dev *dev) dev->ptm_root ? " (root)" : "", clock_desc); } +void pci_save_ptm_state(struct pci_dev *dev) +{ + int ptm; + struct pci_cap_saved_state *save_state; + u16 *cap; + + if (!pci_is_pcie(dev)) + return; + + ptm = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PTM); + if (!ptm) + return; + + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_PTM); + if (!save_state) { + pci_err(dev, "no suspend buffer for PTM\n"); + return; + } + + cap = (u16 *)&save_state->cap.data[0]; + pci_read_config_word(dev, ptm + PCI_PTM_CTRL, cap); +} + +void pci_restore_ptm_state(struct pci_dev *dev) +{ + struct pci_cap_saved_state *save_state; + int ptm; + u16 *cap; + + if (!pci_is_pcie(dev)) + return; + + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_PTM); + ptm = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PTM); + if (!save_state || !ptm) + return; + + cap = (u16 *)&save_state->cap.data[0]; + pci_write_config_word(dev, ptm + PCI_PTM_CTRL, *cap); +} + void pci_ptm_init(struct pci_dev *dev) { int pos; @@ -65,6 +106,8 @@ void pci_ptm_init(struct pci_dev *dev) if (!pos) return; + pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_PTM, sizeof(u16)); + pci_read_config_dword(dev, pos + PCI_PTM_CAP, &cap); local_clock = (cap & PCI_PTM_GRANULARITY_MASK) >> 8; -- 2.20.1
Re: Ftrace startup test and boot-time tracing
On Mon, 7 Dec 2020 23:02:59 +0900 Masami Hiramatsu wrote: > There will be the 2 options, one is to change kconfig so that user can not > select FTRACE_STARTUP_TEST if BOOTTIME_TRACING=y, another is to provide > a flag from trace_boot and all tests checks the flag at runtime. > (moreover, that flag will be good to be set from other command-line options) > What would you think? Yeah, a "disable_ftrace_startup_tests" flag should be implemented. And something that could also be on the kernel command line itself :-) "disabe_ftrace_startup_tests" Sometimes when debugging something, I don't want the tests running, even though the config has them, and I don't want to change the config. -- Steve
Ftrace startup test and boot-time tracing
Hi Steve, I found that if I enabled the CONFIG_FTRACE_STARTUP_TEST=y and booted the kernel with kprobe-events defined by boot-time tracing, a warning output. [ 59.803496] trace_kprobe: Testing kprobe tracing: [ 59.804258] [ cut here ] [ 59.805682] WARNING: CPU: 3 PID: 1 at kernel/trace/trace_kprobe.c:1987 kprobe_trace_self_tests_ib [ 59.806944] Modules linked in: [ 59.807335] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc7+ #172 [ 59.808029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/204 [ 59.808999] RIP: 0010:kprobe_trace_self_tests_init+0x5f/0x42b [ 59.809696] Code: e8 03 00 00 48 c7 c7 30 8e 07 82 e8 6d 3c 46 ff 48 c7 c6 00 b2 1a 81 48 c7 c7 7 [ 59.812439] RSP: 0018:c9013e78 EFLAGS: 00010282 [ 59.813038] RAX: ffef RBX: RCX: 00049443 [ 59.813780] RDX: 00049403 RSI: 00049403 RDI: 0002deb0 [ 59.814589] RBP: c9013e90 R08: 0001 R09: 0001 [ 59.815349] R10: 0001 R11: R12: ffef [ 59.816138] R13: 888004613d80 R14: 82696940 R15: 888004429138 [ 59.816877] FS: () GS:88807dcc() knlGS: [ 59.817772] CS: 0010 DS: ES: CR0: 80050033 [ 59.818395] CR2: 01a8dd38 CR3: 0000 CR4: 06a0 [ 59.819144] Call Trace: [ 59.819469] ? init_kprobe_trace+0x6b/0x6b [ 59.819948] do_one_initcall+0x5f/0x300 [ 59.820392] ? rcu_read_lock_sched_held+0x4f/0x80 [ 59.820916] kernel_init_freeable+0x22a/0x271 [ 59.821416] ? rest_init+0x241/0x241 [ 59.821841] kernel_init+0xe/0x10f [ 59.822251] ret_from_fork+0x22/0x30 [ 59.822683] irq event stamp: 16403349 [ 59.823121] hardirqs last enabled at (16403359): [] console_unlock+0x48e/0x580 [ 59.824074] hardirqs last disabled at (16403368): [] console_unlock+0x3f6/0x580 [ 59.825036] softirqs last enabled at (16403200): [] __do_softirq+0x33a/0x484 [ 59.825982] softirqs last disabled at (16403087): [] asm_call_irq_on_stack+0x10 [ 59.827034] ---[ end trace 200c544775cdfeb3 ]--- [ 59.827635] trace_kprobe: error on probing function entry. This is actually similar issue which you had fixed with commit b6399cc78934 ("tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline"). Fixing this kprobes warning is easy (see attached below), but I think this has to be fixed widely, because other testcase also changes the boot-time tracing results or may not work correctly with it. There will be the 2 options, one is to change kconfig so that user can not select FTRACE_STARTUP_TEST if BOOTTIME_TRACING=y, another is to provide a flag from trace_boot and all tests checks the flag at runtime. (moreover, that flag will be good to be set from other command-line options) What would you think? Thank you, >From 00037083baca07a8705da39852480f6f53a8297c Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Mon, 7 Dec 2020 22:53:16 +0900 Subject: [PATCH] tracing/kprobes: Fix to skip kprobe-events startup test if kprobe-events is used commit b6399cc78934 ("tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline") had fixed the same issue with kprobe-events on kernel cmdline, but boot-time tracing re-introduce similar issue. When the boot-time tracing uses kprobe-events with ftrace startup test, it produced a warning on the kprobe-events startup test because the testcase doesn't expect any kprobe events exists. To mitigate the warning, skip the kprobe-events startup test if any kprobe-event is defined before starting the test. Fixes: 4d655281eb1b ("tracing/boot Add kprobe event support") Cc: sta...@vger.kernel.org Signed-off-by: Masami Hiramatsu --- kernel/trace/trace_kprobe.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index b911e9f6d9f5..515e139236f2 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -25,7 +25,6 @@ /* Kprobe early definition from command line */ static char kprobe_boot_events_buf[COMMAND_LINE_SIZE] __initdata; -static bool kprobe_boot_events_enabled __initdata; static int __init set_kprobe_boot_events(char *str) { @@ -1887,8 +1886,6 @@ static __init void setup_boot_kprobe_events(void) ret = trace_run_command(cmd, create_or_delete_trace_kprobe); if (ret) pr_warn("Failed to add event(%d): %s\n", ret, cmd); - else - kprobe_boot_events_enabled = true; cmd = p; } @@ -1959,6 +1956,20 @@ find_trace_probe_file(struct trace_kprobe *tk, struct trace_array *tr) return NULL; } +static __init int trace_kprobe_exist(void) +{ + struct trace_kprobe *tk;
[PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property
Add a new common property 'reset-time-sec' to be used in conjunction with the devices supporting the key pressed reset feature. Signed-off-by: Cristian Ciocaltea --- Changes in v3: - This patch was not present in v2 Documentation/devicetree/bindings/input/input.yaml | 7 +++ 1 file changed, 7 insertions(+) diff --git a/Documentation/devicetree/bindings/input/input.yaml b/Documentation/devicetree/bindings/input/input.yaml index ab407f266bef..caba93209ae7 100644 --- a/Documentation/devicetree/bindings/input/input.yaml +++ b/Documentation/devicetree/bindings/input/input.yaml @@ -34,4 +34,11 @@ properties: specify this property. $ref: /schemas/types.yaml#/definitions/uint32 + reset-time-sec: +description: + Duration in seconds which the key should be kept pressed for device to + reset automatically. Device with key pressed reset feature can specify + this property. +$ref: /schemas/types.yaml#/definitions/uint32 + additionalProperties: true -- 2.29.2
Re: [PATCH net v3] bonding: fix feature flag setting at init time
On Thu, Dec 3, 2020 at 11:45 AM Jakub Kicinski wrote: ... > nit: let's narrow down the ifdef-enery > > no need for the ifdef here, if the helper looks like this: > > +static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode) > +{ > +#ifdef CONFIG_XFRM_OFFLOAD > + if (mode == BOND_MODE_ACTIVEBACKUP) > + bond_dev->wanted_features |= BOND_XFRM_FEATURES; > + else > + bond_dev->wanted_features &= ~BOND_XFRM_FEATURES; > + > + netdev_update_features(bond_dev); > +#endif /* CONFIG_XFRM_OFFLOAD */ > +} > > Even better: > > +static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode) > +{ > + if (!IS_ENABLED(CONFIG_XFRM_OFFLOAD)) > + return; > + > + if (mode == BOND_MODE_ACTIVEBACKUP) > + bond_dev->wanted_features |= BOND_XFRM_FEATURES; > + else > + bond_dev->wanted_features &= ~BOND_XFRM_FEATURES; > + > + netdev_update_features(bond_dev); > +} > > (Assuming BOND_XFRM_FEATURES doesn't itself hide under an ifdef.) It is, but doesn't need to be. I can mix these changes in as well. -- Jarod Wilson ja...@redhat.com
[PATCH net v4] bonding: fix feature flag setting at init time
Don't try to adjust XFRM support flags if the bond device isn't yet registered. Bad things can currently happen when netdev_change_features() is called without having wanted_features fully filled in yet. This code runs both on post-module-load mode changes, as well as at module init time, and when run at module init time, it is before register_netdevice() has been called and filled in wanted_features. The empty wanted_features led to features also getting emptied out, which was definitely not the intended behavior, so prevent that from happening. Originally, I'd hoped to stop adjusting wanted_features at all in the bonding driver, as it's documented as being something only the network core should touch, but we actually do need to do this to properly update both the features and wanted_features fields when changing the bond type, or we get to a situation where ethtool sees: esp-hw-offload: off [requested on] I do think we should be using netdev_update_features instead of netdev_change_features here though, so we only send notifiers when the features actually changed. Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") Reported-by: Ivan Vecera Suggested-by: Ivan Vecera Cc: Jay Vosburgh Cc: Veaceslav Falico Cc: Andy Gospodarek Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Thomas Davis Cc: net...@vger.kernel.org Signed-off-by: Jarod Wilson --- v2: rework based on further testing and suggestions from ivecera v3: add helper function, remove goto v4: drop hunk not directly related to fix, clean up ifdeffery drivers/net/bonding/bond_options.c | 22 +++--- include/net/bonding.h | 2 -- 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 9abfaae1c6f7..a4e4e15f574d 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -745,6 +745,19 @@ const struct bond_option *bond_opt_get(unsigned int option) return &bond_opts[option]; } +static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode) +{ + if (!IS_ENABLED(CONFIG_XFRM_OFFLOAD)) + return; + + if (mode == BOND_MODE_ACTIVEBACKUP) + bond_dev->wanted_features |= BOND_XFRM_FEATURES; + else + bond_dev->wanted_features &= ~BOND_XFRM_FEATURES; + + netdev_update_features(bond_dev); +} + static int bond_option_mode_set(struct bonding *bond, const struct bond_opt_value *newval) { @@ -767,13 +780,8 @@ static int bond_option_mode_set(struct bonding *bond, if (newval->value == BOND_MODE_ALB) bond->params.tlb_dynamic_lb = 1; -#ifdef CONFIG_XFRM_OFFLOAD - if (newval->value == BOND_MODE_ACTIVEBACKUP) - bond->dev->wanted_features |= BOND_XFRM_FEATURES; - else - bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; - netdev_change_features(bond->dev); -#endif /* CONFIG_XFRM_OFFLOAD */ + if (bond->dev->reg_state == NETREG_REGISTERED) + bond_set_xfrm_features(bond->dev, newval->value); /* don't cache arp_validate between modes */ bond->params.arp_validate = BOND_ARP_VALIDATE_NONE; diff --git a/include/net/bonding.h b/include/net/bonding.h index d9d0ff3b0ad3..adc3da776970 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -86,10 +86,8 @@ #define bond_for_each_slave_rcu(bond, pos, iter) \ netdev_for_each_lower_private_rcu((bond)->dev, pos, iter) -#ifdef CONFIG_XFRM_OFFLOAD #define BOND_XFRM_FEATURES (NETIF_F_HW_ESP | NETIF_F_HW_ESP_TX_CSUM | \ NETIF_F_GSO_ESP) -#endif /* CONFIG_XFRM_OFFLOAD */ #ifdef CONFIG_NET_POLL_CONTROLLER extern atomic_t netpoll_block_tx; -- 2.28.0
Re: [PATCH net v3] bonding: fix feature flag setting at init time
On Thu, 3 Dec 2020 22:14:12 -0500 Jarod Wilson wrote: > On Thu, Dec 3, 2020 at 11:50 AM Jakub Kicinski wrote: > > > > On Wed, 2 Dec 2020 19:43:57 -0500 Jarod Wilson wrote: > > > bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4; > > > -#ifdef CONFIG_XFRM_OFFLOAD > > > - bond_dev->hw_features |= BOND_XFRM_FEATURES; > > > -#endif /* CONFIG_XFRM_OFFLOAD */ > > > bond_dev->features |= bond_dev->hw_features; > > > bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | > > > NETIF_F_HW_VLAN_STAG_TX; > > > #ifdef CONFIG_XFRM_OFFLOAD > > > - /* Disable XFRM features if this isn't an active-backup config */ > > > - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) > > > - bond_dev->features &= ~BOND_XFRM_FEATURES; > > > + bond_dev->hw_features |= BOND_XFRM_FEATURES; > > > + /* Only enable XFRM features if this is an active-backup config */ > > > + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) > > > + bond_dev->features |= BOND_XFRM_FEATURES; > > > #endif /* CONFIG_XFRM_OFFLOAD */ > > > > This makes no functional change, or am I reading it wrong? > > You are correct, there's ultimately no functional change there, it > primarily just condenses the code down to a single #ifdef block, and > doesn't add and then remove BOND_XFRM_FEATURES from > bond_dev->features, instead omitting it initially and only adding it > when in AB mode. I'd poked at the code in that area while trying to > get to the bottom of this, thought it made it more understandable, so > I left it in, but ultimately, it's not necessary to fix the problem > here. Makes sense, but please split it out and send separately to net-next.
Re: [PATCH v2 1/2] KVM: arm64: Some fixes of PV-time interface document
On 2020/12/3 23:04, Marc Zyngier wrote: > On 2020-08-17 12:07, Keqian Zhu wrote: >> Rename PV_FEATURES to PV_TIME_FEATURES. >> >> Signed-off-by: Keqian Zhu >> Reviewed-by: Andrew Jones >> Reviewed-by: Steven Price >> --- >> Documentation/virt/kvm/arm/pvtime.rst | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/Documentation/virt/kvm/arm/pvtime.rst >> b/Documentation/virt/kvm/arm/pvtime.rst >> index 687b60d..94bffe2 100644 >> --- a/Documentation/virt/kvm/arm/pvtime.rst >> +++ b/Documentation/virt/kvm/arm/pvtime.rst >> @@ -3,7 +3,7 @@ >> Paravirtualized time support for arm64 >> == >> >> -Arm specification DEN0057/A defines a standard for paravirtualised time >> +Arm specification DEN0057/A defines a standard for paravirtualized time >> support for AArch64 guests: > > nit: I do object to this change (some of us are British! ;-). Oh, I will pay attention to this. Thanks! Keqian > > M.
Re: [PATCH net v3] bonding: fix feature flag setting at init time
On Thu, Dec 3, 2020 at 11:50 AM Jakub Kicinski wrote: > > On Wed, 2 Dec 2020 19:43:57 -0500 Jarod Wilson wrote: > > bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4; > > -#ifdef CONFIG_XFRM_OFFLOAD > > - bond_dev->hw_features |= BOND_XFRM_FEATURES; > > -#endif /* CONFIG_XFRM_OFFLOAD */ > > bond_dev->features |= bond_dev->hw_features; > > bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | > > NETIF_F_HW_VLAN_STAG_TX; > > #ifdef CONFIG_XFRM_OFFLOAD > > - /* Disable XFRM features if this isn't an active-backup config */ > > - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) > > - bond_dev->features &= ~BOND_XFRM_FEATURES; > > + bond_dev->hw_features |= BOND_XFRM_FEATURES; > > + /* Only enable XFRM features if this is an active-backup config */ > > + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) > > + bond_dev->features |= BOND_XFRM_FEATURES; > > #endif /* CONFIG_XFRM_OFFLOAD */ > > This makes no functional change, or am I reading it wrong? You are correct, there's ultimately no functional change there, it primarily just condenses the code down to a single #ifdef block, and doesn't add and then remove BOND_XFRM_FEATURES from bond_dev->features, instead omitting it initially and only adding it when in AB mode. I'd poked at the code in that area while trying to get to the bottom of this, thought it made it more understandable, so I left it in, but ultimately, it's not necessary to fix the problem here. -- Jarod Wilson ja...@redhat.com
Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time
On Tue, Nov 24, 2020 at 12:29 AM 'Tomi Valkeinen' via kernel-team wrote: > > Hi, > > On 21/11/2020 04:02, Saravana Kannan wrote: > > The current implementation of fw_devlink is very inefficient because it > > tries to get away without creating fwnode links in the name of saving > > memory usage. Past attempts to optimize runtime at the cost of memory > > usage were blocked with request for data showing that the optimization > > made significant improvement for real world scenarios. > > > > We have those scenarios now. There have been several reports of boot > > time increase in the order of seconds in this thread [1]. Several OEMs > > and SoC manufacturers have also privately reported significant > > (350-400ms) increase in boot time due to all the parsing done by > > fw_devlink. > > > > So this patch series refactors fw_devlink to be more efficient. The key > > difference now is the addition of support for fwnode links -- just a few > > simple APIs. This also allows most of the code to be moved out of > > firmware specific (DT mostly) code into driver core. > > > > This brings the following benefits: > > - Instead of parsing the device tree multiple times (complexity was > > close to O(N^3) where N in the number of properties) during bootup, > > fw_devlink parses each fwnode node/property only once and creates > > fwnode links. The rest of the fw_devlink code then just looks at these > > fwnode links to do rest of the work. > > > > - Makes it much easier to debug probe issue due to fw_devlink in the > > future. fw_devlink=on blocks the probing of devices if they depend on > > a device that hasn't been added yet. With this refactor, it'll be very > > easy to tell what that device is because we now have a reference to > > the fwnode of the device. > > > > - Much easier to add fw_devlink support to ACPI and other firmware > > types. A refactor to move the common bits from DT specific code to > > driver core was in my TODO list as a prerequisite to adding ACPI > > support to fw_devlink. This series gets that done. > > > > Laurent and Grygorii tested the v1 series and they saw boot time > > improvment of about 12 seconds and 3 seconds, respectively. > > Tested v2 on OMAP4 SDP. With my particular config, boot time to starting init > went from 18.5 seconds > to 12.5 seconds. > > Tomi Rafael, Friendly reminder for a review. -Saravana
[for-next][PATCH 3/3] ring-buffer: Add test to validate the time stamp deltas
From: "Steven Rostedt (VMware)" While debugging a situation where a delta for an event was calucalted wrong, I realize there was nothing making sure that the delta of events are correct. If a single event has an incorrect delta, then all events after it will also have one. If the discrepency gets large enough, it could cause the time stamps to go backwards when crossing sub buffers, that record a full 64 bit time stamp, and the new deltas are added to that. Add a way to validate the events at most events and when crossing a buffer page. This will help make sure that the deltas are always correct. This test will detect if they are ever corrupted. The test adds a high overhead to the ring buffer recording, as it does the audit for almost every event, and should only be used for testing the ring buffer. This will catch the bug that is fixed by commit 55ea4cf40380 ("ring-buffer: Update write stamp with the correct ts"), which is not applied when this commit is applied. Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/Kconfig | 20 + kernel/trace/ring_buffer.c | 150 + 2 files changed, 170 insertions(+) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index c9b64dea1216..fe60f9d7a0e6 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -845,6 +845,26 @@ config RING_BUFFER_STARTUP_TEST If unsure, say N +config RING_BUFFER_VALIDATE_TIME_DELTAS + bool "Verify ring buffer time stamp deltas" + depends on RING_BUFFER + help + This will audit the time stamps on the ring buffer sub + buffer to make sure that all the time deltas for the + events on a sub buffer matches the current time stamp. + This audit is performed for every event that is not + interrupted, or interrupting another event. A check + is also made when traversing sub buffers to make sure + that all the deltas on the previous sub buffer do not + add up to be greater than the current time stamp. + + NOTE: This adds significant overhead to recording of events, + and should only be used to test the logic of the ring buffer. + Do not use it on production systems. + + Only say Y if you understand what this does, and you + still want it enabled. Otherwise say N + config MMIOTRACE_TEST tristate "Test module for mmiotrace" depends on MMIOTRACE && m diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index ab68f28b8f4b..7cd888ee9ac7 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3193,6 +3193,153 @@ int ring_buffer_unlock_commit(struct trace_buffer *buffer, } EXPORT_SYMBOL_GPL(ring_buffer_unlock_commit); +/* Special value to validate all deltas on a page. */ +#define CHECK_FULL_PAGE1L + +#ifdef CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS +static void dump_buffer_page(struct buffer_data_page *bpage, +struct rb_event_info *info, +unsigned long tail) +{ + struct ring_buffer_event *event; + u64 ts, delta; + int e; + + ts = bpage->time_stamp; + pr_warn(" [%lld] PAGE TIME STAMP\n", ts); + + for (e = 0; e < tail; e += rb_event_length(event)) { + + event = (struct ring_buffer_event *)(bpage->data + e); + + switch (event->type_len) { + + case RINGBUF_TYPE_TIME_EXTEND: + delta = ring_buffer_event_time_stamp(event); + ts += delta; + pr_warn(" [%lld] delta:%lld TIME EXTEND\n", ts, delta); + break; + + case RINGBUF_TYPE_TIME_STAMP: + delta = ring_buffer_event_time_stamp(event); + ts = delta; + pr_warn(" [%lld] absolute:%lld TIME STAMP\n", ts, delta); + break; + + case RINGBUF_TYPE_PADDING: + ts += event->time_delta; + pr_warn(" [%lld] delta:%d PADDING\n", ts, event->time_delta); + break; + + case RINGBUF_TYPE_DATA: + ts += event->time_delta; + pr_warn(" [%lld] delta:%d\n", ts, event->time_delta); + break; + + default: + break; + } + } +} + +static DEFINE_PER_CPU(atomic_t, checking); +static atomic_t ts_dump; + +/* + * Check if the current event time stamp matches the deltas on + * the buffer page. + */ +static void check_buffer(struct ring_buffer_per_cpu *cpu_buffer, +struct rb_event_info *info, +unsigned long tail) +{ + struct ring_buffer_event *event; + struct buffer_
Re: [PATCH net v3] bonding: fix feature flag setting at init time
On Wed, 2 Dec 2020 19:43:57 -0500 Jarod Wilson wrote: > bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4; > -#ifdef CONFIG_XFRM_OFFLOAD > - bond_dev->hw_features |= BOND_XFRM_FEATURES; > -#endif /* CONFIG_XFRM_OFFLOAD */ > bond_dev->features |= bond_dev->hw_features; > bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; > #ifdef CONFIG_XFRM_OFFLOAD > - /* Disable XFRM features if this isn't an active-backup config */ > - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) > - bond_dev->features &= ~BOND_XFRM_FEATURES; > + bond_dev->hw_features |= BOND_XFRM_FEATURES; > + /* Only enable XFRM features if this is an active-backup config */ > + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) > + bond_dev->features |= BOND_XFRM_FEATURES; > #endif /* CONFIG_XFRM_OFFLOAD */ This makes no functional change, or am I reading it wrong?
Re: [PATCH net v3] bonding: fix feature flag setting at init time
On Wed, 2 Dec 2020 19:43:57 -0500 Jarod Wilson wrote: > Don't try to adjust XFRM support flags if the bond device isn't yet > registered. Bad things can currently happen when netdev_change_features() > is called without having wanted_features fully filled in yet. This code > runs on post-module-load mode changes, as well as at module init time > and new bond creation time, and in the latter two scenarios, it is > running prior to register_netdevice() having been called and > subsequently filling in wanted_features. The empty wanted_features led > to features also getting emptied out, which was definitely not the > intended behavior, so prevent that from happening. > > Originally, I'd hoped to stop adjusting wanted_features at all in the > bonding driver, as it's documented as being something only the network > core should touch, but we actually do need to do this to properly update > both the features and wanted_features fields when changing the bond type, > or we get to a situation where ethtool sees: > > esp-hw-offload: off [requested on] > > I do think we should be using netdev_update_features instead of > netdev_change_features here though, so we only send notifiers when the > features actually changed. > > v2: rework based on further testing and suggestions from ivecera > v3: add helper function, remove goto, fix problem description > > Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") > Reported-by: Ivan Vecera > Suggested-by: Ivan Vecera > Cc: Jay Vosburgh > Cc: Veaceslav Falico > Cc: Andy Gospodarek > Cc: "David S. Miller" > Cc: Jakub Kicinski > Cc: Thomas Davis > Cc: net...@vger.kernel.org > Signed-off-by: Jarod Wilson > --- > drivers/net/bonding/bond_main.c| 10 -- > drivers/net/bonding/bond_options.c | 19 ++- > 2 files changed, 18 insertions(+), 11 deletions(-) > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 47afc5938c26..7905534a763b 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -4747,15 +4747,13 @@ void bond_setup(struct net_device *bond_dev) > NETIF_F_HW_VLAN_CTAG_FILTER; > > bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4; > -#ifdef CONFIG_XFRM_OFFLOAD > - bond_dev->hw_features |= BOND_XFRM_FEATURES; > -#endif /* CONFIG_XFRM_OFFLOAD */ > bond_dev->features |= bond_dev->hw_features; > bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; > #ifdef CONFIG_XFRM_OFFLOAD > - /* Disable XFRM features if this isn't an active-backup config */ > - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) > - bond_dev->features &= ~BOND_XFRM_FEATURES; > + bond_dev->hw_features |= BOND_XFRM_FEATURES; > + /* Only enable XFRM features if this is an active-backup config */ > + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) > + bond_dev->features |= BOND_XFRM_FEATURES; > #endif /* CONFIG_XFRM_OFFLOAD */ > } > > diff --git a/drivers/net/bonding/bond_options.c > b/drivers/net/bonding/bond_options.c > index 9abfaae1c6f7..1ae0e5ab8c67 100644 > --- a/drivers/net/bonding/bond_options.c > +++ b/drivers/net/bonding/bond_options.c > @@ -745,6 +745,18 @@ const struct bond_option *bond_opt_get(unsigned int > option) > return &bond_opts[option]; > } > > +#ifdef CONFIG_XFRM_OFFLOAD > +static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode) > +{ > + if (mode == BOND_MODE_ACTIVEBACKUP) > + bond_dev->wanted_features |= BOND_XFRM_FEATURES; > + else > + bond_dev->wanted_features &= ~BOND_XFRM_FEATURES; > + > + netdev_update_features(bond_dev); > +} > +#endif /* CONFIG_XFRM_OFFLOAD */ > + > static int bond_option_mode_set(struct bonding *bond, > const struct bond_opt_value *newval) > { > @@ -768,11 +780,8 @@ static int bond_option_mode_set(struct bonding *bond, > bond->params.tlb_dynamic_lb = 1; > > #ifdef CONFIG_XFRM_OFFLOAD > - if (newval->value == BOND_MODE_ACTIVEBACKUP) > - bond->dev->wanted_features |= BOND_XFRM_FEATURES; > - else > - bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; > - netdev_change_features(bond->dev); > + if (bond->dev->reg_state == NETREG_REGISTERED) > + bond_set_xfrm_features(bond->dev, newval->value); > #endif /* CONFIG_XFRM_OFFLOAD */ nit: let's narrow down the ifdef-enery no need for the ifdef here,
Re: [PATCH v2 1/2] KVM: arm64: Some fixes of PV-time interface document
On 2020-08-17 12:07, Keqian Zhu wrote: Rename PV_FEATURES to PV_TIME_FEATURES. Signed-off-by: Keqian Zhu Reviewed-by: Andrew Jones Reviewed-by: Steven Price --- Documentation/virt/kvm/arm/pvtime.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/virt/kvm/arm/pvtime.rst b/Documentation/virt/kvm/arm/pvtime.rst index 687b60d..94bffe2 100644 --- a/Documentation/virt/kvm/arm/pvtime.rst +++ b/Documentation/virt/kvm/arm/pvtime.rst @@ -3,7 +3,7 @@ Paravirtualized time support for arm64 == -Arm specification DEN0057/A defines a standard for paravirtualised time +Arm specification DEN0057/A defines a standard for paravirtualized time support for AArch64 guests: nit: I do object to this change (some of us are British! ;-). M. -- Jazz is not dead. It just smells funny...
[PATCH 0/1] Fix for a recent regression in kvm/queue (guest using 100% cpu time)
I did a quick bisect yesterday after noticing that my VMs started to take 100% cpu time. Looks like we don't ignore SIPIs that are received while the CPU isn't waiting for them, and that makes KVM think that CPU always has pending events which makes it never enter an idle state. Best regards, Maxim Levitsky Maxim Levitsky (1): KVM: x86: ignore SIPIs that are received while not in wait-for-sipi state arch/x86/kvm/lapic.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) -- 2.26.2
[RFC PATCH 00/10] Reduce time complexity of select_idle_sibling
This is an early prototype that has not been tested heavily. While parts of it may stand on its own, the motivation to release early is Aubrey Li's series on using an idle cpumask to optimise the search and Barry Song's series on representing clusters on die. The series is based on tip/sched/core rebased to 5.10-rc6. Patches 1-2 add schedstats to track the search efficiency of select_idle_sibling. They can be dropped from the final version but are useful when looking at select_idle_sibling in general. MMTests can already parse the stats and generate useful data including graphs over time. Patch 3 kills SIS_AVG_CPU but is partially reintroduced later in the context of SIS_PROP. Patch 4 notes that select_idle_core() can find an idle CPU that is not a free core yet it is ignored and a second search is conducted in select_idle_cpu() which is wasteful. Note that this patch will definitely change in the final version. Patch 5 adjusts p->recent_used_cpu so that it has a higher success rate and avoids searching the domain in some cases. Patch 6 notes that select_idle_* always starts with a CPU that is definitely not idle and fixes that. Patch 7 notes that SIS_PROP is only partially accounting for search costs. While this might be accidentally beneficial, it makes it much harder to reason about the effectiveness of SIS_PROP. Patch 8 uses similar logic to SIS_AVG_CPU but in the context of SIS_PROP to throttle the search depth. Patches 9 and 10 are stupid in the context of this series. They are included even though it makes no sense to use SIS_PROP logic in select_idle_core() as it already has throttling logic. The point is to illustrate that the select_idle_mask can be initialised at the start of a domain search used to mask out CPUs that have already been visited. In the context of Aubrey's and Barry's work, select_idle_mask would be initialised *after* select_idle_core as select_idle_core uses select_idle_mask for its own purposes. In Aubrey's case, the next step would be to scan idle_cpus_span as those CPUs may still be idle and bias the search towards likely idle candidates. If that fails, select_idle_mask clears all the bits set in idle_cpus_span and then scans the remainder. Similar observations apply to Barry's work, scan the local domain first, mask out those bits then scan the remaining CPUs in the cluster. The final version of this series will drop patches 1-2 unless there is demand and definitely drop patches 9-10. However, all 4 patches may be useful in the context of Aubrey's and Barry's work. Patches 1-2 would give more precise results on exactly how much they are improving "SIS Domain Search Efficiency" which may be more illustrative than just the headline performance figures of a given workload. The final version of this series will also adjust patch 4. If select_idle_core() runs at all then it definitely should return a CPU -- either an idle CPU or the target as it has already searched the entire domain and no further searching should be conducted. Barry might change that back so that a cluster can be scanned but it would be done in the context of the cluster series. -- 2.26.2
[PATCH AUTOSEL 4.19 01/14] iwlwifi: pcie: limit memory read spin time
From: Johannes Berg [ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ] When we read device memory, we lock a spinlock, write the address we want to read from the device and then spin in a loop reading the data in 32-bit quantities from another register. As the description makes clear, this is rather inefficient, incurring a PCIe bus transaction for every read. In a typical device today, we want to read 786k SMEM if it crashes, leading to 192k register reads. Occasionally, we've seen the whole loop take over 20 seconds and then triggering the soft lockup detector. Clearly, it is unreasonable to spin here for such extended periods of time. To fix this, break the loop down into an outer and an inner loop, and break out of the inner loop if more than half a second elapsed. To avoid too much overhead, check for that only every 128 reads, though there's no particular reason for that number. Then, unlock and relock to obtain NIC access again, reprogram the start address and continue. This will keep (interrupt) latencies on the CPU down to a reasonable time. Signed-off-by: Johannes Berg Signed-off-by: Mordechay Goodstein Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid Signed-off-by: Sasha Levin --- .../net/wireless/intel/iwlwifi/pcie/trans.c | 36 ++- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index 24da496151353..f48c7cac122e9 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -2121,18 +2121,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr, void *buf, int dwords) { unsigned long flags; - int offs, ret = 0; + int offs = 0; u32 *vals = buf; - if (iwl_trans_grab_nic_access(trans, &flags)) { - iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr); - for (offs = 0; offs < dwords; offs++) - vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT); - iwl_trans_release_nic_access(trans, &flags); - } else { - ret = -EBUSY; + while (offs < dwords) { + /* limit the time we spin here under lock to 1/2s */ + ktime_t timeout = ktime_add_us(ktime_get(), 500 * USEC_PER_MSEC); + + if (iwl_trans_grab_nic_access(trans, &flags)) { + iwl_write32(trans, HBUS_TARG_MEM_RADDR, + addr + 4 * offs); + + while (offs < dwords) { + vals[offs] = iwl_read32(trans, + HBUS_TARG_MEM_RDAT); + offs++; + + /* calling ktime_get is expensive so +* do it once in 128 reads +*/ + if (offs % 128 == 0 && ktime_after(ktime_get(), + timeout)) + break; + } + iwl_trans_release_nic_access(trans, &flags); + } else { + return -EBUSY; + } } - return ret; + + return 0; } static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr, -- 2.27.0
[PATCH AUTOSEL 4.14 1/9] iwlwifi: pcie: limit memory read spin time
From: Johannes Berg [ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ] When we read device memory, we lock a spinlock, write the address we want to read from the device and then spin in a loop reading the data in 32-bit quantities from another register. As the description makes clear, this is rather inefficient, incurring a PCIe bus transaction for every read. In a typical device today, we want to read 786k SMEM if it crashes, leading to 192k register reads. Occasionally, we've seen the whole loop take over 20 seconds and then triggering the soft lockup detector. Clearly, it is unreasonable to spin here for such extended periods of time. To fix this, break the loop down into an outer and an inner loop, and break out of the inner loop if more than half a second elapsed. To avoid too much overhead, check for that only every 128 reads, though there's no particular reason for that number. Then, unlock and relock to obtain NIC access again, reprogram the start address and continue. This will keep (interrupt) latencies on the CPU down to a reasonable time. Signed-off-by: Johannes Berg Signed-off-by: Mordechay Goodstein Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid Signed-off-by: Sasha Levin --- .../net/wireless/intel/iwlwifi/pcie/trans.c | 36 ++- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index 8a074a516fb26..910edd034fe3a 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -1927,18 +1927,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr, void *buf, int dwords) { unsigned long flags; - int offs, ret = 0; + int offs = 0; u32 *vals = buf; - if (iwl_trans_grab_nic_access(trans, &flags)) { - iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr); - for (offs = 0; offs < dwords; offs++) - vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT); - iwl_trans_release_nic_access(trans, &flags); - } else { - ret = -EBUSY; + while (offs < dwords) { + /* limit the time we spin here under lock to 1/2s */ + ktime_t timeout = ktime_add_us(ktime_get(), 500 * USEC_PER_MSEC); + + if (iwl_trans_grab_nic_access(trans, &flags)) { + iwl_write32(trans, HBUS_TARG_MEM_RADDR, + addr + 4 * offs); + + while (offs < dwords) { + vals[offs] = iwl_read32(trans, + HBUS_TARG_MEM_RDAT); + offs++; + + /* calling ktime_get is expensive so +* do it once in 128 reads +*/ + if (offs % 128 == 0 && ktime_after(ktime_get(), + timeout)) + break; + } + iwl_trans_release_nic_access(trans, &flags); + } else { + return -EBUSY; + } } - return ret; + + return 0; } static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr, -- 2.27.0
[PATCH AUTOSEL 4.9 1/5] iwlwifi: pcie: limit memory read spin time
From: Johannes Berg [ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ] When we read device memory, we lock a spinlock, write the address we want to read from the device and then spin in a loop reading the data in 32-bit quantities from another register. As the description makes clear, this is rather inefficient, incurring a PCIe bus transaction for every read. In a typical device today, we want to read 786k SMEM if it crashes, leading to 192k register reads. Occasionally, we've seen the whole loop take over 20 seconds and then triggering the soft lockup detector. Clearly, it is unreasonable to spin here for such extended periods of time. To fix this, break the loop down into an outer and an inner loop, and break out of the inner loop if more than half a second elapsed. To avoid too much overhead, check for that only every 128 reads, though there's no particular reason for that number. Then, unlock and relock to obtain NIC access again, reprogram the start address and continue. This will keep (interrupt) latencies on the CPU down to a reasonable time. Signed-off-by: Johannes Berg Signed-off-by: Mordechay Goodstein Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid Signed-off-by: Sasha Levin --- .../net/wireless/intel/iwlwifi/pcie/trans.c | 36 ++- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index e7b873018dca6..e1287c3421165 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -1904,18 +1904,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr, void *buf, int dwords) { unsigned long flags; - int offs, ret = 0; + int offs = 0; u32 *vals = buf; - if (iwl_trans_grab_nic_access(trans, &flags)) { - iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr); - for (offs = 0; offs < dwords; offs++) - vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT); - iwl_trans_release_nic_access(trans, &flags); - } else { - ret = -EBUSY; + while (offs < dwords) { + /* limit the time we spin here under lock to 1/2s */ + ktime_t timeout = ktime_add_us(ktime_get(), 500 * USEC_PER_MSEC); + + if (iwl_trans_grab_nic_access(trans, &flags)) { + iwl_write32(trans, HBUS_TARG_MEM_RADDR, + addr + 4 * offs); + + while (offs < dwords) { + vals[offs] = iwl_read32(trans, + HBUS_TARG_MEM_RDAT); + offs++; + + /* calling ktime_get is expensive so +* do it once in 128 reads +*/ + if (offs % 128 == 0 && ktime_after(ktime_get(), + timeout)) + break; + } + iwl_trans_release_nic_access(trans, &flags); + } else { + return -EBUSY; + } } - return ret; + + return 0; } static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr, -- 2.27.0
[PATCH AUTOSEL 5.4 01/23] iwlwifi: pcie: limit memory read spin time
From: Johannes Berg [ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ] When we read device memory, we lock a spinlock, write the address we want to read from the device and then spin in a loop reading the data in 32-bit quantities from another register. As the description makes clear, this is rather inefficient, incurring a PCIe bus transaction for every read. In a typical device today, we want to read 786k SMEM if it crashes, leading to 192k register reads. Occasionally, we've seen the whole loop take over 20 seconds and then triggering the soft lockup detector. Clearly, it is unreasonable to spin here for such extended periods of time. To fix this, break the loop down into an outer and an inner loop, and break out of the inner loop if more than half a second elapsed. To avoid too much overhead, check for that only every 128 reads, though there's no particular reason for that number. Then, unlock and relock to obtain NIC access again, reprogram the start address and continue. This will keep (interrupt) latencies on the CPU down to a reasonable time. Signed-off-by: Johannes Berg Signed-off-by: Mordechay Goodstein Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid Signed-off-by: Sasha Levin --- .../net/wireless/intel/iwlwifi/pcie/trans.c | 36 ++- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index c76d26708e659..ef5a8ecabc60a 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -2178,18 +2178,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr, void *buf, int dwords) { unsigned long flags; - int offs, ret = 0; + int offs = 0; u32 *vals = buf; - if (iwl_trans_grab_nic_access(trans, &flags)) { - iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr); - for (offs = 0; offs < dwords; offs++) - vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT); - iwl_trans_release_nic_access(trans, &flags); - } else { - ret = -EBUSY; + while (offs < dwords) { + /* limit the time we spin here under lock to 1/2s */ + ktime_t timeout = ktime_add_us(ktime_get(), 500 * USEC_PER_MSEC); + + if (iwl_trans_grab_nic_access(trans, &flags)) { + iwl_write32(trans, HBUS_TARG_MEM_RADDR, + addr + 4 * offs); + + while (offs < dwords) { + vals[offs] = iwl_read32(trans, + HBUS_TARG_MEM_RDAT); + offs++; + + /* calling ktime_get is expensive so +* do it once in 128 reads +*/ + if (offs % 128 == 0 && ktime_after(ktime_get(), + timeout)) + break; + } + iwl_trans_release_nic_access(trans, &flags); + } else { + return -EBUSY; + } } - return ret; + + return 0; } static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr, -- 2.27.0
[PATCH AUTOSEL 5.9 03/39] iwlwifi: pcie: limit memory read spin time
From: Johannes Berg [ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ] When we read device memory, we lock a spinlock, write the address we want to read from the device and then spin in a loop reading the data in 32-bit quantities from another register. As the description makes clear, this is rather inefficient, incurring a PCIe bus transaction for every read. In a typical device today, we want to read 786k SMEM if it crashes, leading to 192k register reads. Occasionally, we've seen the whole loop take over 20 seconds and then triggering the soft lockup detector. Clearly, it is unreasonable to spin here for such extended periods of time. To fix this, break the loop down into an outer and an inner loop, and break out of the inner loop if more than half a second elapsed. To avoid too much overhead, check for that only every 128 reads, though there's no particular reason for that number. Then, unlock and relock to obtain NIC access again, reprogram the start address and continue. This will keep (interrupt) latencies on the CPU down to a reasonable time. Signed-off-by: Johannes Berg Signed-off-by: Mordechay Goodstein Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid Signed-off-by: Sasha Levin --- .../net/wireless/intel/iwlwifi/pcie/trans.c | 36 ++- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index e5160d6208688..6393e895f95c6 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -2155,18 +2155,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr, void *buf, int dwords) { unsigned long flags; - int offs, ret = 0; + int offs = 0; u32 *vals = buf; - if (iwl_trans_grab_nic_access(trans, &flags)) { - iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr); - for (offs = 0; offs < dwords; offs++) - vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT); - iwl_trans_release_nic_access(trans, &flags); - } else { - ret = -EBUSY; + while (offs < dwords) { + /* limit the time we spin here under lock to 1/2s */ + ktime_t timeout = ktime_add_us(ktime_get(), 500 * USEC_PER_MSEC); + + if (iwl_trans_grab_nic_access(trans, &flags)) { + iwl_write32(trans, HBUS_TARG_MEM_RADDR, + addr + 4 * offs); + + while (offs < dwords) { + vals[offs] = iwl_read32(trans, + HBUS_TARG_MEM_RDAT); + offs++; + + /* calling ktime_get is expensive so +* do it once in 128 reads +*/ + if (offs % 128 == 0 && ktime_after(ktime_get(), + timeout)) + break; + } + iwl_trans_release_nic_access(trans, &flags); + } else { + return -EBUSY; + } } - return ret; + + return 0; } static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr, -- 2.27.0
[PATCH RFC 3/3] RISC-V: KVM: Implement guest time scaling
When time frequency needs to scale, RDTIME/RDTIMEH instruction in guest doesn't work correctly. Because it still uses the host's time frequency. To read correct time, the RDTIME/RDTIMEH instruction executed by guest should trap to HS-mode. The TM bit of HCOUNTEREN CSR could control whether these instructions are trapped to HS-mode. Therefore, we can implement guest time scaling by setting TM bit in kvm_riscv_vcpu_timer_restore() and emulating RDTIME/RDTIMEH instruction in system_opcode_insn(). Signed-off-by: Yifei Jiang Signed-off-by: Yipeng Yin --- arch/riscv/include/asm/csr.h| 3 +++ arch/riscv/include/asm/kvm_vcpu_timer.h | 1 + arch/riscv/kvm/vcpu_exit.c | 35 + arch/riscv/kvm/vcpu_timer.c | 10 +++ 4 files changed, 49 insertions(+) diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h index bc825693e0e3..a4d8ca76cf1d 100644 --- a/arch/riscv/include/asm/csr.h +++ b/arch/riscv/include/asm/csr.h @@ -241,6 +241,9 @@ #define IE_TIE (_AC(0x1, UL) << RV_IRQ_TIMER) #define IE_EIE (_AC(0x1, UL) << RV_IRQ_EXT) +/* The counteren flag */ +#define CE_TM 1 + #ifndef __ASSEMBLY__ #define csr_swap(csr, val) \ diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h b/arch/riscv/include/asm/kvm_vcpu_timer.h index 41b5503de9e4..61384eb57334 100644 --- a/arch/riscv/include/asm/kvm_vcpu_timer.h +++ b/arch/riscv/include/asm/kvm_vcpu_timer.h @@ -41,6 +41,7 @@ int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu); int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu); void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu); int kvm_riscv_guest_timer_init(struct kvm *kvm); +u64 kvm_riscv_read_guest_time(struct kvm_vcpu *vcpu); static inline bool kvm_riscv_need_scale(struct kvm_guest_timer *gt) { diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c index f054406792a6..4beb9d25049a 100644 --- a/arch/riscv/kvm/vcpu_exit.c +++ b/arch/riscv/kvm/vcpu_exit.c @@ -18,6 +18,10 @@ #define INSN_MASK_WFI 0xff00 #define INSN_MATCH_WFI 0x1050 +#define INSN_MASK_RDTIME 0xfff03000 +#define INSN_MATCH_RDTIME 0xc0102000 +#define INSN_MASK_RDTIMEH 0xfff03000 +#define INSN_MATCH_RDTIMEH 0xc8102000 #define INSN_MATCH_LB 0x3 #define INSN_MASK_LB 0x707f @@ -138,6 +142,34 @@ static int truly_illegal_insn(struct kvm_vcpu *vcpu, return 1; } +static int system_opcode_insn_rdtime(struct kvm_vcpu *vcpu, +struct kvm_run *run, +ulong insn) +{ +#ifdef CONFIG_64BIT + if ((insn & INSN_MASK_RDTIME) == INSN_MATCH_RDTIME) { + u64 guest_time = kvm_riscv_read_guest_time(vcpu); + SET_RD(insn, &vcpu->arch.guest_context, guest_time); + vcpu->arch.guest_context.sepc += INSN_LEN(insn); + return 1; + } +#else + if ((insn & INSN_MASK_RDTIME) == INSN_MATCH_RDTIME) { + u64 guest_time = kvm_riscv_read_guest_time(vcpu); + SET_RD(insn, &vcpu->arch.guest_context, (u32)guest_time); + vcpu->arch.guest_context.sepc += INSN_LEN(insn); + return 1; + } + if ((insn & INSN_MASK_RDTIMEH) == INSN_MATCH_RDTIMEH) { + u64 guest_time = kvm_riscv_read_guest_time(vcpu); + SET_RD(insn, &vcpu->arch.guest_context, (u32)(guest_time >> 32)); + vcpu->arch.guest_context.sepc += INSN_LEN(insn); + return 1; + } +#endif + return 0; +} + static int system_opcode_insn(struct kvm_vcpu *vcpu, struct kvm_run *run, ulong insn) @@ -154,6 +186,9 @@ static int system_opcode_insn(struct kvm_vcpu *vcpu, return 1; } + if (system_opcode_insn_rdtime(vcpu, run, insn)) + return 1; + return truly_illegal_insn(vcpu, run, insn); } diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c index 2d203660a7e9..2040dbe57ee6 100644 --- a/arch/riscv/kvm/vcpu_timer.c +++ b/arch/riscv/kvm/vcpu_timer.c @@ -49,6 +49,11 @@ static u64 kvm_riscv_current_cycles(struct kvm_guest_timer *gt) return kvm_riscv_scale_time(gt, host_time) + gt->time_delta; } +u64 kvm_riscv_read_guest_time(struct kvm_vcpu *vcpu) +{ + return kvm_riscv_current_cycles(&vcpu->kvm->arch.timer); +} + static u64 kvm_riscv_delta_cycles2ns(u64 cycles, struct kvm_guest_timer *gt, struct kvm_vcpu_timer *t) @@ -241,6 +246,11 @@ void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu) csr_write(CSR_HTIMEDELTA, (u32)(gt->time_delta)); csr_write(CSR_HTIMEDELTAH, (u32)(gt->time_delta >> 32)); #endif
[PATCH RFC 0/3] Implement guest time scaling in RISC-V KVM
This series implements guest time scaling based on RDTIME instruction emulation so that we can allow migrating Guest/VM across Hosts with different time frequency. Why not through para-virt. From arm's experience[1], para-virt implementation doesn't really solve the problem for the following two main reasons: - RDTIME not only be used in linux, but also in firmware and userspace. - It is difficult to be compatible with nested virtualization. [1] https://lore.kernel.org/patchwork/cover/1288153/ Yifei Jiang (3): RISC-V: KVM: Change the method of calculating cycles to nanoseconds RISC-V: KVM: Support dynamic time frequency from userspace RISC-V: KVM: Implement guest time scaling arch/riscv/include/asm/csr.h| 3 ++ arch/riscv/include/asm/kvm_vcpu_timer.h | 13 +-- arch/riscv/kvm/vcpu_exit.c | 35 + arch/riscv/kvm/vcpu_timer.c | 51 ++--- 4 files changed, 93 insertions(+), 9 deletions(-) -- 2.19.1
[PATCH RFC 2/3] RISC-V: KVM: Support dynamic time frequency from userspace
This patch implements KVM_S/GET_ONE_REG of time frequency to support setting dynamic time frequency from userspace. When the time frequency specified by userspace is inconsistent with host 'riscv_timebase', it will use scale_mult and scale_shift to calculate guest scaling time. Signed-off-by: Yifei Jiang Signed-off-by: Yipeng Yin --- arch/riscv/include/asm/kvm_vcpu_timer.h | 9 ++ arch/riscv/kvm/vcpu_timer.c | 40 + 2 files changed, 44 insertions(+), 5 deletions(-) diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h b/arch/riscv/include/asm/kvm_vcpu_timer.h index 87e00d878999..41b5503de9e4 100644 --- a/arch/riscv/include/asm/kvm_vcpu_timer.h +++ b/arch/riscv/include/asm/kvm_vcpu_timer.h @@ -12,6 +12,10 @@ #include struct kvm_guest_timer { + u64 frequency; + bool need_scale; + u64 scale_mult; + u64 scale_shift; /* Time delta value */ u64 time_delta; }; @@ -38,4 +42,9 @@ int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu); void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu); int kvm_riscv_guest_timer_init(struct kvm *kvm); +static inline bool kvm_riscv_need_scale(struct kvm_guest_timer *gt) +{ + return gt->need_scale; +} + #endif diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c index f6b35180199a..2d203660a7e9 100644 --- a/arch/riscv/kvm/vcpu_timer.c +++ b/arch/riscv/kvm/vcpu_timer.c @@ -15,9 +15,38 @@ #include #include +#define SCALE_SHIFT_VALUE 48 +#define SCALE_TOLERANCE_HZ 1000 + +static void kvm_riscv_set_time_freq(struct kvm_guest_timer *gt, u64 freq) +{ + /* + * Guest time frequency and Host time frequency are identical +* if the error between them is limited within SCALE_TOLERANCE_HZ. +*/ + u64 diff = riscv_timebase > freq ? + riscv_timebase - freq : freq - riscv_timebase; + gt->need_scale = (diff >= SCALE_TOLERANCE_HZ); + if (gt->need_scale) { + gt->scale_shift = SCALE_SHIFT_VALUE; + gt->scale_mult = mul_u64_u32_div(1ULL << gt->scale_shift, +freq, riscv_timebase); + } + gt->frequency = freq; +} + +static u64 kvm_riscv_scale_time(struct kvm_guest_timer *gt, u64 time) +{ + if (kvm_riscv_need_scale(gt)) + return mul_u64_u64_shr(time, gt->scale_mult, gt->scale_shift); + + return time; +} + static u64 kvm_riscv_current_cycles(struct kvm_guest_timer *gt) { - return get_cycles64() + gt->time_delta; + u64 host_time = get_cycles64(); + return kvm_riscv_scale_time(gt, host_time) + gt->time_delta; } static u64 kvm_riscv_delta_cycles2ns(u64 cycles, @@ -33,7 +62,7 @@ static u64 kvm_riscv_delta_cycles2ns(u64 cycles, cycles_delta = cycles - cycles_now; else cycles_delta = 0; - delta_ns = mul_u64_u64_div_u64(cycles_delta, NSEC_PER_SEC, riscv_timebase); + delta_ns = mul_u64_u64_div_u64(cycles_delta, NSEC_PER_SEC, gt->frequency); local_irq_restore(flags); return delta_ns; @@ -106,7 +135,7 @@ int kvm_riscv_vcpu_get_reg_timer(struct kvm_vcpu *vcpu, switch (reg_num) { case KVM_REG_RISCV_TIMER_REG(frequency): - reg_val = riscv_timebase; + reg_val = gt->frequency; break; case KVM_REG_RISCV_TIMER_REG(time): reg_val = kvm_riscv_current_cycles(gt); @@ -150,10 +179,10 @@ int kvm_riscv_vcpu_set_reg_timer(struct kvm_vcpu *vcpu, switch (reg_num) { case KVM_REG_RISCV_TIMER_REG(frequency): - ret = -EOPNOTSUPP; + kvm_riscv_set_time_freq(gt, reg_val); break; case KVM_REG_RISCV_TIMER_REG(time): - gt->time_delta = reg_val - get_cycles64(); + gt->time_delta = reg_val - kvm_riscv_scale_time(gt, get_cycles64()); break; case KVM_REG_RISCV_TIMER_REG(compare): t->next_cycles = reg_val; @@ -219,6 +248,7 @@ int kvm_riscv_guest_timer_init(struct kvm *kvm) struct kvm_guest_timer *gt = &kvm->arch.timer; gt->time_delta = -get_cycles64(); + gt->frequency = riscv_timebase; return 0; } -- 2.19.1
[PATCH net-next v5 8/9] net: dsa: microchip: ksz9477: remaining hardware time stamping support
Add data path routines required for TX hardware time stamping. PTP mode is enabled depending on the filter setup (changes tail tag). TX time stamps are reported via an interrupt / device registers whilst RX time stamps are reported via an additional tail tag. One step TX time stamping of PDelay_Resp requires the RX time stamp from the associated PDelay_Req message. The user space PTP stack assumes that the RX time stamp has already been subtracted from the PDelay_Req correction field (as done by the ZHAW InES PTP time stamping core). It will echo back the value of the correction field in the PDelay_Resp message. In order to be compatible to this already established interface, the KSZ9563 code emulates this behavior. When processing the PDelay_Resp message, the time stamp is moved back from the correction field to the tail tag, as the hardware generates an invalid UDP checksum if this field is negative. Of course, the UDP checksums (if any) have to be corrected after this (for both directions). Everything has been tested on a Microchip KSZ9563 switch. Signed-off-by: Christian Eggers --- Changes in v5: -- - Fix compile error reported by kernel test robot (NET_DSA_TAG_KSZ must select NET_PTP_CLASSIFY) Changes in v4: -- - s/low active/active low/ - 80 chars per line - Use IEEE 802.1AS mode (to suppress forwarding of PDelay messages) - Enable/disable hardware timestaping at runtime (port_hwtstamp_set) - Use mutex in port_hwtstamp_set - Don't use port specific struct hwtstamp_config - removed #ifdefs from tag_ksz.c - Set port's tx_latency and rx_latency to 0 - added include/linux/dsa/ksz_common.h to MAINTAINERS On Saturday, 21 November 2020, 02:26:11 CET, Vladimir Oltean wrote: > If you don't like the #ifdef's, I am not in love with them either. But > maybe Christian is just optimizing too aggressively, and doesn't actually > need to put those #ifdef's there and provide stub implementations, but > could actually just leave the ksz9477_rcv_timestamp and ksz9477_xmit_timestamp > always compiled-in, and "dead at runtime" in the case there is no PTP. I removed the #ifdefs. > [...] > The thing is, ptp4l already has ingressLatency and egressLatency > settings, and I would not be surprised if those config options would get > extended to cover values at multiple link speeds. > > In the general case, the ksz9477 MAC could be attached to any external > PHY, having its own propagation delay characteristics, or any number of > other things that cause clock domain crossings. I'm not sure how feasible > it is for the kernel to abstract this away completely, and adjust > timestamps automatically based on any and all combinations of MAC and > PHY. Maybe this is just wishful thinking. > > Oh, and by the way, Christian, I'm not even sure if you aren't in fact > just beating around the bush with these tstamp_rx_latency_ns and > tstamp_tx_latency_ns values? I mean, the switch adds the latency value > to the timestamps. And you, from the driver, read the value of the > register, so you can subtract the value from the timestamp, to > compensate for its correction. So, all in all, there is no net latency > compensation seen by the outside world?! If that is the case, can't you > just set the latency registers to zero, do your compensation from the > application stack and call it a day? At first I thought that I have to move these values to ptp4l.conf. But after setting the hardware registers to zero, it turned out, that I also have to use zero values in ptp4l.conf. So you are right. On Monday, 23 November 2020, 13:09:38 CET, Vladimir Oltean wrote: > On Mon, Nov 23, 2020 at 12:32:33PM +0100, Christian Eggers wrote: > > please let me know, how I shall finally implement this. Enabling the PTP > > mode > > on the switch and sending the extra 4 byte tail on tx must be done in sync. > > Currently, both simply depends on the PTP define. > > I, too, would prefer that the reconfiguration is done at ioctl time. > Distributions typically enable whatever kernel config options they can. > However, for users, the behavior should not change. Therefore the tail > tag should remain small even though the PTP kernel config option is > enabled, as long as hardware timestamping has not been explicitly > enabled. I moved this to port_hwtstamp_set. But I am not sure whether enabling PTP mode should depend on tx_type or rx_filter. > [...] > When forwarding what packet? What profile are you testing with? > What commands do you run? > A P2P capable switch should not forward Peer delay messages. With the 802.1AS settings, no SYNC/Announce messages are forwarded anymore. Peer delay messages have never been forwarded. MAINTAINERS | 1 + drivers/net/dsa/microchip/ksz9477_main.c | 12 +- drive
[PATCH net-next v5 7/9] net: dsa: microchip: ksz9477: initial hardware time stamping support
Add control routines required for TX hardware time stamping. The KSZ9563 only supports one step time stamping (HWTSTAMP_TX_ONESTEP_P2P), which requires linuxptp-2.0 or later. Currently, only P2P delay measurement is supported. See patchwork discussion and comments in ksz9477_ptp_init() for details: https://patchwork.ozlabs.org/project/netdev/patch/20201019172435.4416-8-cegg...@arri.de/ Signed-off-by: Christian Eggers Reviewed-by: Vladimir Oltean --- Changes in v4: -- - Remove useless case statement - Reviewed-by: Vladimir Oltean drivers/net/dsa/microchip/ksz9477_main.c | 6 + drivers/net/dsa/microchip/ksz9477_ptp.c | 186 +++ drivers/net/dsa/microchip/ksz9477_ptp.h | 21 +++ drivers/net/dsa/microchip/ksz_common.h | 4 + 4 files changed, 217 insertions(+) diff --git a/drivers/net/dsa/microchip/ksz9477_main.c b/drivers/net/dsa/microchip/ksz9477_main.c index 2cb33e9beb4c..0ade40bf27c7 100644 --- a/drivers/net/dsa/microchip/ksz9477_main.c +++ b/drivers/net/dsa/microchip/ksz9477_main.c @@ -1387,6 +1387,7 @@ static const struct dsa_switch_ops ksz9477_switch_ops = { .phy_read = ksz9477_phy_read16, .phy_write = ksz9477_phy_write16, .phylink_mac_link_down = ksz_mac_link_down, + .get_ts_info= ksz9477_ptp_get_ts_info, .port_enable= ksz_enable_port, .get_strings= ksz9477_get_strings, .get_ethtool_stats = ksz_get_ethtool_stats, @@ -1407,6 +1408,11 @@ static const struct dsa_switch_ops ksz9477_switch_ops = { .port_mdb_del = ksz9477_port_mdb_del, .port_mirror_add= ksz9477_port_mirror_add, .port_mirror_del= ksz9477_port_mirror_del, + .port_hwtstamp_get = ksz9477_ptp_port_hwtstamp_get, + .port_hwtstamp_set = ksz9477_ptp_port_hwtstamp_set, + .port_txtstamp = NULL, + /* never defer rx delivery, tstamping is done via tail tagging */ + .port_rxtstamp = NULL, }; static u32 ksz9477_get_port_addr(int port, int offset) diff --git a/drivers/net/dsa/microchip/ksz9477_ptp.c b/drivers/net/dsa/microchip/ksz9477_ptp.c index 0ffc4504a290..a1ca1923ec0c 100644 --- a/drivers/net/dsa/microchip/ksz9477_ptp.c +++ b/drivers/net/dsa/microchip/ksz9477_ptp.c @@ -218,6 +218,18 @@ static int ksz9477_ptp_enable(struct ptp_clock_info *ptp, return -EOPNOTSUPP; } +static long ksz9477_ptp_do_aux_work(struct ptp_clock_info *ptp) +{ + struct ksz_device *dev = container_of(ptp, struct ksz_device, ptp_caps); + struct timespec64 ts; + + mutex_lock(&dev->ptp_mutex); + _ksz9477_ptp_gettime(dev, &ts); + mutex_unlock(&dev->ptp_mutex); + + return HZ; /* reschedule in 1 second */ +} + static int ksz9477_ptp_start_clock(struct ksz_device *dev) { u16 data; @@ -257,6 +269,54 @@ static int ksz9477_ptp_stop_clock(struct ksz_device *dev) return ksz_write16(dev, REG_PTP_CLK_CTRL, data); } +/* device attributes */ + +enum ksz9477_ptp_tcmode { + KSZ9477_PTP_TCMODE_E2E, + KSZ9477_PTP_TCMODE_P2P, +}; + +static int ksz9477_ptp_tcmode_set(struct ksz_device *dev, + enum ksz9477_ptp_tcmode tcmode) +{ + u16 data; + int ret; + + ret = ksz_read16(dev, REG_PTP_MSG_CONF1, &data); + if (ret) + return ret; + + if (tcmode == KSZ9477_PTP_TCMODE_P2P) + data |= PTP_TC_P2P; + else + data &= ~PTP_TC_P2P; + + return ksz_write16(dev, REG_PTP_MSG_CONF1, data); +} + +enum ksz9477_ptp_ocmode { + KSZ9477_PTP_OCMODE_SLAVE, + KSZ9477_PTP_OCMODE_MASTER, +}; + +static int ksz9477_ptp_ocmode_set(struct ksz_device *dev, + enum ksz9477_ptp_ocmode ocmode) +{ + u16 data; + int ret; + + ret = ksz_read16(dev, REG_PTP_MSG_CONF1, &data); + if (ret) + return ret; + + if (ocmode == KSZ9477_PTP_OCMODE_MASTER) + data |= PTP_MASTER; + else + data &= ~PTP_MASTER; + + return ksz_write16(dev, REG_PTP_MSG_CONF1, data); +} + int ksz9477_ptp_init(struct ksz_device *dev) { int ret; @@ -282,6 +342,7 @@ int ksz9477_ptp_init(struct ksz_device *dev) dev->ptp_caps.gettime64 = ksz9477_ptp_gettime; dev->ptp_caps.settime64 = ksz9477_ptp_settime; dev->ptp_caps.enable = ksz9477_ptp_enable; + dev->ptp_caps.do_aux_work = ksz9477_ptp_do_aux_work; /* Start hardware counter (will overflow after 136 years) */ ret = ksz9477_ptp_start_clock(dev); @@ -294,8 +355,31 @@ int ksz9477_ptp_init(struct ksz_device *dev) goto error_stop_clock; } + /* Currently, only P2P delay measurement is supported. Setting ocmode +* to slave will work independently of actually being master or slave. +
Re: [PATCH v2 net-next] net: ipa: fix build-time bug in ipa_hardware_config_qsb()
On Wed, 2 Dec 2020 08:15:02 -0600 Alex Elder wrote: > Jon Hunter reported observing a build bug in the IPA driver: > > https://lore.kernel.org/netdev/5b5d9d40-94d5-5dad-b861-fd9bef826...@nvidia.com > > The problem is that the QMB0 max read value set for IPA v4.5 (16) is > too large to fit in the 4-bit field. > > The actual value we want is 0, which requests that the hardware use > the maximum it is capable of. > > Reported-by: Jon Hunter > Tested-by: Jon Hunter > Signed-off-by: Alex Elder Applied, thanks!
[PATCH net v3] bonding: fix feature flag setting at init time
Don't try to adjust XFRM support flags if the bond device isn't yet registered. Bad things can currently happen when netdev_change_features() is called without having wanted_features fully filled in yet. This code runs on post-module-load mode changes, as well as at module init time and new bond creation time, and in the latter two scenarios, it is running prior to register_netdevice() having been called and subsequently filling in wanted_features. The empty wanted_features led to features also getting emptied out, which was definitely not the intended behavior, so prevent that from happening. Originally, I'd hoped to stop adjusting wanted_features at all in the bonding driver, as it's documented as being something only the network core should touch, but we actually do need to do this to properly update both the features and wanted_features fields when changing the bond type, or we get to a situation where ethtool sees: esp-hw-offload: off [requested on] I do think we should be using netdev_update_features instead of netdev_change_features here though, so we only send notifiers when the features actually changed. v2: rework based on further testing and suggestions from ivecera v3: add helper function, remove goto, fix problem description Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") Reported-by: Ivan Vecera Suggested-by: Ivan Vecera Cc: Jay Vosburgh Cc: Veaceslav Falico Cc: Andy Gospodarek Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Thomas Davis Cc: net...@vger.kernel.org Signed-off-by: Jarod Wilson --- drivers/net/bonding/bond_main.c| 10 -- drivers/net/bonding/bond_options.c | 19 ++- 2 files changed, 18 insertions(+), 11 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 47afc5938c26..7905534a763b 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4747,15 +4747,13 @@ void bond_setup(struct net_device *bond_dev) NETIF_F_HW_VLAN_CTAG_FILTER; bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4; -#ifdef CONFIG_XFRM_OFFLOAD - bond_dev->hw_features |= BOND_XFRM_FEATURES; -#endif /* CONFIG_XFRM_OFFLOAD */ bond_dev->features |= bond_dev->hw_features; bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; #ifdef CONFIG_XFRM_OFFLOAD - /* Disable XFRM features if this isn't an active-backup config */ - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) - bond_dev->features &= ~BOND_XFRM_FEATURES; + bond_dev->hw_features |= BOND_XFRM_FEATURES; + /* Only enable XFRM features if this is an active-backup config */ + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) + bond_dev->features |= BOND_XFRM_FEATURES; #endif /* CONFIG_XFRM_OFFLOAD */ } diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 9abfaae1c6f7..1ae0e5ab8c67 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -745,6 +745,18 @@ const struct bond_option *bond_opt_get(unsigned int option) return &bond_opts[option]; } +#ifdef CONFIG_XFRM_OFFLOAD +static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode) +{ + if (mode == BOND_MODE_ACTIVEBACKUP) + bond_dev->wanted_features |= BOND_XFRM_FEATURES; + else + bond_dev->wanted_features &= ~BOND_XFRM_FEATURES; + + netdev_update_features(bond_dev); +} +#endif /* CONFIG_XFRM_OFFLOAD */ + static int bond_option_mode_set(struct bonding *bond, const struct bond_opt_value *newval) { @@ -768,11 +780,8 @@ static int bond_option_mode_set(struct bonding *bond, bond->params.tlb_dynamic_lb = 1; #ifdef CONFIG_XFRM_OFFLOAD - if (newval->value == BOND_MODE_ACTIVEBACKUP) - bond->dev->wanted_features |= BOND_XFRM_FEATURES; - else - bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; - netdev_change_features(bond->dev); + if (bond->dev->reg_state == NETREG_REGISTERED) + bond_set_xfrm_features(bond->dev, newval->value); #endif /* CONFIG_XFRM_OFFLOAD */ /* don't cache arp_validate between modes */ -- 2.28.0
Re: [PATCH net v2] bonding: fix feature flag setting at init time
On Wed, Dec 2, 2020 at 3:17 PM Jay Vosburgh wrote: > > Jarod Wilson wrote: > > >On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh > >wrote: > >> > >> Jarod Wilson wrote: > >> > >> >Don't try to adjust XFRM support flags if the bond device isn't yet > >> >registered. Bad things can currently happen when netdev_change_features() > >> >is called without having wanted_features fully filled in yet. Basically, > >> >this code was racing against register_netdevice() filling in > >> >wanted_features, and when it got there first, the empty wanted_features > >> >led to features also getting emptied out, which was definitely not the > >> >intended behavior, so prevent that from happening. > >> > >> Is this an actual race? Reading Ivan's prior message, it sounds > >> like it's an ordering problem (in that bond_newlink calls > >> register_netdevice after bond_changelink). > > > >Sorry, yeah, this is not actually a race condition, just an ordering > >issue, bond_check_params() gets called at init time, which leads to > >bond_option_mode_set() being called, and does so prior to > >bond_create() running, which is where we actually call > >register_netdevice(). > > So this only happens if there's a "mode" module parameter? That > doesn't sound like the call path that Ivan described (coming in via > bond_newlink). Ah. I think there's actually two different pathways that can trigger this. The first is for bonds created at module load time, which I was describing, the second is for a new bond created via bond_newlink() after the bonding module is already loaded, as described by Ivan. Both have the problem of bond_option_mode_set() running prior to register_netdevice(). Of course, that would suggest every bond currently comes up with unintentionally neutered flags, which I neglected to catch in earlier testing and development. -- Jarod Wilson ja...@redhat.com
Re: [PATCH net v2] bonding: fix feature flag setting at init time
Jarod Wilson wrote: >On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh >wrote: >> >> Jarod Wilson wrote: >> >> >Don't try to adjust XFRM support flags if the bond device isn't yet >> >registered. Bad things can currently happen when netdev_change_features() >> >is called without having wanted_features fully filled in yet. Basically, >> >this code was racing against register_netdevice() filling in >> >wanted_features, and when it got there first, the empty wanted_features >> >led to features also getting emptied out, which was definitely not the >> >intended behavior, so prevent that from happening. >> >> Is this an actual race? Reading Ivan's prior message, it sounds >> like it's an ordering problem (in that bond_newlink calls >> register_netdevice after bond_changelink). > >Sorry, yeah, this is not actually a race condition, just an ordering >issue, bond_check_params() gets called at init time, which leads to >bond_option_mode_set() being called, and does so prior to >bond_create() running, which is where we actually call >register_netdevice(). So this only happens if there's a "mode" module parameter? That doesn't sound like the call path that Ivan described (coming in via bond_newlink). -J >> The change to bond_option_mode_set tests against reg_state, so >> presumably it wants to skip the first(?) time through, before the >> register_netdevice call; is that right? > >Correct. Later on, when the bonding driver is already loaded, and >parameter changes are made, bond_option_mode_set() gets called and if >the mode changes to or from active-backup, we do need/want this code >to run to update wanted and features flags properly. > > >-- >Jarod Wilson >ja...@redhat.com --- -Jay Vosburgh, jay.vosbu...@canonical.com
Re: [PATCH net v2] bonding: fix feature flag setting at init time
On Wed, Dec 2, 2020 at 2:23 PM Jakub Kicinski wrote: > > On Wed, 2 Dec 2020 14:03:53 -0500 Jarod Wilson wrote: > > On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski wrote: > > > > > > On Wed, 2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote: > > > > + if (bond->dev->reg_state != NETREG_REGISTERED) > > > > + goto noreg; > > > > + > > > > if (newval->value == BOND_MODE_ACTIVEBACKUP) > > > > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > > > > else > > > > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; > > > > - netdev_change_features(bond->dev); > > > > + netdev_update_features(bond->dev); > > > > +noreg: > > > > > > Why the goto? > > > > Seemed cleaner to prevent an extra level of indentation of the code > > following the goto and before the label, but I'm not that attached to > > it if it's not wanted for coding style reasons. > > Yes, please don't use gotos where a normal if statement is sufficient. > If you must avoid the indentation move the code to a helper. > > Also - this patch did not apply to net, please make sure you're > developing on the correct base. Argh, I must have been working in net-next instead of net, apologies. Okay, I'll clarify the description per what Jay pointed out and adjust the code to not include a goto, then make it on the right branch. -- Jarod Wilson ja...@redhat.com
Re: [PATCH 1/5] sched/cputime: Remove symbol exports from IRQ time accounting
On 02.12.20 12:57, Frederic Weisbecker wrote: > account_irq_enter_time() and account_irq_exit_time() are not called > from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ > cputime accounting functions called from there. > > Signed-off-by: Frederic Weisbecker > Cc: Peter Zijlstra > Cc: Tony Luck > Cc: Fenghua Yu > Cc: Michael Ellerman > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Heiko Carstens > Cc: Vasily Gorbik > Cc: Christian Borntraeger > --- > arch/s390/kernel/vtime.c | 10 +- > kernel/sched/cputime.c | 2 -- > 2 files changed, 5 insertions(+), 7 deletions(-) > > diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c > index 8df10d3c8f6c..f9f2a11958a5 100644 > --- a/arch/s390/kernel/vtime.c > +++ b/arch/s390/kernel/vtime.c > @@ -226,7 +226,7 @@ void vtime_flush(struct task_struct *tsk) > * Update process times based on virtual cpu times stored by entry.S > * to the lowcore fields user_timer, system_timer & steal_clock. > */ > -void vtime_account_irq_enter(struct task_struct *tsk) > +void vtime_account_kernel(struct task_struct *tsk) > { > u64 timer; > > @@ -245,12 +245,12 @@ void vtime_account_irq_enter(struct task_struct *tsk) > > virt_timer_forward(timer); > } > -EXPORT_SYMBOL_GPL(vtime_account_irq_enter); > - > -void vtime_account_kernel(struct task_struct *tsk) > -__attribute__((alias("vtime_account_irq_enter"))); > EXPORT_SYMBOL_GPL(vtime_account_kernel); > > +void vtime_account_irq_enter(struct task_struct *tsk) > +__attribute__((alias("vtime_account_kernel"))); > + > + One new line is enough I think. Apart from that this looks sane from an s390 perspective. Acked-by: Christian Borntraeger
Re: [PATCH net v2] bonding: fix feature flag setting at init time
On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh wrote: > > Jarod Wilson wrote: > > >Don't try to adjust XFRM support flags if the bond device isn't yet > >registered. Bad things can currently happen when netdev_change_features() > >is called without having wanted_features fully filled in yet. Basically, > >this code was racing against register_netdevice() filling in > >wanted_features, and when it got there first, the empty wanted_features > >led to features also getting emptied out, which was definitely not the > >intended behavior, so prevent that from happening. > > Is this an actual race? Reading Ivan's prior message, it sounds > like it's an ordering problem (in that bond_newlink calls > register_netdevice after bond_changelink). Sorry, yeah, this is not actually a race condition, just an ordering issue, bond_check_params() gets called at init time, which leads to bond_option_mode_set() being called, and does so prior to bond_create() running, which is where we actually call register_netdevice(). > The change to bond_option_mode_set tests against reg_state, so > presumably it wants to skip the first(?) time through, before the > register_netdevice call; is that right? Correct. Later on, when the bonding driver is already loaded, and parameter changes are made, bond_option_mode_set() gets called and if the mode changes to or from active-backup, we do need/want this code to run to update wanted and features flags properly. -- Jarod Wilson ja...@redhat.com
[tip: irq/core] sched/vtime: Consolidate IRQ time accounting
The following commit has been merged into the irq/core branch of tip: Commit-ID: 8a6a5920d3286eb0eae9f36a4ec4fc9df511eccb Gitweb: https://git.kernel.org/tip/8a6a5920d3286eb0eae9f36a4ec4fc9df511eccb Author:Frederic Weisbecker AuthorDate:Wed, 02 Dec 2020 12:57:30 +01:00 Committer: Thomas Gleixner CommitterDate: Wed, 02 Dec 2020 20:20:05 +01:00 sched/vtime: Consolidate IRQ time accounting The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE all have their own version of irq time accounting that dispatch the cputime to the appropriate index: hardirq, softirq, system, idle, guest... from an all-in-one function. Instead of having these ad-hoc versions, move the cputime destination dispatch decision to the core code and leave only the actual per-index cputime accounting to the architecture. Signed-off-by: Frederic Weisbecker Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20201202115732.27827-4-frede...@kernel.org --- arch/ia64/kernel/time.c| 20 + arch/powerpc/kernel/time.c | 56 ++--- arch/s390/kernel/vtime.c | 45 +- include/linux/vtime.h | 16 --- kernel/sched/cputime.c | 13 ++--- 5 files changed, 102 insertions(+), 48 deletions(-) diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c index 7abc5f3..733e0e3 100644 --- a/arch/ia64/kernel/time.c +++ b/arch/ia64/kernel/time.c @@ -138,12 +138,8 @@ void vtime_account_kernel(struct task_struct *tsk) struct thread_info *ti = task_thread_info(tsk); __u64 stime = vtime_delta(tsk); - if ((tsk->flags & PF_VCPU) && !irq_count()) + if (tsk->flags & PF_VCPU) ti->gtime += stime; - else if (hardirq_count()) - ti->hardirq_time += stime; - else if (in_serving_softirq()) - ti->softirq_time += stime; else ti->stime += stime; } @@ -156,6 +152,20 @@ void vtime_account_idle(struct task_struct *tsk) ti->idle_time += vtime_delta(tsk); } +void vtime_account_softirq(struct task_struct *tsk) +{ + struct thread_info *ti = task_thread_info(tsk); + + ti->softirq_time += vtime_delta(tsk); +} + +void vtime_account_hardirq(struct task_struct *tsk) +{ + struct thread_info *ti = task_thread_info(tsk); + + ti->hardirq_time += vtime_delta(tsk); +} + #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */ static irqreturn_t diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 74efe46..cf3f8db 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -311,12 +311,11 @@ static unsigned long vtime_delta_scaled(struct cpu_accounting_data *acct, return stime_scaled; } -static unsigned long vtime_delta(struct task_struct *tsk, +static unsigned long vtime_delta(struct cpu_accounting_data *acct, unsigned long *stime_scaled, unsigned long *steal_time) { unsigned long now, stime; - struct cpu_accounting_data *acct = get_accounting(tsk); WARN_ON_ONCE(!irqs_disabled()); @@ -331,29 +330,30 @@ static unsigned long vtime_delta(struct task_struct *tsk, return stime; } +static void vtime_delta_kernel(struct cpu_accounting_data *acct, + unsigned long *stime, unsigned long *stime_scaled) +{ + unsigned long steal_time; + + *stime = vtime_delta(acct, stime_scaled, &steal_time); + *stime -= min(*stime, steal_time); + acct->steal_time += steal_time; +} + void vtime_account_kernel(struct task_struct *tsk) { - unsigned long stime, stime_scaled, steal_time; struct cpu_accounting_data *acct = get_accounting(tsk); + unsigned long stime, stime_scaled; - stime = vtime_delta(tsk, &stime_scaled, &steal_time); - - stime -= min(stime, steal_time); - acct->steal_time += steal_time; + vtime_delta_kernel(acct, &stime, &stime_scaled); - if ((tsk->flags & PF_VCPU) && !irq_count()) { + if (tsk->flags & PF_VCPU) { acct->gtime += stime; #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME acct->utime_scaled += stime_scaled; #endif } else { - if (hardirq_count()) - acct->hardirq_time += stime; - else if (in_serving_softirq()) - acct->softirq_time += stime; - else - acct->stime += stime; - + acct->stime += stime; #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME acct->stime_scaled += stime_scaled; #endif @@ -366,10 +366,34 @@ void vtime_account_idle(struct task_struct *tsk) unsigned long stime, stime_scaled, steal_time; struct cpu_accounting_data *acct = get_accounting(tsk);
[tip: irq/core] sched/cputime: Remove symbol exports from IRQ time accounting
The following commit has been merged into the irq/core branch of tip: Commit-ID: 7197688b2006357da75a014e0a76be89ca9c2d46 Gitweb: https://git.kernel.org/tip/7197688b2006357da75a014e0a76be89ca9c2d46 Author:Frederic Weisbecker AuthorDate:Wed, 02 Dec 2020 12:57:28 +01:00 Committer: Thomas Gleixner CommitterDate: Wed, 02 Dec 2020 20:20:04 +01:00 sched/cputime: Remove symbol exports from IRQ time accounting account_irq_enter_time() and account_irq_exit_time() are not called from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ cputime accounting functions called from there. Signed-off-by: Frederic Weisbecker Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20201202115732.27827-2-frede...@kernel.org --- arch/s390/kernel/vtime.c | 10 +- kernel/sched/cputime.c | 2 -- 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c index 8df10d3..f9f2a11 100644 --- a/arch/s390/kernel/vtime.c +++ b/arch/s390/kernel/vtime.c @@ -226,7 +226,7 @@ void vtime_flush(struct task_struct *tsk) * Update process times based on virtual cpu times stored by entry.S * to the lowcore fields user_timer, system_timer & steal_clock. */ -void vtime_account_irq_enter(struct task_struct *tsk) +void vtime_account_kernel(struct task_struct *tsk) { u64 timer; @@ -245,12 +245,12 @@ void vtime_account_irq_enter(struct task_struct *tsk) virt_timer_forward(timer); } -EXPORT_SYMBOL_GPL(vtime_account_irq_enter); - -void vtime_account_kernel(struct task_struct *tsk) -__attribute__((alias("vtime_account_irq_enter"))); EXPORT_SYMBOL_GPL(vtime_account_kernel); +void vtime_account_irq_enter(struct task_struct *tsk) +__attribute__((alias("vtime_account_kernel"))); + + /* * Sorted add to a list. List is linear searched until first bigger * element is found. diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 5a55d23..61ce9f9 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -71,7 +71,6 @@ void irqtime_account_irq(struct task_struct *curr) else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ); } -EXPORT_SYMBOL_GPL(irqtime_account_irq); static u64 irqtime_tick_accounted(u64 maxtime) { @@ -434,7 +433,6 @@ void vtime_account_irq_enter(struct task_struct *tsk) else vtime_account_kernel(tsk); } -EXPORT_SYMBOL_GPL(vtime_account_irq_enter); #endif /* __ARCH_HAS_VTIME_ACCOUNT */ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
Re: [PATCH net v2] bonding: fix feature flag setting at init time
On Wed, 2 Dec 2020 14:03:53 -0500 Jarod Wilson wrote: > On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski wrote: > > > > On Wed, 2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote: > > > + if (bond->dev->reg_state != NETREG_REGISTERED) > > > + goto noreg; > > > + > > > if (newval->value == BOND_MODE_ACTIVEBACKUP) > > > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > > > else > > > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; > > > - netdev_change_features(bond->dev); > > > + netdev_update_features(bond->dev); > > > +noreg: > > > > Why the goto? > > Seemed cleaner to prevent an extra level of indentation of the code > following the goto and before the label, but I'm not that attached to > it if it's not wanted for coding style reasons. Yes, please don't use gotos where a normal if statement is sufficient. If you must avoid the indentation move the code to a helper. Also - this patch did not apply to net, please make sure you're developing on the correct base.
Re: [PATCH net v2] bonding: fix feature flag setting at init time
On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski wrote: > > On Wed, 2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote: > > + if (bond->dev->reg_state != NETREG_REGISTERED) > > + goto noreg; > > + > > if (newval->value == BOND_MODE_ACTIVEBACKUP) > > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > > else > > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; > > - netdev_change_features(bond->dev); > > + netdev_update_features(bond->dev); > > +noreg: > > Why the goto? Seemed cleaner to prevent an extra level of indentation of the code following the goto and before the label, but I'm not that attached to it if it's not wanted for coding style reasons. -- Jarod Wilson ja...@redhat.com
Re: [PATCH net v2] bonding: fix feature flag setting at init time
Jarod Wilson wrote: >Don't try to adjust XFRM support flags if the bond device isn't yet >registered. Bad things can currently happen when netdev_change_features() >is called without having wanted_features fully filled in yet. Basically, >this code was racing against register_netdevice() filling in >wanted_features, and when it got there first, the empty wanted_features >led to features also getting emptied out, which was definitely not the >intended behavior, so prevent that from happening. Is this an actual race? Reading Ivan's prior message, it sounds like it's an ordering problem (in that bond_newlink calls register_netdevice after bond_changelink). The change to bond_option_mode_set tests against reg_state, so presumably it wants to skip the first(?) time through, before the register_netdevice call; is that right? -J >Originally, I'd hoped to stop adjusting wanted_features at all in the >bonding driver, as it's documented as being something only the network >core should touch, but we actually do need to do this to properly update >both the features and wanted_features fields when changing the bond type, >or we get to a situation where ethtool sees: > >esp-hw-offload: off [requested on] > >I do think we should be using netdev_update_features instead of >netdev_change_features here though, so we only send notifiers when the >features actually changed. > >v2: rework based on further testing and suggestions from ivecera > >Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") >Reported-by: Ivan Vecera >Suggested-by: Ivan Vecera >Cc: Jay Vosburgh >Cc: Veaceslav Falico >Cc: Andy Gospodarek >Cc: "David S. Miller" >Cc: Jakub Kicinski >Cc: Thomas Davis >Cc: net...@vger.kernel.org >Signed-off-by: Jarod Wilson >--- > drivers/net/bonding/bond_main.c| 10 -- > drivers/net/bonding/bond_options.c | 6 +- > 2 files changed, 9 insertions(+), 7 deletions(-) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index e0880a3840d7..5fe5232cc3f3 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev) > NETIF_F_HW_VLAN_CTAG_FILTER; > > bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; >-#ifdef CONFIG_XFRM_OFFLOAD >- bond_dev->hw_features |= BOND_XFRM_FEATURES; >-#endif /* CONFIG_XFRM_OFFLOAD */ > bond_dev->features |= bond_dev->hw_features; > bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; > #ifdef CONFIG_XFRM_OFFLOAD >- /* Disable XFRM features if this isn't an active-backup config */ >- if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) >- bond_dev->features &= ~BOND_XFRM_FEATURES; >+ bond_dev->hw_features |= BOND_XFRM_FEATURES; >+ /* Only enable XFRM features if this is an active-backup config */ >+ if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) >+ bond_dev->features |= BOND_XFRM_FEATURES; > #endif /* CONFIG_XFRM_OFFLOAD */ > } > >diff --git a/drivers/net/bonding/bond_options.c >b/drivers/net/bonding/bond_options.c >index 9abfaae1c6f7..19205cfac751 100644 >--- a/drivers/net/bonding/bond_options.c >+++ b/drivers/net/bonding/bond_options.c >@@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond, > bond->params.tlb_dynamic_lb = 1; > > #ifdef CONFIG_XFRM_OFFLOAD >+ if (bond->dev->reg_state != NETREG_REGISTERED) >+ goto noreg; >+ > if (newval->value == BOND_MODE_ACTIVEBACKUP) > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > else > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; >- netdev_change_features(bond->dev); >+ netdev_update_features(bond->dev); >+noreg: > > #endif /* CONFIG_XFRM_OFFLOAD */ > > /* don't cache arp_validate between modes */ >-- >2.28.0 > --- -Jay Vosburgh, jay.vosbu...@canonical.com
Re: [PATCH net v2] bonding: fix feature flag setting at init time
On Wed, 2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote: > + if (bond->dev->reg_state != NETREG_REGISTERED) > + goto noreg; > + > if (newval->value == BOND_MODE_ACTIVEBACKUP) > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > else > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; > - netdev_change_features(bond->dev); > + netdev_update_features(bond->dev); > +noreg: Why the goto?
Re: [PATCH net v2] bonding: fix feature flag setting at init time
On Wed, 2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote: > Don't try to adjust XFRM support flags if the bond device isn't yet > registered. Bad things can currently happen when netdev_change_features() > is called without having wanted_features fully filled in yet. Basically, > this code was racing against register_netdevice() filling in > wanted_features, and when it got there first, the empty wanted_features > led to features also getting emptied out, which was definitely not the > intended behavior, so prevent that from happening. > > Originally, I'd hoped to stop adjusting wanted_features at all in the > bonding driver, as it's documented as being something only the network > core should touch, but we actually do need to do this to properly update > both the features and wanted_features fields when changing the bond type, > or we get to a situation where ethtool sees: > > esp-hw-offload: off [requested on] > > I do think we should be using netdev_update_features instead of > netdev_change_features here though, so we only send notifiers when the > features actually changed. > > v2: rework based on further testing and suggestions from ivecera > > Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") > Reported-by: Ivan Vecera > Suggested-by: Ivan Vecera > Cc: Jay Vosburgh > Cc: Veaceslav Falico > Cc: Andy Gospodarek > Cc: "David S. Miller" > Cc: Jakub Kicinski > Cc: Thomas Davis > Cc: net...@vger.kernel.org > Signed-off-by: Jarod Wilson > --- > drivers/net/bonding/bond_main.c| 10 -- > drivers/net/bonding/bond_options.c | 6 +- > 2 files changed, 9 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index e0880a3840d7..5fe5232cc3f3 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev) > NETIF_F_HW_VLAN_CTAG_FILTER; > > bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; > -#ifdef CONFIG_XFRM_OFFLOAD > - bond_dev->hw_features |= BOND_XFRM_FEATURES; > -#endif /* CONFIG_XFRM_OFFLOAD */ > bond_dev->features |= bond_dev->hw_features; > bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; > #ifdef CONFIG_XFRM_OFFLOAD > - /* Disable XFRM features if this isn't an active-backup config */ > - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) > - bond_dev->features &= ~BOND_XFRM_FEATURES; > + bond_dev->hw_features |= BOND_XFRM_FEATURES; > + /* Only enable XFRM features if this is an active-backup config */ > + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) > + bond_dev->features |= BOND_XFRM_FEATURES; > #endif /* CONFIG_XFRM_OFFLOAD */ > } > > diff --git a/drivers/net/bonding/bond_options.c > b/drivers/net/bonding/bond_options.c > index 9abfaae1c6f7..19205cfac751 100644 > --- a/drivers/net/bonding/bond_options.c > +++ b/drivers/net/bonding/bond_options.c > @@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond, > bond->params.tlb_dynamic_lb = 1; > > #ifdef CONFIG_XFRM_OFFLOAD > + if (bond->dev->reg_state != NETREG_REGISTERED) > + goto noreg; > + > if (newval->value == BOND_MODE_ACTIVEBACKUP) > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > else > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; > - netdev_change_features(bond->dev); > + netdev_update_features(bond->dev); > +noreg: > #endif /* CONFIG_XFRM_OFFLOAD */ > > /* don't cache arp_validate between modes */ Tested-by: Ivan Vecera
[PATCH net v2] bonding: fix feature flag setting at init time
Don't try to adjust XFRM support flags if the bond device isn't yet registered. Bad things can currently happen when netdev_change_features() is called without having wanted_features fully filled in yet. Basically, this code was racing against register_netdevice() filling in wanted_features, and when it got there first, the empty wanted_features led to features also getting emptied out, which was definitely not the intended behavior, so prevent that from happening. Originally, I'd hoped to stop adjusting wanted_features at all in the bonding driver, as it's documented as being something only the network core should touch, but we actually do need to do this to properly update both the features and wanted_features fields when changing the bond type, or we get to a situation where ethtool sees: esp-hw-offload: off [requested on] I do think we should be using netdev_update_features instead of netdev_change_features here though, so we only send notifiers when the features actually changed. v2: rework based on further testing and suggestions from ivecera Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") Reported-by: Ivan Vecera Suggested-by: Ivan Vecera Cc: Jay Vosburgh Cc: Veaceslav Falico Cc: Andy Gospodarek Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Thomas Davis Cc: net...@vger.kernel.org Signed-off-by: Jarod Wilson --- drivers/net/bonding/bond_main.c| 10 -- drivers/net/bonding/bond_options.c | 6 +- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index e0880a3840d7..5fe5232cc3f3 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev) NETIF_F_HW_VLAN_CTAG_FILTER; bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; -#ifdef CONFIG_XFRM_OFFLOAD - bond_dev->hw_features |= BOND_XFRM_FEATURES; -#endif /* CONFIG_XFRM_OFFLOAD */ bond_dev->features |= bond_dev->hw_features; bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; #ifdef CONFIG_XFRM_OFFLOAD - /* Disable XFRM features if this isn't an active-backup config */ - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) - bond_dev->features &= ~BOND_XFRM_FEATURES; + bond_dev->hw_features |= BOND_XFRM_FEATURES; + /* Only enable XFRM features if this is an active-backup config */ + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) + bond_dev->features |= BOND_XFRM_FEATURES; #endif /* CONFIG_XFRM_OFFLOAD */ } diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 9abfaae1c6f7..19205cfac751 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond, bond->params.tlb_dynamic_lb = 1; #ifdef CONFIG_XFRM_OFFLOAD + if (bond->dev->reg_state != NETREG_REGISTERED) + goto noreg; + if (newval->value == BOND_MODE_ACTIVEBACKUP) bond->dev->wanted_features |= BOND_XFRM_FEATURES; else bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; - netdev_change_features(bond->dev); + netdev_update_features(bond->dev); +noreg: #endif /* CONFIG_XFRM_OFFLOAD */ /* don't cache arp_validate between modes */ -- 2.28.0