[rcu:dev.2020.12.23a 133/149] kernel/time/clocksource.c:220:6: sparse: sparse: symbol 'clocksource_verify_one_cpu' was not declared. Should it be

2020-12-26 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 
dev.2020.12.23a
head:   7cc07f4867eb9618d4f7c35ddfbd746131b52f51
commit: 6a70298420b2bd6d3e3dc86d81b993f618df8569 [133/149] clocksource: Check 
per-CPU clock synchronization when marked unstable
config: x86_64-randconfig-s021-20201223 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-184-g1b896707-dirty
# 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?id=6a70298420b2bd6d3e3dc86d81b993f618df8569
git remote add rcu 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
git fetch --no-tags rcu dev.2020.12.23a
git checkout 6a70298420b2bd6d3e3dc86d81b993f618df8569
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


"sparse warnings: (new ones prefixed by >>)"
>> kernel/time/clocksource.c:220:6: sparse: sparse: symbol 
>> 'clocksource_verify_one_cpu' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: Time to re-enable Runtime PM per default for PCI devcies?

2020-12-26 Thread Heiner Kallweit
On 17.11.2020 17:57, Rafael J. Wysocki wrote:
> On Tue, Nov 17, 2020 at 5:38 PM Bjorn Helgaas  wrote:
>>
>> [+to Rafael, author of the commit you mentioned,
>> +cc Mika, Kai Heng, Lukas, linux-pm, linux-kernel]
>>
>> On Tue, Nov 17, 2020 at 04:56:09PM +0100, Heiner Kallweit wrote:
>>> More than 10 yrs ago Runtime PM was disabled per default by bb910a7040
>>> ("PCI/PM Runtime: Make runtime PM of PCI devices inactive by default").
>>>
>>> Reason given: "avoid breakage on systems where ACPI-based wake-up is
>>> known to fail for some devices"
>>> Unfortunately the commit message doesn't mention any affected  devices
>>> or systems.
> 
> Even if it did that, it wouldn't have been a full list almost for sure.
> 
> We had received multiple problem reports related to that, most likely
> because the ACPI PM in BIOSes at that time was tailored for
> system-wide PM transitions only.
> 

To follow up on this discussion:
We could call pm_runtime_forbid() conditionally, e.g. with the following
condition. This would enable runtime pm per default for all non-ACPI
systems, and it uses the BIOS date as an indicator for a hopefully
not that broken ACPI implementation. However I could understand the
argument that this looks a little hacky ..

if (IS_ENABLED(CONFIG_ACPI) && dmi_get_bios_year() <= 2016)



>>> With Runtime PM disabled e.g. the PHY on network devices may remain
>>> powered up even with no cable plugged in, affecting battery lifetime
>>> on mobile devices. Currently we have to rely on the respective distro
>>> or user to enable Runtime PM via sysfs (echo auto > power/control).
>>> Some devices work around this restriction by calling pm_runtime_allow
>>> in their probe routine, even though that's not recommended by
>>> https://www.kernel.org/doc/Documentation/power/pci.txt
>>>
>>> Disabling Runtime PM per default seems to be a big hammer, a quirk
>>> for affected devices / systems may had been better. And we still
>>> have the option to disable Runtime PM for selected devices via sysfs.
>>>
>>> So, to cut a long story short: Wouldn't it be time to remove this
>>> restriction?
>>
>> I don't know the history of this, but maybe Rafael or the others can
>> shed some light on it.
> 
> The systems that had those problems 10 years ago would still have
> them, but I expect there to be more systems where runtime PM can be
> enabled by default for PCI devices without issues.
> 



Re: [rcu:dev.2020.12.23a 133/149] kernel/time/clocksource.c:220:6: warning: no previous prototype for function 'clocksource_verify_one_cpu'

2020-12-25 Thread Paul E. McKenney
On Fri, Dec 25, 2020 at 06:55:07PM +0800, kernel test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 
> dev.2020.12.23a
> head:   7cc07f4867eb9618d4f7c35ddfbd746131b52f51
> commit: 6a70298420b2bd6d3e3dc86d81b993f618df8569 [133/149] clocksource: Check 
> per-CPU clock synchronization when marked unstable
> config: x86_64-randconfig-r013-20201223 (attached as .config)
> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
> cee1e7d14f4628d6174b33640d502bff3b54ae45)
> reproduce (this is a W=1 build):
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # install x86_64 cross compiling tool for clang build
> # apt-get install binutils-x86-64-linux-gnu
> # 
> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?id=6a70298420b2bd6d3e3dc86d81b993f618df8569
> git remote add rcu 
> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> git fetch --no-tags rcu dev.2020.12.23a
> git checkout 6a70298420b2bd6d3e3dc86d81b993f618df8569
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
> ARCH=x86_64 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 

Good catch!  I will fold the fix into the original with attribution,
thank you!

Thanx,Paul

> All warnings (new ones prefixed by >>):
> 
> >> kernel/time/clocksource.c:220:6: warning: no previous prototype for 
> >> function 'clocksource_verify_one_cpu' [-Wmissing-prototypes]
>void clocksource_verify_one_cpu(void *csin)
> ^
>kernel/time/clocksource.c:220:1: note: declare 'static' if the function is 
> not intended to be used outside of this translation unit
>    void clocksource_verify_one_cpu(void *csin)
>^
>static 
>1 warning generated.
> 
> 
> vim +/clocksource_verify_one_cpu +220 kernel/time/clocksource.c
> 
>219
>  > 220void clocksource_verify_one_cpu(void *csin)
>221{
>222struct clocksource *cs = (struct clocksource *)csin;
>223
>224__this_cpu_write(csnow_mid, cs->read(cs));
>225}
>226
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org




[rcu:dev.2020.12.23a 133/149] kernel/time/clocksource.c:220:6: warning: no previous prototype for function 'clocksource_verify_one_cpu'

2020-12-25 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 
dev.2020.12.23a
head:   7cc07f4867eb9618d4f7c35ddfbd746131b52f51
commit: 6a70298420b2bd6d3e3dc86d81b993f618df8569 [133/149] clocksource: Check 
per-CPU clock synchronization when marked unstable
config: x86_64-randconfig-r013-20201223 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
cee1e7d14f4628d6174b33640d502bff3b54ae45)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?id=6a70298420b2bd6d3e3dc86d81b993f618df8569
git remote add rcu 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
git fetch --no-tags rcu dev.2020.12.23a
git checkout 6a70298420b2bd6d3e3dc86d81b993f618df8569
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> kernel/time/clocksource.c:220:6: warning: no previous prototype for function 
>> 'clocksource_verify_one_cpu' [-Wmissing-prototypes]
   void clocksource_verify_one_cpu(void *csin)
^
   kernel/time/clocksource.c:220:1: note: declare 'static' if the function is 
not intended to be used outside of this translation unit
   void clocksource_verify_one_cpu(void *csin)
   ^
   static 
   1 warning generated.


vim +/clocksource_verify_one_cpu +220 kernel/time/clocksource.c

   219  
 > 220  void clocksource_verify_one_cpu(void *csin)
   221  {
   222  struct clocksource *cs = (struct clocksource *)csin;
   223  
   224  __this_cpu_write(csnow_mid, cs->read(cs));
   225  }
   226  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


[PATCH v3 17/21] x86/fpu/amx: Define AMX state components and have it used for boot-time checks

2020-12-23 Thread Chang S. Bae
Linux uses check_xstate_against_struct() to sanity check the size of
XSTATE-enabled features. AMX is the XSAVE-enabled feature, and its size is
not hard-coded but discoverable at run-time via CPUID.

The AMX state is composed of state components 17 and 18, which are all user
state components. The first component is the XTILECFG state of a 64-byte
tile-related control register. The state component 18, called XTILEDATA,
contains the actual tile data, and the state size varies on
implementations. The architectural maximum, as defined in the CPUID(0x1d,
1): EAX[15:0], is a byte less than 64KB. The first implementation supports
8KB.

Check the XTILEDATA state size dynamically. The feature introduces the new
tile register, TMM. Define one register struct only and read the number of
registers from CPUID. Cross-check the overall size with CPUID again.

Signed-off-by: Chang S. Bae 
Reviewed-by: Len Brown 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Updated the code comments.

Changes from v1:
* Rebased on the upstream kernel (5.10)
---
 arch/x86/include/asm/fpu/types.h  | 27 ++
 arch/x86/include/asm/fpu/xstate.h |  2 +
 arch/x86/kernel/fpu/xstate.c  | 62 +++
 3 files changed, 91 insertions(+)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 3fc6dbbe3ede..bf9511efd546 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -120,6 +120,9 @@ enum xfeature {
XFEATURE_RSRVD_COMP_13,
XFEATURE_RSRVD_COMP_14,
XFEATURE_LBR,
+   XFEATURE_RSRVD_COMP_16,
+   XFEATURE_XTILE_CFG,
+   XFEATURE_XTILE_DATA,
 
XFEATURE_MAX,
 };
@@ -136,11 +139,15 @@ enum xfeature {
 #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
 #define XFEATURE_MASK_PASID(1 << XFEATURE_PASID)
 #define XFEATURE_MASK_LBR  (1 << XFEATURE_LBR)
+#define XFEATURE_MASK_XTILE_CFG(1 << XFEATURE_XTILE_CFG)
+#define XFEATURE_MASK_XTILE_DATA   (1 << XFEATURE_XTILE_DATA)
 
 #define XFEATURE_MASK_FPSSE(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
 #define XFEATURE_MASK_AVX512   (XFEATURE_MASK_OPMASK \
 | XFEATURE_MASK_ZMM_Hi256 \
 | XFEATURE_MASK_Hi16_ZMM)
+#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILE_DATA \
+| XFEATURE_MASK_XTILE_CFG)
 
 #define FIRST_EXTENDED_XFEATUREXFEATURE_YMM
 
@@ -153,6 +160,9 @@ struct reg_256_bit {
 struct reg_512_bit {
u8  regbytes[512/8];
 };
+struct reg_1024_byte {
+   u8  regbytes[1024];
+};
 
 /*
  * State component 2:
@@ -255,6 +265,23 @@ struct arch_lbr_state {
u64 ler_to;
u64 ler_info;
struct lbr_entryentries[];
+};
+
+/*
+ * State component 17: 64-byte tile configuration register.
+ */
+struct xtile_cfg {
+   u64 tcfg[8];
+} __packed;
+
+/*
+ * State component 18: 1KB tile data register.
+ * Each register represents 16 64-byte rows of the matrix
+ * data. But the number of registers depends on the actual
+ * implementation.
+ */
+struct xtile_data {
+   struct reg_1024_bytetmm;
 } __packed;
 
 /*
diff --git a/arch/x86/include/asm/fpu/xstate.h 
b/arch/x86/include/asm/fpu/xstate.h
index 5927033e017f..08d3dd18d7d8 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -13,6 +13,8 @@
 
 #define XSTATE_CPUID   0x000d
 
+#define TILE_CPUID 0x001d
+
 #define FXSAVE_SIZE512
 
 #define XSAVE_HDR_SIZE 64
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c2acfee581ba..f54ff1d4a44b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -41,6 +41,14 @@ static const char *xfeature_names[] =
"Protection Keys User registers",
"PASID state",
"unknown xstate feature",
+   "unknown xstate feature",
+   "unknown xstate feature",
+   "unknown xstate feature",
+   "unknown xstate feature",
+   "unknown xstate feature",
+   "AMX Tile config"   ,
+   "AMX Tile data" ,
+   "unknown xstate feature",
 };
 
 struct xfeature_capflag_info {
@@ -60,6 +68,8 @@ static struct xfeature_capflag_info xfeature_capflags[] 
__initdata = {
{ XFEATURE_PT_UNIMPLEMENTED_SO_FAR, X86_FEATURE_INTEL_PT },
{ XFEATURE_PKRU,X86_FEATURE_PKU },
{ XFEATURE_PASID,   X86_FEATURE_ENQCMD },
+   { XFEATURE_XTILE_CFG,   X86_FEATURE_AMX_TILE },
+   { XFEATURE_XTILE_DATA,  X86_FEATURE

NASA scientists achieve long-distance 'quantum teleportation' over 27 miles for the first time – paving the way for unhackable networks that transfer data faster than the speed of light

2020-12-23 Thread Turritopsis Dohrnii Teo En Ming
Subject: NASA scientists achieve long-distance 'quantum teleportation' 
over 27 miles for the first time – paving the way for unhackable 
networks that transfer data faster than the speed of light


Good day from Singapore,

I am sharing the below news article:

News Article: NASA scientists achieve long-distance 'quantum 
teleportation' over 27 miles for the first time – paving the way for 
unhackable networks that transfer data faster than the speed of light


Author: JOE PINKSTONE FOR MAILONLINE

Date Published: 22 December 2020

Link: 
https://www.dailymail.co.uk/sciencetech/article-9078855/NASA-scientists-achieve-long-distance-quantum-teleportation-time.html


Publisher: MailOnline UK

Synopis:

- Scientists built a 27-mile long prototype quantum internet in the US
- They successfully used quantum entanglement to teleport signals 
instantly
- The phenomenon sees qubits, the quantum equivalent of computer bits, 
pair up and respond instantly


Scientists have demonstrated long-distance 'quantum teleportation' – the 
instant transfer of units of quantum information known as qubits – for 
the first time.


The qubits were transferred faster than the speed of light over a 
distance of 27 miles, laying the foundations for a quantum internet 
service, which could one day revolutionise computing.


Quantum communication systems are faster and more secure than regular 
networks because they use photons rather than computer code, which can 
be hacked.


But their development relies on cutting-edge scientific theory which 
transforms our understanding of how computers work.


In a quantum internet, information stored in qubits (the quantum 
equivalent of computer bits) is shuttled, or 'teleported', over long 
distances through entanglement.


Entanglement is a phenomenon whereby two particles are linked in such a 
way that information shared with one is shared with the other at exactly 
the same time.


This means that the quantum state of each particle is dependent on the 
state of the other – even when they are separated by a large distance.


Quantum teleportation, therefore, is the transfer of quantum states from 
one location to the other.


However, it is highly sensitive to environmental interference that can 
easily disrupt the quality or 'fidelity' of teleportation, so proving 
the theory in practice has been technologically challenging.


In their latest experiment, researchers from Caltech, NASA, and Fermilab 
(Fermi National Accelerator Laboratory) built a unique system between 
two labs separated by 27 miles (44km).


The system comprises three nodes which interact with one another to 
trigger a sequence of qubits, which pass a signal from one place to the 
other instantly.


The 'teleportation' is instant, occurring faster than the speed of 
light, and the researchers reported a fidelity of more than 90 percent, 
according to the new study, published in PRX Quantum.


Fidelity is used to measure how close the resulting qubit signal is to 
the original message that was sent.


'This high fidelity is important especially in the case of quantum 
networks designed to connect advanced quantum devices, including quantum 
sensors,' explains Professor Maria Spiropulu from Caltech.


The findings of the project are crucial to hopes of a future quantum 
internet as well as pushing the boundaries of what scientists known 
about the quantum realm.


Although the technology is yet to reach the point of being rolled out 
beyond sophisticated tests such as this, there are already plans for how 
policy makers will employ the technology.


For example, the US Department of Energy hopes to erect a quantum 
network between its laboratories across the states.


The power of a quantum computer running on quantum internet will likely 
exceed the speeds of the world's current most sophisticated 
supercomputers by around 100 trillion times.


'People on social media are asking if they should sign up for a quantum 
internet provider (jokingly of course),' Professor Spiropulu told 
Motherboard.


'We need (a lot) more R&D work.'

WHAT IS QUANTUM ENTANGLEMENT?

In quantum physics, entangled particles remain connected so that actions 
performed by one affects the behaviour of the other, even if they are 
separated by huge distances.


This means if you measure, 'up' for the spin of one photon from an 
entangled pair, the spin of the other, measured an instant later, will 
be 'down' - even if the two are on opposite sides of the world.


Entanglement takes place when a part of particles interact physically.

For instance, a laser beam fired through a certain type of crystal can 
cause individual light particles to be split into pairs of entangled 
photons.


The theory that so riled Einstein is also referred to as 'spooky action 
at a distance'.


Einstein wasn't happy with theory, because 

[PATCH AUTOSEL 5.4 106/130] iwlwifi: avoid endless HW errors at assert time

2020-12-22 Thread Sasha Levin
From: Mordechay Goodstein 

[ Upstream commit 861bae42e1f125a5a255ace3ccd731e59f58ddec ]

Curretly we only mark HW error state "after" trying to collect HW data,
but if any HW error happens while colleting HW data we go into endless
loop. avoid this by setting HW error state "before" collecting HW data.

Signed-off-by: Mordechay Goodstein 
Signed-off-by: Luca Coelho 
Link: 
https://lore.kernel.org/r/iwlwifi.20201209231352.4c7e5a87da15.Ic35b2f28ff08f7ac23143c80f224d52eb97a0454@changeid
Signed-off-by: Luca Coelho 
Signed-off-by: Sasha Levin 
---
 drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c 
b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
index 3acbd5b7ab4b2..87f53810fdac3 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
@@ -1291,6 +1291,12 @@ void iwl_mvm_nic_restart(struct iwl_mvm *mvm, bool 
fw_error)
} else if (mvm->fwrt.cur_fw_img == IWL_UCODE_REGULAR &&
   mvm->hw_registered &&
   !test_bit(STATUS_TRANS_DEAD, &mvm->trans->status)) {
+   /* This should be first thing before trying to collect any
+* data to avoid endless loops if any HW error happens while
+* collecting debug data.
+*/
+   set_bit(IWL_MVM_STATUS_HW_RESTART_REQUESTED, &mvm->status);
+
if (mvm->fw->ucode_capa.error_log_size) {
u32 src_size = mvm->fw->ucode_capa.error_log_size;
u32 src_addr = mvm->fw->ucode_capa.error_log_addr;
@@ -1309,7 +1315,6 @@ void iwl_mvm_nic_restart(struct iwl_mvm *mvm, bool 
fw_error)
 
if (fw_error && mvm->fw_restart > 0)
mvm->fw_restart--;
-   set_bit(IWL_MVM_STATUS_HW_RESTART_REQUESTED, &mvm->status);
ieee80211_restart_hw(mvm->hw);
}
 }
-- 
2.27.0



Re: [PATCH] MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK

2020-12-22 Thread Rafael J. Wysocki
On Thu, Dec 17, 2020 at 8:16 AM Lukas Bulwahn  wrote:
>
> The current pattern in the file entry does not make the files in the
> governors subdirectory to be a part of the CPU IDLE TIME MANAGEMENT
> FRAMEWORK.
>
> Adjust the file pattern to include files in governors.
>
> Signed-off-by: Lukas Bulwahn 
> ---
> applies cleanly on current master and next-20201215
>
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 952731d1e43c..ac679aa00e0d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4596,7 +4596,7 @@ B:https://bugzilla.kernel.org
>  T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
>  F: Documentation/admin-guide/pm/cpuidle.rst
>  F: Documentation/driver-api/pm/cpuidle.rst
> -F: drivers/cpuidle/*
> +F: drivers/cpuidle/
>  F: include/linux/cpuidle.h
>
>  CPU POWER MONITORING SUBSYSTEM
> --

Applied as 5.11-rc material, thanks!

>


[PATCH v1 13/15] powerpc/32: Enable instruction translation at the same time as data translation

2020-12-22 Thread Christophe Leroy
On 8xx, kernel text is pinned.
On book3s/32, kernel text is mapped by BATs.

Enable instruction translation at the same time as data translation, it
makes things simpler.

In syscall handler, MSR_RI can also be set at the same time because
srr0/srr1 are already saved and r1 is set properly.

Also update comment in power_save_ppc32_restore().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 15 -
 arch/powerpc/kernel/head_6xx_8xx.h | 35 +++---
 arch/powerpc/kernel/idle_6xx.S |  4 +---
 3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 9ef75efaff47..2c38106c2c93 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -213,12 +213,8 @@ transfer_to_handler_cont:
 3:
mflrr9
tovirt_novmstack r2, r2 /* set r2 to current */
-   tovirt_vmstack r9, r9
lwz r11,0(r9)   /* virtual address of handler */
lwz r9,4(r9)/* where to go when done */
-#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
-   mtspr   SPRN_NRI, r0
-#endif
 #ifdef CONFIG_TRACE_IRQFLAGS
/*
 * When tracing IRQ state (lockdep) we enable the MMU before we call
@@ -235,6 +231,11 @@ transfer_to_handler_cont:
 
/* MSR isn't changing, just transition directly */
 #endif
+#ifdef CONFIG_HAVE_ARCH_VMAP_STACK
+   mtctr   r11
+   mtlrr9
+   bctr/* jump to handler */
+#else
mtspr   SPRN_SRR0,r11
mtspr   SPRN_SRR1,r10
mtlrr9
@@ -242,6 +243,7 @@ transfer_to_handler_cont:
 #ifdef CONFIG_40x
b . /* Prevent prefetch past rfi */
 #endif
+#endif
 
 #if defined (CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
 4: rlwinm  r12,r12,0,~_TLF_NAPPING
@@ -261,7 +263,9 @@ _ASM_NOKPROBE_SYMBOL(transfer_to_handler)
 _ASM_NOKPROBE_SYMBOL(transfer_to_handler_cont)
 
 #ifdef CONFIG_TRACE_IRQFLAGS
-1: /* MSR is changing, re-enable MMU so we can notify lockdep. We need to
+1:
+#ifndef CONFIG_HAVE_ARCH_VMAP_STACK
+   /* MSR is changing, re-enable MMU so we can notify lockdep. We need to
 * keep interrupts disabled at this point otherwise we might risk
 * taking an interrupt before we tell lockdep they are enabled.
 */
@@ -276,6 +280,7 @@ _ASM_NOKPROBE_SYMBOL(transfer_to_handler_cont)
 #endif
 
 reenable_mmu:
+#endif
/*
 * We save a bunch of GPRs,
 * r3 can be different from GPR3(r1) at this point, r9 and r11
diff --git a/arch/powerpc/kernel/head_6xx_8xx.h 
b/arch/powerpc/kernel/head_6xx_8xx.h
index 11b608b6f4b7..bedbf37c2a0c 100644
--- a/arch/powerpc/kernel/head_6xx_8xx.h
+++ b/arch/powerpc/kernel/head_6xx_8xx.h
@@ -49,10 +49,14 @@
 .endm
 
 .macro EXCEPTION_PROLOG_2 handle_dar_dsisr=0
-   li  r11, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
-   mtmsr   r11
-   isync
+   li  r11, MSR_KERNEL & ~MSR_RI /* re-enable MMU */
+   mtspr   SPRN_SRR1, r11
+   lis r11, 1f@h
+   ori r11, r11, 1f@l
+   mtspr   SPRN_SRR0, r11
mfspr   r11, SPRN_SPRG_SCRATCH2
+   rfi
+1:
stw r11,GPR1(r1)
stw r11,0(r1)
mr  r11, r1
@@ -75,7 +79,7 @@
.endif
lwz r9, SRR1(r12)
lwz r12, SRR0(r12)
-   li  r10, MSR_KERNEL & ~MSR_IR /* can take exceptions */
+   li  r10, MSR_KERNEL /* can take exceptions */
mtmsr   r10 /* (except for mach check in rtas) */
stw r0,GPR0(r11)
lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */
@@ -95,9 +99,13 @@
lwz r1,TASK_STACK-THREAD(r12)
beq-99f
addir1, r1, THREAD_SIZE - INT_FRAME_SIZE
-   li  r10, MSR_KERNEL & ~(MSR_IR | MSR_RI) /* can take DTLB miss */
-   mtmsr   r10
-   isync
+   li  r10, MSR_KERNEL /* can take exceptions */
+   mtspr   SPRN_SRR1, r10
+   lis r10, 1f@h
+   ori r10, r10, 1f@l
+   mtspr   SPRN_SRR0, r10
+   rfi
+1:
tovirt(r12, r12)
stw r11,GPR1(r1)
stw r11,0(r1)
@@ -108,8 +116,6 @@
mfcrr10
rlwinm  r10,r10,0,4,2   /* Clear SO bit in CR */
stw r10,_CCR(r1)/* save registers */
-   LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~MSR_IR) /* can take exceptions */
-   mtmsr   r10 /* (except for mach check in rtas) */
lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */
stw r2,GPR2(r1)
addir10,r10,STACK_FRAME_REGS_MARKER@l
@@ -126,8 +132,6 @@
ACCOUNT_CPU_USER_ENTRY(r2, r11, r12)
 
 3:
-   lis r11, transfer_to_syscall@h
-   ori r11, r11, transfer_to_syscall@l
 #ifdef CONFIG_TRACE_IRQFLAGS
/*
 * If MSR is changing we ne

Re: [PATCH] powerpc/time: Force inlining of get_tb()

2020-12-22 Thread Michael Ellerman
On Sun, 20 Dec 2020 18:18:26 + (UTC), Christophe Leroy wrote:
> Force inlining of get_tb() in order to avoid getting
> following function in vdso32, leading to suboptimal
> performance in clock_gettime()
> 
> 0688 <.get_tb>:
>  688: 7c 6d 42 a6 mftbu   r3
>  68c: 7c 8c 42 a6 mftbr4
>  690: 7d 2d 42 a6 mftbu   r9
>  694: 7c 03 48 40 cmplw   r3,r9
>  698: 40 e2 ff f0 bne+688 <.get_tb>
>  69c: 4e 80 00 20 blr

Applied to powerpc/fixes.

[1/1] powerpc/time: Force inlining of get_tb()
  https://git.kernel.org/powerpc/c/0faa22f09caadc11af2aa7570870ebd2ac5b8170

cheers


[PATCH v1 5/7] perf arm-spe: Assign kernel time to synthesized event

2020-12-21 Thread Leo Yan
In current code, it assigns the arch timer counter to the synthesized
samples Arm SPE trace, thus the samples don't contain the kernel time
but only contain the raw counter value.

To fix the issue, this patch converts the timer counter to kernel time
and assigns it to sample timestamp.

Signed-off-by: Leo Yan 
---
 tools/perf/util/arm-spe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index bc512c3479f7..2b008b973387 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -232,7 +232,7 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
struct arm_spe_record *record = &speq->decoder->record;
 
if (!spe->timeless_decoding)
-   sample->time = speq->timestamp;
+   sample->time = tsc_to_perf_time(record->timestamp, &spe->tc);
 
sample->ip = record->from_ip;
sample->cpumode = arm_spe_cpumode(spe, sample->ip);
-- 
2.17.1



[PATCH v1 4/7] perf arm-spe: Convert event kernel time to counter value

2020-12-21 Thread Leo Yan
When handle a perf event, Arm SPE decoder needs to decide if this perf
event is earlier or later than the samples from Arm SPE trace data; to
do comparision, it needs to use the same unit for the time.

This patch converts the event kernel time to arch timer's counter value,
thus it can be used to compare with counter value contained in Arm SPE
Timestamp packet.

Signed-off-by: Leo Yan 
---
 tools/perf/util/arm-spe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index a504ceec2de6..bc512c3479f7 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -588,7 +588,7 @@ static int arm_spe_process_event(struct perf_session 
*session,
}
 
if (sample->time && (sample->time != (u64) -1))
-   timestamp = sample->time;
+   timestamp = perf_time_to_tsc(sample->time, &spe->tc);
else
timestamp = 0;
 
-- 
2.17.1



[PATCH] powerpc/time: Force inlining of get_tb()

2020-12-20 Thread Christophe Leroy
Force inlining of get_tb() in order to avoid getting
following function in vdso32, leading to suboptimal
performance in clock_gettime()

0688 <.get_tb>:
 688:   7c 6d 42 a6 mftbu   r3
 68c:   7c 8c 42 a6 mftbr4
 690:   7d 2d 42 a6 mftbu   r9
 694:   7c 03 48 40 cmplw   r3,r9
 698:   40 e2 ff f0 bne+688 <.get_tb>
 69c:   4e 80 00 20 blr

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/vdso/timebase.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/vdso/timebase.h 
b/arch/powerpc/include/asm/vdso/timebase.h
index b558b07959ce..881f655caa0a 100644
--- a/arch/powerpc/include/asm/vdso/timebase.h
+++ b/arch/powerpc/include/asm/vdso/timebase.h
@@ -49,7 +49,7 @@ static inline unsigned long get_tbl(void)
return mftb();
 }
 
-static inline u64 get_tb(void)
+static __always_inline u64 get_tb(void)
 {
unsigned int tbhi, tblo, tbhi2;
 
-- 
2.25.0



[PATCH 5.4 24/34] xhci: Give USB2 ports time to enter U3 in bus suspend

2020-12-19 Thread Greg Kroah-Hartman
From: Li Jun 

commit c1373f10479b624fb6dba0805d673e860f1b421d upstream.

If a USB2 device wakeup is not enabled/supported the link state may
still be in U0 in xhci_bus_suspend(), where it's then manually put
to suspended U3 state.

Just as with selective suspend the device needs time to enter U3
suspend before continuing with further suspend operations
(e.g. system suspend), otherwise we may enter system suspend with link
state in U0.

[commit message rewording -Mathias]

Cc: 
Signed-off-by: Li Jun 
Signed-off-by: Mathias Nyman 
Link: 
https://lore.kernel.org/r/20201208092912.1773650-6-mathias.ny...@linux.intel.com
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/host/xhci-hub.c |4 
 1 file changed, 4 insertions(+)

--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1705,6 +1705,10 @@ retry:
hcd->state = HC_STATE_SUSPENDED;
bus_state->next_statechange = jiffies + msecs_to_jiffies(10);
spin_unlock_irqrestore(&xhci->lock, flags);
+
+   if (bus_state->bus_suspended)
+   usleep_range(5000, 1);
+
return 0;
 }
 




[PATCH 5.9 39/49] xhci: Give USB2 ports time to enter U3 in bus suspend

2020-12-19 Thread Greg Kroah-Hartman
From: Li Jun 

commit c1373f10479b624fb6dba0805d673e860f1b421d upstream.

If a USB2 device wakeup is not enabled/supported the link state may
still be in U0 in xhci_bus_suspend(), where it's then manually put
to suspended U3 state.

Just as with selective suspend the device needs time to enter U3
suspend before continuing with further suspend operations
(e.g. system suspend), otherwise we may enter system suspend with link
state in U0.

[commit message rewording -Mathias]

Cc: 
Signed-off-by: Li Jun 
Signed-off-by: Mathias Nyman 
Link: 
https://lore.kernel.org/r/20201208092912.1773650-6-mathias.ny...@linux.intel.com
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/host/xhci-hub.c |4 
 1 file changed, 4 insertions(+)

--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1712,6 +1712,10 @@ retry:
hcd->state = HC_STATE_SUSPENDED;
bus_state->next_statechange = jiffies + msecs_to_jiffies(10);
spin_unlock_irqrestore(&xhci->lock, flags);
+
+   if (bus_state->bus_suspended)
+   usleep_range(5000, 1);
+
return 0;
 }
 




[PATCH 5.9 28/49] bonding: fix feature flag setting at init time

2020-12-19 Thread Greg Kroah-Hartman
From: Jarod Wilson 

[ Upstream commit 007ab5345545aba2f9cbe4c096cc35d2fd3275ac ]

Don't try to adjust XFRM support flags if the bond device isn't yet
registered. Bad things can currently happen when netdev_change_features()
is called without having wanted_features fully filled in yet. This code
runs both on post-module-load mode changes, as well as at module init
time, and when run at module init time, it is before register_netdevice()
has been called and filled in wanted_features. The empty wanted_features
led to features also getting emptied out, which was definitely not the
intended behavior, so prevent that from happening.

Originally, I'd hoped to stop adjusting wanted_features at all in the
bonding driver, as it's documented as being something only the network
core should touch, but we actually do need to do this to properly update
both the features and wanted_features fields when changing the bond type,
or we get to a situation where ethtool sees:

esp-hw-offload: off [requested on]

I do think we should be using netdev_update_features instead of
netdev_change_features here though, so we only send notifiers when the
features actually changed.

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
Reported-by: Ivan Vecera 
Suggested-by: Ivan Vecera 
Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Signed-off-by: Jarod Wilson 
Link: https://lore.kernel.org/r/20201205172229.576587-1-ja...@redhat.com
Signed-off-by: Jakub Kicinski 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/bonding/bond_options.c |   22 +++---
 include/net/bonding.h  |2 --
 2 files changed, 15 insertions(+), 9 deletions(-)

--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -745,6 +745,19 @@ const struct bond_option *bond_opt_get(u
return &bond_opts[option];
 }
 
+static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode)
+{
+   if (!IS_ENABLED(CONFIG_XFRM_OFFLOAD))
+   return;
+
+   if (mode == BOND_MODE_ACTIVEBACKUP)
+   bond_dev->wanted_features |= BOND_XFRM_FEATURES;
+   else
+   bond_dev->wanted_features &= ~BOND_XFRM_FEATURES;
+
+   netdev_update_features(bond_dev);
+}
+
 static int bond_option_mode_set(struct bonding *bond,
const struct bond_opt_value *newval)
 {
@@ -767,13 +780,8 @@ static int bond_option_mode_set(struct b
if (newval->value == BOND_MODE_ALB)
bond->params.tlb_dynamic_lb = 1;
 
-#ifdef CONFIG_XFRM_OFFLOAD
-   if (newval->value == BOND_MODE_ACTIVEBACKUP)
-   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
-   else
-   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-   netdev_change_features(bond->dev);
-#endif /* CONFIG_XFRM_OFFLOAD */
+   if (bond->dev->reg_state == NETREG_REGISTERED)
+   bond_set_xfrm_features(bond->dev, newval->value);
 
/* don't cache arp_validate between modes */
bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -86,10 +86,8 @@
 #define bond_for_each_slave_rcu(bond, pos, iter) \
netdev_for_each_lower_private_rcu((bond)->dev, pos, iter)
 
-#ifdef CONFIG_XFRM_OFFLOAD
 #define BOND_XFRM_FEATURES (NETIF_F_HW_ESP | NETIF_F_HW_ESP_TX_CSUM | \
NETIF_F_GSO_ESP)
-#endif /* CONFIG_XFRM_OFFLOAD */
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
 extern atomic_t netpoll_block_tx;




[PATCH 5.10 09/16] xhci: Give USB2 ports time to enter U3 in bus suspend

2020-12-19 Thread Greg Kroah-Hartman
From: Li Jun 

commit c1373f10479b624fb6dba0805d673e860f1b421d upstream.

If a USB2 device wakeup is not enabled/supported the link state may
still be in U0 in xhci_bus_suspend(), where it's then manually put
to suspended U3 state.

Just as with selective suspend the device needs time to enter U3
suspend before continuing with further suspend operations
(e.g. system suspend), otherwise we may enter system suspend with link
state in U0.

[commit message rewording -Mathias]

Cc: 
Signed-off-by: Li Jun 
Signed-off-by: Mathias Nyman 
Link: 
https://lore.kernel.org/r/20201208092912.1773650-6-mathias.ny...@linux.intel.com
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/host/xhci-hub.c |4 
 1 file changed, 4 insertions(+)

--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1712,6 +1712,10 @@ retry:
hcd->state = HC_STATE_SUSPENDED;
bus_state->next_statechange = jiffies + msecs_to_jiffies(10);
spin_unlock_irqrestore(&xhci->lock, flags);
+
+   if (bus_state->bus_suspended)
+   usleep_range(5000, 1);
+
return 0;
 }
 




[PATCH v1 05/13] sparc32: Drop run-time patching of ipi trap

2020-12-18 Thread Sam Ravnborg
There is no longer any need for the run-time patching of the ipi trap
with the removal of sun4m and sun4d. Remove the patching and drop the
ipi implementation for the two machines.

The patch includes removal of patching from pcic as this was needed to
fix the build. pcic will be removed in a later commit.

Signed-off-by: Sam Ravnborg 
Cc: Mike Rapoport 
Cc: Andrew Morton 
Cc: Sam Ravnborg 
Cc: Christian Brauner 
Cc: "David S. Miller" 
Cc: Geert Uytterhoeven 
Cc: Pekka Enberg 
Cc: Arnd Bergmann 
Cc: Andreas Larsson 
---
 arch/sparc/kernel/entry.S | 70 ++-
 arch/sparc/kernel/kernel.h|  4 --
 arch/sparc/kernel/leon_smp.c  |  3 --
 arch/sparc/kernel/pcic.c  | 11 --
 arch/sparc/kernel/sun4d_smp.c |  3 --
 arch/sparc/kernel/ttable_32.S |  9 ++---
 6 files changed, 7 insertions(+), 93 deletions(-)

diff --git a/arch/sparc/kernel/entry.S b/arch/sparc/kernel/entry.S
index 9985b08a3467..1a2e20a7e584 100644
--- a/arch/sparc/kernel/entry.S
+++ b/arch/sparc/kernel/entry.S
@@ -174,32 +174,6 @@ maybe_smp4m_msg_check_resched:
 maybe_smp4m_msg_out:
RESTORE_ALL
 
-   .align  4
-   .globl  linux_trap_ipi15_sun4m
-linux_trap_ipi15_sun4m:
-   SAVE_ALL
-   sethi   %hi(0x8000), %o2
-   GET_PROCESSOR4M_ID(o0)
-   sethi   %hi(sun4m_irq_percpu), %l5
-   or  %l5, %lo(sun4m_irq_percpu), %o5
-   sll %o0, 2, %o0
-   ld  [%o5 + %o0], %o5
-   ld  [%o5 + 0x00], %o3   ! sun4m_irq_percpu[cpu]->pending
-   andcc   %o3, %o2, %g0
-   be  sun4m_nmi_error ! Must be an NMI async memory error
-st %o2, [%o5 + 0x04]   ! 
sun4m_irq_percpu[cpu]->clear=0x8000
-   WRITE_PAUSE
-   ld  [%o5 + 0x00], %g0   ! sun4m_irq_percpu[cpu]->pending
-   WRITE_PAUSE
-   or  %l0, PSR_PIL, %l4
-   wr  %l4, 0x0, %psr
-   WRITE_PAUSE
-   wr  %l4, PSR_ET, %psr
-   WRITE_PAUSE
-   callsmp4m_cross_call_irq
-nop
-   b   ret_trap_lockless_ipi
-clr%l6
 
.globl  smp4d_ticker
/* SMP per-cpu ticker interrupts are handled specially. */
@@ -220,44 +194,6 @@ smp4d_ticker:
WRITE_PAUSE
RESTORE_ALL
 
-   .align  4
-   .globl  linux_trap_ipi15_sun4d
-linux_trap_ipi15_sun4d:
-   SAVE_ALL
-   sethi   %hi(CC_BASE), %o4
-   sethi   %hi(MXCC_ERR_ME|MXCC_ERR_PEW|MXCC_ERR_ASE|MXCC_ERR_PEE), %o2
-   or  %o4, (CC_EREG - CC_BASE), %o0
-   ldda[%o0] ASI_M_MXCC, %o0
-   andcc   %o0, %o2, %g0
-   bne 1f
-sethi  %hi(BB_STAT2), %o2
-   lduba   [%o2] ASI_M_CTL, %o2
-   andcc   %o2, BB_STAT2_MASK, %g0
-   bne 2f
-or %o4, (CC_ICLR - CC_BASE), %o0
-   sethi   %hi(1 << 15), %o1
-   stha%o1, [%o0] ASI_M_MXCC   /* Clear PIL 15 in MXCC's ICLR */
-   or  %l0, PSR_PIL, %l4
-   wr  %l4, 0x0, %psr
-   WRITE_PAUSE
-   wr  %l4, PSR_ET, %psr
-   WRITE_PAUSE
-   callsmp4d_cross_call_irq
-nop
-   b   ret_trap_lockless_ipi
-clr%l6
-
-1: /* MXCC error */
-2: /* BB error */
-   /* Disable PIL 15 */
-   set CC_IMSK, %l4
-   lduha   [%l4] ASI_M_MXCC, %l5
-   sethi   %hi(1 << 15), %l7
-   or  %l5, %l7, %l5
-   stha%l5, [%l4] ASI_M_MXCC
-   /* FIXME */
-1: b,a 1b
-
.globl  smpleon_ipi
.extern leon_ipi_interrupt
/* SMP per-cpu IPI interrupts are handled specially. */
@@ -618,11 +554,11 @@ sun4m_nmi_error:
 
 #ifndef CONFIG_SMP
.align  4
-   .globl  linux_trap_ipi15_sun4m
-linux_trap_ipi15_sun4m:
+   .globl  linux_trap_ipi15_leon
+linux_trap_ipi15_leon:
SAVE_ALL
 
-   ba  sun4m_nmi_error
+   ba  sun4m_nmi_error
 nop
 #endif /* CONFIG_SMP */
 
diff --git a/arch/sparc/kernel/kernel.h b/arch/sparc/kernel/kernel.h
index 7328d13875e4..c9e1b13d955f 100644
--- a/arch/sparc/kernel/kernel.h
+++ b/arch/sparc/kernel/kernel.h
@@ -135,10 +135,6 @@ void leonsmp_ipi_interrupt(void);
 void leon_cross_call_irq(void);
 
 /* head_32.S */
-extern unsigned int t_nmi[];
-extern unsigned int linux_trap_ipi15_sun4d[];
-extern unsigned int linux_trap_ipi15_sun4m[];
-
 extern struct tt_entry trapbase;
 extern struct tt_entry trapbase_cpu1;
 extern struct tt_entry trapbase_cpu2;
diff --git a/arch/sparc/kernel/leon_smp.c b/arch/sparc/kernel/leon_smp.c
index 1eed26d423fb..b0d75783d337 100644
--- a/arch/sparc/kernel/leon_smp.c
+++ b/arch/sparc/kernel/leon_smp.c
@@ -461,8 +461,5 @@ static const struct sparc32_ipi_ops leon_ipi_ops = {
 
 void __init leon_init_smp(void)
 {
-   /* Patch ipi15 trap table */
-   t_nmi[1] = t_nmi[1] + (linux_trap_ipi15_leon - linux_trap_ipi15_sun4m);
-
sparc32_ipi_ops = &leon_ipi_ops;
 }
diff --git a/arch/sparc/kernel/pcic.c b/arch/sparc/kernel/pcic.c
index ee4c9a9a171c..87bdcb16019b 100644
--- a/arch/sparc/kern

RE: [PATCH v3 06/15] x86/paravirt: switch time pvops functions to use static_call()

2020-12-17 Thread Michael Kelley
From: Juergen Gross  Sent: Thursday, December 17, 2020 1:31 AM

> The time pvops functions are the only ones left which might be
> used in 32-bit mode and which return a 64-bit value.
> 
> Switch them to use the static_call() mechanism instead of pvops, as
> this allows quite some simplification of the pvops implementation.
> 
> Due to include hell this requires to split out the time interfaces
> into a new header file.
> 
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/Kconfig  |  1 +
>  arch/x86/include/asm/mshyperv.h   | 11 
>  arch/x86/include/asm/paravirt.h   | 14 --
>  arch/x86/include/asm/paravirt_time.h  | 38 +++
>  arch/x86/include/asm/paravirt_types.h |  6 -
>  arch/x86/kernel/cpu/vmware.c  |  5 ++--
>  arch/x86/kernel/kvm.c |  3 ++-
>  arch/x86/kernel/kvmclock.c|  3 ++-
>  arch/x86/kernel/paravirt.c| 16 ---
>  arch/x86/kernel/tsc.c |  3 ++-
>  arch/x86/xen/time.c   | 12 -
>  drivers/clocksource/hyperv_timer.c|  5 ++--
>  drivers/xen/time.c|  3 ++-
>  kernel/sched/sched.h  |  1 +
>  14 files changed, 71 insertions(+), 50 deletions(-)
>  create mode 100644 arch/x86/include/asm/paravirt_time.h
>

[snip]
 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ffc289992d1b..45942d420626 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -56,17 +56,6 @@ typedef int (*hyperv_fill_flush_list_func)(
>  #define hv_get_raw_timer() rdtsc_ordered()
>  #define hv_get_vector() HYPERVISOR_CALLBACK_VECTOR
> 
> -/*
> - * Reference to pv_ops must be inline so objtool
> - * detection of noinstr violations can work correctly.
> - */
> -static __always_inline void hv_setup_sched_clock(void *sched_clock)
> -{
> -#ifdef CONFIG_PARAVIRT
> - pv_ops.time.sched_clock = sched_clock;
> -#endif
> -}
> -
>  void hyperv_vector_handler(struct pt_regs *regs);
> 
>  static inline void hv_enable_stimer0_percpu_irq(int irq) {}

[snip]

> diff --git a/drivers/clocksource/hyperv_timer.c 
> b/drivers/clocksource/hyperv_timer.c
> index ba04cb381cd3..1ed79993fc50 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  static struct clock_event_device __percpu *hv_clock_event;
>  static u64 hv_sched_clock_offset __ro_after_init;
> @@ -445,7 +446,7 @@ static bool __init hv_init_tsc_clocksource(void)
>   clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
> 
>   hv_sched_clock_offset = hv_read_reference_counter();
> - hv_setup_sched_clock(read_hv_sched_clock_tsc);
> + paravirt_set_sched_clock(read_hv_sched_clock_tsc);
> 
>   return true;
>  }
> @@ -470,6 +471,6 @@ void __init hv_init_clocksource(void)
>   clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
> 
>   hv_sched_clock_offset = hv_read_reference_counter();
> - hv_setup_sched_clock(read_hv_sched_clock_msr);
> + static_call_update(pv_sched_clock, read_hv_sched_clock_msr);
>  }
>  EXPORT_SYMBOL_GPL(hv_init_clocksource);

These Hyper-V changes are problematic as we want to keep hyperv_timer.c
architecture independent.  While only the code for x86/x64 is currently
accepted upstream, code for ARM64 support is in progress.   So we need
to use hv_setup_sched_clock() in hyperv_timer.c, and have the per-arch
implementation in mshyperv.h.

Michael


[PATCH v3 06/15] x86/paravirt: switch time pvops functions to use static_call()

2020-12-17 Thread Juergen Gross
The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.

Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.

Due to include hell this requires to split out the time interfaces
into a new header file.

Signed-off-by: Juergen Gross 
---
 arch/x86/Kconfig  |  1 +
 arch/x86/include/asm/mshyperv.h   | 11 
 arch/x86/include/asm/paravirt.h   | 14 --
 arch/x86/include/asm/paravirt_time.h  | 38 +++
 arch/x86/include/asm/paravirt_types.h |  6 -
 arch/x86/kernel/cpu/vmware.c  |  5 ++--
 arch/x86/kernel/kvm.c |  3 ++-
 arch/x86/kernel/kvmclock.c|  3 ++-
 arch/x86/kernel/paravirt.c| 16 ---
 arch/x86/kernel/tsc.c |  3 ++-
 arch/x86/xen/time.c   | 12 -
 drivers/clocksource/hyperv_timer.c|  5 ++--
 drivers/xen/time.c|  3 ++-
 kernel/sched/sched.h  |  1 +
 14 files changed, 71 insertions(+), 50 deletions(-)
 create mode 100644 arch/x86/include/asm/paravirt_time.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a8bd298e45b1..ebabd8bf4064 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -769,6 +769,7 @@ if HYPERVISOR_GUEST
 
 config PARAVIRT
bool "Enable paravirtualization code"
+   depends on HAVE_STATIC_CALL
help
  This changes the kernel so it can modify itself when it is run
  under a hypervisor, potentially improving performance significantly
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ffc289992d1b..45942d420626 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -56,17 +56,6 @@ typedef int (*hyperv_fill_flush_list_func)(
 #define hv_get_raw_timer() rdtsc_ordered()
 #define hv_get_vector() HYPERVISOR_CALLBACK_VECTOR
 
-/*
- * Reference to pv_ops must be inline so objtool
- * detection of noinstr violations can work correctly.
- */
-static __always_inline void hv_setup_sched_clock(void *sched_clock)
-{
-#ifdef CONFIG_PARAVIRT
-   pv_ops.time.sched_clock = sched_clock;
-#endif
-}
-
 void hyperv_vector_handler(struct pt_regs *regs);
 
 static inline void hv_enable_stimer0_percpu_irq(int irq) {}
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 4abf110e2243..0785a9686e32 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -17,25 +17,11 @@
 #include 
 #include 
 
-static inline unsigned long long paravirt_sched_clock(void)
-{
-   return PVOP_CALL0(unsigned long long, time.sched_clock);
-}
-
-struct static_key;
-extern struct static_key paravirt_steal_enabled;
-extern struct static_key paravirt_steal_rq_enabled;
-
 __visible void __native_queued_spin_unlock(struct qspinlock *lock);
 bool pv_is_native_spin_unlock(void);
 __visible bool __native_vcpu_is_preempted(long cpu);
 bool pv_is_native_vcpu_is_preempted(void);
 
-static inline u64 paravirt_steal_clock(int cpu)
-{
-   return PVOP_CALL1(u64, time.steal_clock, cpu);
-}
-
 /* The paravirtualized I/O functions */
 static inline void slow_down_io(void)
 {
diff --git a/arch/x86/include/asm/paravirt_time.h 
b/arch/x86/include/asm/paravirt_time.h
new file mode 100644
index ..76cf94b7c899
--- /dev/null
+++ b/arch/x86/include/asm/paravirt_time.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PARAVIRT_TIME_H
+#define _ASM_X86_PARAVIRT_TIME_H
+
+/* Time related para-virtualized functions. */
+
+#ifdef CONFIG_PARAVIRT
+
+#include 
+#include 
+#include 
+
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+u64 dummy_sched_clock(void);
+
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);
+
+extern bool paravirt_using_native_sched_clock;
+
+void paravirt_set_sched_clock(u64 (*func)(void));
+
+static inline u64 paravirt_sched_clock(void)
+{
+   return static_call(pv_sched_clock)();
+}
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+#endif /* CONFIG_PARAVIRT */
+
+#endif /* _ASM_X86_PARAVIRT_TIME_H */
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index de87087d3bde..1fff349e4792 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -95,11 +95,6 @@ struct pv_lazy_ops {
 } __no_randomize_layout;
 #endif
 
-struct pv_time_ops {
-   unsigned long long (*sched_clock)(void);
-   unsigned long long (*steal_clock)(int cpu);
-} __no_randomize_layout;
-
 struct pv_cpu_ops {
/* hooks for various privileged instructions */
void (*io_delay)(void);
@@ -291,7 +286,6 @@ struct p

[PATCH] MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK

2020-12-16 Thread Lukas Bulwahn
The current pattern in the file entry does not make the files in the
governors subdirectory to be a part of the CPU IDLE TIME MANAGEMENT
FRAMEWORK.

Adjust the file pattern to include files in governors.

Signed-off-by: Lukas Bulwahn 
---
applies cleanly on current master and next-20201215

 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 952731d1e43c..ac679aa00e0d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4596,7 +4596,7 @@ B:https://bugzilla.kernel.org
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
 F: Documentation/admin-guide/pm/cpuidle.rst
 F: Documentation/driver-api/pm/cpuidle.rst
-F: drivers/cpuidle/*
+F: drivers/cpuidle/
 F: include/linux/cpuidle.h
 
 CPU POWER MONITORING SUBSYSTEM
-- 
2.17.1



[RFC PATCH 2/8] x86/cpu: Load Key Locker internal key at boot-time

2020-12-16 Thread Chang S. Bae
Internal (Wrapping) Key is a new entity of Intel Key Locker feature. This
internal key is loaded in a software-inaccessible CPU state and used to
encode a data encryption key.

The kernel makes random data and loads it as the internal key in each CPU.
The data need to be invalidated as soon as the load is done.

The BIOS may disable the feature. Check the dynamic CPUID bit
(KEYLOCKER_CPUID_EBX_AESKLE) at first.

Add byte code for LOADIWKEY -- an instruction to load the internal key, in
the 'x86-opcode-map.txt' file to avoid objtool's misinterpretation.

Signed-off-by: Chang S. Bae 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/include/asm/keylocker.h  | 11 +
 arch/x86/kernel/Makefile  |  1 +
 arch/x86/kernel/cpu/common.c  | 38 +-
 arch/x86/kernel/keylocker.c   | 71 +++
 arch/x86/kernel/smpboot.c |  2 +
 arch/x86/lib/x86-opcode-map.txt   |  2 +-
 tools/arch/x86/lib/x86-opcode-map.txt |  2 +-
 7 files changed, 124 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 2fe13c21c63f..daf0734a4095 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -14,5 +14,16 @@
 #define KEYLOCKER_CPUID_EBX_BACKUP BIT(4)
 #define KEYLOCKER_CPUID_ECX_RAND   BIT(1)
 
+bool check_keylocker_readiness(void);
+
+bool load_keylocker(void);
+
+void make_keylocker_data(void);
+#ifdef CONFIG_X86_KEYLOCKER
+void invalidate_keylocker_data(void);
+#else
+#define invalidate_keylocker_data() do { } while (0)
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 68608bd892c0..085dbf49b3b9 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -145,6 +145,7 @@ obj-$(CONFIG_PERF_EVENTS)   += perf_regs.o
 obj-$(CONFIG_TRACING)  += tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o
 obj-$(CONFIG_X86_UMIP) += umip.o
+obj-$(CONFIG_X86_KEYLOCKER)+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)   += unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 35ad8480c464..d675075848bb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -57,6 +57,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 
 #include "cpu.h"
@@ -459,6 +461,39 @@ static __init int x86_nofsgsbase_setup(char *arg)
 }
 __setup("nofsgsbase", x86_nofsgsbase_setup);
 
+static __always_inline void setup_keylocker(struct cpuinfo_x86 *c)
+{
+   bool keyloaded;
+
+   if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+   goto out;
+
+   cr4_set_bits(X86_CR4_KEYLOCKER);
+
+   if (c == &boot_cpu_data) {
+   if (!check_keylocker_readiness())
+   goto disable_keylocker;
+
+   make_keylocker_data();
+   }
+
+   keyloaded = load_keylocker();
+   if (!keyloaded) {
+   pr_err_once("x86/keylocker: Failed to load internal key\n");
+   goto disable_keylocker;
+   }
+
+   pr_info_once("x86/keylocker: Activated\n");
+   return;
+
+disable_keylocker:
+   clear_cpu_cap(c, X86_FEATURE_KEYLOCKER);
+   pr_info_once("x86/keylocker: Disabled\n");
+out:
+   /* Make sure the feature disabled for kexec-reboot. */
+   cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1554,10 +1589,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
 
-   /* Set up SMEP/SMAP/UMIP */
+   /* Setup various Intel-specific CPU security features */
setup_smep(c);
setup_smap(c);
setup_umip(c);
+   setup_keylocker(c);
 
/* Enable FSGSBASE instructions if available. */
if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index ..e455d806b80c
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Key Locker feature check and support the internal key
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+
+bool check_keylocker_readiness(void)
+{
+   u32 eax, ebx, ecx, edx;
+
+   cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+   /* BIOS may not enable it on some systems. */
+   if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE)) {
+   pr_debug("x86/keylocker: not fully enabled\n");
+   return false;
+   }
+
+   return true;
+}
+
+/* Load Internal (Wrapping) Key */
+#define LOADIWKEY  ".byte 0xf3,0x0f,0x38,0xdc,0xd1"
+#define LOADIWKEY_NUM_OPERANDS 3
+

[PATCH] usemem: Add option init-time

2020-12-16 Thread Hui Zhu
From: Hui Zhu 

This commit add a new option init-time to remove the initialization time
from the run time and show the initialization time.

Signed-off-by: Hui Zhu 
---
 usemem.c | 29 +++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/usemem.c b/usemem.c
index 823647e..6d1d575 100644
--- a/usemem.c
+++ b/usemem.c
@@ -96,6 +96,7 @@ int opt_bind_interval = 0;
 unsigned long opt_delay = 0;
 int opt_read_again = 0;
 int opt_punch_holes = 0;
+int opt_init_time = 0;
 int nr_task;
 int nr_thread;
 int nr_cpu;
@@ -155,6 +156,7 @@ void usage(int ok)
"-U|--hugetlballocate hugetlbfs page\n"
"-Z|--read-again read memory again after access the memory\n"
"--punch-holes   free every other page after allocation\n"
+   "--init-time remove the initialization time from the run 
time and show the initialization time\n"
"-h|--help   show this message\n"
,   ourname);
 
@@ -193,7 +195,8 @@ static const struct option opts[] = {
{ "delay"   , 1, NULL, 'e' },
{ "hugetlb" , 0, NULL, 'U' },
{ "read-again"  , 0, NULL, 'Z' },
-   { "punch-holes" , 0, NULL,   0 },
+   { "punch-holes" , 0, NULL,   0 },
+   { "init-time"   , 0, NULL,   0 },
{ "help", 0, NULL, 'h' },
{ NULL  , 0, NULL, 0 }
 };
@@ -945,6 +948,8 @@ int main(int argc, char *argv[])
case 0:
if (strcmp(opts[opt_index].name, "punch-holes") == 0) {
opt_punch_holes = 1;
+   } else if (strcmp(opts[opt_index].name, "init-time") == 
0) { 
+   opt_init_time = 1;
} else
usage(1);
break;
@@ -1128,7 +1133,7 @@ int main(int argc, char *argv[])
if (optind != argc - 1)
usage(0);
 
-   if (!opt_write_signal_read)
+   if (!opt_write_signal_read || opt_init_time)
gettimeofday(&start_time, NULL);
 
opt_bytes = memparse(argv[optind], NULL);
@@ -1263,5 +1268,25 @@ int main(int argc, char *argv[])
if (!nr_task)
nr_task = 1;
 
+   if (opt_init_time) {
+   struct timeval stop;
+   char buf[1024];
+   size_t len;
+   unsigned long delta_us;
+
+   gettimeofday(&stop, NULL);
+   delta_us = (stop.tv_sec - start_time.tv_sec) * 100 +
+   (stop.tv_usec - start_time.tv_usec);
+   len = snprintf(buf, sizeof(buf),
+   "the initialization time is %lu secs %lu usecs\n",
+   delta_us / 100, delta_us % 100);
+   fflush(stdout);
+   if (write(1, buf, len) != len)
+   fprintf(stderr, "WARNING: statistics output may be 
incomplete.\n");
+
+   if (!opt_write_signal_read)
+   gettimeofday(&start_time, NULL);
+   }
+
return do_tasks();
 }
-- 
2.17.1



RE: [PATCH RFC 0/3] Implement guest time scaling in RISC-V KVM

2020-12-16 Thread Jiangyifei

> -Original Message-
> From: Anup Patel [mailto:a...@brainfault.org]
> Sent: Wednesday, December 16, 2020 2:40 PM
> To: Jiangyifei 
> Cc: Anup Patel ; Atish Patra ;
> Paul Walmsley ; Palmer Dabbelt
> ; Albert Ou ; Paolo Bonzini
> ; Zhanghailiang ;
> KVM General ; yinyipeng ;
> Zhangxiaofeng (F) ;
> linux-kernel@vger.kernel.org List ;
> kvm-ri...@lists.infradead.org; linux-riscv ;
> Wubin (H) ; dengkai (A) 
> Subject: Re: [PATCH RFC 0/3] Implement guest time scaling in RISC-V KVM
> 
> On Thu, Dec 3, 2020 at 5:51 PM Yifei Jiang  wrote:
> >
> > This series implements guest time scaling based on RDTIME instruction
> > emulation so that we can allow migrating Guest/VM across Hosts with
> > different time frequency.
> >
> > Why not through para-virt. From arm's experience[1], para-virt
> > implementation doesn't really solve the problem for the following two main
> reasons:
> > - RDTIME not only be used in linux, but also in firmware and userspace.
> > - It is difficult to be compatible with nested virtualization.
> 
> I think this approach is rather incomplete. Also, I don't see how para-virt 
> time
> scaling will be difficult for nested virtualization.
> 
> If trap-n-emulate TIME CSR for Guest Linux then it will have significant
> performance impact of systems where TIME CSR is implemented in HW.
> 
> Best approach will be to have VDSO-style para-virt time-scale SBI calls 
> (similar
> to what KVM x86 does). If the Guest software (Linux/Bootloader) does not
> enable para-virt time-scaling then we trap-n-emulate TIME CSR (this series).
> 
> Please propose VDSO-style para-virt time-scale SBI call and expand this this
> series to provide both:
> 1. VDSO-style para-virt time-scaling
> 2. Trap-n-emulation of TIME CSR when #1 is disabled
> 
> Regards,
> Anup
> 

OK, it sounds good. We will look into the para-virt time-scaling for more 
details.

Yifei

> >
> > [1] https://lore.kernel.org/patchwork/cover/1288153/
> >
> > Yifei Jiang (3):
> >   RISC-V: KVM: Change the method of calculating cycles to nanoseconds
> >   RISC-V: KVM: Support dynamic time frequency from userspace
> >   RISC-V: KVM: Implement guest time scaling
> >
> >  arch/riscv/include/asm/csr.h|  3 ++
> >  arch/riscv/include/asm/kvm_vcpu_timer.h | 13 +--
> >  arch/riscv/kvm/vcpu_exit.c  | 35 +
> >  arch/riscv/kvm/vcpu_timer.c | 51
> ++---
> >  4 files changed, 93 insertions(+), 9 deletions(-)
> >
> > --
> > 2.19.1
> >
> >
> > --
> > kvm-riscv mailing list
> > kvm-ri...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kvm-riscv


[PATCH v4 7/9] regulator: mt6359: Set the enable time for LDOs

2020-12-15 Thread Hsin-Hsiung Wang
Add the enable time for LDOs.
This patch is preparing for adding mt6359p regulator support.

Signed-off-by: Hsin-Hsiung Wang 
Acked-by: Mark Brown 
---
 drivers/regulator/mt6359-regulator.c | 65 ++--
 1 file changed, 42 insertions(+), 23 deletions(-)

diff --git a/drivers/regulator/mt6359-regulator.c 
b/drivers/regulator/mt6359-regulator.c
index 4ac6380f9875..e46fb95b87e2 100644
--- a/drivers/regulator/mt6359-regulator.c
+++ b/drivers/regulator/mt6359-regulator.c
@@ -103,7 +103,7 @@ struct mt6359_regulator_info {
 
 #define MT6359_LDO(match, _name, _volt_table,  \
_enable_reg, _enable_mask, _status_reg, \
-   _vsel_reg, _vsel_mask)  \
+   _vsel_reg, _vsel_mask, _en_delay)   \
 [MT6359_ID_##_name] = {\
.desc = {   \
.name = #_name, \
@@ -118,6 +118,7 @@ struct mt6359_regulator_info {
.vsel_mask = _vsel_mask,\
.enable_reg = _enable_reg,  \
.enable_mask = BIT(_enable_mask),   \
+   .enable_time = _en_delay,   \
},  \
.status_reg = _status_reg,  \
.qi = BIT(0),   \
@@ -466,15 +467,18 @@ static struct mt6359_regulator_info mt6359_regulators[] = 
{
MT6359_LDO("ldo_vsim1", VSIM1, vsim1_voltages,
   MT6359_RG_LDO_VSIM1_EN_ADDR, MT6359_RG_LDO_VSIM1_EN_SHIFT,
   MT6359_DA_VSIM1_B_EN_ADDR, MT6359_RG_VSIM1_VOSEL_ADDR,
-  MT6359_RG_VSIM1_VOSEL_MASK << MT6359_RG_VSIM1_VOSEL_SHIFT),
+  MT6359_RG_VSIM1_VOSEL_MASK << MT6359_RG_VSIM1_VOSEL_SHIFT,
+  480),
MT6359_LDO("ldo_vibr", VIBR, vibr_voltages,
   MT6359_RG_LDO_VIBR_EN_ADDR, MT6359_RG_LDO_VIBR_EN_SHIFT,
   MT6359_DA_VIBR_B_EN_ADDR, MT6359_RG_VIBR_VOSEL_ADDR,
-  MT6359_RG_VIBR_VOSEL_MASK << MT6359_RG_VIBR_VOSEL_SHIFT),
+  MT6359_RG_VIBR_VOSEL_MASK << MT6359_RG_VIBR_VOSEL_SHIFT,
+  240),
MT6359_LDO("ldo_vrf12", VRF12, vrf12_voltages,
   MT6359_RG_LDO_VRF12_EN_ADDR, MT6359_RG_LDO_VRF12_EN_SHIFT,
   MT6359_DA_VRF12_B_EN_ADDR, MT6359_RG_VRF12_VOSEL_ADDR,
-  MT6359_RG_VRF12_VOSEL_MASK << MT6359_RG_VRF12_VOSEL_SHIFT),
+  MT6359_RG_VRF12_VOSEL_MASK << MT6359_RG_VRF12_VOSEL_SHIFT,
+  120),
MT6359_REG_FIXED("ldo_vusb", VUSB, MT6359_RG_LDO_VUSB_EN_0_ADDR,
 MT6359_DA_VUSB_B_EN_ADDR, 300),
MT6359_LDO_LINEAR("ldo_vsram_proc2", VSRAM_PROC2, 50, 1293750, 6250,
@@ -486,11 +490,13 @@ static struct mt6359_regulator_info mt6359_regulators[] = 
{
MT6359_LDO("ldo_vio18", VIO18, volt18_voltages,
   MT6359_RG_LDO_VIO18_EN_ADDR, MT6359_RG_LDO_VIO18_EN_SHIFT,
   MT6359_DA_VIO18_B_EN_ADDR, MT6359_RG_VIO18_VOSEL_ADDR,
-  MT6359_RG_VIO18_VOSEL_MASK << MT6359_RG_VIO18_VOSEL_SHIFT),
+  MT6359_RG_VIO18_VOSEL_MASK << MT6359_RG_VIO18_VOSEL_SHIFT,
+  960),
MT6359_LDO("ldo_vcamio", VCAMIO, volt18_voltages,
   MT6359_RG_LDO_VCAMIO_EN_ADDR, MT6359_RG_LDO_VCAMIO_EN_SHIFT,
   MT6359_DA_VCAMIO_B_EN_ADDR, MT6359_RG_VCAMIO_VOSEL_ADDR,
-  MT6359_RG_VCAMIO_VOSEL_MASK << MT6359_RG_VCAMIO_VOSEL_SHIFT),
+  MT6359_RG_VCAMIO_VOSEL_MASK << MT6359_RG_VCAMIO_VOSEL_SHIFT,
+  1290),
MT6359_REG_FIXED("ldo_vcn18", VCN18, MT6359_RG_LDO_VCN18_EN_ADDR,
 MT6359_DA_VCN18_B_EN_ADDR, 180),
MT6359_REG_FIXED("ldo_vfe28", VFE28, MT6359_RG_LDO_VFE28_EN_ADDR,
@@ -498,19 +504,20 @@ static struct mt6359_regulator_info mt6359_regulators[] = 
{
MT6359_LDO("ldo_vcn13", VCN13, vcn13_voltages,
   MT6359_RG_LDO_VCN13_EN_ADDR, MT6359_RG_LDO_VCN13_EN_SHIFT,
   MT6359_DA_VCN13_B_EN_ADDR, MT6359_RG_VCN13_VOSEL_ADDR,
-  MT6359_RG_VCN13_VOSEL_MASK << MT6359_RG_VCN13_VOSEL_SHIFT),
+  MT6359_RG_VCN13_VOSEL_MASK << MT6359_RG_VCN13_VOSEL_SHIFT,
+  240),
MT6359_LDO("ldo_vcn33_1_bt", VCN33_1_BT, vcn33_voltages,
   MT6359_RG_LDO_VCN33_1_EN_0_ADDR,
   MT6359_RG_LDO_VCN33_1_EN_0_SHIFT,
   MT6359_DA_VCN33_1_B_EN_ADDR, MT6359_RG_VCN33_1_VOSEL_ADDR,
   MT6359_R

Re: [PATCH RFC 0/3] Implement guest time scaling in RISC-V KVM

2020-12-15 Thread Anup Patel
On Thu, Dec 3, 2020 at 5:51 PM Yifei Jiang  wrote:
>
> This series implements guest time scaling based on RDTIME instruction
> emulation so that we can allow migrating Guest/VM across Hosts with
> different time frequency.
>
> Why not through para-virt. From arm's experience[1], para-virt implementation
> doesn't really solve the problem for the following two main reasons:
> - RDTIME not only be used in linux, but also in firmware and userspace.
> - It is difficult to be compatible with nested virtualization.

I think this approach is rather incomplete. Also, I don't see how para-virt
time scaling will be difficult for nested virtualization.

If trap-n-emulate TIME CSR for Guest Linux then it will have significant
performance impact of systems where TIME CSR is implemented in HW.

Best approach will be to have VDSO-style para-virt time-scale SBI calls
(similar to what KVM x86 does). If the Guest software (Linux/Bootloader)
does not enable para-virt time-scaling then we trap-n-emulate TIME CSR
(this series).

Please propose VDSO-style para-virt time-scale SBI call and expand this
this series to provide both:
1. VDSO-style para-virt time-scaling
2. Trap-n-emulation of TIME CSR when #1 is disabled

Regards,
Anup

>
> [1] https://lore.kernel.org/patchwork/cover/1288153/
>
> Yifei Jiang (3):
>   RISC-V: KVM: Change the method of calculating cycles to nanoseconds
>   RISC-V: KVM: Support dynamic time frequency from userspace
>   RISC-V: KVM: Implement guest time scaling
>
>  arch/riscv/include/asm/csr.h|  3 ++
>  arch/riscv/include/asm/kvm_vcpu_timer.h | 13 +--
>  arch/riscv/kvm/vcpu_exit.c  | 35 +
>  arch/riscv/kvm/vcpu_timer.c | 51 ++---
>  4 files changed, 93 insertions(+), 9 deletions(-)
>
> --
> 2.19.1
>
>
> --
> kvm-riscv mailing list
> kvm-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kvm-riscv


Re: [GIT PULL] time namespace fixes for v5.11

2020-12-14 Thread pr-tracker-bot
The pull request you sent on Mon, 14 Dec 2020 12:57:44 +0100:

> g...@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux 
> tags/time-namespace-v5.11

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6d93a1971a0ded67887eeab8d00a02074490f071

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


[PATCH 5.9 040/105] iwlwifi: pcie: limit memory read spin time

2020-12-14 Thread Greg Kroah-Hartman
From: Johannes Berg 

[ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ]

When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.

As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.

Clearly, it is unreasonable to spin here for such extended periods of
time.

To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.

This will keep (interrupt) latencies on the CPU down to a reasonable
time.

Signed-off-by: Johannes Berg 
Signed-off-by: Mordechay Goodstein 
Signed-off-by: Luca Coelho 
Signed-off-by: Kalle Valo 
Link: 
https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid
Signed-off-by: Sasha Levin 
---
 .../net/wireless/intel/iwlwifi/pcie/trans.c   | 36 ++-
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index e5160d6208688..6393e895f95c6 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -2155,18 +2155,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans 
*trans, u32 addr,
   void *buf, int dwords)
 {
unsigned long flags;
-   int offs, ret = 0;
+   int offs = 0;
u32 *vals = buf;
 
-   if (iwl_trans_grab_nic_access(trans, &flags)) {
-   iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr);
-   for (offs = 0; offs < dwords; offs++)
-   vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT);
-   iwl_trans_release_nic_access(trans, &flags);
-   } else {
-   ret = -EBUSY;
+   while (offs < dwords) {
+   /* limit the time we spin here under lock to 1/2s */
+   ktime_t timeout = ktime_add_us(ktime_get(), 500 * 
USEC_PER_MSEC);
+
+   if (iwl_trans_grab_nic_access(trans, &flags)) {
+   iwl_write32(trans, HBUS_TARG_MEM_RADDR,
+   addr + 4 * offs);
+
+   while (offs < dwords) {
+   vals[offs] = iwl_read32(trans,
+   HBUS_TARG_MEM_RDAT);
+   offs++;
+
+   /* calling ktime_get is expensive so
+* do it once in 128 reads
+*/
+   if (offs % 128 == 0 && ktime_after(ktime_get(),
+  timeout))
+   break;
+   }
+   iwl_trans_release_nic_access(trans, &flags);
+   } else {
+   return -EBUSY;
+   }
}
-   return ret;
+
+   return 0;
 }
 
 static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr,
-- 
2.27.0





[PATCH 5.9 021/105] ibmvnic: reduce wait for completion time

2020-12-14 Thread Greg Kroah-Hartman
From: Dany Madden 

[ Upstream commit 98c41f04a67abf5e7f7191d55d286e905d1430ef ]

Reduce the wait time for Command Response Queue response from 30 seconds
to 20 seconds, as recommended by VIOS and Power Hypervisor teams.

Fixes: bd0b672313941 ("ibmvnic: Move login and queue negotiation into 
ibmvnic_open")
Fixes: 53da09e92910f ("ibmvnic: Add set_link_state routine for setting adapter 
link state")
Signed-off-by: Dany Madden 
Signed-off-by: Jakub Kicinski 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index de25d1860f16f..a1556673300a0 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -846,7 +846,7 @@ static void release_napi(struct ibmvnic_adapter *adapter)
 static int ibmvnic_login(struct net_device *netdev)
 {
struct ibmvnic_adapter *adapter = netdev_priv(netdev);
-   unsigned long timeout = msecs_to_jiffies(3);
+   unsigned long timeout = msecs_to_jiffies(2);
int retry_count = 0;
int retries = 10;
bool retry;
@@ -950,7 +950,7 @@ static void release_resources(struct ibmvnic_adapter 
*adapter)
 static int set_link_state(struct ibmvnic_adapter *adapter, u8 link_state)
 {
struct net_device *netdev = adapter->netdev;
-   unsigned long timeout = msecs_to_jiffies(3);
+   unsigned long timeout = msecs_to_jiffies(2);
union ibmvnic_crq crq;
bool resend;
int rc;
@@ -5081,7 +5081,7 @@ map_failed:
 static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter)
 {
struct device *dev = &adapter->vdev->dev;
-   unsigned long timeout = msecs_to_jiffies(3);
+   unsigned long timeout = msecs_to_jiffies(2);
u64 old_num_rx_queues, old_num_tx_queues;
int rc;
 
-- 
2.27.0





[PATCH 5.4 03/36] iwlwifi: pcie: limit memory read spin time

2020-12-14 Thread Greg Kroah-Hartman
From: Johannes Berg 

[ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ]

When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.

As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.

Clearly, it is unreasonable to spin here for such extended periods of
time.

To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.

This will keep (interrupt) latencies on the CPU down to a reasonable
time.

Signed-off-by: Johannes Berg 
Signed-off-by: Mordechay Goodstein 
Signed-off-by: Luca Coelho 
Signed-off-by: Kalle Valo 
Link: 
https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid
Signed-off-by: Sasha Levin 
---
 .../net/wireless/intel/iwlwifi/pcie/trans.c   | 36 ++-
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index c76d26708e659..ef5a8ecabc60a 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -2178,18 +2178,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans 
*trans, u32 addr,
   void *buf, int dwords)
 {
unsigned long flags;
-   int offs, ret = 0;
+   int offs = 0;
u32 *vals = buf;
 
-   if (iwl_trans_grab_nic_access(trans, &flags)) {
-   iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr);
-   for (offs = 0; offs < dwords; offs++)
-   vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT);
-   iwl_trans_release_nic_access(trans, &flags);
-   } else {
-   ret = -EBUSY;
+   while (offs < dwords) {
+   /* limit the time we spin here under lock to 1/2s */
+   ktime_t timeout = ktime_add_us(ktime_get(), 500 * 
USEC_PER_MSEC);
+
+   if (iwl_trans_grab_nic_access(trans, &flags)) {
+   iwl_write32(trans, HBUS_TARG_MEM_RADDR,
+   addr + 4 * offs);
+
+   while (offs < dwords) {
+   vals[offs] = iwl_read32(trans,
+   HBUS_TARG_MEM_RDAT);
+   offs++;
+
+   /* calling ktime_get is expensive so
+* do it once in 128 reads
+*/
+   if (offs % 128 == 0 && ktime_after(ktime_get(),
+  timeout))
+   break;
+   }
+   iwl_trans_release_nic_access(trans, &flags);
+   } else {
+   return -EBUSY;
+   }
}
-   return ret;
+
+   return 0;
 }
 
 static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr,
-- 
2.27.0





[GIT PULL] time namespace fixes for v5.11

2020-12-14 Thread Christian Brauner
Hi Linus,

Here are some time namespace fixes for v5.11.

/* Summary */
When time namespaces were introduced we missed to virtualize the "btime" field
in /proc/stat. This confuses tasks which are in another time namespace with a
virtualized boottime which is common in some container workloads. This pr
contains Michael's series to fix "btime" which Thomas asked me to take through
my tree.
To fix "btime" virtualization we simply subtract the offset of the time
namespace's boottime from btime before printing the stats. Note that since
start_boottime of processes are seconds since boottime and the boottime stamp
is now shifted according to the time namespace's offset, the offset of the time
namespace also needs to be applied before the process stats are given to
userspace. This avoids that processes shown by tools such as "ps" appear as
time travelers in the corresponding time namespace. Selftests are included to
verify that btime virtualization in /proc/stat works as expected.

The following changes since commit 3cea11cd5e3b00d91caf0b4730194039b45c5891:

  Linux 5.10-rc2 (2020-11-01 14:43:51 -0800)

are available in the Git repository at:

  g...@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux 
tags/time-namespace-v5.11

for you to fetch changes up to 5c62634fc65101d350cbd47722fb76f02693059d:

  namespace: make timens_on_fork() return nothing (2020-11-18 11:06:47 +0100)

/* Testing */
All patches are based on v5.10-rc2. All old and new tests are passing.
Please note that I missed to merge these fixes into my for-next branch and so
linux-next has not contained the commits in this pr.
I'm still sending this pr because these are fairly trivial bugfixes and have
seen vetting from multiple people. I have also now merged this tag into my
for-next branch so these commits will show up in linux-next soon. If you feel
more comfortable with this sitting in linux-next for a while please just ignore
this pr and I'll resend after rc1 has been released.

/* Conflicts */
At the time of creating this PR no merge conflicts were reported from
linux-next and no merge 2c85ebc57b3e ("Linux 5.10").

Please consider pulling these changes from the signed time-namespace-v5.11 tag.

Thanks!
Christian


time-namespace-v5.11


Hui Su (1):
  namespace: make timens_on_fork() return nothing

Michael Weiß (3):
  timens: additional helper functions for boottime offset handling
  fs/proc: apply the time namespace offset to /proc/stat btime
  selftests/timens: added selftest for /proc/stat btime

 fs/proc/array.c |  6 ++--
 fs/proc/stat.c  |  3 ++
 include/linux/time_namespace.h  | 28 ++--
 kernel/nsproxy.c|  7 +---
 kernel/time/namespace.c |  6 ++--
 tools/testing/selftests/timens/procfs.c | 58 -
 6 files changed, 92 insertions(+), 16 deletions(-)


[tip: core/rcu] torture: Accept time units on kvm.sh --duration argument

2020-12-13 Thread tip-bot2 for Paul E. McKenney
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 7de1ca35269ee20e40c35666c810cbaea528c719
Gitweb:
https://git.kernel.org/tip/7de1ca35269ee20e40c35666c810cbaea528c719
Author:Paul E. McKenney 
AuthorDate:Tue, 22 Sep 2020 17:20:11 -07:00
Committer: Paul E. McKenney 
CommitterDate: Fri, 06 Nov 2020 17:13:55 -08:00

torture: Accept time units on kvm.sh --duration argument

The "--duration " has worked well for a very long time, but
it can be inconvenient to compute the minutes for (say) a 28-hour run.
It can also be annoying to have to let a simple boot test run for a full
minute.  This commit therefore permits an "s" suffix to specify seconds,
"m" to specify minutes (which remains the default), "h" suffix to specify
hours, and "d" to specify days.

With this change, "--duration 5" still specifies that each scenario
run for five minutes, but "--duration 30s" runs for only 30 seconds,
"--duration 8h" runs for eight hours, and "--duration 2d" runs for
two days.

Signed-off-by: Paul E. McKenney 
---
 tools/testing/selftests/rcutorture/bin/kvm.sh | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh 
b/tools/testing/selftests/rcutorture/bin/kvm.sh
index 5ad3882..c348d96 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm.sh
@@ -58,7 +58,7 @@ usage () {
echo "   --datestamp string"
echo "   --defconfig string"
echo "   --dryrun sched|script"
-   echo "   --duration minutes"
+   echo "   --duration minutes | s | h | d"
echo "   --gdb"
echo "   --help"
echo "   --interactive"
@@ -128,8 +128,20 @@ do
shift
;;
--duration)
-   checkarg --duration "(minutes)" $# "$2" '^[0-9]*$' '^error'
-   dur=$(($2*60))
+   checkarg --duration "(minutes)" $# "$2" 
'^[0-9][0-9]*\(s\|m\|h\|d\|\)$' '^error'
+   mult=60
+   if echo "$2" | grep -q 's$'
+   then
+   mult=1
+   elif echo "$2" | grep -q 'h$'
+   then
+   mult=3600
+   elif echo "$2" | grep -q 'd$'
+   then
+   mult=86400
+   fi
+   ts=`echo $2 | sed -e 's/[smhd]$//'`
+   dur=$(($ts*mult))
shift
;;
--gdb)


[tip: core/rcu] locktorture: Track time of last ->writeunlock()

2020-12-13 Thread tip-bot2 for Paul E. McKenney
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 3480d6774f07341e3e1cf3114f58bef98ea58ae0
Gitweb:
https://git.kernel.org/tip/3480d6774f07341e3e1cf3114f58bef98ea58ae0
Author:Paul E. McKenney 
AuthorDate:Sun, 30 Aug 2020 21:48:23 -07:00
Committer: Paul E. McKenney 
CommitterDate: Fri, 06 Nov 2020 17:13:29 -08:00

locktorture: Track time of last ->writeunlock()

This commit adds a last_lock_release variable that tracks the time of
the last ->writeunlock() call, which allows easier diagnosing of lock
hangs when using a kernel debugger.

Acked-by: Davidlohr Bueso 
Signed-off-by: Paul E. McKenney 
---
 kernel/locking/locktorture.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 62d215b..316531d 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -60,6 +60,7 @@ static struct task_struct **reader_tasks;
 
 static bool lock_is_write_held;
 static bool lock_is_read_held;
+static unsigned long last_lock_release;
 
 struct lock_stress_stats {
long n_lock_fail;
@@ -632,6 +633,7 @@ static int lock_torture_writer(void *arg)
lwsp->n_lock_acquired++;
cxt.cur_ops->write_delay(&rand);
lock_is_write_held = false;
+   WRITE_ONCE(last_lock_release, jiffies);
cxt.cur_ops->writeunlock();
 
stutter_wait("lock_torture_writer");


Re: [PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property

2020-12-10 Thread Rob Herring
On Thu, Dec 10, 2020 at 11:13:50AM +0200, Cristian Ciocaltea wrote:
> Hi Rob,
> 
> On Wed, Dec 09, 2020 at 09:37:08PM -0600, Rob Herring wrote:
> > On Sun, Dec 06, 2020 at 03:27:01AM +0200, Cristian Ciocaltea wrote:
> > > Add a new common property 'reset-time-sec' to be used in conjunction
> > > with the devices supporting the key pressed reset feature.
> > > 
> > > Signed-off-by: Cristian Ciocaltea 
> > > ---
> > > Changes in v3:
> > >  - This patch was not present in v2
> > > 
> > >  Documentation/devicetree/bindings/input/input.yaml | 7 +++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/Documentation/devicetree/bindings/input/input.yaml 
> > > b/Documentation/devicetree/bindings/input/input.yaml
> > > index ab407f266bef..caba93209ae7 100644
> > > --- a/Documentation/devicetree/bindings/input/input.yaml
> > > +++ b/Documentation/devicetree/bindings/input/input.yaml
> > > @@ -34,4 +34,11 @@ properties:
> > >specify this property.
> > >  $ref: /schemas/types.yaml#/definitions/uint32
> > >  
> > > +  reset-time-sec:
> > 
> > Humm, I'm pretty sure we already have something for this. Or maybe just 
> > power off.
> 
> We only have 'power-off-time-sec', so I added 'reset-time-sec' according
> to your review in v2:
> https://lore.kernel.org/lkml/20200908214724.GA959481@bogus/

I'm doing good if I remember reviews from a week ago. From 3 months ago, 
no chance without some reminder.

Reviewed-by: Rob Herring 

Rob


[PATCH 2/2] platform: cros_ec: Call interrupt bottom half at probe time

2020-12-10 Thread Gwendal Grignou
While the AP was powered off, the EC may have send messages.
If the message is not serviced within 3s, the EC stops sending message.
Unlock the EC by purging stale messages at probe time.

Signed-off-by: Gwendal Grignou 
---
 drivers/platform/chrome/cros_ec.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/platform/chrome/cros_ec.c 
b/drivers/platform/chrome/cros_ec.c
index 4ac33491d0d18..a45d6a6640d50 100644
--- a/drivers/platform/chrome/cros_ec.c
+++ b/drivers/platform/chrome/cros_ec.c
@@ -252,6 +252,13 @@ int cros_ec_register(struct cros_ec_device *ec_dev)
 
dev_info(dev, "Chrome EC device registered\n");
 
+   /*
+* Unlock EC that may be waiting for AP to process MKBP events.
+* If the AP takes to long to answer, the EC would stop sending events.
+*/
+   if (ec_dev->mkbp_event_supported)
+   cros_ec_irq_thread(0, ec_dev);
+
return 0;
 }
 EXPORT_SYMBOL(cros_ec_register);
-- 
2.29.2.576.ga3fc446d84-goog



Re: [PATCH v2 1/2] Add save/restore of Precision Time Measurement capability

2020-12-10 Thread Bjorn Helgaas
On Mon, Dec 07, 2020 at 02:39:50PM -0800, David E. Box wrote:
> The PCI subsystem does not currently save and restore the configuration
> space for the Precision Time Measurement (PTM) PCIe extended capability
> leading to the possibility of the feature returning disabled on S3 resume.
> This has been observed on Intel Coffee Lake desktops. Add save/restore of
> the PTM control register. This saves the PTM Enable, Root Select, and
> Effective Granularity bits.
> 
> Suggested-by: Rafael J. Wysocki 
> Signed-off-by: David E. Box 

Applied both to pci/ptm for v5.11, thanks!

> ---
> 
> Changes from V1:
>   - Move save/restore functions to ptm.c
>   - Move pci_add_ext_cap_sve_buffer() to pci_ptm_init in ptm.c
>   
>  drivers/pci/pci.c  |  2 ++
>  drivers/pci/pci.h  |  8 
>  drivers/pci/pcie/ptm.c | 43 ++
>  3 files changed, 53 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e578d34095e9..12ba6351c05b 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1566,6 +1566,7 @@ int pci_save_state(struct pci_dev *dev)
>   pci_save_ltr_state(dev);
>   pci_save_dpc_state(dev);
>   pci_save_aer_state(dev);
> + pci_save_ptm_state(dev);
>   return pci_save_vc_state(dev);
>  }
>  EXPORT_SYMBOL(pci_save_state);
> @@ -1677,6 +1678,7 @@ void pci_restore_state(struct pci_dev *dev)
>   pci_restore_vc_state(dev);
>   pci_restore_rebar_state(dev);
>   pci_restore_dpc_state(dev);
> + pci_restore_ptm_state(dev);
>  
>   pci_aer_clear_status(dev);
>   pci_restore_aer_state(dev);
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index f86cae9aa1f4..62cdacba5954 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -516,6 +516,14 @@ static inline int pci_iov_bus_range(struct pci_bus *bus)
>  
>  #endif /* CONFIG_PCI_IOV */
>  
> +#ifdef CONFIG_PCIE_PTM
> +void pci_save_ptm_state(struct pci_dev *dev);
> +void pci_restore_ptm_state(struct pci_dev *dev);
> +#else
> +static inline void pci_save_ptm_state(struct pci_dev *dev) {}
> +static inline void pci_restore_ptm_state(struct pci_dev *dev) {}
> +#endif
> +
>  unsigned long pci_cardbus_resource_alignment(struct resource *);
>  
>  static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
> diff --git a/drivers/pci/pcie/ptm.c b/drivers/pci/pcie/ptm.c
> index 357a454cafa0..6b24a1c9327a 100644
> --- a/drivers/pci/pcie/ptm.c
> +++ b/drivers/pci/pcie/ptm.c
> @@ -29,6 +29,47 @@ static void pci_ptm_info(struct pci_dev *dev)
>dev->ptm_root ? " (root)" : "", clock_desc);
>  }
>  
> +void pci_save_ptm_state(struct pci_dev *dev)
> +{
> + int ptm;
> + struct pci_cap_saved_state *save_state;
> + u16 *cap;
> +
> + if (!pci_is_pcie(dev))
> + return;
> +
> + ptm = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PTM);
> + if (!ptm)
> + return;
> +
> + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_PTM);
> + if (!save_state) {
> + pci_err(dev, "no suspend buffer for PTM\n");
> + return;
> + }
> +
> + cap = (u16 *)&save_state->cap.data[0];
> + pci_read_config_word(dev, ptm + PCI_PTM_CTRL, cap);
> +}
> +
> +void pci_restore_ptm_state(struct pci_dev *dev)
> +{
> + struct pci_cap_saved_state *save_state;
> + int ptm;
> + u16 *cap;
> +
> + if (!pci_is_pcie(dev))
> + return;
> +
> + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_PTM);
> + ptm = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PTM);
> + if (!save_state || !ptm)
> + return;
> +
> + cap = (u16 *)&save_state->cap.data[0];
> + pci_write_config_word(dev, ptm + PCI_PTM_CTRL, *cap);
> +}
> +
>  void pci_ptm_init(struct pci_dev *dev)
>  {
>   int pos;
> @@ -65,6 +106,8 @@ void pci_ptm_init(struct pci_dev *dev)
>   if (!pos)
>   return;
>  
> + pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_PTM, sizeof(u16));
> +
>   pci_read_config_dword(dev, pos + PCI_PTM_CAP, &cap);
>   local_clock = (cap & PCI_PTM_GRANULARITY_MASK) >> 8;
>  
> -- 
> 2.20.1
> 


Re: [PATCH rdma-next 0/3] Various fixes collected over time

2020-12-10 Thread Jason Gunthorpe
On Tue, Dec 08, 2020 at 09:35:42AM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Hi,
> 
> This is set of various and unrelated fixes that we collected over time.
> 
> Thanks
> 
> Avihai Horon (1):
>   RDMA/uverbs: Fix incorrect variable type
> 
> Jack Morgenstein (2):
>   RDMA/core: Clean up cq pool mechanism
>   RDMA/core: Do not indicate device ready when device enablement fails
> 
>  drivers/infiniband/core/core_priv.h  |  3 +--
>  drivers/infiniband/core/cq.c | 12 ++--
>  drivers/infiniband/core/device.c | 16 ++--
>  .../infiniband/core/uverbs_std_types_device.c| 14 +-
>  include/rdma/uverbs_ioctl.h  | 10 ++
>  5 files changed, 28 insertions(+), 27 deletions(-)

Applied to for-next, thanks

Jason


[PATCH v2 3/7] watchdog/softlockup: Report the overall time of softlockups

2020-12-10 Thread Petr Mladek
The softlockup detector currently shows the time spent since the last
report. As a result it is not clear whether a CPU is infinitely hogged
by a single task or if it is a repeated event.

The situation can be simulated with a simply busy loop:

while (true)
  cpu_relax();

The softlockup detector produces:

[  168.277520] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865]
[  196.277604] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865]
[  236.277522] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [cat:4865]

But it should be, something like:

[  480.372418] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [cat:4943]
[  508.372359] watchdog: BUG: soft lockup - CPU#2 stuck for 52s! [cat:4943]
[  548.372359] watchdog: BUG: soft lockup - CPU#2 stuck for 89s! [cat:4943]
[  576.372351] watchdog: BUG: soft lockup - CPU#2 stuck for 115s! [cat:4943]

For the better output, add an additional timestamp of the last report.
Only this timestamp is reset when the watchdog is intentionally
touched from slow code paths or when printing the report.

Signed-off-by: Petr Mladek 
---
 kernel/watchdog.c | 40 
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7776d53a015c..6259590d6474 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -154,7 +154,11 @@ static void lockup_detector_update_enable(void)
 
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
 
-#define SOFTLOCKUP_RESET   ULONG_MAX
+/*
+ * Delay the soflockup report when running a known slow code.
+ * It does _not_ affect the timestamp of the last successdul reschedule.
+ */
+#define SOFTLOCKUP_DELAY_REPORTULONG_MAX
 
 #ifdef CONFIG_SMP
 int __read_mostly sysctl_softlockup_all_cpu_backtrace;
@@ -169,7 +173,10 @@ unsigned int __read_mostly softlockup_panic =
 static bool softlockup_initialized __read_mostly;
 static u64 __read_mostly sample_period;
 
+/* Timestamp taken after the last successful reschedule. */
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
+/* Timestamp of the last softlockup report. */
+static DEFINE_PER_CPU(unsigned long, watchdog_report_ts);
 static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
 static DEFINE_PER_CPU(bool, softlockup_touch_sync);
 static DEFINE_PER_CPU(bool, soft_watchdog_warn);
@@ -235,10 +242,16 @@ static void set_sample_period(void)
watchdog_update_hrtimer_threshold(sample_period);
 }
 
+static void update_report_ts(void)
+{
+   __this_cpu_write(watchdog_report_ts, get_timestamp());
+}
+
 /* Commands for resetting the watchdog */
 static void update_touch_ts(void)
 {
__this_cpu_write(watchdog_touch_ts, get_timestamp());
+   update_report_ts();
 }
 
 /**
@@ -252,10 +265,10 @@ static void update_touch_ts(void)
 notrace void touch_softlockup_watchdog_sched(void)
 {
/*
-* Preemption can be enabled.  It doesn't matter which CPU's timestamp
-* gets zeroed here, so use the raw_ operation.
+* Preemption can be enabled.  It doesn't matter which CPU's watchdog
+* report period gets restarted here, so use the raw_ operation.
 */
-   raw_cpu_write(watchdog_touch_ts, SOFTLOCKUP_RESET);
+   raw_cpu_write(watchdog_report_ts, SOFTLOCKUP_DELAY_REPORT);
 }
 
 notrace void touch_softlockup_watchdog(void)
@@ -279,23 +292,23 @@ void touch_all_softlockup_watchdogs(void)
 * the softlockup check.
 */
for_each_cpu(cpu, &watchdog_allowed_mask)
-   per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET;
+   per_cpu(watchdog_report_ts, cpu) = SOFTLOCKUP_DELAY_REPORT;
wq_watchdog_touch(-1);
 }
 
 void touch_softlockup_watchdog_sync(void)
 {
__this_cpu_write(softlockup_touch_sync, true);
-   __this_cpu_write(watchdog_touch_ts, SOFTLOCKUP_RESET);
+   __this_cpu_write(watchdog_report_ts, SOFTLOCKUP_DELAY_REPORT);
 }
 
-static int is_softlockup(unsigned long touch_ts)
+static int is_softlockup(unsigned long touch_ts, unsigned long period_ts)
 {
unsigned long now = get_timestamp();
 
if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
/* Warn about unreasonable delays. */
-   if (time_after(now, touch_ts + get_softlockup_thresh()))
+   if (time_after(now, period_ts + get_softlockup_thresh()))
return now - touch_ts;
}
return 0;
@@ -341,6 +354,7 @@ static int softlockup_fn(void *data)
 static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 {
unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);
+   unsigned long period_ts = __this_cpu_read(watchdog_report_ts);
struct pt_regs *regs = get_irq_regs();
int duration;
int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
@@ -362,7 +376,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct 

[PATCH v2 0/7] watchdog/softlockup: Report overall time and some cleanup

2020-12-10 Thread Petr Mladek
I dug deep into the softlockup watchdog history when time permitted
this year. And reworked the patchset that fixed timestamps and
cleaned up the code[1].

I split it into very small steps and did even more code clean up.
The result looks quite strightforward and I am pretty confident
with the changes.

[*] v1: https://lore.kernel.org/r/20191024114928.15377-1-pmla...@suse.com

Petr Mladek (7):
  watchdog: Rename __touch_watchdog() to a better descriptive name
  watchdog: Explicitly update timestamp when reporting softlockup
  watchdog/softlockup: Report the overall time of softlockups
  watchdog/softlockup: Remove logic that tried to prevent repeated
reports
  watchdog: Fix barriers when printing backtraces from all CPUs
  watchdog: Cleanup handling of false positives
  Test softlockup

 fs/proc/consoles.c |  5 +++
 fs/proc/version.c  |  7 
 kernel/watchdog.c  | 97 ++
 3 files changed, 66 insertions(+), 43 deletions(-)

-- 
2.26.2



Re: [PATCH] powerpc/time: Remove ifdef in get_vtb()

2020-12-10 Thread Michael Ellerman
On Thu, 1 Oct 2020 10:59:20 + (UTC), Christophe Leroy wrote:
> SPRN_VTB and CPU_FTR_ARCH_207S are always defined,
> no need of an ifdef.

Applied to powerpc/next.

[1/1] powerpc/time: Remove ifdef in get_vtb()
  https://git.kernel.org/powerpc/c/c3cb5dbd85dbd9ae51fadf867782dc34806f04d8

cheers


Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time

2020-12-10 Thread Greg Kroah-Hartman
On Wed, Dec 09, 2020 at 12:24:32PM -0800, Saravana Kannan wrote:
> On Wed, Dec 9, 2020 at 10:15 AM Greg Kroah-Hartman
>  wrote:
> >
> > On Fri, Nov 20, 2020 at 06:02:15PM -0800, Saravana Kannan wrote:
> > > The current implementation of fw_devlink is very inefficient because it
> > > tries to get away without creating fwnode links in the name of saving
> > > memory usage. Past attempts to optimize runtime at the cost of memory
> > > usage were blocked with request for data showing that the optimization
> > > made significant improvement for real world scenarios.
> > >
> > > We have those scenarios now. There have been several reports of boot
> > > time increase in the order of seconds in this thread [1]. Several OEMs
> > > and SoC manufacturers have also privately reported significant
> > > (350-400ms) increase in boot time due to all the parsing done by
> > > fw_devlink.
> > >
> > > So this patch series refactors fw_devlink to be more efficient. The key
> > > difference now is the addition of support for fwnode links -- just a few
> > > simple APIs. This also allows most of the code to be moved out of
> > > firmware specific (DT mostly) code into driver core.
> > >
> > > This brings the following benefits:
> > > - Instead of parsing the device tree multiple times (complexity was
> > >   close to O(N^3) where N in the number of properties) during bootup,
> > >   fw_devlink parses each fwnode node/property only once and creates
> > >   fwnode links. The rest of the fw_devlink code then just looks at these
> > >   fwnode links to do rest of the work.
> > >
> > > - Makes it much easier to debug probe issue due to fw_devlink in the
> > >   future. fw_devlink=on blocks the probing of devices if they depend on
> > >   a device that hasn't been added yet. With this refactor, it'll be very
> > >   easy to tell what that device is because we now have a reference to
> > >   the fwnode of the device.
> > >
> > > - Much easier to add fw_devlink support to ACPI and other firmware
> > >   types. A refactor to move the common bits from DT specific code to
> > >   driver core was in my TODO list as a prerequisite to adding ACPI
> > >   support to fw_devlink. This series gets that done.
> > >
> > > Laurent and Grygorii tested the v1 series and they saw boot time
> > > improvment of about 12 seconds and 3 seconds, respectively.
> >
> > Now queued up to my tree.  Note, I had to hand-apply patches 13 and 16
> > due to some reason (for 13, I have no idea, for 16 it was due to a
> > previous patch applied to my tree that I cc:ed you on.)
> >
> > Verifying I got it all correct would be great :)
> 
> A quick diff of drivers/base/core.c between driver-core-testing and my
> local tree doesn't show any major diff (only some unrelated comment
> fixes). So, it looks fine.
> 
> The patch 13 conflict is probably due to having to rebase the v2
> series on top of this:
> https://lore.kernel.org/lkml/20201104205431.3795207-1-sarava...@google.com/
> 
> And looks like Patch 16 was handled fine.

Great, thanks for verifying!

greg k-h


Re: [PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property

2020-12-10 Thread Cristian Ciocaltea
Hi Mani,

On Thu, Dec 10, 2020 at 09:36:44AM +0530, Manivannan Sadhasivam wrote:
> On Sun, Dec 06, 2020 at 03:27:01AM +0200, Cristian Ciocaltea wrote:
> > Add a new common property 'reset-time-sec' to be used in conjunction
> > with the devices supporting the key pressed reset feature.
> > 
> > Signed-off-by: Cristian Ciocaltea 
> > ---
> > Changes in v3:
> >  - This patch was not present in v2
> > 
> >  Documentation/devicetree/bindings/input/input.yaml | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/input/input.yaml 
> > b/Documentation/devicetree/bindings/input/input.yaml
> > index ab407f266bef..caba93209ae7 100644
> > --- a/Documentation/devicetree/bindings/input/input.yaml
> > +++ b/Documentation/devicetree/bindings/input/input.yaml
> > @@ -34,4 +34,11 @@ properties:
> >specify this property.
> >  $ref: /schemas/types.yaml#/definitions/uint32
> >  
> > +  reset-time-sec:
> > +description:
> > +  Duration in seconds which the key should be kept pressed for device 
> > to
> > +  reset automatically. Device with key pressed reset feature can 
> > specify
> > +  this property.
> > +$ref: /schemas/types.yaml#/definitions/uint32
> > +
> 
> Why can't you just use "power-off-time-sec"?

I think the common behavior of keeping the power button pressed is to
trigger a power off rather than a reset. Hence, per Rob's suggestion in
the previous revision of this patch series, I added the reset variant:
https://lore.kernel.org/lkml/20200908214724.GA959481@bogus/

Thanks,
Cristi

> Thanks,
> Mani
> 
> >  additionalProperties: true
> > -- 
> > 2.29.2
> > 


Re: [PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property

2020-12-10 Thread Cristian Ciocaltea
Hi Rob,

On Wed, Dec 09, 2020 at 09:37:08PM -0600, Rob Herring wrote:
> On Sun, Dec 06, 2020 at 03:27:01AM +0200, Cristian Ciocaltea wrote:
> > Add a new common property 'reset-time-sec' to be used in conjunction
> > with the devices supporting the key pressed reset feature.
> > 
> > Signed-off-by: Cristian Ciocaltea 
> > ---
> > Changes in v3:
> >  - This patch was not present in v2
> > 
> >  Documentation/devicetree/bindings/input/input.yaml | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/input/input.yaml 
> > b/Documentation/devicetree/bindings/input/input.yaml
> > index ab407f266bef..caba93209ae7 100644
> > --- a/Documentation/devicetree/bindings/input/input.yaml
> > +++ b/Documentation/devicetree/bindings/input/input.yaml
> > @@ -34,4 +34,11 @@ properties:
> >specify this property.
> >  $ref: /schemas/types.yaml#/definitions/uint32
> >  
> > +  reset-time-sec:
> 
> Humm, I'm pretty sure we already have something for this. Or maybe just 
> power off.

We only have 'power-off-time-sec', so I added 'reset-time-sec' according
to your review in v2:
https://lore.kernel.org/lkml/20200908214724.GA959481@bogus/

Thanks,
Cristi

> > +description:
> > +  Duration in seconds which the key should be kept pressed for device 
> > to
> > +  reset automatically. Device with key pressed reset feature can 
> > specify
> > +  this property.
> > +$ref: /schemas/types.yaml#/definitions/uint32
> > +
> >  additionalProperties: true
> > -- 
> > 2.29.2
> > 


Re: [PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property

2020-12-09 Thread Manivannan Sadhasivam
On Sun, Dec 06, 2020 at 03:27:01AM +0200, Cristian Ciocaltea wrote:
> Add a new common property 'reset-time-sec' to be used in conjunction
> with the devices supporting the key pressed reset feature.
> 
> Signed-off-by: Cristian Ciocaltea 
> ---
> Changes in v3:
>  - This patch was not present in v2
> 
>  Documentation/devicetree/bindings/input/input.yaml | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/input/input.yaml 
> b/Documentation/devicetree/bindings/input/input.yaml
> index ab407f266bef..caba93209ae7 100644
> --- a/Documentation/devicetree/bindings/input/input.yaml
> +++ b/Documentation/devicetree/bindings/input/input.yaml
> @@ -34,4 +34,11 @@ properties:
>specify this property.
>  $ref: /schemas/types.yaml#/definitions/uint32
>  
> +  reset-time-sec:
> +description:
> +  Duration in seconds which the key should be kept pressed for device to
> +  reset automatically. Device with key pressed reset feature can specify
> +  this property.
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +

Why can't you just use "power-off-time-sec"?

Thanks,
Mani

>  additionalProperties: true
> -- 
> 2.29.2
> 


Re: [PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property

2020-12-09 Thread Rob Herring
On Sun, Dec 06, 2020 at 03:27:01AM +0200, Cristian Ciocaltea wrote:
> Add a new common property 'reset-time-sec' to be used in conjunction
> with the devices supporting the key pressed reset feature.
> 
> Signed-off-by: Cristian Ciocaltea 
> ---
> Changes in v3:
>  - This patch was not present in v2
> 
>  Documentation/devicetree/bindings/input/input.yaml | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/input/input.yaml 
> b/Documentation/devicetree/bindings/input/input.yaml
> index ab407f266bef..caba93209ae7 100644
> --- a/Documentation/devicetree/bindings/input/input.yaml
> +++ b/Documentation/devicetree/bindings/input/input.yaml
> @@ -34,4 +34,11 @@ properties:
>specify this property.
>  $ref: /schemas/types.yaml#/definitions/uint32
>  
> +  reset-time-sec:

Humm, I'm pretty sure we already have something for this. Or maybe just 
power off.

> +description:
> +  Duration in seconds which the key should be kept pressed for device to
> +  reset automatically. Device with key pressed reset feature can specify
> +  this property.
> +$ref: /schemas/types.yaml#/definitions/uint32
> +
>  additionalProperties: true
> -- 
> 2.29.2
> 


Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time

2020-12-09 Thread Saravana Kannan
On Wed, Dec 9, 2020 at 10:15 AM Greg Kroah-Hartman
 wrote:
>
> On Fri, Nov 20, 2020 at 06:02:15PM -0800, Saravana Kannan wrote:
> > The current implementation of fw_devlink is very inefficient because it
> > tries to get away without creating fwnode links in the name of saving
> > memory usage. Past attempts to optimize runtime at the cost of memory
> > usage were blocked with request for data showing that the optimization
> > made significant improvement for real world scenarios.
> >
> > We have those scenarios now. There have been several reports of boot
> > time increase in the order of seconds in this thread [1]. Several OEMs
> > and SoC manufacturers have also privately reported significant
> > (350-400ms) increase in boot time due to all the parsing done by
> > fw_devlink.
> >
> > So this patch series refactors fw_devlink to be more efficient. The key
> > difference now is the addition of support for fwnode links -- just a few
> > simple APIs. This also allows most of the code to be moved out of
> > firmware specific (DT mostly) code into driver core.
> >
> > This brings the following benefits:
> > - Instead of parsing the device tree multiple times (complexity was
> >   close to O(N^3) where N in the number of properties) during bootup,
> >   fw_devlink parses each fwnode node/property only once and creates
> >   fwnode links. The rest of the fw_devlink code then just looks at these
> >   fwnode links to do rest of the work.
> >
> > - Makes it much easier to debug probe issue due to fw_devlink in the
> >   future. fw_devlink=on blocks the probing of devices if they depend on
> >   a device that hasn't been added yet. With this refactor, it'll be very
> >   easy to tell what that device is because we now have a reference to
> >   the fwnode of the device.
> >
> > - Much easier to add fw_devlink support to ACPI and other firmware
> >   types. A refactor to move the common bits from DT specific code to
> >   driver core was in my TODO list as a prerequisite to adding ACPI
> >   support to fw_devlink. This series gets that done.
> >
> > Laurent and Grygorii tested the v1 series and they saw boot time
> > improvment of about 12 seconds and 3 seconds, respectively.
>
> Now queued up to my tree.  Note, I had to hand-apply patches 13 and 16
> due to some reason (for 13, I have no idea, for 16 it was due to a
> previous patch applied to my tree that I cc:ed you on.)
>
> Verifying I got it all correct would be great :)

A quick diff of drivers/base/core.c between driver-core-testing and my
local tree doesn't show any major diff (only some unrelated comment
fixes). So, it looks fine.

The patch 13 conflict is probably due to having to rebase the v2
series on top of this:
https://lore.kernel.org/lkml/20201104205431.3795207-1-sarava...@google.com/

And looks like Patch 16 was handled fine.

Thanks for applying the series.

-Saravana


Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time

2020-12-09 Thread Greg Kroah-Hartman
On Fri, Nov 20, 2020 at 06:02:15PM -0800, Saravana Kannan wrote:
> The current implementation of fw_devlink is very inefficient because it
> tries to get away without creating fwnode links in the name of saving
> memory usage. Past attempts to optimize runtime at the cost of memory
> usage were blocked with request for data showing that the optimization
> made significant improvement for real world scenarios.
> 
> We have those scenarios now. There have been several reports of boot
> time increase in the order of seconds in this thread [1]. Several OEMs
> and SoC manufacturers have also privately reported significant
> (350-400ms) increase in boot time due to all the parsing done by
> fw_devlink.
> 
> So this patch series refactors fw_devlink to be more efficient. The key
> difference now is the addition of support for fwnode links -- just a few
> simple APIs. This also allows most of the code to be moved out of
> firmware specific (DT mostly) code into driver core.
> 
> This brings the following benefits:
> - Instead of parsing the device tree multiple times (complexity was
>   close to O(N^3) where N in the number of properties) during bootup,
>   fw_devlink parses each fwnode node/property only once and creates
>   fwnode links. The rest of the fw_devlink code then just looks at these
>   fwnode links to do rest of the work.
> 
> - Makes it much easier to debug probe issue due to fw_devlink in the
>   future. fw_devlink=on blocks the probing of devices if they depend on
>   a device that hasn't been added yet. With this refactor, it'll be very
>   easy to tell what that device is because we now have a reference to
>   the fwnode of the device.
> 
> - Much easier to add fw_devlink support to ACPI and other firmware
>   types. A refactor to move the common bits from DT specific code to
>   driver core was in my TODO list as a prerequisite to adding ACPI
>   support to fw_devlink. This series gets that done.
> 
> Laurent and Grygorii tested the v1 series and they saw boot time
> improvment of about 12 seconds and 3 seconds, respectively.

Now queued up to my tree.  Note, I had to hand-apply patches 13 and 16
due to some reason (for 13, I have no idea, for 16 it was due to a
previous patch applied to my tree that I cc:ed you on.)

Verifying I got it all correct would be great :)

thanks,

greg k-h


[PATCH v16 4/9] time: Add mechanism to recognize clocksource in time_get_snapshot

2020-12-08 Thread Jianyong Wu
From: Thomas Gleixner 

System time snapshots are not conveying information about the current
clocksource which was used, but callers like the PTP KVM guest
implementation have the requirement to evaluate the clocksource type to
select the appropriate mechanism.

Introduce a clocksource id field in struct clocksource which is by default
set to CSID_GENERIC (0). Clocksource implementations can set that field to
a value which allows to identify the clocksource.

Store the clocksource id of the current clocksource in the
system_time_snapshot so callers can evaluate which clocksource was used to
take the snapshot and act accordingly.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Jianyong Wu 
---
 include/linux/clocksource.h |  6 ++
 include/linux/clocksource_ids.h | 11 +++
 include/linux/timekeeping.h | 12 +++-
 kernel/time/clocksource.c   |  2 ++
 kernel/time/timekeeping.c   |  1 +
 5 files changed, 27 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/clocksource_ids.h

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143db6523..1290d0dce840 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -62,6 +63,10 @@ struct module;
  * 400-499: Perfect
  * The ideal clocksource. A must-use where
  * available.
+ * @id:Defaults to CSID_GENERIC. The id value is 
captured
+ * in certain snapshot functions to allow callers to
+ * validate the clocksource from which the snapshot was
+ * taken.
  * @flags: Flags describing special properties
  * @enable:Optional function to enable the clocksource
  * @disable:   Optional function to disable the clocksource
@@ -100,6 +105,7 @@ struct clocksource {
const char  *name;
struct list_headlist;
int rating;
+   enum clocksource_idsid;
enum vdso_clock_modevdso_clock_mode;
unsigned long   flags;
 
diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h
new file mode 100644
index ..4d8e19e05328
--- /dev/null
+++ b/include/linux/clocksource_ids.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CLOCKSOURCE_IDS_H
+#define _LINUX_CLOCKSOURCE_IDS_H
+
+/* Enum to give clocksources a unique identifier */
+enum clocksource_ids {
+   CSID_GENERIC= 0,
+   CSID_MAX,
+};
+
+#endif
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index d47009611109..688ec2e1a3bf 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -3,6 +3,7 @@
 #define _LINUX_TIMEKEEPING_H
 
 #include 
+#include 
 
 /* Included from linux/ktime.h */
 
@@ -243,11 +244,12 @@ struct ktime_timestamps {
  * @cs_was_changed_seq:The sequence number of clocksource change events
  */
 struct system_time_snapshot {
-   u64 cycles;
-   ktime_t real;
-   ktime_t raw;
-   unsigned intclock_was_set_seq;
-   u8  cs_was_changed_seq;
+   u64 cycles;
+   ktime_t real;
+   ktime_t raw;
+   enum clocksource_idscs_id;
+   unsigned intclock_was_set_seq;
+   u8  cs_was_changed_seq;
 };
 
 /**
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cce484a2cc7c..4fe1df894ee5 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -920,6 +920,8 @@ int __clocksource_register_scale(struct clocksource *cs, 
u32 scale, u32 freq)
 
clocksource_arch_init(cs);
 
+   if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX))
+   cs->id = CSID_GENERIC;
if (cs->vdso_clock_mode < 0 ||
cs->vdso_clock_mode >= VDSO_CLOCKMODE_MAX) {
pr_warn("clocksource %s registered with invalid VDSO mode %d. 
Disabling VDSO support.\n",
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index a45cedda93a7..50f08632165c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1049,6 +1049,7 @@ void ktime_get_snapshot(struct system_time_snapshot 
*systime_snapshot)
do {
seq = read_seqcount_begin(&tk_core.seq);
now = tk_clock_read(&tk->tkr_mono);
+   systime_snapshot->cs_id = tk->tkr_mono.clock->id;
systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
base_real = ktime_add(tk->tkr_mono.base,
-- 
2.17.1



Re: [PATCH] soc: ti: omap-prm: Fix boot time errors for rst_map_012 bits 0 and 1

2020-12-08 Thread Carl Philipp Klemm
On Tue,  8 Dec 2020 16:08:02 +0200
Tony Lindgren  wrote:

> We have rst_map_012 used for various accelerators like dsp, ipu and iva.
> For these use cases, we have rstctrl bit 2 control the subsystem module
> reset, and have and bits 0 and 1 control the accelerator specific
> features.
> 
> If the bootloader, or kexec boot, has left any accelerator specific
> reset bits deasserted, deasserting bit 2 reset will potentially enable
> an accelerator with unconfigured MMU and no firmware. And we may get
> spammed with a lot by warnings on boot with "Data Access in User mode
> during Functional access", or depending on the accelerator, the system
> can also just hang.
> 
> This issue can be quite easily reproduced by setting a rst_map_012 type
> rstctrl register to 0 or 4 in the bootloader, and booting the system.
> 
> Let's just assert all reset bits for rst_map_012 type resets. So far
> it looks like the other rstctrl types don't need this. If it turns out
> that the other type rstctrl bits also need reset on init, we need to
> add an instance specific reset mask for the bits to avoid resetting
> unwanted bits.
> 
> Reported-by: Carl Philipp Klemm 
> Cc: Philipp Zabel 
> Cc: Santosh Shilimkar 
> Cc: Suman Anna 
> Cc: Tero Kristo 
> Signed-off-by: Tony Lindgren 
> ---
>  drivers/soc/ti/omap_prm.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c
> --- a/drivers/soc/ti/omap_prm.c
> +++ b/drivers/soc/ti/omap_prm.c
> @@ -860,6 +860,7 @@ static int omap_prm_reset_init(struct platform_device 
> *pdev,
>   const struct omap_rst_map *map;
>   struct ti_prm_platform_data *pdata = dev_get_platdata(&pdev->dev);
>   char buf[32];
> + u32 v;
>  
>   /*
>* Check if we have controllable resets. If either rstctrl is non-zero
> @@ -907,6 +908,16 @@ static int omap_prm_reset_init(struct platform_device 
> *pdev,
>   map++;
>   }
>  
> + /* Quirk handling to assert rst_map_012 bits on reset and avoid errors 
> */
> + if (prm->data->rstmap == rst_map_012) {
> + v = readl_relaxed(reset->prm->base + reset->prm->data->rstctrl);
> + if ((v & reset->mask) != reset->mask) {
> + dev_dbg(&pdev->dev, "Asserting all resets: %08x\n", v);
> + writel_relaxed(reset->mask, reset->prm->base +
> +reset->prm->data->rstctrl);
> + }
> + }
> +
>   return devm_reset_controller_register(&pdev->dev, &reset->rcdev);
>  }
>  
> -- 
> 2.29.2

Works for me on xt875, idle now also works without userspace hack.

Tested-by: Carl Philipp Klemm 


Re: [PATCH net v4] bonding: fix feature flag setting at init time

2020-12-08 Thread Jakub Kicinski
On Sat,  5 Dec 2020 12:22:29 -0500 Jarod Wilson wrote:
> Don't try to adjust XFRM support flags if the bond device isn't yet
> registered. Bad things can currently happen when netdev_change_features()
> is called without having wanted_features fully filled in yet. This code
> runs both on post-module-load mode changes, as well as at module init
> time, and when run at module init time, it is before register_netdevice()
> has been called and filled in wanted_features. The empty wanted_features
> led to features also getting emptied out, which was definitely not the
> intended behavior, so prevent that from happening.
> 
> Originally, I'd hoped to stop adjusting wanted_features at all in the
> bonding driver, as it's documented as being something only the network
> core should touch, but we actually do need to do this to properly update
> both the features and wanted_features fields when changing the bond type,
> or we get to a situation where ethtool sees:
> 
> esp-hw-offload: off [requested on]
> 
> I do think we should be using netdev_update_features instead of
> netdev_change_features here though, so we only send notifiers when the
> features actually changed.
> 
> Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
> Reported-by: Ivan Vecera 
> Suggested-by: Ivan Vecera 

Applied, thanks!


[PATCH] soc: ti: omap-prm: Fix boot time errors for rst_map_012 bits 0 and 1

2020-12-08 Thread Tony Lindgren
We have rst_map_012 used for various accelerators like dsp, ipu and iva.
For these use cases, we have rstctrl bit 2 control the subsystem module
reset, and have and bits 0 and 1 control the accelerator specific
features.

If the bootloader, or kexec boot, has left any accelerator specific
reset bits deasserted, deasserting bit 2 reset will potentially enable
an accelerator with unconfigured MMU and no firmware. And we may get
spammed with a lot by warnings on boot with "Data Access in User mode
during Functional access", or depending on the accelerator, the system
can also just hang.

This issue can be quite easily reproduced by setting a rst_map_012 type
rstctrl register to 0 or 4 in the bootloader, and booting the system.

Let's just assert all reset bits for rst_map_012 type resets. So far
it looks like the other rstctrl types don't need this. If it turns out
that the other type rstctrl bits also need reset on init, we need to
add an instance specific reset mask for the bits to avoid resetting
unwanted bits.

Reported-by: Carl Philipp Klemm 
Cc: Philipp Zabel 
Cc: Santosh Shilimkar 
Cc: Suman Anna 
Cc: Tero Kristo 
Signed-off-by: Tony Lindgren 
---
 drivers/soc/ti/omap_prm.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c
--- a/drivers/soc/ti/omap_prm.c
+++ b/drivers/soc/ti/omap_prm.c
@@ -860,6 +860,7 @@ static int omap_prm_reset_init(struct platform_device *pdev,
const struct omap_rst_map *map;
struct ti_prm_platform_data *pdata = dev_get_platdata(&pdev->dev);
char buf[32];
+   u32 v;
 
/*
 * Check if we have controllable resets. If either rstctrl is non-zero
@@ -907,6 +908,16 @@ static int omap_prm_reset_init(struct platform_device 
*pdev,
map++;
}
 
+   /* Quirk handling to assert rst_map_012 bits on reset and avoid errors 
*/
+   if (prm->data->rstmap == rst_map_012) {
+   v = readl_relaxed(reset->prm->base + reset->prm->data->rstctrl);
+   if ((v & reset->mask) != reset->mask) {
+   dev_dbg(&pdev->dev, "Asserting all resets: %08x\n", v);
+   writel_relaxed(reset->mask, reset->prm->base +
+  reset->prm->data->rstctrl);
+   }
+   }
+
return devm_reset_controller_register(&pdev->dev, &reset->rcdev);
 }
 
-- 
2.29.2


Re: [PATCH] arm64/irq: report bug if NR_IPI greater than max SGI during compile time

2020-12-08 Thread Pingfan Liu
On Tue, Dec 8, 2020 at 5:51 PM Marc Zyngier  wrote:
>
> On 2020-12-08 09:43, Pingfan Liu wrote:
> > On Tue, Dec 8, 2020 at 5:31 PM Marc Zyngier  wrote:
> >>
> >> On 2020-12-08 09:21, Pingfan Liu wrote:
> >> > Although there is a runtime WARN_ON() when NR_IPR > max SGI, it had
> >> > better
> >> > do the check during built time, and associate these related code
> >> > together.
> >> >
> >> > Signed-off-by: Pingfan Liu 
> >> > Cc: Catalin Marinas 
> >> > Cc: Will Deacon 
> >> > Cc: Thomas Gleixner 
> >> > Cc: Jason Cooper 
> >> > Cc: Marc Zyngier 
> >> > Cc: Mark Rutland 
> >> > To: linux-arm-ker...@lists.infradead.org
> >> > Cc: linux-kernel@vger.kernel.org
> >> > ---
> >> >  arch/arm64/kernel/smp.c| 2 ++
> >> >  drivers/irqchip/irq-gic-v3.c   | 2 +-
> >> >  drivers/irqchip/irq-gic.c  | 2 +-
> >> >  include/linux/irqchip/arm-gic-common.h | 2 ++
> >> >  4 files changed, 6 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> >> > index 18e9727..9fc383c 100644
> >> > --- a/arch/arm64/kernel/smp.c
> >> > +++ b/arch/arm64/kernel/smp.c
> >> > @@ -33,6 +33,7 @@
> >> >  #include 
> >> >  #include 
> >> >  #include 
> >> > +#include 
> >> >
> >> >  #include 
> >> >  #include 
> >> > @@ -76,6 +77,7 @@ enum ipi_msg_type {
> >> >   IPI_WAKEUP,
> >> >   NR_IPI
> >> >  };
> >> > +static_assert(NR_IPI <= MAX_SGI_NUM);
> >>
> >> I am trying *very hard* to remove dependencies between the
> >> architecture
> >> code and random drivers, so this kind of check really is
> >> counter-productive.
> >>
> >> Driver code should not have to know the number of IPIs, because there
> >> is
> >> no requirement that all IPIs should map 1:1 to SGIs. Conflating the
> >> two
> >
> > Just curious about this. Is there an IPI which is not implemented by
> > SGI? Or mapping several IPIs to a single SGI, and scatter out due to a
> > global variable value?
>
> We currently have a single NS SGI left, and I'd like to move some of the
> non-critical IPIs over to dispatching mechanism (the two "CPU stop" IPIs
> definitely are candidate for merging). That's not implemented yet, but
> I don't see a need to add checks that would otherwise violate this
> IPI/SGI distinction.

Got it. Thanks for your detailed explanation.

Regards,
Pingfan


Re: [PATCH] arm64/irq: report bug if NR_IPI greater than max SGI during compile time

2020-12-08 Thread Marc Zyngier

On 2020-12-08 09:43, Pingfan Liu wrote:

On Tue, Dec 8, 2020 at 5:31 PM Marc Zyngier  wrote:


On 2020-12-08 09:21, Pingfan Liu wrote:
> Although there is a runtime WARN_ON() when NR_IPR > max SGI, it had
> better
> do the check during built time, and associate these related code
> together.
>
> Signed-off-by: Pingfan Liu 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Thomas Gleixner 
> Cc: Jason Cooper 
> Cc: Marc Zyngier 
> Cc: Mark Rutland 
> To: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/arm64/kernel/smp.c| 2 ++
>  drivers/irqchip/irq-gic-v3.c   | 2 +-
>  drivers/irqchip/irq-gic.c  | 2 +-
>  include/linux/irqchip/arm-gic-common.h | 2 ++
>  4 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 18e9727..9fc383c 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -76,6 +77,7 @@ enum ipi_msg_type {
>   IPI_WAKEUP,
>   NR_IPI
>  };
> +static_assert(NR_IPI <= MAX_SGI_NUM);

I am trying *very hard* to remove dependencies between the 
architecture

code and random drivers, so this kind of check really is
counter-productive.

Driver code should not have to know the number of IPIs, because there 
is
no requirement that all IPIs should map 1:1 to SGIs. Conflating the 
two


Just curious about this. Is there an IPI which is not implemented by
SGI? Or mapping several IPIs to a single SGI, and scatter out due to a
global variable value?


We currently have a single NS SGI left, and I'd like to move some of the
non-critical IPIs over to dispatching mechanism (the two "CPU stop" IPIs
definitely are candidate for merging). That's not implemented yet, but
I don't see a need to add checks that would otherwise violate this
IPI/SGI distinction.

Thanks,

 M.
--
Jazz is not dead. It just smells funny...


Re: [PATCH] arm64/irq: report bug if NR_IPI greater than max SGI during compile time

2020-12-08 Thread Pingfan Liu
On Tue, Dec 8, 2020 at 5:31 PM Marc Zyngier  wrote:
>
> On 2020-12-08 09:21, Pingfan Liu wrote:
> > Although there is a runtime WARN_ON() when NR_IPR > max SGI, it had
> > better
> > do the check during built time, and associate these related code
> > together.
> >
> > Signed-off-by: Pingfan Liu 
> > Cc: Catalin Marinas 
> > Cc: Will Deacon 
> > Cc: Thomas Gleixner 
> > Cc: Jason Cooper 
> > Cc: Marc Zyngier 
> > Cc: Mark Rutland 
> > To: linux-arm-ker...@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  arch/arm64/kernel/smp.c| 2 ++
> >  drivers/irqchip/irq-gic-v3.c   | 2 +-
> >  drivers/irqchip/irq-gic.c  | 2 +-
> >  include/linux/irqchip/arm-gic-common.h | 2 ++
> >  4 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 18e9727..9fc383c 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -33,6 +33,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include 
> >  #include 
> > @@ -76,6 +77,7 @@ enum ipi_msg_type {
> >   IPI_WAKEUP,
> >   NR_IPI
> >  };
> > +static_assert(NR_IPI <= MAX_SGI_NUM);
>
> I am trying *very hard* to remove dependencies between the architecture
> code and random drivers, so this kind of check really is
> counter-productive.
>
> Driver code should not have to know the number of IPIs, because there is
> no requirement that all IPIs should map 1:1 to SGIs. Conflating the two

Just curious about this. Is there an IPI which is not implemented by
SGI? Or mapping several IPIs to a single SGI, and scatter out due to a
global variable value?

Thanks,
Pingfan

> is already wrong, and I really don't want to add more of that.
>
> Thanks,
>
>  M.
> --
> Jazz is not dead. It just smells funny...


Re: [PATCH] arm64/irq: report bug if NR_IPI greater than max SGI during compile time

2020-12-08 Thread Marc Zyngier

On 2020-12-08 09:21, Pingfan Liu wrote:
Although there is a runtime WARN_ON() when NR_IPR > max SGI, it had 
better
do the check during built time, and associate these related code 
together.


Signed-off-by: Pingfan Liu 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Thomas Gleixner 
Cc: Jason Cooper 
Cc: Marc Zyngier 
Cc: Mark Rutland 
To: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arm64/kernel/smp.c| 2 ++
 drivers/irqchip/irq-gic-v3.c   | 2 +-
 drivers/irqchip/irq-gic.c  | 2 +-
 include/linux/irqchip/arm-gic-common.h | 2 ++
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 18e9727..9fc383c 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -76,6 +77,7 @@ enum ipi_msg_type {
IPI_WAKEUP,
NR_IPI
 };
+static_assert(NR_IPI <= MAX_SGI_NUM);


I am trying *very hard* to remove dependencies between the architecture
code and random drivers, so this kind of check really is 
counter-productive.


Driver code should not have to know the number of IPIs, because there is
no requirement that all IPIs should map 1:1 to SGIs. Conflating the two
is already wrong, and I really don't want to add more of that.

Thanks,

M.
--
Jazz is not dead. It just smells funny...


[PATCH] arm64/irq: report bug if NR_IPI greater than max SGI during compile time

2020-12-08 Thread Pingfan Liu
Although there is a runtime WARN_ON() when NR_IPR > max SGI, it had better
do the check during built time, and associate these related code together.

Signed-off-by: Pingfan Liu 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Thomas Gleixner 
Cc: Jason Cooper 
Cc: Marc Zyngier 
Cc: Mark Rutland 
To: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arm64/kernel/smp.c| 2 ++
 drivers/irqchip/irq-gic-v3.c   | 2 +-
 drivers/irqchip/irq-gic.c  | 2 +-
 include/linux/irqchip/arm-gic-common.h | 2 ++
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 18e9727..9fc383c 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -76,6 +77,7 @@ enum ipi_msg_type {
IPI_WAKEUP,
NR_IPI
 };
+static_assert(NR_IPI <= MAX_SGI_NUM);
 
 static int ipi_irq_base __read_mostly;
 static int nr_ipi __read_mostly = NR_IPI;
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 16fecc0..ee13f85 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1162,7 +1162,7 @@ static void __init gic_smp_init(void)
  gic_starting_cpu, NULL);
 
/* Register all 8 non-secure SGIs */
-   base_sgi = __irq_domain_alloc_irqs(gic_data.domain, -1, 8,
+   base_sgi = __irq_domain_alloc_irqs(gic_data.domain, -1, MAX_SGI_NUM,
   NUMA_NO_NODE, &sgi_fwspec,
   false, NULL);
if (WARN_ON(base_sgi <= 0))
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 6053245..07d36de 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -845,7 +845,7 @@ static __init void gic_smp_init(void)
  "irqchip/arm/gic:starting",
  gic_starting_cpu, NULL);
 
-   base_sgi = __irq_domain_alloc_irqs(gic_data[0].domain, -1, 8,
+   base_sgi = __irq_domain_alloc_irqs(gic_data[0].domain, -1, MAX_SGI_NUM,
   NUMA_NO_NODE, &sgi_fwspec,
   false, NULL);
if (WARN_ON(base_sgi <= 0))
diff --git a/include/linux/irqchip/arm-gic-common.h 
b/include/linux/irqchip/arm-gic-common.h
index fa8c045..7e45a9f 100644
--- a/include/linux/irqchip/arm-gic-common.h
+++ b/include/linux/irqchip/arm-gic-common.h
@@ -16,6 +16,8 @@
(GICD_INT_DEF_PRI << 8) |\
GICD_INT_DEF_PRI)
 
+#define MAX_SGI_NUM8
+
 enum gic_type {
GIC_V2,
GIC_V3,
-- 
2.7.5



[PATCH rdma-next 0/3] Various fixes collected over time

2020-12-07 Thread Leon Romanovsky
From: Leon Romanovsky 

Hi,

This is set of various and unrelated fixes that we collected over time.

Thanks

Avihai Horon (1):
  RDMA/uverbs: Fix incorrect variable type

Jack Morgenstein (2):
  RDMA/core: Clean up cq pool mechanism
  RDMA/core: Do not indicate device ready when device enablement fails

 drivers/infiniband/core/core_priv.h  |  3 +--
 drivers/infiniband/core/cq.c | 12 ++--
 drivers/infiniband/core/device.c | 16 ++--
 .../infiniband/core/uverbs_std_types_device.c| 14 +-
 include/rdma/uverbs_ioctl.h  | 10 ++
 5 files changed, 28 insertions(+), 27 deletions(-)

--
2.28.0



Re: Ftrace startup test and boot-time tracing

2020-12-07 Thread Steven Rostedt
On Tue, 8 Dec 2020 08:26:49 +0900
Masami Hiramatsu  wrote:

> Hi Steve,
> 
> On Mon, 7 Dec 2020 15:25:40 -0500
> Steven Rostedt  wrote:
> 
> > On Mon, 7 Dec 2020 23:02:59 +0900
> > Masami Hiramatsu  wrote:
> >   
> > > There will be the 2 options, one is to change kconfig so that user can not
> > > select FTRACE_STARTUP_TEST if BOOTTIME_TRACING=y, another is to provide
> > > a flag from trace_boot and all tests checks the flag at runtime.
> > > (moreover, that flag will be good to be set from other command-line 
> > > options)
> > > What would you think?  
> > 
> > Yeah, a "disable_ftrace_startup_tests" flag should be implemented. And
> > something that could also be on the kernel command line itself :-)
> > 
> >  "disabe_ftrace_startup_tests"
> > 
> > Sometimes when debugging something, I don't want the tests running, even
> > though the config has them, and I don't want to change the config.  
> 
> OK, BTW, I found tracing_selftest_disabled, it seemed what we need.
> 

Yeah, I thought we had something like this. It's getting hard to keep track
of ;-)

-- Steve


Re: Ftrace startup test and boot-time tracing

2020-12-07 Thread Masami Hiramatsu
Hi Steve,

On Mon, 7 Dec 2020 15:25:40 -0500
Steven Rostedt  wrote:

> On Mon, 7 Dec 2020 23:02:59 +0900
> Masami Hiramatsu  wrote:
> 
> > There will be the 2 options, one is to change kconfig so that user can not
> > select FTRACE_STARTUP_TEST if BOOTTIME_TRACING=y, another is to provide
> > a flag from trace_boot and all tests checks the flag at runtime.
> > (moreover, that flag will be good to be set from other command-line options)
> > What would you think?
> 
> Yeah, a "disable_ftrace_startup_tests" flag should be implemented. And
> something that could also be on the kernel command line itself :-)
> 
>  "disabe_ftrace_startup_tests"
> 
> Sometimes when debugging something, I don't want the tests running, even
> though the config has them, and I don't want to change the config.

OK, BTW, I found tracing_selftest_disabled, it seemed what we need.

Thank you,
-- 
Masami Hiramatsu 


[PATCH v2 1/2] Add save/restore of Precision Time Measurement capability

2020-12-07 Thread David E. Box
The PCI subsystem does not currently save and restore the configuration
space for the Precision Time Measurement (PTM) PCIe extended capability
leading to the possibility of the feature returning disabled on S3 resume.
This has been observed on Intel Coffee Lake desktops. Add save/restore of
the PTM control register. This saves the PTM Enable, Root Select, and
Effective Granularity bits.

Suggested-by: Rafael J. Wysocki 
Signed-off-by: David E. Box 
---

Changes from V1:
- Move save/restore functions to ptm.c
- Move pci_add_ext_cap_sve_buffer() to pci_ptm_init in ptm.c

 drivers/pci/pci.c  |  2 ++
 drivers/pci/pci.h  |  8 
 drivers/pci/pcie/ptm.c | 43 ++
 3 files changed, 53 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e578d34095e9..12ba6351c05b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1566,6 +1566,7 @@ int pci_save_state(struct pci_dev *dev)
pci_save_ltr_state(dev);
pci_save_dpc_state(dev);
pci_save_aer_state(dev);
+   pci_save_ptm_state(dev);
return pci_save_vc_state(dev);
 }
 EXPORT_SYMBOL(pci_save_state);
@@ -1677,6 +1678,7 @@ void pci_restore_state(struct pci_dev *dev)
pci_restore_vc_state(dev);
pci_restore_rebar_state(dev);
pci_restore_dpc_state(dev);
+   pci_restore_ptm_state(dev);
 
pci_aer_clear_status(dev);
pci_restore_aer_state(dev);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index f86cae9aa1f4..62cdacba5954 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -516,6 +516,14 @@ static inline int pci_iov_bus_range(struct pci_bus *bus)
 
 #endif /* CONFIG_PCI_IOV */
 
+#ifdef CONFIG_PCIE_PTM
+void pci_save_ptm_state(struct pci_dev *dev);
+void pci_restore_ptm_state(struct pci_dev *dev);
+#else
+static inline void pci_save_ptm_state(struct pci_dev *dev) {}
+static inline void pci_restore_ptm_state(struct pci_dev *dev) {}
+#endif
+
 unsigned long pci_cardbus_resource_alignment(struct resource *);
 
 static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
diff --git a/drivers/pci/pcie/ptm.c b/drivers/pci/pcie/ptm.c
index 357a454cafa0..6b24a1c9327a 100644
--- a/drivers/pci/pcie/ptm.c
+++ b/drivers/pci/pcie/ptm.c
@@ -29,6 +29,47 @@ static void pci_ptm_info(struct pci_dev *dev)
 dev->ptm_root ? " (root)" : "", clock_desc);
 }
 
+void pci_save_ptm_state(struct pci_dev *dev)
+{
+   int ptm;
+   struct pci_cap_saved_state *save_state;
+   u16 *cap;
+
+   if (!pci_is_pcie(dev))
+   return;
+
+   ptm = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PTM);
+   if (!ptm)
+   return;
+
+   save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_PTM);
+   if (!save_state) {
+   pci_err(dev, "no suspend buffer for PTM\n");
+   return;
+   }
+
+   cap = (u16 *)&save_state->cap.data[0];
+   pci_read_config_word(dev, ptm + PCI_PTM_CTRL, cap);
+}
+
+void pci_restore_ptm_state(struct pci_dev *dev)
+{
+   struct pci_cap_saved_state *save_state;
+   int ptm;
+   u16 *cap;
+
+   if (!pci_is_pcie(dev))
+   return;
+
+   save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_PTM);
+   ptm = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PTM);
+   if (!save_state || !ptm)
+   return;
+
+   cap = (u16 *)&save_state->cap.data[0];
+   pci_write_config_word(dev, ptm + PCI_PTM_CTRL, *cap);
+}
+
 void pci_ptm_init(struct pci_dev *dev)
 {
int pos;
@@ -65,6 +106,8 @@ void pci_ptm_init(struct pci_dev *dev)
if (!pos)
return;
 
+   pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_PTM, sizeof(u16));
+
pci_read_config_dword(dev, pos + PCI_PTM_CAP, &cap);
local_clock = (cap & PCI_PTM_GRANULARITY_MASK) >> 8;
 
-- 
2.20.1



Re: Ftrace startup test and boot-time tracing

2020-12-07 Thread Steven Rostedt
On Mon, 7 Dec 2020 23:02:59 +0900
Masami Hiramatsu  wrote:

> There will be the 2 options, one is to change kconfig so that user can not
> select FTRACE_STARTUP_TEST if BOOTTIME_TRACING=y, another is to provide
> a flag from trace_boot and all tests checks the flag at runtime.
> (moreover, that flag will be good to be set from other command-line options)
> What would you think?

Yeah, a "disable_ftrace_startup_tests" flag should be implemented. And
something that could also be on the kernel command line itself :-)

 "disabe_ftrace_startup_tests"

Sometimes when debugging something, I don't want the tests running, even
though the config has them, and I don't want to change the config.

-- Steve


Ftrace startup test and boot-time tracing

2020-12-07 Thread Masami Hiramatsu
Hi Steve,

I found that if I enabled the CONFIG_FTRACE_STARTUP_TEST=y and booted the
kernel with kprobe-events defined by boot-time tracing, a warning output.

[   59.803496] trace_kprobe: Testing kprobe tracing: 
[   59.804258] [ cut here ]
[   59.805682] WARNING: CPU: 3 PID: 1 at kernel/trace/trace_kprobe.c:1987 
kprobe_trace_self_tests_ib
[   59.806944] Modules linked in:
[   59.807335] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc7+ #172
[   59.808029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/204
[   59.808999] RIP: 0010:kprobe_trace_self_tests_init+0x5f/0x42b
[   59.809696] Code: e8 03 00 00 48 c7 c7 30 8e 07 82 e8 6d 3c 46 ff 48 c7 c6 
00 b2 1a 81 48 c7 c7 7
[   59.812439] RSP: 0018:c9013e78 EFLAGS: 00010282
[   59.813038] RAX: ffef RBX:  RCX: 00049443
[   59.813780] RDX: 00049403 RSI: 00049403 RDI: 0002deb0
[   59.814589] RBP: c9013e90 R08: 0001 R09: 0001
[   59.815349] R10: 0001 R11:  R12: ffef
[   59.816138] R13: 888004613d80 R14: 82696940 R15: 888004429138
[   59.816877] FS:  () GS:88807dcc() 
knlGS:
[   59.817772] CS:  0010 DS:  ES:  CR0: 80050033
[   59.818395] CR2: 01a8dd38 CR3: 0000 CR4: 06a0
[   59.819144] Call Trace:
[   59.819469]  ? init_kprobe_trace+0x6b/0x6b
[   59.819948]  do_one_initcall+0x5f/0x300
[   59.820392]  ? rcu_read_lock_sched_held+0x4f/0x80
[   59.820916]  kernel_init_freeable+0x22a/0x271
[   59.821416]  ? rest_init+0x241/0x241
[   59.821841]  kernel_init+0xe/0x10f
[   59.822251]  ret_from_fork+0x22/0x30
[   59.822683] irq event stamp: 16403349
[   59.823121] hardirqs last  enabled at (16403359): [] 
console_unlock+0x48e/0x580
[   59.824074] hardirqs last disabled at (16403368): [] 
console_unlock+0x3f6/0x580
[   59.825036] softirqs last  enabled at (16403200): [] 
__do_softirq+0x33a/0x484
[   59.825982] softirqs last disabled at (16403087): [] 
asm_call_irq_on_stack+0x10
[   59.827034] ---[ end trace 200c544775cdfeb3 ]---
[   59.827635] trace_kprobe: error on probing function entry.

This is actually similar issue which you had fixed with commit b6399cc78934
("tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline").

Fixing this kprobes warning is easy (see attached below), but I think this
has to be fixed widely, because other testcase also changes the boot-time
tracing results or may not work correctly with it.

There will be the 2 options, one is to change kconfig so that user can not
select FTRACE_STARTUP_TEST if BOOTTIME_TRACING=y, another is to provide
a flag from trace_boot and all tests checks the flag at runtime.
(moreover, that flag will be good to be set from other command-line options)
What would you think?

Thank you,

>From 00037083baca07a8705da39852480f6f53a8297c Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu 
Date: Mon, 7 Dec 2020 22:53:16 +0900
Subject: [PATCH] tracing/kprobes: Fix to skip kprobe-events startup test if
 kprobe-events is used

commit b6399cc78934 ("tracing/kprobe: Do not run kprobe boot tests
if kprobe_event is on cmdline") had fixed the same issue with
kprobe-events on kernel cmdline, but boot-time tracing re-introduce
similar issue.

When the boot-time tracing uses kprobe-events with ftrace startup
test, it produced a warning on the kprobe-events startup test
because the testcase doesn't expect any kprobe events exists.

To mitigate the warning, skip the kprobe-events startup test
if any kprobe-event is defined before starting the test.

Fixes: 4d655281eb1b ("tracing/boot Add kprobe event support")
Cc: sta...@vger.kernel.org
Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index b911e9f6d9f5..515e139236f2 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -25,7 +25,6 @@
 
 /* Kprobe early definition from command line */
 static char kprobe_boot_events_buf[COMMAND_LINE_SIZE] __initdata;
-static bool kprobe_boot_events_enabled __initdata;
 
 static int __init set_kprobe_boot_events(char *str)
 {
@@ -1887,8 +1886,6 @@ static __init void setup_boot_kprobe_events(void)
ret = trace_run_command(cmd, create_or_delete_trace_kprobe);
if (ret)
pr_warn("Failed to add event(%d): %s\n", ret, cmd);
-   else
-   kprobe_boot_events_enabled = true;
 
cmd = p;
}
@@ -1959,6 +1956,20 @@ find_trace_probe_file(struct trace_kprobe *tk, struct 
trace_array *tr)
return NULL;
 }
 
+static __init int trace_kprobe_exist(void)
+{
+   struct trace_kprobe *tk;

[PATCH v3 1/7] dt-bindings: input: Add reset-time-sec common property

2020-12-05 Thread Cristian Ciocaltea
Add a new common property 'reset-time-sec' to be used in conjunction
with the devices supporting the key pressed reset feature.

Signed-off-by: Cristian Ciocaltea 
---
Changes in v3:
 - This patch was not present in v2

 Documentation/devicetree/bindings/input/input.yaml | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/input/input.yaml 
b/Documentation/devicetree/bindings/input/input.yaml
index ab407f266bef..caba93209ae7 100644
--- a/Documentation/devicetree/bindings/input/input.yaml
+++ b/Documentation/devicetree/bindings/input/input.yaml
@@ -34,4 +34,11 @@ properties:
   specify this property.
 $ref: /schemas/types.yaml#/definitions/uint32
 
+  reset-time-sec:
+description:
+  Duration in seconds which the key should be kept pressed for device to
+  reset automatically. Device with key pressed reset feature can specify
+  this property.
+$ref: /schemas/types.yaml#/definitions/uint32
+
 additionalProperties: true
-- 
2.29.2



Re: [PATCH net v3] bonding: fix feature flag setting at init time

2020-12-05 Thread Jarod Wilson
On Thu, Dec 3, 2020 at 11:45 AM Jakub Kicinski  wrote:
...
> nit: let's narrow down the ifdef-enery
>
> no need for the ifdef here, if the helper looks like this:
>
> +static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode)
> +{
> +#ifdef CONFIG_XFRM_OFFLOAD
> +   if (mode == BOND_MODE_ACTIVEBACKUP)
> +   bond_dev->wanted_features |= BOND_XFRM_FEATURES;
> +   else
> +   bond_dev->wanted_features &= ~BOND_XFRM_FEATURES;
> +
> +   netdev_update_features(bond_dev);
> +#endif /* CONFIG_XFRM_OFFLOAD */
> +}
>
> Even better:
>
> +static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode)
> +{
> +   if (!IS_ENABLED(CONFIG_XFRM_OFFLOAD))
> +   return;
> +
> +   if (mode == BOND_MODE_ACTIVEBACKUP)
> +   bond_dev->wanted_features |= BOND_XFRM_FEATURES;
> +   else
> +   bond_dev->wanted_features &= ~BOND_XFRM_FEATURES;
> +
> +   netdev_update_features(bond_dev);
> +}
>
> (Assuming BOND_XFRM_FEATURES doesn't itself hide under an ifdef.)

It is, but doesn't need to be. I can mix these changes in as well.

-- 
Jarod Wilson
ja...@redhat.com



[PATCH net v4] bonding: fix feature flag setting at init time

2020-12-05 Thread Jarod Wilson
Don't try to adjust XFRM support flags if the bond device isn't yet
registered. Bad things can currently happen when netdev_change_features()
is called without having wanted_features fully filled in yet. This code
runs both on post-module-load mode changes, as well as at module init
time, and when run at module init time, it is before register_netdevice()
has been called and filled in wanted_features. The empty wanted_features
led to features also getting emptied out, which was definitely not the
intended behavior, so prevent that from happening.

Originally, I'd hoped to stop adjusting wanted_features at all in the
bonding driver, as it's documented as being something only the network
core should touch, but we actually do need to do this to properly update
both the features and wanted_features fields when changing the bond type,
or we get to a situation where ethtool sees:

esp-hw-offload: off [requested on]

I do think we should be using netdev_update_features instead of
netdev_change_features here though, so we only send notifiers when the
features actually changed.

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
Reported-by: Ivan Vecera 
Suggested-by: Ivan Vecera 
Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
v2: rework based on further testing and suggestions from ivecera
v3: add helper function, remove goto
v4: drop hunk not directly related to fix, clean up ifdeffery

 drivers/net/bonding/bond_options.c | 22 +++---
 include/net/bonding.h  |  2 --
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 9abfaae1c6f7..a4e4e15f574d 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -745,6 +745,19 @@ const struct bond_option *bond_opt_get(unsigned int option)
return &bond_opts[option];
 }
 
+static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode)
+{
+   if (!IS_ENABLED(CONFIG_XFRM_OFFLOAD))
+   return;
+
+   if (mode == BOND_MODE_ACTIVEBACKUP)
+   bond_dev->wanted_features |= BOND_XFRM_FEATURES;
+   else
+   bond_dev->wanted_features &= ~BOND_XFRM_FEATURES;
+
+   netdev_update_features(bond_dev);
+}
+
 static int bond_option_mode_set(struct bonding *bond,
const struct bond_opt_value *newval)
 {
@@ -767,13 +780,8 @@ static int bond_option_mode_set(struct bonding *bond,
if (newval->value == BOND_MODE_ALB)
bond->params.tlb_dynamic_lb = 1;
 
-#ifdef CONFIG_XFRM_OFFLOAD
-   if (newval->value == BOND_MODE_ACTIVEBACKUP)
-   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
-   else
-   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-   netdev_change_features(bond->dev);
-#endif /* CONFIG_XFRM_OFFLOAD */
+   if (bond->dev->reg_state == NETREG_REGISTERED)
+   bond_set_xfrm_features(bond->dev, newval->value);
 
/* don't cache arp_validate between modes */
bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
diff --git a/include/net/bonding.h b/include/net/bonding.h
index d9d0ff3b0ad3..adc3da776970 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -86,10 +86,8 @@
 #define bond_for_each_slave_rcu(bond, pos, iter) \
netdev_for_each_lower_private_rcu((bond)->dev, pos, iter)
 
-#ifdef CONFIG_XFRM_OFFLOAD
 #define BOND_XFRM_FEATURES (NETIF_F_HW_ESP | NETIF_F_HW_ESP_TX_CSUM | \
NETIF_F_GSO_ESP)
-#endif /* CONFIG_XFRM_OFFLOAD */
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
 extern atomic_t netpoll_block_tx;
-- 
2.28.0



Re: [PATCH net v3] bonding: fix feature flag setting at init time

2020-12-04 Thread Jakub Kicinski
On Thu, 3 Dec 2020 22:14:12 -0500 Jarod Wilson wrote:
> On Thu, Dec 3, 2020 at 11:50 AM Jakub Kicinski  wrote:
> >
> > On Wed,  2 Dec 2020 19:43:57 -0500 Jarod Wilson wrote:  
> > >   bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
> > > -#ifdef CONFIG_XFRM_OFFLOAD
> > > - bond_dev->hw_features |= BOND_XFRM_FEATURES;
> > > -#endif /* CONFIG_XFRM_OFFLOAD */
> > >   bond_dev->features |= bond_dev->hw_features;
> > >   bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | 
> > > NETIF_F_HW_VLAN_STAG_TX;
> > >  #ifdef CONFIG_XFRM_OFFLOAD
> > > - /* Disable XFRM features if this isn't an active-backup config */
> > > - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> > > - bond_dev->features &= ~BOND_XFRM_FEATURES;
> > > + bond_dev->hw_features |= BOND_XFRM_FEATURES;
> > > + /* Only enable XFRM features if this is an active-backup config */
> > > + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
> > > + bond_dev->features |= BOND_XFRM_FEATURES;
> > >  #endif /* CONFIG_XFRM_OFFLOAD */  
> >
> > This makes no functional change, or am I reading it wrong?  
> 
> You are correct, there's ultimately no functional change there, it
> primarily just condenses the code down to a single #ifdef block, and
> doesn't add and then remove BOND_XFRM_FEATURES from
> bond_dev->features, instead omitting it initially and only adding it
> when in AB mode. I'd poked at the code in that area while trying to
> get to the bottom of this, thought it made it more understandable, so
> I left it in, but ultimately, it's not necessary to fix the problem
> here.

Makes sense, but please split it out and send separately to net-next.


Re: [PATCH v2 1/2] KVM: arm64: Some fixes of PV-time interface document

2020-12-03 Thread zhukeqian



On 2020/12/3 23:04, Marc Zyngier wrote:
> On 2020-08-17 12:07, Keqian Zhu wrote:
>> Rename PV_FEATURES to PV_TIME_FEATURES.
>>
>> Signed-off-by: Keqian Zhu 
>> Reviewed-by: Andrew Jones 
>> Reviewed-by: Steven Price 
>> ---
>>  Documentation/virt/kvm/arm/pvtime.rst | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/arm/pvtime.rst
>> b/Documentation/virt/kvm/arm/pvtime.rst
>> index 687b60d..94bffe2 100644
>> --- a/Documentation/virt/kvm/arm/pvtime.rst
>> +++ b/Documentation/virt/kvm/arm/pvtime.rst
>> @@ -3,7 +3,7 @@
>>  Paravirtualized time support for arm64
>>  ==
>>
>> -Arm specification DEN0057/A defines a standard for paravirtualised time
>> +Arm specification DEN0057/A defines a standard for paravirtualized time
>>  support for AArch64 guests:
> 
> nit: I do object to this change (some of us are British! ;-).
Oh, I will pay attention to this. Thanks!

Keqian
> 
> M.


Re: [PATCH net v3] bonding: fix feature flag setting at init time

2020-12-03 Thread Jarod Wilson
On Thu, Dec 3, 2020 at 11:50 AM Jakub Kicinski  wrote:
>
> On Wed,  2 Dec 2020 19:43:57 -0500 Jarod Wilson wrote:
> >   bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
> > -#ifdef CONFIG_XFRM_OFFLOAD
> > - bond_dev->hw_features |= BOND_XFRM_FEATURES;
> > -#endif /* CONFIG_XFRM_OFFLOAD */
> >   bond_dev->features |= bond_dev->hw_features;
> >   bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | 
> > NETIF_F_HW_VLAN_STAG_TX;
> >  #ifdef CONFIG_XFRM_OFFLOAD
> > - /* Disable XFRM features if this isn't an active-backup config */
> > - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> > - bond_dev->features &= ~BOND_XFRM_FEATURES;
> > + bond_dev->hw_features |= BOND_XFRM_FEATURES;
> > + /* Only enable XFRM features if this is an active-backup config */
> > + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
> > + bond_dev->features |= BOND_XFRM_FEATURES;
> >  #endif /* CONFIG_XFRM_OFFLOAD */
>
> This makes no functional change, or am I reading it wrong?

You are correct, there's ultimately no functional change there, it
primarily just condenses the code down to a single #ifdef block, and
doesn't add and then remove BOND_XFRM_FEATURES from
bond_dev->features, instead omitting it initially and only adding it
when in AB mode. I'd poked at the code in that area while trying to
get to the bottom of this, thought it made it more understandable, so
I left it in, but ultimately, it's not necessary to fix the problem
here.

-- 
Jarod Wilson
ja...@redhat.com



Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time

2020-12-03 Thread Saravana Kannan
On Tue, Nov 24, 2020 at 12:29 AM 'Tomi Valkeinen' via kernel-team
 wrote:
>
> Hi,
>
> On 21/11/2020 04:02, Saravana Kannan wrote:
> > The current implementation of fw_devlink is very inefficient because it
> > tries to get away without creating fwnode links in the name of saving
> > memory usage. Past attempts to optimize runtime at the cost of memory
> > usage were blocked with request for data showing that the optimization
> > made significant improvement for real world scenarios.
> >
> > We have those scenarios now. There have been several reports of boot
> > time increase in the order of seconds in this thread [1]. Several OEMs
> > and SoC manufacturers have also privately reported significant
> > (350-400ms) increase in boot time due to all the parsing done by
> > fw_devlink.
> >
> > So this patch series refactors fw_devlink to be more efficient. The key
> > difference now is the addition of support for fwnode links -- just a few
> > simple APIs. This also allows most of the code to be moved out of
> > firmware specific (DT mostly) code into driver core.
> >
> > This brings the following benefits:
> > - Instead of parsing the device tree multiple times (complexity was
> >   close to O(N^3) where N in the number of properties) during bootup,
> >   fw_devlink parses each fwnode node/property only once and creates
> >   fwnode links. The rest of the fw_devlink code then just looks at these
> >   fwnode links to do rest of the work.
> >
> > - Makes it much easier to debug probe issue due to fw_devlink in the
> >   future. fw_devlink=on blocks the probing of devices if they depend on
> >   a device that hasn't been added yet. With this refactor, it'll be very
> >   easy to tell what that device is because we now have a reference to
> >   the fwnode of the device.
> >
> > - Much easier to add fw_devlink support to ACPI and other firmware
> >   types. A refactor to move the common bits from DT specific code to
> >   driver core was in my TODO list as a prerequisite to adding ACPI
> >   support to fw_devlink. This series gets that done.
> >
> > Laurent and Grygorii tested the v1 series and they saw boot time
> > improvment of about 12 seconds and 3 seconds, respectively.
>
> Tested v2 on OMAP4 SDP. With my particular config, boot time to starting init 
> went from 18.5 seconds
> to 12.5 seconds.
>
>  Tomi

Rafael,

Friendly reminder for a review.

-Saravana


[for-next][PATCH 3/3] ring-buffer: Add test to validate the time stamp deltas

2020-12-03 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

While debugging a situation where a delta for an event was calucalted wrong,
I realize there was nothing making sure that the delta of events are
correct. If a single event has an incorrect delta, then all events after it
will also have one. If the discrepency gets large enough, it could cause
the time stamps to go backwards when crossing sub buffers, that record a
full 64 bit time stamp, and the new deltas are added to that.

Add a way to validate the events at most events and when crossing a buffer
page. This will help make sure that the deltas are always correct. This test
will detect if they are ever corrupted.

The test adds a high overhead to the ring buffer recording, as it does the
audit for almost every event, and should only be used for testing the ring
buffer.

This will catch the bug that is fixed by commit 55ea4cf40380 ("ring-buffer:
Update write stamp with the correct ts"), which is not applied when this
commit is applied.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/Kconfig   |  20 +
 kernel/trace/ring_buffer.c | 150 +
 2 files changed, 170 insertions(+)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index c9b64dea1216..fe60f9d7a0e6 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -845,6 +845,26 @@ config RING_BUFFER_STARTUP_TEST
 
 If unsure, say N
 
+config RING_BUFFER_VALIDATE_TIME_DELTAS
+   bool "Verify ring buffer time stamp deltas"
+   depends on RING_BUFFER
+   help
+     This will audit the time stamps on the ring buffer sub
+ buffer to make sure that all the time deltas for the
+ events on a sub buffer matches the current time stamp.
+ This audit is performed for every event that is not
+ interrupted, or interrupting another event. A check
+ is also made when traversing sub buffers to make sure
+ that all the deltas on the previous sub buffer do not
+ add up to be greater than the current time stamp.
+
+ NOTE: This adds significant overhead to recording of events,
+ and should only be used to test the logic of the ring buffer.
+ Do not use it on production systems.
+
+ Only say Y if you understand what this does, and you
+ still want it enabled. Otherwise say N
+
 config MMIOTRACE_TEST
tristate "Test module for mmiotrace"
depends on MMIOTRACE && m
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index ab68f28b8f4b..7cd888ee9ac7 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3193,6 +3193,153 @@ int ring_buffer_unlock_commit(struct trace_buffer 
*buffer,
 }
 EXPORT_SYMBOL_GPL(ring_buffer_unlock_commit);
 
+/* Special value to validate all deltas on a page. */
+#define CHECK_FULL_PAGE1L
+
+#ifdef CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS
+static void dump_buffer_page(struct buffer_data_page *bpage,
+struct rb_event_info *info,
+unsigned long tail)
+{
+   struct ring_buffer_event *event;
+   u64 ts, delta;
+   int e;
+
+   ts = bpage->time_stamp;
+   pr_warn("  [%lld] PAGE TIME STAMP\n", ts);
+
+   for (e = 0; e < tail; e += rb_event_length(event)) {
+
+   event = (struct ring_buffer_event *)(bpage->data + e);
+
+   switch (event->type_len) {
+
+   case RINGBUF_TYPE_TIME_EXTEND:
+   delta = ring_buffer_event_time_stamp(event);
+   ts += delta;
+   pr_warn("  [%lld] delta:%lld TIME EXTEND\n", ts, delta);
+   break;
+
+   case RINGBUF_TYPE_TIME_STAMP:
+   delta = ring_buffer_event_time_stamp(event);
+   ts = delta;
+   pr_warn("  [%lld] absolute:%lld TIME STAMP\n", ts, 
delta);
+   break;
+
+   case RINGBUF_TYPE_PADDING:
+   ts += event->time_delta;
+   pr_warn("  [%lld] delta:%d PADDING\n", ts, 
event->time_delta);
+   break;
+
+   case RINGBUF_TYPE_DATA:
+   ts += event->time_delta;
+   pr_warn("  [%lld] delta:%d\n", ts, event->time_delta);
+   break;
+
+   default:
+   break;
+   }
+   }
+}
+
+static DEFINE_PER_CPU(atomic_t, checking);
+static atomic_t ts_dump;
+
+/*
+ * Check if the current event time stamp matches the deltas on
+ * the buffer page.
+ */
+static void check_buffer(struct ring_buffer_per_cpu *cpu_buffer,
+struct rb_event_info *info,
+unsigned long tail)
+{
+   struct ring_buffer_event *event;
+   struct buffer_

Re: [PATCH net v3] bonding: fix feature flag setting at init time

2020-12-03 Thread Jakub Kicinski
On Wed,  2 Dec 2020 19:43:57 -0500 Jarod Wilson wrote:
>   bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
> -#ifdef CONFIG_XFRM_OFFLOAD
> - bond_dev->hw_features |= BOND_XFRM_FEATURES;
> -#endif /* CONFIG_XFRM_OFFLOAD */
>   bond_dev->features |= bond_dev->hw_features;
>   bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
>  #ifdef CONFIG_XFRM_OFFLOAD
> - /* Disable XFRM features if this isn't an active-backup config */
> - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> - bond_dev->features &= ~BOND_XFRM_FEATURES;
> + bond_dev->hw_features |= BOND_XFRM_FEATURES;
> + /* Only enable XFRM features if this is an active-backup config */
> + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
> + bond_dev->features |= BOND_XFRM_FEATURES;
>  #endif /* CONFIG_XFRM_OFFLOAD */

This makes no functional change, or am I reading it wrong?


Re: [PATCH net v3] bonding: fix feature flag setting at init time

2020-12-03 Thread Jakub Kicinski
On Wed,  2 Dec 2020 19:43:57 -0500 Jarod Wilson wrote:
> Don't try to adjust XFRM support flags if the bond device isn't yet
> registered. Bad things can currently happen when netdev_change_features()
> is called without having wanted_features fully filled in yet. This code
> runs on post-module-load mode changes, as well as at module init time
> and new bond creation time, and in the latter two scenarios, it is
> running prior to register_netdevice() having been called and
> subsequently filling in wanted_features. The empty wanted_features led
> to features also getting emptied out, which was definitely not the
> intended behavior, so prevent that from happening.
> 
> Originally, I'd hoped to stop adjusting wanted_features at all in the
> bonding driver, as it's documented as being something only the network
> core should touch, but we actually do need to do this to properly update
> both the features and wanted_features fields when changing the bond type,
> or we get to a situation where ethtool sees:
> 
> esp-hw-offload: off [requested on]
> 
> I do think we should be using netdev_update_features instead of
> netdev_change_features here though, so we only send notifiers when the
> features actually changed.
> 
> v2: rework based on further testing and suggestions from ivecera
> v3: add helper function, remove goto, fix problem description
> 
> Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
> Reported-by: Ivan Vecera 
> Suggested-by: Ivan Vecera 
> Cc: Jay Vosburgh 
> Cc: Veaceslav Falico 
> Cc: Andy Gospodarek 
> Cc: "David S. Miller" 
> Cc: Jakub Kicinski 
> Cc: Thomas Davis 
> Cc: net...@vger.kernel.org
> Signed-off-by: Jarod Wilson 
> ---
>  drivers/net/bonding/bond_main.c| 10 --
>  drivers/net/bonding/bond_options.c | 19 ++-
>  2 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 47afc5938c26..7905534a763b 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4747,15 +4747,13 @@ void bond_setup(struct net_device *bond_dev)
>   NETIF_F_HW_VLAN_CTAG_FILTER;
>  
>   bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
> -#ifdef CONFIG_XFRM_OFFLOAD
> - bond_dev->hw_features |= BOND_XFRM_FEATURES;
> -#endif /* CONFIG_XFRM_OFFLOAD */
>   bond_dev->features |= bond_dev->hw_features;
>   bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
>  #ifdef CONFIG_XFRM_OFFLOAD
> - /* Disable XFRM features if this isn't an active-backup config */
> - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> - bond_dev->features &= ~BOND_XFRM_FEATURES;
> + bond_dev->hw_features |= BOND_XFRM_FEATURES;
> + /* Only enable XFRM features if this is an active-backup config */
> + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
> + bond_dev->features |= BOND_XFRM_FEATURES;
>  #endif /* CONFIG_XFRM_OFFLOAD */
>  }
>  
> diff --git a/drivers/net/bonding/bond_options.c 
> b/drivers/net/bonding/bond_options.c
> index 9abfaae1c6f7..1ae0e5ab8c67 100644
> --- a/drivers/net/bonding/bond_options.c
> +++ b/drivers/net/bonding/bond_options.c
> @@ -745,6 +745,18 @@ const struct bond_option *bond_opt_get(unsigned int 
> option)
>   return &bond_opts[option];
>  }
>  
> +#ifdef CONFIG_XFRM_OFFLOAD
> +static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode)
> +{
> + if (mode == BOND_MODE_ACTIVEBACKUP)
> + bond_dev->wanted_features |= BOND_XFRM_FEATURES;
> + else
> + bond_dev->wanted_features &= ~BOND_XFRM_FEATURES;
> +
> + netdev_update_features(bond_dev);
> +}
> +#endif /* CONFIG_XFRM_OFFLOAD */
> +
>  static int bond_option_mode_set(struct bonding *bond,
>   const struct bond_opt_value *newval)
>  {
> @@ -768,11 +780,8 @@ static int bond_option_mode_set(struct bonding *bond,
>   bond->params.tlb_dynamic_lb = 1;
>  
>  #ifdef CONFIG_XFRM_OFFLOAD
> - if (newval->value == BOND_MODE_ACTIVEBACKUP)
> - bond->dev->wanted_features |= BOND_XFRM_FEATURES;
> - else
> - bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> - netdev_change_features(bond->dev);
> + if (bond->dev->reg_state == NETREG_REGISTERED)
> + bond_set_xfrm_features(bond->dev, newval->value);
>  #endif /* CONFIG_XFRM_OFFLOAD */

nit: let's narrow down the ifdef-enery

no need for the ifdef here,

Re: [PATCH v2 1/2] KVM: arm64: Some fixes of PV-time interface document

2020-12-03 Thread Marc Zyngier

On 2020-08-17 12:07, Keqian Zhu wrote:

Rename PV_FEATURES to PV_TIME_FEATURES.

Signed-off-by: Keqian Zhu 
Reviewed-by: Andrew Jones 
Reviewed-by: Steven Price 
---
 Documentation/virt/kvm/arm/pvtime.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/arm/pvtime.rst
b/Documentation/virt/kvm/arm/pvtime.rst
index 687b60d..94bffe2 100644
--- a/Documentation/virt/kvm/arm/pvtime.rst
+++ b/Documentation/virt/kvm/arm/pvtime.rst
@@ -3,7 +3,7 @@
 Paravirtualized time support for arm64
 ==

-Arm specification DEN0057/A defines a standard for paravirtualised 
time
+Arm specification DEN0057/A defines a standard for paravirtualized 
time

 support for AArch64 guests:


nit: I do object to this change (some of us are British! ;-).

M.
--
Jazz is not dead. It just smells funny...


[PATCH 0/1] Fix for a recent regression in kvm/queue (guest using 100% cpu time)

2020-12-03 Thread Maxim Levitsky
I did a quick bisect yesterday after noticing that my VMs started
to take 100% cpu time.

Looks like we don't ignore SIPIs that are received while the CPU isn't
waiting for them, and that makes KVM think that CPU always has
pending events which makes it never enter an idle state.

Best regards,
Maxim Levitsky

Maxim Levitsky (1):
  KVM: x86: ignore SIPIs that are received while not in wait-for-sipi
state

 arch/x86/kvm/lapic.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

-- 
2.26.2




[RFC PATCH 00/10] Reduce time complexity of select_idle_sibling

2020-12-03 Thread Mel Gorman
This is an early prototype that has not been tested heavily. While parts
of it may stand on its own, the motivation to release early is Aubrey
Li's series on using an idle cpumask to optimise the search and Barry
Song's series on representing clusters on die. The series is based on
tip/sched/core rebased to 5.10-rc6.

Patches 1-2 add schedstats to track the search efficiency of
select_idle_sibling. They can be dropped from the final version but
are useful when looking at select_idle_sibling in general. MMTests
can already parse the stats and generate useful data including
graphs over time.

Patch 3 kills SIS_AVG_CPU but is partially reintroduced later in the
context of SIS_PROP.

Patch 4 notes that select_idle_core() can find an idle CPU that is
not a free core yet it is ignored and a second search is conducted
in select_idle_cpu() which is wasteful. Note that this patch
will definitely change in the final version.

Patch 5 adjusts p->recent_used_cpu so that it has a higher success rate
and avoids searching the domain in some cases.

Patch 6 notes that select_idle_* always starts with a CPU that is
definitely not idle and fixes that.

Patch 7 notes that SIS_PROP is only partially accounting for search
costs. While this might be accidentally beneficial, it makes it
much harder to reason about the effectiveness of SIS_PROP.

Patch 8 uses similar logic to SIS_AVG_CPU but in the context of
SIS_PROP to throttle the search depth.

Patches 9 and 10 are stupid in the context of this series. They
are included even though it makes no sense to use SIS_PROP logic in
select_idle_core() as it already has throttling logic. The point
is to illustrate that the select_idle_mask can be initialised
at the start of a domain search used to mask out CPUs that have
already been visited.

In the context of Aubrey's and Barry's work, select_idle_mask would
be initialised *after* select_idle_core as select_idle_core uses
select_idle_mask for its own purposes. In Aubrey's case, the next
step would be to scan idle_cpus_span as those CPUs may still be idle
and bias the search towards likely idle candidates. If that fails,
select_idle_mask clears all the bits set in idle_cpus_span and then
scans the remainder. Similar observations apply to Barry's work, scan the
local domain first, mask out those bits then scan the remaining CPUs in
the cluster.

The final version of this series will drop patches 1-2 unless there is
demand and definitely drop patches 9-10. However, all 4 patches may be
useful in the context of Aubrey's and Barry's work. Patches 1-2 would
give more precise results on exactly how much they are improving "SIS
Domain Search Efficiency" which may be more illustrative than just the
headline performance figures of a given workload. The final version of
this series will also adjust patch 4. If select_idle_core() runs at all
then it definitely should return a CPU -- either an idle CPU or the target
as it has already searched the entire domain and no further searching
should be conducted. Barry might change that back so that a cluster can
be scanned but it would be done in the context of the cluster series.

-- 
2.26.2



[PATCH AUTOSEL 4.19 01/14] iwlwifi: pcie: limit memory read spin time

2020-12-03 Thread Sasha Levin
From: Johannes Berg 

[ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ]

When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.

As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.

Clearly, it is unreasonable to spin here for such extended periods of
time.

To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.

This will keep (interrupt) latencies on the CPU down to a reasonable
time.

Signed-off-by: Johannes Berg 
Signed-off-by: Mordechay Goodstein 
Signed-off-by: Luca Coelho 
Signed-off-by: Kalle Valo 
Link: 
https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid
Signed-off-by: Sasha Levin 
---
 .../net/wireless/intel/iwlwifi/pcie/trans.c   | 36 ++-
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index 24da496151353..f48c7cac122e9 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -2121,18 +2121,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans 
*trans, u32 addr,
   void *buf, int dwords)
 {
unsigned long flags;
-   int offs, ret = 0;
+   int offs = 0;
u32 *vals = buf;
 
-   if (iwl_trans_grab_nic_access(trans, &flags)) {
-   iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr);
-   for (offs = 0; offs < dwords; offs++)
-   vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT);
-   iwl_trans_release_nic_access(trans, &flags);
-   } else {
-   ret = -EBUSY;
+   while (offs < dwords) {
+   /* limit the time we spin here under lock to 1/2s */
+   ktime_t timeout = ktime_add_us(ktime_get(), 500 * 
USEC_PER_MSEC);
+
+   if (iwl_trans_grab_nic_access(trans, &flags)) {
+   iwl_write32(trans, HBUS_TARG_MEM_RADDR,
+   addr + 4 * offs);
+
+   while (offs < dwords) {
+   vals[offs] = iwl_read32(trans,
+   HBUS_TARG_MEM_RDAT);
+   offs++;
+
+   /* calling ktime_get is expensive so
+* do it once in 128 reads
+*/
+   if (offs % 128 == 0 && ktime_after(ktime_get(),
+  timeout))
+   break;
+   }
+   iwl_trans_release_nic_access(trans, &flags);
+   } else {
+   return -EBUSY;
+   }
}
-   return ret;
+
+   return 0;
 }
 
 static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr,
-- 
2.27.0



[PATCH AUTOSEL 4.14 1/9] iwlwifi: pcie: limit memory read spin time

2020-12-03 Thread Sasha Levin
From: Johannes Berg 

[ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ]

When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.

As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.

Clearly, it is unreasonable to spin here for such extended periods of
time.

To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.

This will keep (interrupt) latencies on the CPU down to a reasonable
time.

Signed-off-by: Johannes Berg 
Signed-off-by: Mordechay Goodstein 
Signed-off-by: Luca Coelho 
Signed-off-by: Kalle Valo 
Link: 
https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid
Signed-off-by: Sasha Levin 
---
 .../net/wireless/intel/iwlwifi/pcie/trans.c   | 36 ++-
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index 8a074a516fb26..910edd034fe3a 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -1927,18 +1927,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans 
*trans, u32 addr,
   void *buf, int dwords)
 {
unsigned long flags;
-   int offs, ret = 0;
+   int offs = 0;
u32 *vals = buf;
 
-   if (iwl_trans_grab_nic_access(trans, &flags)) {
-   iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr);
-   for (offs = 0; offs < dwords; offs++)
-   vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT);
-   iwl_trans_release_nic_access(trans, &flags);
-   } else {
-   ret = -EBUSY;
+   while (offs < dwords) {
+   /* limit the time we spin here under lock to 1/2s */
+   ktime_t timeout = ktime_add_us(ktime_get(), 500 * 
USEC_PER_MSEC);
+
+   if (iwl_trans_grab_nic_access(trans, &flags)) {
+   iwl_write32(trans, HBUS_TARG_MEM_RADDR,
+   addr + 4 * offs);
+
+   while (offs < dwords) {
+   vals[offs] = iwl_read32(trans,
+   HBUS_TARG_MEM_RDAT);
+   offs++;
+
+   /* calling ktime_get is expensive so
+* do it once in 128 reads
+*/
+   if (offs % 128 == 0 && ktime_after(ktime_get(),
+  timeout))
+   break;
+   }
+   iwl_trans_release_nic_access(trans, &flags);
+   } else {
+   return -EBUSY;
+   }
}
-   return ret;
+
+   return 0;
 }
 
 static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr,
-- 
2.27.0



[PATCH AUTOSEL 4.9 1/5] iwlwifi: pcie: limit memory read spin time

2020-12-03 Thread Sasha Levin
From: Johannes Berg 

[ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ]

When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.

As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.

Clearly, it is unreasonable to spin here for such extended periods of
time.

To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.

This will keep (interrupt) latencies on the CPU down to a reasonable
time.

Signed-off-by: Johannes Berg 
Signed-off-by: Mordechay Goodstein 
Signed-off-by: Luca Coelho 
Signed-off-by: Kalle Valo 
Link: 
https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid
Signed-off-by: Sasha Levin 
---
 .../net/wireless/intel/iwlwifi/pcie/trans.c   | 36 ++-
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index e7b873018dca6..e1287c3421165 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -1904,18 +1904,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans 
*trans, u32 addr,
   void *buf, int dwords)
 {
unsigned long flags;
-   int offs, ret = 0;
+   int offs = 0;
u32 *vals = buf;
 
-   if (iwl_trans_grab_nic_access(trans, &flags)) {
-   iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr);
-   for (offs = 0; offs < dwords; offs++)
-   vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT);
-   iwl_trans_release_nic_access(trans, &flags);
-   } else {
-   ret = -EBUSY;
+   while (offs < dwords) {
+   /* limit the time we spin here under lock to 1/2s */
+   ktime_t timeout = ktime_add_us(ktime_get(), 500 * 
USEC_PER_MSEC);
+
+   if (iwl_trans_grab_nic_access(trans, &flags)) {
+   iwl_write32(trans, HBUS_TARG_MEM_RADDR,
+   addr + 4 * offs);
+
+   while (offs < dwords) {
+   vals[offs] = iwl_read32(trans,
+   HBUS_TARG_MEM_RDAT);
+   offs++;
+
+   /* calling ktime_get is expensive so
+* do it once in 128 reads
+*/
+   if (offs % 128 == 0 && ktime_after(ktime_get(),
+  timeout))
+   break;
+   }
+   iwl_trans_release_nic_access(trans, &flags);
+   } else {
+   return -EBUSY;
+   }
}
-   return ret;
+
+   return 0;
 }
 
 static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr,
-- 
2.27.0



[PATCH AUTOSEL 5.4 01/23] iwlwifi: pcie: limit memory read spin time

2020-12-03 Thread Sasha Levin
From: Johannes Berg 

[ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ]

When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.

As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.

Clearly, it is unreasonable to spin here for such extended periods of
time.

To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.

This will keep (interrupt) latencies on the CPU down to a reasonable
time.

Signed-off-by: Johannes Berg 
Signed-off-by: Mordechay Goodstein 
Signed-off-by: Luca Coelho 
Signed-off-by: Kalle Valo 
Link: 
https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid
Signed-off-by: Sasha Levin 
---
 .../net/wireless/intel/iwlwifi/pcie/trans.c   | 36 ++-
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index c76d26708e659..ef5a8ecabc60a 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -2178,18 +2178,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans 
*trans, u32 addr,
   void *buf, int dwords)
 {
unsigned long flags;
-   int offs, ret = 0;
+   int offs = 0;
u32 *vals = buf;
 
-   if (iwl_trans_grab_nic_access(trans, &flags)) {
-   iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr);
-   for (offs = 0; offs < dwords; offs++)
-   vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT);
-   iwl_trans_release_nic_access(trans, &flags);
-   } else {
-   ret = -EBUSY;
+   while (offs < dwords) {
+   /* limit the time we spin here under lock to 1/2s */
+   ktime_t timeout = ktime_add_us(ktime_get(), 500 * 
USEC_PER_MSEC);
+
+   if (iwl_trans_grab_nic_access(trans, &flags)) {
+   iwl_write32(trans, HBUS_TARG_MEM_RADDR,
+   addr + 4 * offs);
+
+   while (offs < dwords) {
+   vals[offs] = iwl_read32(trans,
+   HBUS_TARG_MEM_RDAT);
+   offs++;
+
+   /* calling ktime_get is expensive so
+* do it once in 128 reads
+*/
+   if (offs % 128 == 0 && ktime_after(ktime_get(),
+  timeout))
+   break;
+   }
+   iwl_trans_release_nic_access(trans, &flags);
+   } else {
+   return -EBUSY;
+   }
}
-   return ret;
+
+   return 0;
 }
 
 static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr,
-- 
2.27.0



[PATCH AUTOSEL 5.9 03/39] iwlwifi: pcie: limit memory read spin time

2020-12-03 Thread Sasha Levin
From: Johannes Berg 

[ Upstream commit 04516706bb99889986ddfa3a769ed50d2dc7ac13 ]

When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.

As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.

Clearly, it is unreasonable to spin here for such extended periods of
time.

To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.

This will keep (interrupt) latencies on the CPU down to a reasonable
time.

Signed-off-by: Johannes Berg 
Signed-off-by: Mordechay Goodstein 
Signed-off-by: Luca Coelho 
Signed-off-by: Kalle Valo 
Link: 
https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid
Signed-off-by: Sasha Levin 
---
 .../net/wireless/intel/iwlwifi/pcie/trans.c   | 36 ++-
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index e5160d6208688..6393e895f95c6 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -2155,18 +2155,36 @@ static int iwl_trans_pcie_read_mem(struct iwl_trans 
*trans, u32 addr,
   void *buf, int dwords)
 {
unsigned long flags;
-   int offs, ret = 0;
+   int offs = 0;
u32 *vals = buf;
 
-   if (iwl_trans_grab_nic_access(trans, &flags)) {
-   iwl_write32(trans, HBUS_TARG_MEM_RADDR, addr);
-   for (offs = 0; offs < dwords; offs++)
-   vals[offs] = iwl_read32(trans, HBUS_TARG_MEM_RDAT);
-   iwl_trans_release_nic_access(trans, &flags);
-   } else {
-   ret = -EBUSY;
+   while (offs < dwords) {
+   /* limit the time we spin here under lock to 1/2s */
+   ktime_t timeout = ktime_add_us(ktime_get(), 500 * 
USEC_PER_MSEC);
+
+   if (iwl_trans_grab_nic_access(trans, &flags)) {
+   iwl_write32(trans, HBUS_TARG_MEM_RADDR,
+   addr + 4 * offs);
+
+   while (offs < dwords) {
+   vals[offs] = iwl_read32(trans,
+   HBUS_TARG_MEM_RDAT);
+   offs++;
+
+   /* calling ktime_get is expensive so
+* do it once in 128 reads
+*/
+   if (offs % 128 == 0 && ktime_after(ktime_get(),
+  timeout))
+   break;
+   }
+   iwl_trans_release_nic_access(trans, &flags);
+   } else {
+   return -EBUSY;
+   }
}
-   return ret;
+
+   return 0;
 }
 
 static int iwl_trans_pcie_write_mem(struct iwl_trans *trans, u32 addr,
-- 
2.27.0



[PATCH RFC 3/3] RISC-V: KVM: Implement guest time scaling

2020-12-03 Thread Yifei Jiang
When time frequency needs to scale, RDTIME/RDTIMEH instruction in guest
doesn't work correctly. Because it still uses the host's time frequency.

To read correct time, the RDTIME/RDTIMEH instruction executed by guest
should trap to HS-mode. The TM bit of HCOUNTEREN CSR could control whether
these instructions are trapped to HS-mode. Therefore, we can implement guest
time scaling by setting TM bit in kvm_riscv_vcpu_timer_restore() and emulating
RDTIME/RDTIMEH instruction in system_opcode_insn().

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 arch/riscv/include/asm/csr.h|  3 +++
 arch/riscv/include/asm/kvm_vcpu_timer.h |  1 +
 arch/riscv/kvm/vcpu_exit.c  | 35 +
 arch/riscv/kvm/vcpu_timer.c | 10 +++
 4 files changed, 49 insertions(+)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index bc825693e0e3..a4d8ca76cf1d 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -241,6 +241,9 @@
 #define IE_TIE (_AC(0x1, UL) << RV_IRQ_TIMER)
 #define IE_EIE (_AC(0x1, UL) << RV_IRQ_EXT)
 
+/* The counteren flag */
+#define CE_TM  1
+
 #ifndef __ASSEMBLY__
 
 #define csr_swap(csr, val) \
diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h 
b/arch/riscv/include/asm/kvm_vcpu_timer.h
index 41b5503de9e4..61384eb57334 100644
--- a/arch/riscv/include/asm/kvm_vcpu_timer.h
+++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
@@ -41,6 +41,7 @@ int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu);
 int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
 void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu);
 int kvm_riscv_guest_timer_init(struct kvm *kvm);
+u64 kvm_riscv_read_guest_time(struct kvm_vcpu *vcpu);
 
 static inline bool kvm_riscv_need_scale(struct kvm_guest_timer *gt)
 {
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index f054406792a6..4beb9d25049a 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -18,6 +18,10 @@
 
 #define INSN_MASK_WFI  0xff00
 #define INSN_MATCH_WFI 0x1050
+#define INSN_MASK_RDTIME   0xfff03000
+#define INSN_MATCH_RDTIME  0xc0102000
+#define INSN_MASK_RDTIMEH  0xfff03000
+#define INSN_MATCH_RDTIMEH 0xc8102000
 
 #define INSN_MATCH_LB  0x3
 #define INSN_MASK_LB   0x707f
@@ -138,6 +142,34 @@ static int truly_illegal_insn(struct kvm_vcpu *vcpu,
return 1;
 }
 
+static int system_opcode_insn_rdtime(struct kvm_vcpu *vcpu,
+struct kvm_run *run,
+ulong insn)
+{
+#ifdef CONFIG_64BIT
+   if ((insn & INSN_MASK_RDTIME) == INSN_MATCH_RDTIME) {
+   u64 guest_time = kvm_riscv_read_guest_time(vcpu);
+   SET_RD(insn, &vcpu->arch.guest_context, guest_time);
+   vcpu->arch.guest_context.sepc += INSN_LEN(insn);
+   return 1;
+   }
+#else
+   if ((insn & INSN_MASK_RDTIME) == INSN_MATCH_RDTIME) {
+   u64 guest_time = kvm_riscv_read_guest_time(vcpu);
+   SET_RD(insn, &vcpu->arch.guest_context, (u32)guest_time);
+   vcpu->arch.guest_context.sepc += INSN_LEN(insn);
+   return 1;
+   }
+   if ((insn & INSN_MASK_RDTIMEH) == INSN_MATCH_RDTIMEH) {
+   u64 guest_time = kvm_riscv_read_guest_time(vcpu);
+   SET_RD(insn, &vcpu->arch.guest_context, (u32)(guest_time >> 
32));
+   vcpu->arch.guest_context.sepc += INSN_LEN(insn);
+   return 1;
+   }
+#endif
+   return 0;
+}
+
 static int system_opcode_insn(struct kvm_vcpu *vcpu,
  struct kvm_run *run,
  ulong insn)
@@ -154,6 +186,9 @@ static int system_opcode_insn(struct kvm_vcpu *vcpu,
return 1;
}
 
+   if (system_opcode_insn_rdtime(vcpu, run, insn))
+   return 1;
+
return truly_illegal_insn(vcpu, run, insn);
 }
 
diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c
index 2d203660a7e9..2040dbe57ee6 100644
--- a/arch/riscv/kvm/vcpu_timer.c
+++ b/arch/riscv/kvm/vcpu_timer.c
@@ -49,6 +49,11 @@ static u64 kvm_riscv_current_cycles(struct kvm_guest_timer 
*gt)
return kvm_riscv_scale_time(gt, host_time) + gt->time_delta;
 }
 
+u64 kvm_riscv_read_guest_time(struct kvm_vcpu *vcpu)
+{
+   return kvm_riscv_current_cycles(&vcpu->kvm->arch.timer);
+}
+
 static u64 kvm_riscv_delta_cycles2ns(u64 cycles,
 struct kvm_guest_timer *gt,
 struct kvm_vcpu_timer *t)
@@ -241,6 +246,11 @@ void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu)
csr_write(CSR_HTIMEDELTA, (u32)(gt->time_delta));
csr_write(CSR_HTIMEDELTAH, (u32)(gt->time_delta >> 32));
 #endif

[PATCH RFC 0/3] Implement guest time scaling in RISC-V KVM

2020-12-03 Thread Yifei Jiang
This series implements guest time scaling based on RDTIME instruction
emulation so that we can allow migrating Guest/VM across Hosts with
different time frequency.

Why not through para-virt. From arm's experience[1], para-virt implementation
doesn't really solve the problem for the following two main reasons:
- RDTIME not only be used in linux, but also in firmware and userspace.
- It is difficult to be compatible with nested virtualization.

[1] https://lore.kernel.org/patchwork/cover/1288153/

Yifei Jiang (3):
  RISC-V: KVM: Change the method of calculating cycles to nanoseconds
  RISC-V: KVM: Support dynamic time frequency from userspace
  RISC-V: KVM: Implement guest time scaling

 arch/riscv/include/asm/csr.h|  3 ++
 arch/riscv/include/asm/kvm_vcpu_timer.h | 13 +--
 arch/riscv/kvm/vcpu_exit.c  | 35 +
 arch/riscv/kvm/vcpu_timer.c | 51 ++---
 4 files changed, 93 insertions(+), 9 deletions(-)

-- 
2.19.1



[PATCH RFC 2/3] RISC-V: KVM: Support dynamic time frequency from userspace

2020-12-03 Thread Yifei Jiang
This patch implements KVM_S/GET_ONE_REG of time frequency to support
setting dynamic time frequency from userspace. When the time frequency
specified by userspace is inconsistent with host 'riscv_timebase',
it will use scale_mult and scale_shift to calculate guest scaling time.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 arch/riscv/include/asm/kvm_vcpu_timer.h |  9 ++
 arch/riscv/kvm/vcpu_timer.c | 40 +
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h 
b/arch/riscv/include/asm/kvm_vcpu_timer.h
index 87e00d878999..41b5503de9e4 100644
--- a/arch/riscv/include/asm/kvm_vcpu_timer.h
+++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
@@ -12,6 +12,10 @@
 #include 
 
 struct kvm_guest_timer {
+   u64 frequency;
+   bool need_scale;
+   u64 scale_mult;
+   u64 scale_shift;
    /* Time delta value */
u64 time_delta;
 };
@@ -38,4 +42,9 @@ int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
 void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu);
 int kvm_riscv_guest_timer_init(struct kvm *kvm);
 
+static inline bool kvm_riscv_need_scale(struct kvm_guest_timer *gt)
+{
+   return gt->need_scale;
+}
+
 #endif
diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c
index f6b35180199a..2d203660a7e9 100644
--- a/arch/riscv/kvm/vcpu_timer.c
+++ b/arch/riscv/kvm/vcpu_timer.c
@@ -15,9 +15,38 @@
 #include 
 #include 
 
+#define SCALE_SHIFT_VALUE 48
+#define SCALE_TOLERANCE_HZ 1000
+
+static void kvm_riscv_set_time_freq(struct kvm_guest_timer *gt, u64 freq)
+{
+   /*
+    * Guest time frequency and Host time frequency are identical
+* if the error between them is limited within SCALE_TOLERANCE_HZ.
+*/
+   u64 diff = riscv_timebase > freq ?
+  riscv_timebase - freq : freq - riscv_timebase;
+   gt->need_scale = (diff >= SCALE_TOLERANCE_HZ);
+   if (gt->need_scale) {
+   gt->scale_shift = SCALE_SHIFT_VALUE;
+   gt->scale_mult = mul_u64_u32_div(1ULL << gt->scale_shift,
+freq, riscv_timebase);
+   }
+   gt->frequency = freq;
+}
+
+static u64 kvm_riscv_scale_time(struct kvm_guest_timer *gt, u64 time)
+{
+   if (kvm_riscv_need_scale(gt))
+   return mul_u64_u64_shr(time, gt->scale_mult, gt->scale_shift);
+
+   return time;
+}
+
 static u64 kvm_riscv_current_cycles(struct kvm_guest_timer *gt)
 {
-   return get_cycles64() + gt->time_delta;
+   u64 host_time = get_cycles64();
+   return kvm_riscv_scale_time(gt, host_time) + gt->time_delta;
 }
 
 static u64 kvm_riscv_delta_cycles2ns(u64 cycles,
@@ -33,7 +62,7 @@ static u64 kvm_riscv_delta_cycles2ns(u64 cycles,
cycles_delta = cycles - cycles_now;
else
cycles_delta = 0;
-   delta_ns = mul_u64_u64_div_u64(cycles_delta, NSEC_PER_SEC, 
riscv_timebase);
+   delta_ns = mul_u64_u64_div_u64(cycles_delta, NSEC_PER_SEC, 
gt->frequency);
local_irq_restore(flags);
 
return delta_ns;
@@ -106,7 +135,7 @@ int kvm_riscv_vcpu_get_reg_timer(struct kvm_vcpu *vcpu,
 
switch (reg_num) {
case KVM_REG_RISCV_TIMER_REG(frequency):
-   reg_val = riscv_timebase;
+   reg_val = gt->frequency;
    break;
case KVM_REG_RISCV_TIMER_REG(time):
reg_val = kvm_riscv_current_cycles(gt);
@@ -150,10 +179,10 @@ int kvm_riscv_vcpu_set_reg_timer(struct kvm_vcpu *vcpu,
 
switch (reg_num) {
case KVM_REG_RISCV_TIMER_REG(frequency):
-   ret = -EOPNOTSUPP;
+   kvm_riscv_set_time_freq(gt, reg_val);
    break;
case KVM_REG_RISCV_TIMER_REG(time):
-   gt->time_delta = reg_val - get_cycles64();
+   gt->time_delta = reg_val - kvm_riscv_scale_time(gt, 
get_cycles64());
break;
case KVM_REG_RISCV_TIMER_REG(compare):
t->next_cycles = reg_val;
@@ -219,6 +248,7 @@ int kvm_riscv_guest_timer_init(struct kvm *kvm)
struct kvm_guest_timer *gt = &kvm->arch.timer;
 
gt->time_delta = -get_cycles64();
+   gt->frequency = riscv_timebase;
 
return 0;
 }
-- 
2.19.1



[PATCH net-next v5 8/9] net: dsa: microchip: ksz9477: remaining hardware time stamping support

2020-12-03 Thread Christian Eggers
Add data path routines required for TX hardware time stamping.

PTP mode is enabled depending on the filter setup (changes tail tag). TX
time stamps are reported via an interrupt / device registers whilst RX
time stamps are reported via an additional tail tag.

One step TX time stamping of PDelay_Resp requires the RX time stamp from
the associated PDelay_Req message. The user space PTP stack assumes that
the RX time stamp has already been subtracted from the PDelay_Req
correction field (as done by the ZHAW InES PTP time stamping core). It
will echo back the value of the correction field in the PDelay_Resp
message.

In order to be compatible to this already established interface, the
KSZ9563 code emulates this behavior. When processing the PDelay_Resp
message, the time stamp is moved back from the correction field to the
tail tag, as the hardware generates an invalid UDP checksum if this
field is negative.

Of course, the UDP checksums (if any) have to be corrected after this
(for both directions).

Everything has been tested on a Microchip KSZ9563 switch.

Signed-off-by: Christian Eggers 
---
Changes in v5:
--
- Fix compile error reported by kernel test robot
  (NET_DSA_TAG_KSZ must select NET_PTP_CLASSIFY)

Changes in v4:
--
- s/low active/active low/
- 80 chars per line
- Use IEEE 802.1AS mode (to suppress forwarding of PDelay messages)
- Enable/disable hardware timestaping at runtime (port_hwtstamp_set)
- Use mutex in port_hwtstamp_set
- Don't use port specific struct hwtstamp_config
- removed #ifdefs from tag_ksz.c
- Set port's tx_latency and rx_latency to 0
- added include/linux/dsa/ksz_common.h to MAINTAINERS

On Saturday, 21 November 2020, 02:26:11 CET, Vladimir Oltean wrote:
> If you don't like the #ifdef's, I am not in love with them either. But
> maybe Christian is just optimizing too aggressively, and doesn't actually
> need to put those #ifdef's there and provide stub implementations, but
> could actually just leave the ksz9477_rcv_timestamp and ksz9477_xmit_timestamp
> always compiled-in, and "dead at runtime" in the case there is no PTP.
I removed the #ifdefs.

> [...]
> The thing is, ptp4l already has ingressLatency and egressLatency
> settings, and I would not be surprised if those config options would get
> extended to cover values at multiple link speeds.
> 
> In the general case, the ksz9477 MAC could be attached to any external
> PHY, having its own propagation delay characteristics, or any number of
> other things that cause clock domain crossings. I'm not sure how feasible
> it is for the kernel to abstract this away completely, and adjust
> timestamps automatically based on any and all combinations of MAC and
> PHY. Maybe this is just wishful thinking.
> 
> Oh, and by the way, Christian, I'm not even sure if you aren't in fact
> just beating around the bush with these tstamp_rx_latency_ns and
> tstamp_tx_latency_ns values? I mean, the switch adds the latency value
> to the timestamps. And you, from the driver, read the value of the
> register, so you can subtract the value from the timestamp, to
> compensate for its correction. So, all in all, there is no net latency
> compensation seen by the outside world?! If that is the case, can't you
> just set the latency registers to zero, do your compensation from the
> application stack and call it a day?
At first I thought that I have to move these values to ptp4l.conf. But after
setting the hardware registers to zero, it turned out, that I also have to
use zero values in ptp4l.conf. So you are right.


On Monday, 23 November 2020, 13:09:38 CET, Vladimir Oltean wrote:
> On Mon, Nov 23, 2020 at 12:32:33PM +0100, Christian Eggers wrote:
> > please let me know, how I shall finally implement this. Enabling the PTP 
> > mode
> > on the switch and sending the extra 4 byte tail on tx must be done in sync.
> > Currently, both simply depends on the PTP define.
> 
> I, too, would prefer that the reconfiguration is done at ioctl time.
> Distributions typically enable whatever kernel config options they can.
> However, for users, the behavior should not change. Therefore the tail
> tag should remain small even though the PTP kernel config option is
> enabled, as long as hardware timestamping has not been explicitly
> enabled.
I moved this to port_hwtstamp_set. But I am not sure whether enabling PTP mode
should depend on tx_type or rx_filter.

> [...]
> When forwarding what packet? What profile are you testing with?
> What commands do you run?
> A P2P capable switch should not forward Peer delay messages.
With the 802.1AS settings, no SYNC/Announce messages are forwarded anymore.
Peer delay messages have never been forwarded.

 MAINTAINERS  |   1 +
 drivers/net/dsa/microchip/ksz9477_main.c |  12 +-
 drive

[PATCH net-next v5 7/9] net: dsa: microchip: ksz9477: initial hardware time stamping support

2020-12-03 Thread Christian Eggers
Add control routines required for TX hardware time stamping.

The KSZ9563 only supports one step time stamping
(HWTSTAMP_TX_ONESTEP_P2P), which requires linuxptp-2.0 or later.

Currently, only P2P delay measurement is supported. See patchwork
discussion and comments in ksz9477_ptp_init() for details:
https://patchwork.ozlabs.org/project/netdev/patch/20201019172435.4416-8-cegg...@arri.de/

Signed-off-by: Christian Eggers 
Reviewed-by: Vladimir Oltean 
---
Changes in v4:
--
- Remove useless case statement
- Reviewed-by: Vladimir Oltean 

 drivers/net/dsa/microchip/ksz9477_main.c |   6 +
 drivers/net/dsa/microchip/ksz9477_ptp.c  | 186 +++
 drivers/net/dsa/microchip/ksz9477_ptp.h  |  21 +++
 drivers/net/dsa/microchip/ksz_common.h   |   4 +
 4 files changed, 217 insertions(+)

diff --git a/drivers/net/dsa/microchip/ksz9477_main.c 
b/drivers/net/dsa/microchip/ksz9477_main.c
index 2cb33e9beb4c..0ade40bf27c7 100644
--- a/drivers/net/dsa/microchip/ksz9477_main.c
+++ b/drivers/net/dsa/microchip/ksz9477_main.c
@@ -1387,6 +1387,7 @@ static const struct dsa_switch_ops ksz9477_switch_ops = {
.phy_read   = ksz9477_phy_read16,
.phy_write  = ksz9477_phy_write16,
.phylink_mac_link_down  = ksz_mac_link_down,
+   .get_ts_info= ksz9477_ptp_get_ts_info,
.port_enable= ksz_enable_port,
.get_strings= ksz9477_get_strings,
.get_ethtool_stats  = ksz_get_ethtool_stats,
@@ -1407,6 +1408,11 @@ static const struct dsa_switch_ops ksz9477_switch_ops = {
.port_mdb_del   = ksz9477_port_mdb_del,
.port_mirror_add= ksz9477_port_mirror_add,
.port_mirror_del= ksz9477_port_mirror_del,
+   .port_hwtstamp_get  = ksz9477_ptp_port_hwtstamp_get,
+   .port_hwtstamp_set  = ksz9477_ptp_port_hwtstamp_set,
+   .port_txtstamp  = NULL,
+   /* never defer rx delivery, tstamping is done via tail tagging */
+   .port_rxtstamp  = NULL,
 };
 
 static u32 ksz9477_get_port_addr(int port, int offset)
diff --git a/drivers/net/dsa/microchip/ksz9477_ptp.c 
b/drivers/net/dsa/microchip/ksz9477_ptp.c
index 0ffc4504a290..a1ca1923ec0c 100644
--- a/drivers/net/dsa/microchip/ksz9477_ptp.c
+++ b/drivers/net/dsa/microchip/ksz9477_ptp.c
@@ -218,6 +218,18 @@ static int ksz9477_ptp_enable(struct ptp_clock_info *ptp,
return -EOPNOTSUPP;
 }
 
+static long ksz9477_ptp_do_aux_work(struct ptp_clock_info *ptp)
+{
+   struct ksz_device *dev = container_of(ptp, struct ksz_device, ptp_caps);
+   struct timespec64 ts;
+
+   mutex_lock(&dev->ptp_mutex);
+   _ksz9477_ptp_gettime(dev, &ts);
+   mutex_unlock(&dev->ptp_mutex);
+
+   return HZ;  /* reschedule in 1 second */
+}
+
 static int ksz9477_ptp_start_clock(struct ksz_device *dev)
 {
u16 data;
@@ -257,6 +269,54 @@ static int ksz9477_ptp_stop_clock(struct ksz_device *dev)
return ksz_write16(dev, REG_PTP_CLK_CTRL, data);
 }
 
+/* device attributes */
+
+enum ksz9477_ptp_tcmode {
+   KSZ9477_PTP_TCMODE_E2E,
+   KSZ9477_PTP_TCMODE_P2P,
+};
+
+static int ksz9477_ptp_tcmode_set(struct ksz_device *dev,
+ enum ksz9477_ptp_tcmode tcmode)
+{
+   u16 data;
+   int ret;
+
+   ret = ksz_read16(dev, REG_PTP_MSG_CONF1, &data);
+   if (ret)
+   return ret;
+
+   if (tcmode == KSZ9477_PTP_TCMODE_P2P)
+   data |= PTP_TC_P2P;
+   else
+   data &= ~PTP_TC_P2P;
+
+   return ksz_write16(dev, REG_PTP_MSG_CONF1, data);
+}
+
+enum ksz9477_ptp_ocmode {
+   KSZ9477_PTP_OCMODE_SLAVE,
+   KSZ9477_PTP_OCMODE_MASTER,
+};
+
+static int ksz9477_ptp_ocmode_set(struct ksz_device *dev,
+ enum ksz9477_ptp_ocmode ocmode)
+{
+   u16 data;
+   int ret;
+
+   ret = ksz_read16(dev, REG_PTP_MSG_CONF1, &data);
+   if (ret)
+   return ret;
+
+   if (ocmode == KSZ9477_PTP_OCMODE_MASTER)
+   data |= PTP_MASTER;
+   else
+   data &= ~PTP_MASTER;
+
+   return ksz_write16(dev, REG_PTP_MSG_CONF1, data);
+}
+
 int ksz9477_ptp_init(struct ksz_device *dev)
 {
int ret;
@@ -282,6 +342,7 @@ int ksz9477_ptp_init(struct ksz_device *dev)
dev->ptp_caps.gettime64   = ksz9477_ptp_gettime;
dev->ptp_caps.settime64   = ksz9477_ptp_settime;
dev->ptp_caps.enable  = ksz9477_ptp_enable;
+   dev->ptp_caps.do_aux_work = ksz9477_ptp_do_aux_work;
 
/* Start hardware counter (will overflow after 136 years) */
ret = ksz9477_ptp_start_clock(dev);
@@ -294,8 +355,31 @@ int ksz9477_ptp_init(struct ksz_device *dev)
goto error_stop_clock;
}
 
+   /* Currently, only P2P delay measurement is supported.  Setting ocmode
+* to slave will work independently of actually being master or slave.
+

Re: [PATCH v2 net-next] net: ipa: fix build-time bug in ipa_hardware_config_qsb()

2020-12-02 Thread Jakub Kicinski
On Wed,  2 Dec 2020 08:15:02 -0600 Alex Elder wrote:
> Jon Hunter reported observing a build bug in the IPA driver:
>   
> https://lore.kernel.org/netdev/5b5d9d40-94d5-5dad-b861-fd9bef826...@nvidia.com
> 
> The problem is that the QMB0 max read value set for IPA v4.5 (16) is
> too large to fit in the 4-bit field.
> 
> The actual value we want is 0, which requests that the hardware use
> the maximum it is capable of.
> 
> Reported-by: Jon Hunter 
> Tested-by: Jon Hunter 
> Signed-off-by: Alex Elder 

Applied, thanks!


[PATCH net v3] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson
Don't try to adjust XFRM support flags if the bond device isn't yet
registered. Bad things can currently happen when netdev_change_features()
is called without having wanted_features fully filled in yet. This code
runs on post-module-load mode changes, as well as at module init time
and new bond creation time, and in the latter two scenarios, it is
running prior to register_netdevice() having been called and
subsequently filling in wanted_features. The empty wanted_features led
to features also getting emptied out, which was definitely not the
intended behavior, so prevent that from happening.

Originally, I'd hoped to stop adjusting wanted_features at all in the
bonding driver, as it's documented as being something only the network
core should touch, but we actually do need to do this to properly update
both the features and wanted_features fields when changing the bond type,
or we get to a situation where ethtool sees:

esp-hw-offload: off [requested on]

I do think we should be using netdev_update_features instead of
netdev_change_features here though, so we only send notifiers when the
features actually changed.

v2: rework based on further testing and suggestions from ivecera
v3: add helper function, remove goto, fix problem description

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
Reported-by: Ivan Vecera 
Suggested-by: Ivan Vecera 
Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c| 10 --
 drivers/net/bonding/bond_options.c | 19 ++-
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 47afc5938c26..7905534a763b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4747,15 +4747,13 @@ void bond_setup(struct net_device *bond_dev)
NETIF_F_HW_VLAN_CTAG_FILTER;
 
bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
-#ifdef CONFIG_XFRM_OFFLOAD
-   bond_dev->hw_features |= BOND_XFRM_FEATURES;
-#endif /* CONFIG_XFRM_OFFLOAD */
bond_dev->features |= bond_dev->hw_features;
bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
 #ifdef CONFIG_XFRM_OFFLOAD
-   /* Disable XFRM features if this isn't an active-backup config */
-   if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
-   bond_dev->features &= ~BOND_XFRM_FEATURES;
+   bond_dev->hw_features |= BOND_XFRM_FEATURES;
+   /* Only enable XFRM features if this is an active-backup config */
+   if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
+   bond_dev->features |= BOND_XFRM_FEATURES;
 #endif /* CONFIG_XFRM_OFFLOAD */
 }
 
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 9abfaae1c6f7..1ae0e5ab8c67 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -745,6 +745,18 @@ const struct bond_option *bond_opt_get(unsigned int option)
return &bond_opts[option];
 }
 
+#ifdef CONFIG_XFRM_OFFLOAD
+static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode)
+{
+   if (mode == BOND_MODE_ACTIVEBACKUP)
+   bond_dev->wanted_features |= BOND_XFRM_FEATURES;
+   else
+   bond_dev->wanted_features &= ~BOND_XFRM_FEATURES;
+
+   netdev_update_features(bond_dev);
+}
+#endif /* CONFIG_XFRM_OFFLOAD */
+
 static int bond_option_mode_set(struct bonding *bond,
const struct bond_opt_value *newval)
 {
@@ -768,11 +780,8 @@ static int bond_option_mode_set(struct bonding *bond,
bond->params.tlb_dynamic_lb = 1;
 
 #ifdef CONFIG_XFRM_OFFLOAD
-   if (newval->value == BOND_MODE_ACTIVEBACKUP)
-   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
-   else
-   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-   netdev_change_features(bond->dev);
+   if (bond->dev->reg_state == NETREG_REGISTERED)
+   bond_set_xfrm_features(bond->dev, newval->value);
 #endif /* CONFIG_XFRM_OFFLOAD */
 
/* don't cache arp_validate between modes */
-- 
2.28.0



Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson
On Wed, Dec 2, 2020 at 3:17 PM Jay Vosburgh  wrote:
>
> Jarod Wilson  wrote:
>
> >On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh  
> >wrote:
> >>
> >> Jarod Wilson  wrote:
> >>
> >> >Don't try to adjust XFRM support flags if the bond device isn't yet
> >> >registered. Bad things can currently happen when netdev_change_features()
> >> >is called without having wanted_features fully filled in yet. Basically,
> >> >this code was racing against register_netdevice() filling in
> >> >wanted_features, and when it got there first, the empty wanted_features
> >> >led to features also getting emptied out, which was definitely not the
> >> >intended behavior, so prevent that from happening.
> >>
> >> Is this an actual race?  Reading Ivan's prior message, it sounds
> >> like it's an ordering problem (in that bond_newlink calls
> >> register_netdevice after bond_changelink).
> >
> >Sorry, yeah, this is not actually a race condition, just an ordering
> >issue, bond_check_params() gets called at init time, which leads to
> >bond_option_mode_set() being called, and does so prior to
> >bond_create() running, which is where we actually call
> >register_netdevice().
>
> So this only happens if there's a "mode" module parameter?  That
> doesn't sound like the call path that Ivan described (coming in via
> bond_newlink).

Ah. I think there's actually two different pathways that can trigger
this. The first is for bonds created at module load time, which I was
describing, the second is for a new bond created via bond_newlink()
after the bonding module is already loaded, as described by Ivan. Both
have the problem of bond_option_mode_set() running prior to
register_netdevice(). Of course, that would suggest every bond
currently comes up with unintentionally neutered flags, which I
neglected to catch in earlier testing and development.

-- 
Jarod Wilson
ja...@redhat.com



Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jay Vosburgh
Jarod Wilson  wrote:

>On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh  
>wrote:
>>
>> Jarod Wilson  wrote:
>>
>> >Don't try to adjust XFRM support flags if the bond device isn't yet
>> >registered. Bad things can currently happen when netdev_change_features()
>> >is called without having wanted_features fully filled in yet. Basically,
>> >this code was racing against register_netdevice() filling in
>> >wanted_features, and when it got there first, the empty wanted_features
>> >led to features also getting emptied out, which was definitely not the
>> >intended behavior, so prevent that from happening.
>>
>> Is this an actual race?  Reading Ivan's prior message, it sounds
>> like it's an ordering problem (in that bond_newlink calls
>> register_netdevice after bond_changelink).
>
>Sorry, yeah, this is not actually a race condition, just an ordering
>issue, bond_check_params() gets called at init time, which leads to
>bond_option_mode_set() being called, and does so prior to
>bond_create() running, which is where we actually call
>register_netdevice().

So this only happens if there's a "mode" module parameter?  That
doesn't sound like the call path that Ivan described (coming in via
bond_newlink).

-J

>> The change to bond_option_mode_set tests against reg_state, so
>> presumably it wants to skip the first(?) time through, before the
>> register_netdevice call; is that right?
>
>Correct. Later on, when the bonding driver is already loaded, and
>parameter changes are made, bond_option_mode_set() gets called and if
>the mode changes to or from active-backup, we do need/want this code
>to run to update wanted and features flags properly.
>
>
>-- 
>Jarod Wilson
>ja...@redhat.com

---
-Jay Vosburgh, jay.vosbu...@canonical.com


Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson
On Wed, Dec 2, 2020 at 2:23 PM Jakub Kicinski  wrote:
>
> On Wed, 2 Dec 2020 14:03:53 -0500 Jarod Wilson wrote:
> > On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski  wrote:
> > >
> > > On Wed,  2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote:
> > > > + if (bond->dev->reg_state != NETREG_REGISTERED)
> > > > + goto noreg;
> > > > +
> > > >   if (newval->value == BOND_MODE_ACTIVEBACKUP)
> > > >   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
> > > >   else
> > > >   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> > > > - netdev_change_features(bond->dev);
> > > > + netdev_update_features(bond->dev);
> > > > +noreg:
> > >
> > > Why the goto?
> >
> > Seemed cleaner to prevent an extra level of indentation of the code
> > following the goto and before the label, but I'm not that attached to
> > it if it's not wanted for coding style reasons.
>
> Yes, please don't use gotos where a normal if statement is sufficient.
> If you must avoid the indentation move the code to a helper.
>
> Also - this patch did not apply to net, please make sure you're
> developing on the correct base.

Argh, I must have been working in net-next instead of net, apologies.
Okay, I'll clarify the description per what Jay pointed out and adjust
the code to not include a goto, then make it on the right branch.

-- 
Jarod Wilson
ja...@redhat.com



Re: [PATCH 1/5] sched/cputime: Remove symbol exports from IRQ time accounting

2020-12-02 Thread Christian Borntraeger



On 02.12.20 12:57, Frederic Weisbecker wrote:
> account_irq_enter_time() and account_irq_exit_time() are not called
> from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ
> cputime accounting functions called from there.
> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Peter Zijlstra 
> Cc: Tony Luck 
> Cc: Fenghua Yu 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: Christian Borntraeger 
> ---
>  arch/s390/kernel/vtime.c | 10 +-
>  kernel/sched/cputime.c   |  2 --
>  2 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
> index 8df10d3c8f6c..f9f2a11958a5 100644
> --- a/arch/s390/kernel/vtime.c
> +++ b/arch/s390/kernel/vtime.c
> @@ -226,7 +226,7 @@ void vtime_flush(struct task_struct *tsk)
>   * Update process times based on virtual cpu times stored by entry.S
>   * to the lowcore fields user_timer, system_timer & steal_clock.
>   */
> -void vtime_account_irq_enter(struct task_struct *tsk)
> +void vtime_account_kernel(struct task_struct *tsk)
>  {
>   u64 timer;
>  
> @@ -245,12 +245,12 @@ void vtime_account_irq_enter(struct task_struct *tsk)
>  
>   virt_timer_forward(timer);
>  }
> -EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
> -
> -void vtime_account_kernel(struct task_struct *tsk)
> -__attribute__((alias("vtime_account_irq_enter")));
>  EXPORT_SYMBOL_GPL(vtime_account_kernel);
>  
> +void vtime_account_irq_enter(struct task_struct *tsk)
> +__attribute__((alias("vtime_account_kernel")));
> +
> +

One new line is enough I think. Apart from that this looks sane from an s390 
perspective.
Acked-by: Christian Borntraeger 


Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson
On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh  wrote:
>
> Jarod Wilson  wrote:
>
> >Don't try to adjust XFRM support flags if the bond device isn't yet
> >registered. Bad things can currently happen when netdev_change_features()
> >is called without having wanted_features fully filled in yet. Basically,
> >this code was racing against register_netdevice() filling in
> >wanted_features, and when it got there first, the empty wanted_features
> >led to features also getting emptied out, which was definitely not the
> >intended behavior, so prevent that from happening.
>
> Is this an actual race?  Reading Ivan's prior message, it sounds
> like it's an ordering problem (in that bond_newlink calls
> register_netdevice after bond_changelink).

Sorry, yeah, this is not actually a race condition, just an ordering
issue, bond_check_params() gets called at init time, which leads to
bond_option_mode_set() being called, and does so prior to
bond_create() running, which is where we actually call
register_netdevice().

> The change to bond_option_mode_set tests against reg_state, so
> presumably it wants to skip the first(?) time through, before the
> register_netdevice call; is that right?

Correct. Later on, when the bonding driver is already loaded, and
parameter changes are made, bond_option_mode_set() gets called and if
the mode changes to or from active-backup, we do need/want this code
to run to update wanted and features flags properly.


-- 
Jarod Wilson
ja...@redhat.com



[tip: irq/core] sched/vtime: Consolidate IRQ time accounting

2020-12-02 Thread tip-bot2 for Frederic Weisbecker
The following commit has been merged into the irq/core branch of tip:

Commit-ID: 8a6a5920d3286eb0eae9f36a4ec4fc9df511eccb
Gitweb:
https://git.kernel.org/tip/8a6a5920d3286eb0eae9f36a4ec4fc9df511eccb
Author:Frederic Weisbecker 
AuthorDate:Wed, 02 Dec 2020 12:57:30 +01:00
Committer: Thomas Gleixner 
CommitterDate: Wed, 02 Dec 2020 20:20:05 +01:00

sched/vtime: Consolidate IRQ time accounting

The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
all have their own version of irq time accounting that dispatch the
cputime to the appropriate index: hardirq, softirq, system, idle,
guest... from an all-in-one function.

Instead of having these ad-hoc versions, move the cputime destination
dispatch decision to the core code and leave only the actual per-index
cputime accounting to the architecture.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20201202115732.27827-4-frede...@kernel.org

---
 arch/ia64/kernel/time.c| 20 +
 arch/powerpc/kernel/time.c | 56 ++---
 arch/s390/kernel/vtime.c   | 45 +-
 include/linux/vtime.h  | 16 ---
 kernel/sched/cputime.c | 13 ++---
 5 files changed, 102 insertions(+), 48 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 7abc5f3..733e0e3 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -138,12 +138,8 @@ void vtime_account_kernel(struct task_struct *tsk)
struct thread_info *ti = task_thread_info(tsk);
__u64 stime = vtime_delta(tsk);
 
-   if ((tsk->flags & PF_VCPU) && !irq_count())
+   if (tsk->flags & PF_VCPU)
ti->gtime += stime;
-   else if (hardirq_count())
-   ti->hardirq_time += stime;
-   else if (in_serving_softirq())
-   ti->softirq_time += stime;
else
ti->stime += stime;
 }
@@ -156,6 +152,20 @@ void vtime_account_idle(struct task_struct *tsk)
ti->idle_time += vtime_delta(tsk);
 }
 
+void vtime_account_softirq(struct task_struct *tsk)
+{
+   struct thread_info *ti = task_thread_info(tsk);
+
+   ti->softirq_time += vtime_delta(tsk);
+}
+
+void vtime_account_hardirq(struct task_struct *tsk)
+{
+   struct thread_info *ti = task_thread_info(tsk);
+
+   ti->hardirq_time += vtime_delta(tsk);
+}
+
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 static irqreturn_t
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 74efe46..cf3f8db 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -311,12 +311,11 @@ static unsigned long vtime_delta_scaled(struct 
cpu_accounting_data *acct,
return stime_scaled;
 }
 
-static unsigned long vtime_delta(struct task_struct *tsk,
+static unsigned long vtime_delta(struct cpu_accounting_data *acct,
 unsigned long *stime_scaled,
 unsigned long *steal_time)
 {
unsigned long now, stime;
-   struct cpu_accounting_data *acct = get_accounting(tsk);
 
WARN_ON_ONCE(!irqs_disabled());
 
@@ -331,29 +330,30 @@ static unsigned long vtime_delta(struct task_struct *tsk,
return stime;
 }
 
+static void vtime_delta_kernel(struct cpu_accounting_data *acct,
+  unsigned long *stime, unsigned long 
*stime_scaled)
+{
+   unsigned long steal_time;
+
+   *stime = vtime_delta(acct, stime_scaled, &steal_time);
+   *stime -= min(*stime, steal_time);
+   acct->steal_time += steal_time;
+}
+
 void vtime_account_kernel(struct task_struct *tsk)
 {
-   unsigned long stime, stime_scaled, steal_time;
struct cpu_accounting_data *acct = get_accounting(tsk);
+   unsigned long stime, stime_scaled;
 
-   stime = vtime_delta(tsk, &stime_scaled, &steal_time);
-
-   stime -= min(stime, steal_time);
-   acct->steal_time += steal_time;
+   vtime_delta_kernel(acct, &stime, &stime_scaled);
 
-   if ((tsk->flags & PF_VCPU) && !irq_count()) {
+   if (tsk->flags & PF_VCPU) {
acct->gtime += stime;
 #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
acct->utime_scaled += stime_scaled;
 #endif
} else {
-   if (hardirq_count())
-   acct->hardirq_time += stime;
-   else if (in_serving_softirq())
-   acct->softirq_time += stime;
-   else
-   acct->stime += stime;
-
+   acct->stime += stime;
 #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
acct->stime_scaled += stime_scaled;
 #endif
@@ -366,10 +366,34 @@ void vtime_account_idle(struct task_struct *tsk)
unsigned long stime, stime_scaled, steal_time;
struct cpu_accounting_data *acct = get_accounting(tsk);

[tip: irq/core] sched/cputime: Remove symbol exports from IRQ time accounting

2020-12-02 Thread tip-bot2 for Frederic Weisbecker
The following commit has been merged into the irq/core branch of tip:

Commit-ID: 7197688b2006357da75a014e0a76be89ca9c2d46
Gitweb:
https://git.kernel.org/tip/7197688b2006357da75a014e0a76be89ca9c2d46
Author:Frederic Weisbecker 
AuthorDate:Wed, 02 Dec 2020 12:57:28 +01:00
Committer: Thomas Gleixner 
CommitterDate: Wed, 02 Dec 2020 20:20:04 +01:00

sched/cputime: Remove symbol exports from IRQ time accounting

account_irq_enter_time() and account_irq_exit_time() are not called
from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ
cputime accounting functions called from there.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20201202115732.27827-2-frede...@kernel.org

---
 arch/s390/kernel/vtime.c | 10 +-
 kernel/sched/cputime.c   |  2 --
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 8df10d3..f9f2a11 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -226,7 +226,7 @@ void vtime_flush(struct task_struct *tsk)
  * Update process times based on virtual cpu times stored by entry.S
  * to the lowcore fields user_timer, system_timer & steal_clock.
  */
-void vtime_account_irq_enter(struct task_struct *tsk)
+void vtime_account_kernel(struct task_struct *tsk)
 {
u64 timer;
 
@@ -245,12 +245,12 @@ void vtime_account_irq_enter(struct task_struct *tsk)
 
virt_timer_forward(timer);
 }
-EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
-
-void vtime_account_kernel(struct task_struct *tsk)
-__attribute__((alias("vtime_account_irq_enter")));
 EXPORT_SYMBOL_GPL(vtime_account_kernel);
 
+void vtime_account_irq_enter(struct task_struct *tsk)
+__attribute__((alias("vtime_account_kernel")));
+
+
 /*
  * Sorted add to a list. List is linear searched until first bigger
  * element is found.
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5a55d23..61ce9f9 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -71,7 +71,6 @@ void irqtime_account_irq(struct task_struct *curr)
else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
 }
-EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
 static u64 irqtime_tick_accounted(u64 maxtime)
 {
@@ -434,7 +433,6 @@ void vtime_account_irq_enter(struct task_struct *tsk)
else
vtime_account_kernel(tsk);
 }
-EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
 #endif /* __ARCH_HAS_VTIME_ACCOUNT */
 
 void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,


Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jakub Kicinski
On Wed, 2 Dec 2020 14:03:53 -0500 Jarod Wilson wrote:
> On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski  wrote:
> >
> > On Wed,  2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote:  
> > > + if (bond->dev->reg_state != NETREG_REGISTERED)
> > > + goto noreg;
> > > +
> > >   if (newval->value == BOND_MODE_ACTIVEBACKUP)
> > >   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
> > >   else
> > >   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> > > - netdev_change_features(bond->dev);
> > > + netdev_update_features(bond->dev);
> > > +noreg:  
> >
> > Why the goto?  
> 
> Seemed cleaner to prevent an extra level of indentation of the code
> following the goto and before the label, but I'm not that attached to
> it if it's not wanted for coding style reasons.

Yes, please don't use gotos where a normal if statement is sufficient.
If you must avoid the indentation move the code to a helper.

Also - this patch did not apply to net, please make sure you're
developing on the correct base.


Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson
On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski  wrote:
>
> On Wed,  2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote:
> > + if (bond->dev->reg_state != NETREG_REGISTERED)
> > + goto noreg;
> > +
> >   if (newval->value == BOND_MODE_ACTIVEBACKUP)
> >   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
> >   else
> >   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> > - netdev_change_features(bond->dev);
> > + netdev_update_features(bond->dev);
> > +noreg:
>
> Why the goto?

Seemed cleaner to prevent an extra level of indentation of the code
following the goto and before the label, but I'm not that attached to
it if it's not wanted for coding style reasons.

-- 
Jarod Wilson
ja...@redhat.com



Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jay Vosburgh
Jarod Wilson  wrote:

>Don't try to adjust XFRM support flags if the bond device isn't yet
>registered. Bad things can currently happen when netdev_change_features()
>is called without having wanted_features fully filled in yet. Basically,
>this code was racing against register_netdevice() filling in
>wanted_features, and when it got there first, the empty wanted_features
>led to features also getting emptied out, which was definitely not the
>intended behavior, so prevent that from happening.

Is this an actual race?  Reading Ivan's prior message, it sounds
like it's an ordering problem (in that bond_newlink calls
register_netdevice after bond_changelink).

The change to bond_option_mode_set tests against reg_state, so
presumably it wants to skip the first(?) time through, before the
register_netdevice call; is that right?

-J

>Originally, I'd hoped to stop adjusting wanted_features at all in the
>bonding driver, as it's documented as being something only the network
>core should touch, but we actually do need to do this to properly update
>both the features and wanted_features fields when changing the bond type,
>or we get to a situation where ethtool sees:
>
>esp-hw-offload: off [requested on]
>
>I do think we should be using netdev_update_features instead of
>netdev_change_features here though, so we only send notifiers when the
>features actually changed.
>
>v2: rework based on further testing and suggestions from ivecera
>
>Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
>Reported-by: Ivan Vecera 
>Suggested-by: Ivan Vecera 
>Cc: Jay Vosburgh 
>Cc: Veaceslav Falico 
>Cc: Andy Gospodarek 
>Cc: "David S. Miller" 
>Cc: Jakub Kicinski 
>Cc: Thomas Davis 
>Cc: net...@vger.kernel.org
>Signed-off-by: Jarod Wilson 
>---
> drivers/net/bonding/bond_main.c| 10 --
> drivers/net/bonding/bond_options.c |  6 +-
> 2 files changed, 9 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index e0880a3840d7..5fe5232cc3f3 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev)
>   NETIF_F_HW_VLAN_CTAG_FILTER;
> 
>   bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
>-#ifdef CONFIG_XFRM_OFFLOAD
>-  bond_dev->hw_features |= BOND_XFRM_FEATURES;
>-#endif /* CONFIG_XFRM_OFFLOAD */
>   bond_dev->features |= bond_dev->hw_features;
>   bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
> #ifdef CONFIG_XFRM_OFFLOAD
>-  /* Disable XFRM features if this isn't an active-backup config */
>-  if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
>-  bond_dev->features &= ~BOND_XFRM_FEATURES;
>+  bond_dev->hw_features |= BOND_XFRM_FEATURES;
>+  /* Only enable XFRM features if this is an active-backup config */
>+  if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
>+  bond_dev->features |= BOND_XFRM_FEATURES;
> #endif /* CONFIG_XFRM_OFFLOAD */
> }
> 
>diff --git a/drivers/net/bonding/bond_options.c 
>b/drivers/net/bonding/bond_options.c
>index 9abfaae1c6f7..19205cfac751 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond,
>   bond->params.tlb_dynamic_lb = 1;
> 
> #ifdef CONFIG_XFRM_OFFLOAD
>+  if (bond->dev->reg_state != NETREG_REGISTERED)
>+  goto noreg;
>+
>   if (newval->value == BOND_MODE_ACTIVEBACKUP)
>   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
>   else
>   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
>-  netdev_change_features(bond->dev);
>+  netdev_update_features(bond->dev);
>+noreg:
>
> #endif /* CONFIG_XFRM_OFFLOAD */
> 
>   /* don't cache arp_validate between modes */
>-- 
>2.28.0
>

---
-Jay Vosburgh, jay.vosbu...@canonical.com


Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jakub Kicinski
On Wed,  2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote:
> + if (bond->dev->reg_state != NETREG_REGISTERED)
> + goto noreg;
> +
>   if (newval->value == BOND_MODE_ACTIVEBACKUP)
>   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
>   else
>   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> - netdev_change_features(bond->dev);
> + netdev_update_features(bond->dev);
> +noreg:

Why the goto?


Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Ivan Vecera
On Wed,  2 Dec 2020 12:30:53 -0500
Jarod Wilson  wrote:

> Don't try to adjust XFRM support flags if the bond device isn't yet
> registered. Bad things can currently happen when netdev_change_features()
> is called without having wanted_features fully filled in yet. Basically,
> this code was racing against register_netdevice() filling in
> wanted_features, and when it got there first, the empty wanted_features
> led to features also getting emptied out, which was definitely not the
> intended behavior, so prevent that from happening.
> 
> Originally, I'd hoped to stop adjusting wanted_features at all in the
> bonding driver, as it's documented as being something only the network
> core should touch, but we actually do need to do this to properly update
> both the features and wanted_features fields when changing the bond type,
> or we get to a situation where ethtool sees:
> 
> esp-hw-offload: off [requested on]
> 
> I do think we should be using netdev_update_features instead of
> netdev_change_features here though, so we only send notifiers when the
> features actually changed.
> 
> v2: rework based on further testing and suggestions from ivecera
> 
> Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
> Reported-by: Ivan Vecera 
> Suggested-by: Ivan Vecera 
> Cc: Jay Vosburgh 
> Cc: Veaceslav Falico 
> Cc: Andy Gospodarek 
> Cc: "David S. Miller" 
> Cc: Jakub Kicinski 
> Cc: Thomas Davis 
> Cc: net...@vger.kernel.org
> Signed-off-by: Jarod Wilson 
> ---
>  drivers/net/bonding/bond_main.c| 10 --
>  drivers/net/bonding/bond_options.c |  6 +-
>  2 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index e0880a3840d7..5fe5232cc3f3 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev)
>   NETIF_F_HW_VLAN_CTAG_FILTER;
>  
>   bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
> -#ifdef CONFIG_XFRM_OFFLOAD
> - bond_dev->hw_features |= BOND_XFRM_FEATURES;
> -#endif /* CONFIG_XFRM_OFFLOAD */
>   bond_dev->features |= bond_dev->hw_features;
>   bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
>  #ifdef CONFIG_XFRM_OFFLOAD
> - /* Disable XFRM features if this isn't an active-backup config */
> - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> - bond_dev->features &= ~BOND_XFRM_FEATURES;
> + bond_dev->hw_features |= BOND_XFRM_FEATURES;
> + /* Only enable XFRM features if this is an active-backup config */
> + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
> + bond_dev->features |= BOND_XFRM_FEATURES;
>  #endif /* CONFIG_XFRM_OFFLOAD */
>  }
>  
> diff --git a/drivers/net/bonding/bond_options.c 
> b/drivers/net/bonding/bond_options.c
> index 9abfaae1c6f7..19205cfac751 100644
> --- a/drivers/net/bonding/bond_options.c
> +++ b/drivers/net/bonding/bond_options.c
> @@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond,
>   bond->params.tlb_dynamic_lb = 1;
>  
>  #ifdef CONFIG_XFRM_OFFLOAD
> + if (bond->dev->reg_state != NETREG_REGISTERED)
> + goto noreg;
> +
>   if (newval->value == BOND_MODE_ACTIVEBACKUP)
>   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
>   else
>   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> - netdev_change_features(bond->dev);
> + netdev_update_features(bond->dev);
> +noreg:
>  #endif /* CONFIG_XFRM_OFFLOAD */
>  
>   /* don't cache arp_validate between modes */

Tested-by: Ivan Vecera 



[PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson
Don't try to adjust XFRM support flags if the bond device isn't yet
registered. Bad things can currently happen when netdev_change_features()
is called without having wanted_features fully filled in yet. Basically,
this code was racing against register_netdevice() filling in
wanted_features, and when it got there first, the empty wanted_features
led to features also getting emptied out, which was definitely not the
intended behavior, so prevent that from happening.

Originally, I'd hoped to stop adjusting wanted_features at all in the
bonding driver, as it's documented as being something only the network
core should touch, but we actually do need to do this to properly update
both the features and wanted_features fields when changing the bond type,
or we get to a situation where ethtool sees:

esp-hw-offload: off [requested on]

I do think we should be using netdev_update_features instead of
netdev_change_features here though, so we only send notifiers when the
features actually changed.

v2: rework based on further testing and suggestions from ivecera

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
Reported-by: Ivan Vecera 
Suggested-by: Ivan Vecera 
Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c| 10 --
 drivers/net/bonding/bond_options.c |  6 +-
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e0880a3840d7..5fe5232cc3f3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev)
NETIF_F_HW_VLAN_CTAG_FILTER;
 
bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
-#ifdef CONFIG_XFRM_OFFLOAD
-   bond_dev->hw_features |= BOND_XFRM_FEATURES;
-#endif /* CONFIG_XFRM_OFFLOAD */
bond_dev->features |= bond_dev->hw_features;
bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
 #ifdef CONFIG_XFRM_OFFLOAD
-   /* Disable XFRM features if this isn't an active-backup config */
-   if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
-   bond_dev->features &= ~BOND_XFRM_FEATURES;
+   bond_dev->hw_features |= BOND_XFRM_FEATURES;
+   /* Only enable XFRM features if this is an active-backup config */
+   if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
+   bond_dev->features |= BOND_XFRM_FEATURES;
 #endif /* CONFIG_XFRM_OFFLOAD */
 }
 
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 9abfaae1c6f7..19205cfac751 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond,
bond->params.tlb_dynamic_lb = 1;
 
 #ifdef CONFIG_XFRM_OFFLOAD
+   if (bond->dev->reg_state != NETREG_REGISTERED)
+   goto noreg;
+
if (newval->value == BOND_MODE_ACTIVEBACKUP)
bond->dev->wanted_features |= BOND_XFRM_FEATURES;
else
bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-   netdev_change_features(bond->dev);
+   netdev_update_features(bond->dev);
+noreg:
 #endif /* CONFIG_XFRM_OFFLOAD */
 
/* don't cache arp_validate between modes */
-- 
2.28.0



<    1   2   3   4   5   6   7   8   9   10   >