Re: [PATCH v2 1/2] certs: Trigger creation of RSA module signing key if it's not an RSA key

2021-04-08 Thread Mimi Zohar
On Thu, 2021-04-08 at 11:24 -0400, Stefan Berger wrote:
> Address a kbuild issue where a developer created an ECDSA key for signing
> kernel modules and then builds an older version of the kernel, when bi-
> secting the kernel for example, that does not support ECDSA keys.
> 
> Trigger the creation of an RSA module signing key if it is not an RSA key.
> 
> Fixes: cfc411e7fff3 ("Move certificate handling to its own directory")
> Signed-off-by: Stefan Berger 

Thanks, Stefan.

Reviewed-by: Mimi Zohar 



Re: [PATCH v3 1/2] kunit: add a KUnit test for SLUB debugging functionality

2021-04-08 Thread Daniel Latypov
On Thu, Apr 8, 2021 at 3:30 AM Marco Elver  wrote:
>
> On Tue, 6 Apr 2021 at 12:57, Vlastimil Babka  wrote:
> >
> >
> > On 4/1/21 11:24 PM, Marco Elver wrote:
> > > On Thu, 1 Apr 2021 at 21:04, Daniel Latypov  wrote:
> > >> > }
> > >> > #else
> > >> > static inline bool slab_add_kunit_errors(void) { return false; 
> > >> > }
> > >> > #endif
> > >> >
> > >> > And anywhere you want to increase the error count, you'd call
> > >> > slab_add_kunit_errors().
> > >> >
> > >> > Another benefit of this approach is that if KUnit is disabled, there is
> > >> > zero overhead and no additional code generated (vs. the current
> > >> > approach).
> > >>
> > >> The resource approach looks really good, but...
> > >> You'd be picking up a dependency on
> > >> https://lore.kernel.org/linux-kselftest/20210311152314.3814916-2-dlaty...@google.com/
> > >> current->kunit_test will always be NULL unless CONFIG_KASAN=y &&
> > >> CONFIG_KUNIT=y at the moment.
> > >> My patch drops the CONFIG_KASAN requirement and opens it up to all tests.
> > >
> > > Oh, that's a shame, but hopefully it'll be in -next soon.
> > >
> > >> At the moment, it's just waiting another look over from Brendan or David.
> > >> Any ETA on that, folks? :)
> > >>
> > >> So if you don't want to get blocked on that for now, I think it's fine 
> > >> to add:
> > >>   #ifdef CONFIG_SLUB_KUNIT_TEST
> > >>   int errors;
> > >>   #endif
> > >
> > > Until kunit fixes setting current->kunit_test, a cleaner workaround
> > > that would allow to do the patch with kunit_resource, is to just have
> > > an .init/.exit function that sets it ("current->kunit_test = test;").
> > > And then perhaps add a note ("FIXME: ...") to remove it once the above
> > > patch has landed.
> > >
> > > At least that way we get the least intrusive change for mm/slub.c, and
> > > the test is the only thing that needs a 2-line patch to clean up
> > > later.
> >
> > So when testing internally Oliver's new version with your suggestions 
> > (thanks
> > again for those), I got lockdep splats because slab_add_kunit_errors is 
> > called
> > also from irq disabled contexts, and kunit_find_named_resource will call
> > spin_lock(>lock) that's not irq safe. Can we make the lock irq safe? I
> > tried the change below and it makde the problem go away. If you agree, the
> > question is how to proceed - make it part of Oliver's patch series and let
> > Andrew pick it all with eventually kunit team's acks on this patch, or 
> > whatnot.
>
> From what I can tell it should be fine to make it irq safe (ack for
> your patch below). Regarding patch logistics, I'd probably add it to
> the series. If that ends up not working, we'll find out sooner or
> later.
>
> (FYI, the prerequisite patch for current->kunit_test is in -next now.)

Yep.
There's also two follow-up patches in
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/log/?h=kunit

>
> KUnit maintainers, do you have any preferences?

Poked offline and Brendan and David seemed fine either way.
So probably just include it in this patch series for convenience.

Brendan also mentioned KUnit used to use spin_lock_irqsave/restore()
but had been told to not use it until necessary.
See 
https://lore.kernel.org/linux-kselftest/20181016235120.138227-3-brendanhigg...@google.com/
So I think there's no objections to the patch itself either.

But I'd wait for Brendan to chime in to confirm.



>
> > 8<
> >
> > commit ab28505477892e9824c57ac338c88aec2ec0abce
> > Author: Vlastimil Babka 
> > Date:   Tue Apr 6 12:28:07 2021 +0200
> >
> > kunit: make test->lock irq safe
> >
> > diff --git a/include/kunit/test.h b/include/kunit/test.h
> > index 49601c4b98b8..524d4789af22 100644
> > --- a/include/kunit/test.h
> > +++ b/include/kunit/test.h
> > @@ -515,8 +515,9 @@ kunit_find_resource(struct kunit *test,
> > void *match_data)
> >  {
> > struct kunit_resource *res, *found = NULL;
> > +   unsigned long flags;
> >
> > -   spin_lock(>lock);
> > +   spin_lock_irqsave(>lock, flags);
> >
> > list_for_each_entry_reverse(res, >resources, node) {
> > if (match(test, res, (void *)match_data)) {
> > @@ -526,7 +527,7 @@ kunit_find_resource(struct kunit *test,
> > }
> > }
> >
> > -   spin_unlock(>lock);
> > +   spin_unlock_irqrestore(>lock, flags);
> >
> > return found;
> >  }
> > diff --git a/lib/kunit/test.c b/lib/kunit/test.c
> > index ec9494e914ef..2c62eeb45b82 100644
> > --- a/lib/kunit/test.c
> > +++ b/lib/kunit/test.c
> > @@ -442,6 +442,7 @@ int kunit_add_resource(struct kunit *test,
> >void *data)
> >  {
> > int ret = 0;
> > +   unsigned long flags;
> >
> > res->free = free;
> > kref_init(>refcount);
> > @@ -454,10 +455,10 @@ int kunit_add_resource(struct kunit *test,
> > res->data = data;
> > }
> >
> > -   spin_lock(>lock);
> > +   

Re: [PATCH 1/1] x86/kvm/svm: Implement support for PSFD

2021-04-08 Thread Tom Lendacky
On 4/7/21 2:45 PM, Ramakrishna Saripalli wrote:
> From: Ramakrishna Saripalli 
> 
> Expose Predictive Store Forwarding capability to guests.
> Guests enable or disable PSF via SPEC_CTRL MSR.
> 
> Signed-off-by: Ramakrishna Saripalli 
> ---
>  arch/x86/kvm/cpuid.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 6bd2f8b830e4..9c4af0fef6d7 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -448,6 +448,8 @@ void kvm_set_cpu_caps(void)
>   kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP);
>   if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
>   kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
> + if (boot_cpu_has(X86_FEATURE_AMD_PSFD))
> + kvm_cpu_cap_set(X86_FEATURE_AMD_PSFD);
>  
>   kvm_cpu_cap_mask(CPUID_7_1_EAX,
>   F(AVX_VNNI) | F(AVX512_BF16)
> @@ -482,7 +484,7 @@ void kvm_set_cpu_caps(void)
>   kvm_cpu_cap_mask(CPUID_8000_0008_EBX,
>   F(CLZERO) | F(XSAVEERPTR) |
>   F(WBNOINVD) | F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | 
> F(VIRT_SSBD) |
> - F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON)
> + F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON) | 
> F(AMD_PSFD)

Please note that this patch has a pre-req against the PSFD support that
defines this feature:

https://lore.kernel.org/lkml/20210406155004.230790-2-rsari...@amd.com/#t

Thanks,
Tom

>   );
>  
>   /*
> 


Re: [Intel-gfx] [PATCH v2] drm/i915: Fix invalid access to ACPI _DSM objects

2021-04-08 Thread Takashi Iwai
On Thu, 08 Apr 2021 18:56:06 +0200,
Ville Syrjälä wrote:
> 
> On Thu, Apr 08, 2021 at 06:34:06PM +0200, Takashi Iwai wrote:
> > On Thu, 08 Apr 2021 09:51:18 +0200,
> > Takashi Iwai wrote:
> > > 
> > > On Wed, 07 Apr 2021 23:28:48 +0200,
> > > Ville Syrjälä wrote:
> > > > 
> > > > Oh, could you ask the bug reporter to attach an acpidump to the
> > > > bug? Might be good to have that stuff on record somewhere if/when
> > > > someone wants to actually figure out what's going on here.
> > > 
> > > OK, I'll ask.
> > 
> > Available at
> >   acpidump: https://susepaste.org/59777448
> >   hwinfo: https://susepaste.org/86284020
> 
> Those will apparently expire real soon. I took local copies out
> of morbid curiosity, but that's not going to help anyone else 
> in the future. I rather wish bug reporting tools would flat out
> refuse to accept any pastebin links.

Yeah, don't worry, I'll upload them to Bugzilla later in anyway.


Takashi


Re: [PATCH v2] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-08 Thread Tom Lendacky
On 4/8/21 12:10 PM, Sean Christopherson wrote:
> On Thu, Apr 08, 2021, Tom Lendacky wrote:
>> From: Tom Lendacky 
>>
>> Access to the GHCB is mainly in the VMGEXIT path and it is known that the
>> GHCB will be mapped. But there are two paths where it is possible the GHCB
>> might not be mapped.
>>
>> The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
>> the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
>> However, if a SIPI is performed without a corresponding AP Reset Hold,
>> then the GHCB might not be mapped (depending on the previous VMEXIT),
>> which will result in a NULL pointer dereference.
>>
>> The svm_complete_emulated_msr() routine will update the GHCB to inform
>> the caller of a RDMSR/WRMSR operation about any errors. While it is likely
>> that the GHCB will be mapped in this situation, add a safe guard
>> in this path to be certain a NULL pointer dereference is not encountered.
>>
>> Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts 
>> under SEV-ES")
>> Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES 
>> guest")
>> Signed-off-by: Tom Lendacky 
>>
>> ---
>>
>> Changes from v1:
>> - Added the svm_complete_emulated_msr() path as suggested by Sean
>>   Christopherson
>> - Add a WARN_ON_ONCE() to the sev_vcpu_deliver_sipi_vector() path
>> ---
>>  arch/x86/kvm/svm/sev.c | 3 +++
>>  arch/x86/kvm/svm/svm.c | 2 +-
>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 83e00e524513..7ac67615c070 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -2105,5 +2105,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu 
>> *vcpu, u8 vector)
>>   * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
>>   * non-zero value.
>>   */
>> +if (WARN_ON_ONCE(!svm->ghcb))
> 
> Isn't this guest triggerable?  I.e. send a SIPI without doing the reset hold?
> If so, this should not WARN.

Yes, it is a guest triggerable event. But a guest shouldn't be doing that,
so I thought adding the WARN_ON_ONCE() just to detect it wasn't bad.
Definitely wouldn't want a WARN_ON().

Thanks,
Tom

> 
>> +return;
>> +
>>  ghcb_set_sw_exit_info_2(svm->ghcb, 1);
>>  }
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 271196400495..534e52ba6045 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -2759,7 +2759,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
>> msr_data *msr_info)
>>  static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
>>  {
>>  struct vcpu_svm *svm = to_svm(vcpu);
>> -if (!sev_es_guest(vcpu->kvm) || !err)
>> +if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->ghcb))
>>  return kvm_complete_insn_gp(vcpu, err);
>>  
>>  ghcb_set_sw_exit_info_1(svm->ghcb, 1);
>> -- 
>> 2.31.0
>>


Re: [PATCH 5/7] cxl/mem: Move device register setup

2021-04-08 Thread Jonathan Cameron
On Wed, 7 Apr 2021 15:26:23 -0700
Ben Widawsky  wrote:

> Support expansion of register block types that the driver will attempt
> to recognize by pulling the code up into the register block scanning
> loop. Subsequent code can easily add in new register block types with
> this.
> 
> Signed-off-by: Ben Widawsky 

Acked-by: Jonathan Cameron >

> ---
>  drivers/cxl/mem.c | 19 +--
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 60b95c524c3e..49f651694cb0 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -1020,6 +1020,15 @@ static int cxl_mem_setup_regs(struct cxl_mem *cxlm)
>   base = cxl_mem_map_regblock(cxlm, reg_lo, reg_hi);
>   if (IS_ERR(base))
>   return PTR_ERR(base);
> +
> + cxl_setup_device_regs(dev, base, >device_regs);
> + if (!regs->status || !regs->mbox || !regs->memdev) {
> + dev_err(dev, "registers not found: %s%s%s\n",
> + !regs->status ? "status " : "",
> + !regs->mbox ? "mbox " : "",
> + !regs->memdev ? "memdev" : "");
> + return -ENXIO;
> + }
>   break;
>   }
>   }
> @@ -1029,16 +1038,6 @@ static int cxl_mem_setup_regs(struct cxl_mem *cxlm)
>   return -ENXIO;
>   }
>  
> - cxl_setup_device_regs(dev, base, >device_regs);
> -
> - if (!regs->status || !regs->mbox || !regs->memdev) {
> - dev_err(dev, "registers not found: %s%s%s\n",
> - !regs->status ? "status " : "",
> - !regs->mbox ? "mbox " : "",
> - !regs->memdev ? "memdev" : "");
> - return -ENXIO;
> - }
> -
>   return 0;
>  }
>  



Re: [RFC v2] KVM: x86: Support KVM VMs sharing SEV context

2021-04-08 Thread James Bottomley
On Fri, 2021-04-02 at 16:20 +0200, Paolo Bonzini wrote:
> On 02/04/21 13:58, Ashish Kalra wrote:
> > Hi Nathan,
> > 
> > Will you be posting a corresponding Qemu patch for this ?
> 
> Hi Ashish,
> 
> as far as I know IBM is working on QEMU patches for guest-based 
> migration helpers.

Yes, that's right, we'll take on this part.

> However, it would be nice to collaborate on the low-level (SEC/PEI) 
> firmware patches to detect whether a CPU is part of the primary VM
> or the mirror.  If Google has any OVMF patches already done for that,
> it would be great to combine it with IBM's SEV migration code and
> merge it into upstream OVMF.

We've reached the stage with our prototyping where not having the OVMF
support is blocking us from working on QEMU.  If we're going to have to
reinvent the wheel in OVMF because Google is unwilling to publish the
patches, can you at least give some hints about how you did it?

Thanks,

James




Re: [PATCH 22/24] ARM: at91: sama7: introduce sama7 SoC family

2021-04-08 Thread Alexandre Belloni
On 08/04/2021 17:24:39+0200, Nicolas Ferre wrote:
> On 01/04/2021 at 12:24, Claudiu Beznea - M18063 wrote:
> > On 01.04.2021 12:38, Claudiu Beznea - M18063 wrote:
> > > On 31.03.2021 19:01, Alexandre Belloni wrote:
> > > > EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> > > > the content is safe
> > > > 
> > > > On 31/03/2021 13:59:06+0300, Claudiu Beznea wrote:
> > > > > From: Eugen Hristev 
> > > > > 
> > > > > Introduce new family of SoCs, sama7, and first SoC, sama7g5.
> > > > > 
> > > > > Signed-off-by: Eugen Hristev 
> > > > > Signed-off-by: Claudiu Beznea 
> > > > > ---
> > > > >   arch/arm/mach-at91/Makefile |  1 +
> > > > >   arch/arm/mach-at91/sama7.c  | 48 
> > > > > +
> > > > >   2 files changed, 49 insertions(+)
> > > > >   create mode 100644 arch/arm/mach-at91/sama7.c
> > > > > 
> > > > > diff --git a/arch/arm/mach-at91/Makefile b/arch/arm/mach-at91/Makefile
> > > > > index f565490f1b70..6cc6624cddac 100644
> > > > > --- a/arch/arm/mach-at91/Makefile
> > > > > +++ b/arch/arm/mach-at91/Makefile
> > > > > @@ -9,6 +9,7 @@ obj-$(CONFIG_SOC_AT91SAM9)+= at91sam9.o
> > > > >   obj-$(CONFIG_SOC_SAM9X60)+= sam9x60.o
> > > > >   obj-$(CONFIG_SOC_SAMA5)  += sama5.o
> > > > >   obj-$(CONFIG_SOC_SAMV7)  += samv7.o
> > > > > +obj-$(CONFIG_SOC_SAMA7)  += sama7.o
> > > > > 
> > > > >   # Power Management
> > > > >   obj-$(CONFIG_ATMEL_PM)   += pm.o pm_suspend.o
> > > > > diff --git a/arch/arm/mach-at91/sama7.c b/arch/arm/mach-at91/sama7.c
> > > > > new file mode 100644
> > > > > index ..e04cadb569ad
> > > > > --- /dev/null
> > > > > +++ b/arch/arm/mach-at91/sama7.c
> > > > > @@ -0,0 +1,48 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > > > +/*
> > > > > + * Setup code for SAMA7
> > > > > + *
> > > > > + * Copyright (C) 2021 Microchip Technology, Inc. and its subsidiaries
> > > > > + *
> > > > > + */
> > > > > +
> > > > > +#include 
> > > > > +#include 
> > > > > +
> > > > > +#include 
> > > > > +#include 
> > > > > +
> > > > > +#include "generic.h"
> > > > > +
> > > > > +static void __init sama7_common_init(void)
> > > > > +{
> > > > > + of_platform_default_populate(NULL, NULL, NULL);
> > > > 
> > > > Is this necessary? This is left as a workaround for the old SoCs using
> > > > pinctrl-at91. I guess this will be using pio4 so this has to be removed.
> > > 
> > > OK, I'll have a look. BTW, SAMA5D2 which is also using PIO4 calls
> > > of_platform_default_populate(NULL, NULL, NULL);
> > 
> > Without this call the PM code (arch/arm/mach-at/pm.c) is not able to locate
> > proper DT nodes:
> > 
> > [0.194615] at91_pm_backup_init: failed to find securam device!
> > [0.201393] at91_pm_sram_init: failed to find sram device!
> > [0.207449] AT91: PM not supported, due to no SRAM allocated
> 
> Okay, so we can't afford removing these calls to sama5d2 and upcoming
> sama7g5 right now.
> 
> Is it a common pattern to have to reach DT content in the early stages that
> explicit call to of_platform_default_populate() tries to solve?
> 

That's fine, I didn't remember about that one, we can keep the call.


-- 
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [PATCH net v1] Revert "lan743x: trim all 4 bytes of the FCS; not just 2"

2021-04-08 Thread Sven Van Asbroeck
Hi Heiner,

On Thu, Apr 8, 2021 at 1:49 PM Heiner Kallweit  wrote:
>
> Can't we use frame_length - ETH_FCS_LEN direcctly here?

If the hard-coded "4" refers to ETH_FCS_LEN, then yes, good point. I'd
love to find out first why George and I need different patches to make
the driver work in our use case, though.


[PATCH] MIPS: uaccess: Reduce number of nested macros

2021-04-08 Thread Thomas Bogendoerfer
Clean up macros even further after removal get_fs/set_fs.

Signed-off-by: Thomas Bogendoerfer 
---
 arch/mips/include/asm/uaccess.h | 157 +++-
 1 file changed, 71 insertions(+), 86 deletions(-)

diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index 91bc7fb7dca1..e0dedd47e4e6 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -102,8 +102,15 @@ static inline int __access_ok(const void __user *p, 
unsigned long size)
  *
  * Returns zero on success, or -EFAULT on error.
  */
-#define put_user(x,ptr) \
-   __put_user_check((x), (ptr), sizeof(*(ptr)))
+#define put_user(x, ptr)   \
+({ \
+   __typeof__(*(ptr)) __user *__p = (ptr); \
+   \
+   might_fault();  \
+   access_ok(__p, sizeof(*__p)) ?  \
+   __put_user((x), __p) :  \
+   -EFAULT;\
+})
 
 /*
  * get_user: - Get a simple variable from user space.
@@ -123,8 +130,15 @@ static inline int __access_ok(const void __user *p, 
unsigned long size)
  * Returns zero on success, or -EFAULT on error.
  * On error, the variable @x is set to zero.
  */
-#define get_user(x,ptr) \
-   __get_user_check((x), (ptr), sizeof(*(ptr)))
+#define get_user(x, ptr)   \
+({ \
+   const __typeof__(*(ptr)) __user *__p = (ptr);   \
+   \
+   might_fault();  \
+   access_ok(__p, sizeof(*__p)) ?  \
+   __get_user((x), __p) :  \
+   ((x) = 0, -EFAULT); \
+})
 
 /*
  * __put_user: - Write a simple value into user space, with less checking.
@@ -146,8 +160,32 @@ static inline int __access_ok(const void __user *p, 
unsigned long size)
  *
  * Returns zero on success, or -EFAULT on error.
  */
-#define __put_user(x,ptr) \
-   __put_user_nocheck((x), (ptr), sizeof(*(ptr)))
+#define __put_user(x, ptr) \
+({ \
+   __typeof__(*(ptr)) __user *__pu_ptr = (ptr);\
+   __typeof__(*(ptr)) __pu_val = (x);  \
+   int __pu_err = 0;   \
+   \
+   __chk_user_ptr(__pu_ptr);   \
+   switch (sizeof(*__pu_ptr)) {\
+   case 1: \
+   __put_data_asm(user_sb, __pu_ptr);  \
+   break;  \
+   case 2: \
+   __put_data_asm(user_sh, __pu_ptr);  \
+   break;  \
+   case 4: \
+   __put_data_asm(user_sw, __pu_ptr);  \
+   break;  \
+   case 8: \
+   __PUT_DW(user_sd, __pu_ptr);\
+   break;  \
+   default:\
+   BUILD_BUG();\
+   }   \
+   \
+   __pu_err;   \
+})
 
 /*
  * __get_user: - Get a simple variable from user space, with less checking.
@@ -170,8 +208,31 @@ static inline int __access_ok(const void __user *p, 
unsigned long size)
  * Returns zero on success, or -EFAULT on error.
  * On error, the variable @x is set to zero.
  */
-#define __get_user(x,ptr) \
-   __get_user_nocheck((x), (ptr), sizeof(*(ptr)))
+#define __get_user(x, ptr) \
+({ \
+   const __typeof__(*(ptr)) __user *__gu_ptr = (ptr);  \
+   int __gu_err = 0;  

[PATCH v2] PCI: merge slot and bus reset implementations

2021-04-08 Thread Raphael Norwitz
Slot resets are bus resets with additional logic to prevent a device
from being removed during the reset. Currently slot and bus resets have
separate implementations in pci.c, complicating higher level logic. As
discussed on the mailing list, they should be combined into a generic
function which performs an SBR. This change adds a function,
pci_reset_bus_function(), which first attempts a slot reset and then
attempts a bus reset if -ENOTTY is returned, such that there is now a
single device agnostic function to perform an SBR.

This new function is also needed to add SBR reset quirks and therefore
is exposed in pci.h.

Link: https://lkml.org/lkml/2021/3/23/911

Suggested-by: Alex Williamson 
Signed-off-by: Amey Narkhede 
Signed-off-by: Raphael Norwitz 
---
 drivers/pci/pci.c   | 19 +++
 include/linux/pci.h |  1 +
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 16a17215f633..a8f8dd588d15 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4982,6 +4982,15 @@ static int pci_dev_reset_slot_function(struct pci_dev 
*dev, int probe)
return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
 }
 
+int pci_reset_bus_function(struct pci_dev *dev, int probe)
+{
+   int rc = pci_dev_reset_slot_function(dev, probe);
+
+   if (rc != -ENOTTY)
+   return rc;
+   return pci_parent_bus_reset(dev, probe);
+}
+
 static void pci_dev_lock(struct pci_dev *dev)
 {
pci_cfg_access_lock(dev);
@@ -5102,10 +5111,7 @@ int __pci_reset_function_locked(struct pci_dev *dev)
rc = pci_pm_reset(dev, 0);
if (rc != -ENOTTY)
return rc;
-   rc = pci_dev_reset_slot_function(dev, 0);
-   if (rc != -ENOTTY)
-   return rc;
-   return pci_parent_bus_reset(dev, 0);
+   return pci_reset_bus_function(dev, 0);
 }
 EXPORT_SYMBOL_GPL(__pci_reset_function_locked);
 
@@ -5135,13 +5141,10 @@ int pci_probe_reset_function(struct pci_dev *dev)
if (rc != -ENOTTY)
return rc;
rc = pci_pm_reset(dev, 1);
-   if (rc != -ENOTTY)
-   return rc;
-   rc = pci_dev_reset_slot_function(dev, 1);
if (rc != -ENOTTY)
return rc;
 
-   return pci_parent_bus_reset(dev, 1);
+   return pci_reset_bus_function(dev, 1);
 }
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 86c799c97b77..979d54335ac1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1228,6 +1228,7 @@ int pci_probe_reset_bus(struct pci_bus *bus);
 int pci_reset_bus(struct pci_dev *dev);
 void pci_reset_secondary_bus(struct pci_dev *dev);
 void pcibios_reset_secondary_bus(struct pci_dev *dev);
+int pci_reset_bus_function(struct pci_dev *dev, int probe);
 void pci_update_resource(struct pci_dev *dev, int resno);
 int __must_check pci_assign_resource(struct pci_dev *dev, int i);
 int __must_check pci_reassign_resource(struct pci_dev *dev, int i, 
resource_size_t add_size, resource_size_t align);
-- 
2.20.1


Re: [PATCH V5 04/25] perf/x86/intel: Hybrid PMU support for perf capabilities

2021-04-08 Thread Liang, Kan




On 4/8/2021 9:40 AM, Peter Zijlstra wrote:

@@ -4330,7 +4347,7 @@ static int intel_pmu_check_period(struct perf_event 
*event, u64 value)
  
  static int intel_pmu_aux_output_match(struct perf_event *event)

  {
-   if (!x86_pmu.intel_cap.pebs_output_pt_available)
+   if (!intel_pmu_has_cap(event, PERF_CAP_PT_IDX))
return 0;
  
  	return is_intel_pt_event(event);

these sites can have !event->pmu ?



I don't think the event->pmu can be NULL, but it could be pt_pmu.pmu.
If so, it should be a problem.

I think I will still use the x86_pmu.intel_cap.pebs_output_pt_available 
here in V6. The worst case is that we lost the PEBS via PT support on 
the small core for now.


I guess Alexander may provide a separate patch later to enable the PEBS 
via PT support on the ADL small core.


Thanks,
Kan


Re: [PATCH 1/2] zram: fix crashes due to use of cpu hotplug multistate

2021-04-08 Thread Luis Chamberlain
On Thu, Apr 08, 2021 at 10:55:53AM +0200, Greg KH wrote:
> So to add crazy complexity to the kernel, 

I agree that this can be tricky. However, driver developers are going to
open code this either way. The problem with that as well, and one of my
own reasons for striving for at least *trying* for a generic solution,
was that I am aware that driver developers may be trying a busy solution
rather than the try method. The busy approach means you could also end
up in a situation where userspace can prevent full undoing / removal
of a driver. The try method is best effort in the sense that if the
driver is going it won't be allowed.

So a sensible consideration I think is we at least choose one of these:

 a) We allow driver developers to open code it on drivers which need it on
each and every single sysfs knob on the driver where its needed, and
accept folks might do it wrong

 b) Provide a macro which is opt-in, ie, not doing it for all
attributes, but perhaps one which the driver author *is* aware to
try / put of the driver method.

 c) Document this race and other races / failings so that driver
authors who do care about module removal are aware and know what
to do.

In this thread two races were exposed on syfs:

  * deadlock when a sysfs attribute uses a lock which is also used on
module __exit

  * possible race against the device's private data, and this is type
specific. I think many people probably missed the last hunks of my
proposed patch which added dev_type_get() which were not part of the
deadlock fix. At least I showed how attributes for all block devices
have an exposed race, which can crash if removal of a block device
with del_gendisk() happens while a sysfs attribute is about to be
used.
 
I don't think either race is trivial for a driver developer to assume a
solution for. Most focus on this thread was about the first race, the
seconod however is also very real, and likely can cause more issues on
rolling backs on error codes unrelated to rmmod...

> for an operation that can only
> be triggered manually by a root user, is not worth it in my opinion, as
> the maintainer of that code the complexity was asked to be made to.

Three things:

1) Many driver maintainers *do* care that rmmod works well. To the point
that if it does not, they feel ashamed. Reason for this is that in some
subsystems this *is* a typical test case. So although rmmod may be
a vodoo thing for core, many driver developers do want this to work
well.

As someone who also works on many test cases to expose odd issues in the
kernel unrelated to module removal, I can also say that module removal
does also expose other possible races which would otherwise be difficult
to trigger. So it is also a helfup aid as a developer working on
differen areas of the kernel.

2) Folks have pointed out that this is not just about rmmod, rolling
back removal of sysfs attributes due to error code paths is another
real scenario to consider. I don't think we have rules to tell userspace
to not muck with sysfs files after they are exposed. In fact those
uevents we send to userspace allow userspace to know exactly when to
muck with them.

3) Sadly, some sysfs attributes also purposely do things which *also*
mimic what is typically done on module removal, such as removal of an
interface, or block device. That was the case with zram, it allows
remvoal / reset of a device... Yes these are odd things to do, but we
allow for it. And sysfs users *do* need to be aware of these corner
cases if they want to support them.

There **may** be some severity to some of the issues mentioned above, to
allow really bad things to be done in userspace even without module
removal... but I didn't have time yet to expose a pattern with coccinelle
yet to see how commonplace some of these issue are. I was focusing at
first more for a generic solution if possible, as I thought that would
be better first evaluated, and to let others slowly become aware of the
issue.

  Luis


Re: [PATCH v3 02/12] buildid: Stash away kernels build ID on init

2021-04-08 Thread Stephen Boyd
Quoting Jessica Yu (2021-04-08 05:05:33)
> +++ Stephen Boyd [30/03/21 20:05 -0700]:
> >+/**
> >+ * init_vmlinux_build_id - Get the running kernel's build ID
> >+ *
> >+ * Return: Running kernel's build ID
> >+ */
> 
> Hm, init_vmlinux_build_id() doesn't return anything, so this comment is
> not accurate - maybe "Get the running kernel's build ID and store it in
> vmlinux_build_id"?
> 
> >+void __init init_vmlinux_build_id(void)

Thanks! I've fixed it for the next round.


Problem in pfmemalloc skb handling in net/core/dev.c

2021-04-08 Thread Xie He
Hi Mel Gorman,

I may have found a problem in pfmemalloc skb handling in
net/core/dev.c. I see there are "if" conditions checking for
"sk_memalloc_socks() && skb_pfmemalloc(skb)", and when the condition
is true, the skb is handled specially as a pfmemalloc skb, otherwise
it is handled as a normal skb.

However, if "sk_memalloc_socks()" is false and "skb_pfmemalloc(skb)"
is true, the skb is still handled as a normal skb. Is this correct?
This might happen if "sk_memalloc_socks()" was originally true and has
just turned into false before the check. Can this happen?

I found the original commit that added the "if" conditions:
commit b4b9e3558508 ("netvm: set PF_MEMALLOC as appropriate during SKB
processing")
The commit message clearly indicates pfmemalloc skbs shouldn't be
delivered to taps (or protocols that don't support pfmemalloc skbs).
However, if they are incorrectly handled as normal skbs, they could be
delivered to those places.

I'm not sure if my understanding is correct. Could you please help? Thank you!


Re: [PATCH v4 18/20] x86: Convert to GENERIC_CMDLINE

2021-04-08 Thread Rob Herring
On Fri, Apr 02, 2021 at 03:18:20PM +, Christophe Leroy wrote:
> This converts the architecture to GENERIC_CMDLINE.
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/x86/Kconfig| 45 ++---
>  arch/x86/kernel/setup.c | 17 ++--
>  2 files changed, 4 insertions(+), 58 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index a20684d56b4b..66b384228ca3 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -104,6 +104,7 @@ config X86
>   select ARCH_USE_QUEUED_SPINLOCKS
>   select ARCH_USE_SYM_ANNOTATIONS
>   select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> + select ARCH_WANT_CMDLINE_PREPEND_BY_DEFAULT

Seems to be non-existent kconfig option.

>   select ARCH_WANT_DEFAULT_BPF_JITif X86_64
>   select ARCH_WANTS_DYNAMIC_TASK_STRUCT
>   select ARCH_WANT_HUGE_PMD_SHARE
> @@ -118,6 +119,7 @@ config X86
>   select EDAC_SUPPORT
>   select GENERIC_CLOCKEVENTS_BROADCASTif X86_64 || (X86_32 && 
> X86_LOCAL_APIC)
>   select GENERIC_CLOCKEVENTS_MIN_ADJUST
> + select GENERIC_CMDLINE
>   select GENERIC_CMOS_UPDATE
>   select GENERIC_CPU_AUTOPROBE
>   select GENERIC_CPU_VULNERABILITIES
> @@ -2358,49 +2360,6 @@ choice
>  
>  endchoice
>  
> -config CMDLINE_BOOL
> - bool "Built-in kernel command line"
> - help
> -   Allow for specifying boot arguments to the kernel at
> -   build time.  On some systems (e.g. embedded ones), it is
> -   necessary or convenient to provide some or all of the
> -   kernel boot arguments with the kernel itself (that is,
> -   to not rely on the boot loader to provide them.)
> -
> -   To compile command line arguments into the kernel,
> -   set this option to 'Y', then fill in the
> -   boot arguments in CONFIG_CMDLINE.
> -
> -   Systems with fully functional boot loaders (i.e. non-embedded)
> -   should leave this option set to 'N'.
> -
> -config CMDLINE
> - string "Built-in kernel command string"
> - depends on CMDLINE_BOOL
> - default ""
> - help
> -   Enter arguments here that should be compiled into the kernel
> -   image and used at boot time.  If the boot loader provides a
> -   command line at boot time, it is appended to this string to
> -   form the full kernel command line, when the system boots.
> -
> -   However, you can use the CONFIG_CMDLINE_FORCE option to
> -   change this behavior.
> -
> -   In most cases, the command line (whether built-in or provided
> -   by the boot loader) should specify the device for the root
> -   file system.
> -
> -config CMDLINE_FORCE
> - bool "Built-in command line overrides boot loader arguments"
> - depends on CMDLINE_BOOL && CMDLINE != ""
> - help
> -   Set this option to 'Y' to have the kernel ignore the boot loader
> -   command line, and use ONLY the built-in command line.
> -
> -   This is used to work around broken boot loaders.  This should
> -   be set to 'N' under normal conditions.
> -
>  config MODIFY_LDT_SYSCALL
>   bool "Enable the LDT (local descriptor table)" if EXPERT
>   default y
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 6f2de58eeb54..3f274b02e51c 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -5,6 +5,7 @@
>   * This file contains the setup_arch() code, which handles the 
> architecture-dependent
>   * parts of early kernel initialization.
>   */
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -161,9 +162,6 @@ unsigned long saved_video_mode;
>  #define RAMDISK_LOAD_FLAG0x4000
>  
>  static char __initdata command_line[COMMAND_LINE_SIZE];
> -#ifdef CONFIG_CMDLINE_BOOL
> -static char __initdata builtin_cmdline[COMMAND_LINE_SIZE] = CONFIG_CMDLINE;
> -#endif
>  
>  #if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE)
>  struct edd edd;
> @@ -883,18 +881,7 @@ void __init setup_arch(char **cmdline_p)
>   bss_resource.start = __pa_symbol(__bss_start);
>   bss_resource.end = __pa_symbol(__bss_stop)-1;
>  
> -#ifdef CONFIG_CMDLINE_BOOL
> -#ifdef CONFIG_CMDLINE_FORCE
> - strlcpy(boot_command_line, builtin_cmdline, COMMAND_LINE_SIZE);
> -#else
> - if (builtin_cmdline[0]) {
> - /* append boot loader cmdline to builtin */
> - strlcat(builtin_cmdline, " ", COMMAND_LINE_SIZE);
> - strlcat(builtin_cmdline, boot_command_line, COMMAND_LINE_SIZE);
> - strlcpy(boot_command_line, builtin_cmdline, COMMAND_LINE_SIZE);
> - }
> -#endif
> -#endif
> + cmdline_build(boot_command_line, boot_command_line);
>  
>   strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
>   *cmdline_p = command_line;

Once this is all done, I wonder if we can get rid of the strlcpy and 
perhaps also cmdline_p.

Rob


Re: [PATCH v2 3/3] perf-stat: introduce config stat.bpf-counter-events

2021-04-08 Thread Song Liu



> On Apr 8, 2021, at 11:50 AM, Arnaldo Carvalho de Melo  wrote:
> 
> Em Thu, Apr 08, 2021 at 08:24:47PM +0200, Jiri Olsa escreveu:
>> On Thu, Apr 08, 2021 at 06:08:20PM +, Song Liu wrote:
>>> 
>>> 
 On Apr 8, 2021, at 10:45 AM, Jiri Olsa  wrote:
 
 On Thu, Apr 08, 2021 at 05:28:10PM +, Song Liu wrote:
> 
> 
>> On Apr 8, 2021, at 10:20 AM, Jiri Olsa  wrote:
>> 
>> On Thu, Apr 08, 2021 at 04:39:33PM +, Song Liu wrote:
>>> 
>>> 
 On Apr 8, 2021, at 4:47 AM, Jiri Olsa  wrote:
 
 On Tue, Apr 06, 2021 at 05:36:01PM -0700, Song Liu wrote:
> Currently, to use BPF to aggregate perf event counters, the user uses
> --bpf-counters option. Enable "use bpf by default" events with a 
> config
> option, stat.bpf-counter-events. This is limited to hardware events in
> evsel__hw_names.
> 
> This also enables mixed BPF event and regular event in the same 
> sesssion.
> For example:
> 
> perf config stat.bpf-counter-events=instructions
> perf stat -e instructions,cs
> 
 
 so if we are mixing events now, how about uing modifier for bpf 
 counters,
 instead of configuring .perfconfig list we could use:
 
 perf stat -e instructions:b,cs
 
 thoughts?
 
 the change below adds 'b' modifier and sets 'evsel::bpf_counter',
 feel free to use it
>>> 
>>> I think we will need both 'b' modifier and .perfconfig configuration. 
>>> For systems with BPF-managed perf events running in the background, 
>> 
>> hum, I'm not sure I understand what that means.. you mean there
>> are tools that run perf stat so you don't want to change them?
> 
> We have tools that do perf_event_open(). I will change them to use 
> BPF managed perf events for "cycles" and "instructions". Since these 
> tools are running 24/7, perf-stat on the system should use BPF managed
> "cycles" and "instructions" by default. 
 
 well if you are already changing the tools why not change them to add
 modifier.. but I don't mind adding that .perfconfig stuff if you need
 that
>>> 
>>> The tools I mentioned here don't use perf-stat, they just use 
>>> perf_event_open() and read the perf events fds. We want a config to make
>> 
>> just curious, how those tools use perf_event_open?
> 
> I.e. do they use tools/lib/perf/? :-)

Not right now. I do hope we can eventually let them use libperf. But I 
haven't figured out the best path forward. 

> 
> I guess they will use it now for getting that "struct 
> perf_event_attr_map_entry" and
> the map name define.
> 
>>> "cycles" to use BPF by default, so that when the user (not these tools)
>>> runs perf-stat, it will share PMCs with those events by default. 
> 
>> I'm sorry but I still don't see the usecase.. if you need to change both 
>> tools,
>> you can change them to use bpf-managed event, why bother with the list?
> 
> He wants users not to bother if they are using bpf based counters, this will 
> happen
> automagically after they set their ~/.perfconfig with some command line Song 
> provides.
> 
> Then they will be using bpf counters that won't get exclusive access to those
> scarce counters, the tooling they are using will use bpf-counters and all will
> be well.
> 
> Right Song?

Yes, exactly. The config automatically switches ad-hoc perf-stat runs (for 
debug, 
performance tuning, etc.) to bpf managed counters. 

Thanks,
Song



Re: [PATCH v3 03/12] dump_stack: Add vmlinux build ID to stack traces

2021-04-08 Thread Stephen Boyd
Quoting Petr Mladek (2021-04-08 03:13:20)
> It helped with the vmlinux buildid. I see the following:
> 
> [  551.435942][ T1803] test_printf: loaded.
> [  551.436667][ T1803] [ cut here ]
> [  551.437561][ T1803] kernel BUG at lib/test_printf.c:689!
> [  551.438352][ T1803] invalid opcode:  [#1] SMP NOPTI
> [  551.438359][ T1803] CPU: 3 PID: 1803 Comm: modprobe Kdump: loaded Tainted: 
> GE 5.12.0-rc6-default+ #176 
> e51781e52aaf4d6dfea7a18574c104c8bfd7c37f
> [  551.438363][ T1803] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
> BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
> [  551.438365][ T1803] RIP: 0010:test_printf_init+0x561/0xc99 [test_printf 
> c2388ff0552611501b4d2ad58d8e5ca441d9a350]

It shows it for the test module here.

> [  551.443090][ T1803] Code: 00 48 c7 c7 b8 36 1b c0 e8 19 f9 ff ff b9 ab 00 
> 00 00 48 c7 c2 93 36 1b c0 be 08 00 00 00 48 c7 c7 af 36 1b c0 e8 fc f8 ff ff 
> <0f> 0b 8b 05 44 07 00 00 8b 35 3a 07 00 00 8b 1d 3c 07 00 00 85 c0
> [  551.443094][ T1803] RSP: 0018:b62c0039bc78 EFLAGS: 00010282
> [  551.443096][ T1803] RAX:  RBX: b62c0039bc80 RCX: 
> d62bffc00b70
> [  551.443098][ T1803] RDX:  RSI:  RDI: 
> a0352fd5
> [  551.443099][ T1803] RBP: c01b7367 R08: 0001 R09: 
> 0001
> [  551.443100][ T1803] R10:  R11: 0001 R12: 
> 9bc08c87c820
> [  551.443101][ T1803] R13: 0001 R14: 9bc0d2798480 R15: 
> b62c0039be90
> [  551.443102][ T1803] FS:  7f5767485b80() GS:9bc0ffc0() 
> knlGS:
> [  551.443103][ T1803] CS:  0010 DS:  ES:  CR0: 80050033
> [  551.443105][ T1803] CR2: 7f5766b36ef0 CR3: 000100368004 CR4: 
> 00370ee0
> [  551.443108][ T1803] DR0:  DR1:  DR2: 
> 
> [  551.443108][ T1803] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  551.443109][ T1803] Call Trace:
> [  551.443113][ T1803]  ? __test+0x13c/0x149 [test_printf]

But not here. I missed a place in the x86 code, printk_stack_address()
uses %pB, so I'll need to introduce %pBb to indicate that we're printing
a backtrace with a build ID, oof!

It must be obvious by now but I didn't test on x86. Let me go scrounge
for some hardware...

> [  551.443116][ T1803]  ? rcu_read_lock_sched_held+0x52/0x80
> [  551.443120][ T1803]  do_one_initcall+0x5b/0x2d0
> [  551.443125][ T1803]  do_init_module+0x5b/0x21c
> [  551.443127][ T1803]  load_module+0x1eaa/0x23c0
> [  551.443130][ T1803]  ? show_modinfo_version+0x30/0x30
> [  551.443134][ T1803]  ? __do_sys_finit_module+0xad/0x110
> [  551.443135][ T1803]  __do_sys_finit_module+0xad/0x110
> [  551.443138][ T1803]  do_syscall_64+0x33/0x40
> [  551.443139][ T1803]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  551.443143][ T1803] RIP: 0033:0x7f5766b5b2a9
> [
> 
> Note that it still does not show the build id for the module. It fails
> in the module init call and the build id should be already initialized
> at this stage.
> 
> One more thing. I am not familiar with the elf-related code.
> Is it safe to access (nhdr + 1)? Do we need a check that
> it is still withing the given section?

Should be safe given that the elf note header is prepended to the name buffer
and the descriptor buffer. The 'n_namesz' member of the header tells us how
many bytes after the header is reserved for the name and the 'n_descsz' member
of the header tells us how many bytes after the name is reserved for the
description (where the build ID data is). I did the nhdr + 1 thing to make the
pointer point to the name directly after the header. The name is NUL terminated
per the elf spec. See the man page[1] for elf and the section about notes:

"""
Notes (Nhdr)
   ELF notes allow for appending arbitrary information for the
   system to use.  They are largely used by core files (e_type of
   ET_CORE), but many projects define their own set of extensions.
   For example, the GNU tool chain uses ELF notes to pass
   information from the linker to the C library.

   Note sections contain a series of notes (see the struct
   definitions below).  Each note is followed by the name field
   (whose length is defined in n_namesz) and then by the descriptor
   field (whose length is defined in n_descsz) and whose starting
   address has a 4 byte alignment.  Neither field is defined in the
   note struct due to their arbitrary lengths.
"""

[1] https://man7.org/linux/man-pages/man5/elf.5.html


Can you try this patch for x86? I'll dig up some hardware in the meantime.

-8<
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 7ad5eea99b2b..be2de39bf16f 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -69,7 +69,7 @@ static void printk_stack_address(unsigned long address, int 

Re: [PATCH] block: Disable -Walign-mismatch for blk-mq.c

2021-04-08 Thread Guenter Roeck
On 4/8/21 12:44 PM, Nathan Chancellor wrote:
> LLVM 13 adds a new warning, -Walign-mismatch, which has an instance in
> blk_mq_complete_send_ipi():
> 
> block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to
> 32-byte aligned parameter 2 of 'smp_call_function_single_async' may
> result in an unaligned pointer access [-Walign-mismatch]
> smp_call_function_single_async(cpu, >csd);
> ^
> 1 warning generated.
> 
> This is expected after commit 4ccafe032005 ("block: unalign
> call_single_data in struct request"), which purposefully unaligned the
> structure to save space. Given that there is no real alignment
> requirement and there have been no reports of issues since that change,
> it should be safe to disable the warning for this one translation unit.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/1328
> Link: 
> https://lore.kernel.org/r/20210310182307.zzcbi5w5jrmveld4@archlinux-ax161/
> Link: https://lore.kernel.org/r/20210330230249.709221-1-jian...@google.com/
> Signed-off-by: Nathan Chancellor 

Reviewed-by: Guenter Roeck 

> ---
>  block/Makefile | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/Makefile b/block/Makefile
> index 8d841f5f986f..d69ac0bd8e61 100644
> --- a/block/Makefile
> +++ b/block/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o 
> blk-sysfs.o \
>   blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
>   genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o
>  
> +CFLAGS_blk-mq.o := $(call cc-disable-warning, align-mismatch)
>  obj-$(CONFIG_BOUNCE) += bounce.o
>  obj-$(CONFIG_BLK_SCSI_REQUEST)   += scsi_ioctl.o
>  obj-$(CONFIG_BLK_DEV_BSG)+= bsg.o
> 
> base-commit: e49d033bddf5b565044e2abe4241353959bc9120
> 



Re: [PATCH 01/10] staging: rtl8188eu: remove unused macros

2021-04-08 Thread Martin Kaiser
Thus wrote Greg Kroah-Hartman (gre...@linuxfoundation.org):

> Wow, that's there for a really old kernel version and should not be
> needed anymore at all.  I'll take this, but please remove the other ones
> here, they are not necessary.

Ok, I see what you mean. New patch is on the way.

Best regards,
Martin


Re: [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots

2021-04-08 Thread Sean Christopherson
On Thu, Apr 08, 2021, Paolo Bonzini wrote:
> On 08/04/21 13:15, Wanpeng Li wrote:
> > I saw this splatting:
> > 
> >   BUG: sleeping function called from invalid context at
> > arch/x86/kvm/kvm_cache_regs.h:115
> >kvm_pdptr_read+0x20/0x60 [kvm]
> >kvm_mmu_load+0x3bd/0x540 [kvm]
> > 
> > There is a might_sleep() in kvm_pdptr_read(), however, the original
> > commit didn't explain more. I can send a formal one if the below fix
> > is acceptable.

We don't want to drop mmu_lock, even temporarily.  The reason for holding it
across the entire sequence is to ensure kvm_mmu_available_pages() isn't 
violated.

> I think we can just push make_mmu_pages_available down into
> kvm_mmu_load's callees.  This way it's not necessary to hold the lock
> until after the PDPTR check:

...

> @@ -4852,14 +4868,10 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>   r = mmu_alloc_special_roots(vcpu);
>   if (r)
>   goto out;
> - write_lock(>kvm->mmu_lock);
> - if (make_mmu_pages_available(vcpu))
> - r = -ENOSPC;
> - else if (vcpu->arch.mmu->direct_map)
> + if (vcpu->arch.mmu->direct_map)
>   r = mmu_alloc_direct_roots(vcpu);
>   else
>   r = mmu_alloc_shadow_roots(vcpu);
> - write_unlock(>kvm->mmu_lock);
>   if (r)
>   goto out;

Freaking PDPTRs.  I was really hoping we could keep the lock and 
pages_available()
logic outside of the helpers.  What if kvm_mmu_load() reads the PDPTRs and
passes them into mmu_alloc_shadow_roots()?  Or is that too ugly?

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index efb41f31e80a..e3c4938cd665 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3275,11 +3275,11 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
return 0;
 }

-static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
+static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu, u64 pdptrs[4])
 {
struct kvm_mmu *mmu = vcpu->arch.mmu;
-   u64 pdptrs[4], pm_mask;
gfn_t root_gfn, root_pgd;
+   u64 pm_mask;
hpa_t root;
int i;

@@ -3291,11 +3291,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)

if (mmu->root_level == PT32E_ROOT_LEVEL) {
for (i = 0; i < 4; ++i) {
-   pdptrs[i] = mmu->get_pdptr(vcpu, i);
-   if (!(pdptrs[i] & PT_PRESENT_MASK))
-   continue;
-
-   if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
+   if ((pdptrs[i] & PT_PRESENT_MASK) &&
+   mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
return 1;
}
}
@@ -4844,21 +4841,33 @@ EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);

 int kvm_mmu_load(struct kvm_vcpu *vcpu)
 {
-   int r;
+   struct kvm_mmu *mmu = vcpu->arch.mmu;
+   u64 pdptrs[4];
+   int r, i;

-   r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
+   r = mmu_topup_memory_caches(vcpu, !mmu->direct_map);
if (r)
goto out;
r = mmu_alloc_special_roots(vcpu);
if (r)
goto out;
+
+   /*
+* On SVM, reading PDPTRs might access guest memory, which might fault
+* and thus might sleep.  Grab the PDPTRs before acquiring mmu_lock.
+*/
+   if (!mmu->direct_map && mmu->root_level == PT32E_ROOT_LEVEL) {
+   for (i = 0; i < 4; ++i)
+   pdptrs[i] = mmu->get_pdptr(vcpu, i);
+   }
+
write_lock(>kvm->mmu_lock);
if (make_mmu_pages_available(vcpu))
r = -ENOSPC;
else if (vcpu->arch.mmu->direct_map)
r = mmu_alloc_direct_roots(vcpu);
else
-   r = mmu_alloc_shadow_roots(vcpu);
+   r = mmu_alloc_shadow_roots(vcpu, pdptrs);
write_unlock(>kvm->mmu_lock);
if (r)
goto out;


Re: [PATCH v5 1/4] dt-bindings: serial: 8250: deprecate aspeed, sirq-polarity-sense

2021-04-08 Thread Rob Herring
On Wed, 07 Apr 2021 20:16:34 -0500, Zev Weiss wrote:
> This property ties SIRQ polarity to SCU register bits that don't
> necessarily have any direct relationship to it; the only use of it was
> removed in commit c82bf6e133d3 ("ARM: aspeed: g5: Do not set sirq
> polarity").
> 
> Signed-off-by: Zev Weiss 
> Reviewed-by: Joel Stanley 
> ---
>  Documentation/devicetree/bindings/serial/8250.yaml | 1 +
>  1 file changed, 1 insertion(+)
> 

Acked-by: Rob Herring 


Re: [RFC PATCH] vdpa: mandate 1.0 device

2021-04-08 Thread Michael S. Tsirkin
On Thu, Apr 08, 2021 at 04:26:48PM +0800, Jason Wang wrote:
> This patch mandates 1.0 for vDPA devices. The goal is to have the
> semantic of normative statement in the virtio spec and eliminate the
> burden of transitional device for both vDPA bus and vDPA parent.
> 
> uAPI seems fine since all the vDPA parent mandates
> VIRTIO_F_ACCESS_PLATFORM which implies 1.0 devices.
> 
> For legacy guests, it can still work since Qemu will mediate when
> necessary (e.g doing the endian conversion).
> 
> Signed-off-by: Jason Wang 

Hmm. If we do this, don't we still have a problem with
legacy drivers which don't ack 1.0?
Note 1.0 affects ring endianness which is not mediated in QEMU
so QEMU can't pretend to device guest is 1.0.





> ---
>  include/linux/vdpa.h | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 0fefeb976877..cfde4ec999b4 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /**
>   * vDPA callback definition.
> @@ -317,6 +318,11 @@ static inline int vdpa_set_features(struct vdpa_device 
> *vdev, u64 features)
>  {
>  const struct vdpa_config_ops *ops = vdev->config;
>  
> +/* Mandating 1.0 to have semantics of normative statements in
> + * the spec. */
> +if (!(features & BIT_ULL(VIRTIO_F_VERSION_1)))
> + return -EINVAL;
> +
>   vdev->features_valid = true;
>  return ops->set_features(vdev, features);
>  }
> -- 
> 2.25.1



Re: [PATCH] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-08 Thread Tom Lendacky



On 4/7/21 4:07 PM, Sean Christopherson wrote:
> On Wed, Apr 07, 2021, Tom Lendacky wrote:
>> On 4/7/21 3:08 PM, Sean Christopherson wrote:
>>> On Wed, Apr 07, 2021, Tom Lendacky wrote:
 From: Tom Lendacky 

 The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
 the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
 However, if a SIPI is performed without a corresponding AP Reset Hold,
 then the GHCB may not be mapped, which will result in a NULL pointer
 dereference.

 Check that the GHCB is mapped before attempting the update.
>>>
>>> It's tempting to say the ghcb_set_*() helpers should guard against this, but
>>> that would add a lot of pollution and the vast majority of uses are very 
>>> clearly
>>> in the vmgexit path.  svm_complete_emulated_msr() is the only other case 
>>> that
>>> is non-obvious; would it make sense to sanity check svm->ghcb there as well?
>>
>> Hmm... I'm not sure if we can get here without having taken the VMGEXIT
>> path to start, but it certainly couldn't hurt to add it.
> 
> Yeah, AFAICT it should be impossible to reach the callback without a valid 
> ghcb,
> it'd be purely be a sanity check.
>  
>> I can submit a v2 with that unless you want to submit it (with one small
>> change below).
> 
> I'd say just throw it into v2.
> 
>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index 019ac836dcd0..abe9c765628f 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -2728,7 +2728,8 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
>>> msr_data *msr_info)
>>>  static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
>>>  {
>>> struct vcpu_svm *svm = to_svm(vcpu);
>>> -   if (!sev_es_guest(vcpu->kvm) || !err)
>>> +
>>> +   if (!err || !sev_es_guest(vcpu->kvm) || !WARN_ON_ONCE(svm->ghcb))
>>
>> This should be WARN_ON_ONCE(!svm->ghcb), otherwise you'll get the right
>> result, but get a stack trace immediately.
> 
> Doh, yep.

Actually, because of the "or's", this needs to be:

if (!err || !sev_es_guest(vcpu->kvm) || (sev_es_guest(vcpu->kvm) && 
WARN_ON_ONCE(!svm->ghcb)))

Thanks,
Tom

> 


Re: [PATCH v5 1/1] use crc32 instead of md5 for hibernation e820 integrity check

2021-04-08 Thread Rafael J. Wysocki
On Thu, Apr 8, 2021 at 5:26 PM Eric Biggers  wrote:
>
> On Thu, Apr 08, 2021 at 03:32:38PM +0200, Rafael J. Wysocki wrote:
> > On Thu, Apr 8, 2021 at 3:15 PM Chris von Recklinghausen
> >  wrote:
> > >
> > > Suspend fails on a system in fips mode because md5 is used for the e820
> > > integrity check and is not available. Use crc32 instead.
> > >
> > > This patch changes the integrity check algorithm from md5 to
> > > crc32. This integrity check is used only to verify accidental
> > > corruption of the hybernation data
> >
> > It isn't used for that.
> >
> > In fact, it is used to detect differences between the memory map used
> > before hibernation and the one made available by the BIOS during the
> > subsequent resume.  And the check is there, because it is generally
> > unsafe to load the hibernation image into memory if the current memory
> > map doesn't match the one used when the image was created.
>
> So what types of "differences" are you trying to detect?  If you need to 
> detect
> differences caused by someone who maliciously made changes ("malicious" 
> implies
> they may try to avoid detection), then you need to use a cryptographic hash
> function (or a cryptographic MAC if the hash value isn't stored separately).  
> If
> you only need to detect non-malicious changes (normally these would be called
> "accidental" changes, but sure, it could be changes that are "intentionally"
> made provided that the other side can be trusted to not try to avoid
> detection...)

That's the case here.

> then a non-cryptographic checksum would be sufficient.


Re: [PATCH] KVM: vmx: add mismatched size in vmcs_check32

2021-04-08 Thread Sean Christopherson
On Thu, Apr 08, 2021, lihaiwei.ker...@gmail.com wrote:
> From: Haiwei Li 
> 
> vmcs_check32 misses the check for 64-bit and 64-bit high.

Can you clarify in the changelog that, while it is architecturally legal to
access 64-bit and 64-bit high fields with a 32-bit read/write in 32-bit mode,
KVM should never do partial accesses to VMCS fields.  And/or note that the
32-bit accesses are done in vmcs_{read,write}64() when necessary?  Hmm, maybe:

  Add compile-time assertions in vmcs_check32() to disallow accesses to
  64-bit and 64-bit high fields via vmcs_{read,write}32().  Upper level
  KVM code should never do partial accesses to VMCS fields.  KVM handles
  the split accesses automatically in vmcs_{read,write}64() when running
  as a 32-bit kernel.

With something along those lines:

Reviewed-and-tested-by: Sean Christopherson  

> Signed-off-by: Haiwei Li 
> ---
>  arch/x86/kvm/vmx/vmx_ops.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h
> index 692b0c3..164b64f 100644
> --- a/arch/x86/kvm/vmx/vmx_ops.h
> +++ b/arch/x86/kvm/vmx/vmx_ops.h
> @@ -37,6 +37,10 @@ static __always_inline void vmcs_check32(unsigned long 
> field)
>  {
>   BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0,
>"32-bit accessor invalid for 16-bit field");
> + BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
> 0x2000,
> +  "32-bit accessor invalid for 64-bit field");
> + BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
> 0x2001,
> +  "32-bit accessor invalid for 64-bit high field");
>   BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
> 0x6000,
>"32-bit accessor invalid for natural width field");
>  }
> -- 
> 1.8.3.1
> 


Re: [PATCH v4 2/6] dt-bindings: PCI: Add bindings for Brcmstb endpoint device voltage regulators

2021-04-08 Thread Rob Herring
On Tue, Apr 06, 2021 at 02:25:49PM -0400, Jim Quinlan wrote:
> On Tue, Apr 6, 2021 at 1:32 PM Mark Brown  wrote:
> >
> > On Tue, Apr 06, 2021 at 01:26:51PM -0400, Jim Quinlan wrote:
> > > On Tue, Apr 6, 2021 at 12:47 PM Mark Brown  wrote:
> >
> > > > No great problem with having these in the controller node (assming it
> > > > accurately describes the hardware) but I do think we ought to also be
> > > > able to describe these per slot.

PCIe is effectively point to point, so there's only 1 slot unless 
there's a PCIe switch in the middle. If that's the case, then it's all 
more complicated.

> > > Can you explain what you think that would look like in the DT?
> >
> > I *think* that's just some properties on the nodes for the endpoints,
> > note that the driver could just ignore them for now.  Not sure where or
> > if we document any extensions but child nodes are in section 4 of the
> > v2.1 PCI bus binding.
> 
> Hi Mark,
> 
> I'm a little confused -- here is how I remember the chronology of the
> "DT bindings" commit reviews, please correct me if I'm wrong:
> 
> o JimQ submitted a pullreq for using voltage regulators in the same
> style as the existing "rockport" PCIe driver.
> o After some deliberation, RobH preferred that the voltage regulators
> should go into the PCIe subnode device's DT node.

IIRC, that's because you said there isn't a standard slot.

> o JimQ put the voltage regulators in the subnode device's DT node.
> o MarkB didn't like the fact that the code did a global search for the
> regulator since it could not provide the owning struct device* handle.
> o RobH relented, and said that if it is just two specific and standard
> voltage regulators, perhaps they can go in the parent DT node after
> all.
> o JimQ put the regulators back in the PCIe node.
> o MarkB now wants the regulators to go back into the child node again?
> 
> Folks, please advise.
> 
> Regards,
> Jim Quinlan
> Broadcom STB


Re: [PATCH v2] ACPI / hotplug / PCI: fix memory leak in enable_slot()

2021-04-08 Thread Bjorn Helgaas
On Thu, Mar 25, 2021 at 03:26:00PM +0800, Zhiqiang Liu wrote:
> From: Feilong Lin 
> 
> In enable_slot() in drivers/pci/hotplug/acpiphp_glue.c, if pci_get_slot()
> will return NULL, we will do not set SLOT_ENABLED flag of slot. if one
> device is found by calling pci_get_slot(), its reference count will be
> increased. In this case, we did not call pci_dev_put() to decrement the
> its reference count, the memory of the device (struct pci_dev type) will
> leak.
> 
> Fix it by calling pci_dev_put() to decrement its reference count after that
> pci_get_slot() returns a PCI device.
> 
> Signed-off-by: Feilong Lin 
> Signed-off-by: Zhiqiang Liu 

Applied with Rafael's reviewed-by to pci/hotplug for v5.13, thanks!

> --
> v2: rewrite subject and commit log as suggested by Bjorn Helgaas.
> ---
>  drivers/pci/hotplug/acpiphp_glue.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c 
> b/drivers/pci/hotplug/acpiphp_glue.c
> index 3365c93abf0e..f031302ad401 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -533,6 +533,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool 
> bridge)
>   slot->flags &= ~SLOT_ENABLED;
>   continue;
>   }
> + pci_dev_put(dev);
>   }
>  }
> 
> -- 
> 2.19.1
> 


Re: [PATCH 1/2 v2] staging: media: hantro: Align line break to the open parenthesis in file hantro_hw.h

2021-04-08 Thread Ezequiel Garcia
Ola Aline,

Welcome to the kernel community. Hope you enjoy some of this
Outreachy adventures.

Normally, when you submit a v2, we want to know what changed
between the first submission and v2.

If you are subscribed to linux-media, you can read some
of the series with a vN+1 and look how it's done. Examples:

https://www.spinics.net/lists/linux-media/msg190043.html

https://www.spinics.net/lists/linux-media/msg189923.html

I'm sure your Outreachy mentors can tell you more.

On Thu, 2021-04-08 at 11:07 -0300, Aline Santana Cordeiro wrote:
> Aligns line break with the remaining function arguments
> to the open parenthesis. Issue found by checkpatch.
> 
> Signed-off-by: Aline Santana Cordeiro 
> ---
>  drivers/staging/media/hantro/hantro_hw.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/media/hantro/hantro_hw.h 
> b/drivers/staging/media/hantro/hantro_hw.h
> index 34c9e46..a650b9c 100644
> --- a/drivers/staging/media/hantro/hantro_hw.h
> +++ b/drivers/staging/media/hantro/hantro_hw.h
> @@ -207,7 +207,7 @@ hantro_h264_mv_size(unsigned int width, unsigned int 
> height)
>  void hantro_g1_mpeg2_dec_run(struct hantro_ctx *ctx);
>  void rk3399_vpu_mpeg2_dec_run(struct hantro_ctx *ctx);
>  void hantro_mpeg2_dec_copy_qtable(u8 *qtable,
> -   const struct v4l2_ctrl_mpeg2_quantization *ctrl);
> + const struct v4l2_ctrl_mpeg2_quantization 
> *ctrl);
>  int hantro_mpeg2_dec_init(struct hantro_ctx *ctx);
>  void hantro_mpeg2_dec_exit(struct hantro_ctx *ctx);
>  




Re: [PATCH v5 0/6] Add SiFive FU740 PCIe host controller driver support

2021-04-08 Thread Lorenzo Pieralisi
On Tue, Apr 06, 2021 at 05:26:28PM +0800, Greentime Hu wrote:
> This patchset includes SiFive FU740 PCIe host controller driver. We also
> add pcie_aux clock and pcie_power_on_reset controller to prci driver for
> PCIe driver to use it.
> 
> This is tested with e1000e: Intel(R) PRO/1000 Network Card, AMD Radeon R5
> 230 graphics card and SP M.2 PCIe Gen 3 SSD in SiFive Unmatched based on
> v5.11 Linux kernel.
> 
> Changes in v5:
>  - Fix typo in comments
>  - Keep comments style consistent
>  - Refine some error handling codes
>  - Remove unneeded header file including
>  - Merge fu740_pcie_ltssm_enable implementation to fu740_pcie_start_link
> 
> Changes in v4:
>  - Fix Wunused-but-set-variable warning in prci driver
> 
> Changes in v3:
>  - Remove items that has been defined
>  - Refine format of sifive,fu740-pcie.yaml
>  - Replace perstn-gpios with the common one
>  - Change DBI mapping space to 2GB from 4GB
>  - Refine drivers/reset/Kconfig
> 
> Changes in v2:
>  - Refine codes based on reviewers' feedback
>  - Remove define and use the common one
>  - Replace __raw_writel with writel_relaxed
>  - Split fu740_phyregreadwrite to write function
>  - Use readl_poll_timeout in stead of while loop checking
>  - Use dwc common codes
>  - Use gpio descriptors and the gpiod_* api.
>  - Replace devm_ioremap_resource with devm_platform_ioremap_resource_byname
>  - Replace devm_reset_control_get with devm_reset_control_get_exclusive
>  - Add more comments for delay and sleep
>  - Remove "phy ? x : y" expressions
>  - Refine code logic to remove possible infinite loop
>  - Replace magic number with meaningful define
>  - Remove fu740_pcie_pm_ops
>  - Use builtin_platform_driver
> 
> Greentime Hu (5):
>   clk: sifive: Add pcie_aux clock in prci driver for PCIe driver
>   clk: sifive: Use reset-simple in prci driver for PCIe driver
>   MAINTAINERS: Add maintainers for SiFive FU740 PCIe driver
>   dt-bindings: PCI: Add SiFive FU740 PCIe host controller
>   riscv: dts: Add PCIe support for the SiFive FU740-C000 SoC
> 
> Paul Walmsley (1):
>   PCI: fu740: Add SiFive FU740 PCIe host controller driver

I can pull the patches above into the PCI tree (but will drop patch 6 -
dts changes), is it OK for you ? Please let me know how you would like
to upstream it.

Lorenzo

>  .../bindings/pci/sifive,fu740-pcie.yaml   | 113 +++
>  MAINTAINERS   |   8 +
>  arch/riscv/boot/dts/sifive/fu740-c000.dtsi|  33 ++
>  drivers/clk/sifive/Kconfig|   2 +
>  drivers/clk/sifive/fu740-prci.c   |  11 +
>  drivers/clk/sifive/fu740-prci.h   |   2 +-
>  drivers/clk/sifive/sifive-prci.c  |  54 +++
>  drivers/clk/sifive/sifive-prci.h  |  13 +
>  drivers/pci/controller/dwc/Kconfig|   9 +
>  drivers/pci/controller/dwc/Makefile   |   1 +
>  drivers/pci/controller/dwc/pcie-fu740.c   | 308 ++
>  drivers/reset/Kconfig |   1 +
>  include/dt-bindings/clock/sifive-fu740-prci.h |   1 +
>  13 files changed, 555 insertions(+), 1 deletion(-)
>  create mode 100644 
> Documentation/devicetree/bindings/pci/sifive,fu740-pcie.yaml
>  create mode 100644 drivers/pci/controller/dwc/pcie-fu740.c
> 
> -- 
> 2.30.2
> 


Re: [PATCH v2 07/17] KVM: x86/mmu: Check PDPTRs before allocating PAE roots

2021-04-08 Thread Sean Christopherson
On Thu, Apr 08, 2021, Paolo Bonzini wrote:
> On 08/04/21 17:48, Sean Christopherson wrote:
> > Freaking PDPTRs.  I was really hoping we could keep the lock and 
> > pages_available()
> > logic outside of the helpers.  What if kvm_mmu_load() reads the PDPTRs and
> > passes them into mmu_alloc_shadow_roots()?  Or is that too ugly?
> 
> The patch I have posted (though untested) tries to do that in a slightly
> less ugly way by pushing make_mmu_pages_available down to mmu_alloc_*_roots.

Yeah, I agree it's less ugly.  It would be nice to not duplicate that code, but
it's probably not worth the ugliness.  :-/

For your approach, can we put the out label after the success path?  Setting
mmu->root_pgd isn't wrong per se, but doing so might mislead future readers into
thinking that it's functionally necessary. 


diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index efb41f31e80a..93f97d0a9e2e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3244,6 +3244,13 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
u8 shadow_root_level = mmu->shadow_root_level;
hpa_t root;
unsigned i;
+   int r;
+
+   write_lock(>kvm->mmu_lock);
+
+   r = make_mmu_pages_available(vcpu);
+   if (r)
+   goto out_unlock;

if (is_tdp_mmu_enabled(vcpu->kvm)) {
root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
@@ -3252,8 +3259,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true);
mmu->root_hpa = root;
} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
-   if (WARN_ON_ONCE(!mmu->pae_root))
-   return -EIO;
+   if (WARN_ON_ONCE(!mmu->pae_root)) {
+   r = -EIO;
+   goto out_unlock;
+   }

for (i = 0; i < 4; ++i) {
WARN_ON_ONCE(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
@@ -3266,13 +3275,15 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
mmu->root_hpa = __pa(mmu->pae_root);
} else {
WARN_ONCE(1, "Bad TDP root level = %d\n", shadow_root_level);
-   return -EIO;
+   r = -EIO;
+   goto out_unlock;
}

/* root_pgd is ignored for direct MMUs. */
mmu->root_pgd = 0;
-
-   return 0;
+out_unlock:
+   write_unlock(>kvm->mmu_lock);
+   return r;
 }

 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
@@ -3281,7 +3292,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
u64 pdptrs[4], pm_mask;
gfn_t root_gfn, root_pgd;
hpa_t root;
-   int i;
+   int i, r;

root_pgd = mmu->get_guest_pgd(vcpu);
root_gfn = root_pgd >> PAGE_SHIFT;
@@ -3289,6 +3300,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
if (mmu_check_root(vcpu, root_gfn))
return 1;

+   /*
+* On SVM, reading PDPTRs might access guest memory, which might fault
+* and thus might sleep.  Grab the PDPTRs before acquiring mmu_lock.
+*/
if (mmu->root_level == PT32E_ROOT_LEVEL) {
for (i = 0; i < 4; ++i) {
pdptrs[i] = mmu->get_pdptr(vcpu, i);
@@ -3300,6 +3315,12 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
}
}

+   write_lock(>kvm->mmu_lock);
+
+   r = make_mmu_pages_available(vcpu);
+   if (r)
+   goto out_unlock;
+
/*
 * Do we shadow a long mode page table? If so we need to
 * write-protect the guests page table root.
@@ -3311,8 +3332,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
goto set_root_pgd;
}

-   if (WARN_ON_ONCE(!mmu->pae_root))
-   return -EIO;
+   if (WARN_ON_ONCE(!mmu->pae_root)) {
+   r = -EIO;
+   goto out_unlock;
+   }

/*
 * We shadow a 32 bit page table. This may be a legacy 2-level
@@ -3323,8 +3346,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;

-   if (WARN_ON_ONCE(!mmu->lm_root))
-   return -EIO;
+   if (WARN_ON_ONCE(!mmu->lm_root)) {
+   r = -EIO;
+   goto out_unlock;
+   }

mmu->lm_root[0] = __pa(mmu->pae_root) | pm_mask;
}
@@ -3352,8 +3377,9 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)

 set_root_pgd:
mmu->root_pgd = root_pgd;
-
-   return 0;
+out_unlock:
+   write_unlock(>kvm->mmu_lock);
+   return r;
 }

 static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
@@ -4852,14 +4878,10 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
r = 

Re: [PATCH -next] ext4: fix error return code in ext4_fc_perform_commit()

2021-04-08 Thread harshad shirwadkar
Thanks, this looks good.

Reviewed-by: Harshad Shirwadkar 

On Thu, Apr 8, 2021 at 12:00 AM Xu Yihang  wrote:
>
> In case of if not ext4_fc_add_tlv branch, an error return code is missing.
>
> Fixes: aa75f4d3daae ("ext4: main fast-commit commit path")
> Reported-by: Hulk Robot 
> Signed-off-by: Xu Yihang 
> ---
>  fs/ext4/fast_commit.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 7541d0b5d706..312273ed8a9f 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -1088,8 +1088,10 @@ static int ext4_fc_perform_commit(journal_t *journal)
> head.fc_tid = cpu_to_le32(
> sbi->s_journal->j_running_transaction->t_tid);
> if (!ext4_fc_add_tlv(sb, EXT4_FC_TAG_HEAD, sizeof(head),
> -   (u8 *), ))
> +   (u8 *), )) {
> +   ret = -ENOSPC;
> goto out;
> +   }
> }
>
> spin_lock(>s_fc_lock);
> --
> 2.17.1
>


RE: [PATCH v2 net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-08 Thread Haiyang Zhang


> -Original Message-
> From: Randy Dunlap 
> Sent: Thursday, April 8, 2021 12:23 PM
> To: Dexuan Cui ; da...@davemloft.net;
> k...@kernel.org; KY Srinivasan ; Haiyang Zhang
> ; Stephen Hemminger
> ; wei@kernel.org; Wei Liu
> ; net...@vger.kernel.org; l...@kernel.org;
> and...@lunn.ch; be...@petrovitsch.priv.at
> Cc: linux-kernel@vger.kernel.org; linux-hyp...@vger.kernel.org
> Subject: Re: [PATCH v2 net-next] net: mana: Add a driver for Microsoft Azure
> Network Adapter (MANA)
> 
> On 4/8/21 2:15 AM, Dexuan Cui wrote:
> > diff --git a/drivers/net/ethernet/microsoft/Kconfig
> b/drivers/net/ethernet/microsoft/Kconfig
> > new file mode 100644
> > index ..12ef6b581566
> > --- /dev/null
> > +++ b/drivers/net/ethernet/microsoft/Kconfig
> > @@ -0,0 +1,30 @@
> > +#
> > +# Microsoft Azure network device configuration
> > +#
> > +
> > +config NET_VENDOR_MICROSOFT
> > +   bool "Microsoft Azure Network Device"
> 
> Seems to me that should be generalized, more like:
> 
>   bool "Microsoft Network Devices"
This device is planned for Azure cloud at this time.
We will update the wording if things change.

> 
> 
> > +   default y
> > +   help
> > + If you have a network (Ethernet) device belonging to this class, say 
> > Y.
> > +
> > + Note that the answer to this question doesn't directly affect the
> > + kernel: saying N will just cause the configurator to skip the
> > + question about Microsoft Azure network device. If you say Y, you
> 
>  about Microsoft networking devices.
(ditto)

> 
> > + will be asked for your specific device in the following question.
> > +
> > +if NET_VENDOR_MICROSOFT
> > +
> > +config MICROSOFT_MANA
> > +   tristate "Microsoft Azure Network Adapter (MANA) support"
> > +   default m
> 
> Please drop the default m. We don't randomly add drivers to be built.
We will.

> 
> Or leave this as is and change NET_VENDOR_MICROSOFT to be default n.
> 
> 
> > +   depends on PCI_MSI && X86_64
> > +   select PCI_HYPERV
> > +   help
> > + This driver supports Microsoft Azure Network Adapter (MANA).
> > + So far, the driver is only validated on X86_64.
> 
> validated how?
On our pre-released HW.

Thanks,
- Haiyang


Re: [PATCH v2 3/3] perf-stat: introduce config stat.bpf-counter-events

2021-04-08 Thread Song Liu



> On Apr 8, 2021, at 4:47 AM, Jiri Olsa  wrote:
> 
> On Tue, Apr 06, 2021 at 05:36:01PM -0700, Song Liu wrote:
>> Currently, to use BPF to aggregate perf event counters, the user uses
>> --bpf-counters option. Enable "use bpf by default" events with a config
>> option, stat.bpf-counter-events. This is limited to hardware events in
>> evsel__hw_names.
>> 
>> This also enables mixed BPF event and regular event in the same sesssion.
>> For example:
>> 
>>   perf config stat.bpf-counter-events=instructions
>>   perf stat -e instructions,cs
>> 
> 
> so if we are mixing events now, how about uing modifier for bpf counters,
> instead of configuring .perfconfig list we could use:
> 
>  perf stat -e instructions:b,cs
> 
> thoughts?
> 
> the change below adds 'b' modifier and sets 'evsel::bpf_counter',
> feel free to use it

I think we will need both 'b' modifier and .perfconfig configuration. 
For systems with BPF-managed perf events running in the background, 
.perfconfig makes sure perf-stat sessions will share PMCs with these 
background monitoring tools. 'b' modifier, on the other hand, is useful
when the user knows there is opportunity to share the PMCs. 

Does this make sense? 

Thanks,
Song

> 
> jirka
> 
> 
> ---
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index ca52581f1b17..c55e4e58d1dc 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -82,6 +82,7 @@ struct evsel {
>   boolauto_merge_stats;
>   boolcollect_stat;
>   boolweak_group;
> + boolbpf_counter;
>   int bpf_fd;
>   struct bpf_object   *bpf_obj;
>   };
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 9ecb45bea948..b5850f1ea90b 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -1801,6 +1801,7 @@ struct event_modifier {
>   int pinned;
>   int weak;
>   int exclusive;
> + int bpf_counter;
> };
> 
> static int get_event_modifier(struct event_modifier *mod, char *str,
> @@ -1821,6 +1822,7 @@ static int get_event_modifier(struct event_modifier 
> *mod, char *str,
>   int exclude = eu | ek | eh;
>   int exclude_GH = evsel ? evsel->exclude_GH : 0;
>   int weak = 0;
> + int bpf_counter = 0;
> 
>   memset(mod, 0, sizeof(*mod));
> 
> @@ -1864,6 +1866,8 @@ static int get_event_modifier(struct event_modifier 
> *mod, char *str,
>   exclusive = 1;
>   } else if (*str == 'W') {
>   weak = 1;
> + } else if (*str == 'b') {
> + bpf_counter = 1;
>   } else
>   break;
> 
> @@ -1895,6 +1899,7 @@ static int get_event_modifier(struct event_modifier 
> *mod, char *str,
>   mod->sample_read = sample_read;
>   mod->pinned = pinned;
>   mod->weak = weak;
> + mod->bpf_counter = bpf_counter;
>   mod->exclusive = exclusive;
> 
>   return 0;
> @@ -1909,7 +1914,7 @@ static int check_modifier(char *str)
>   char *p = str;
> 
>   /* The sizeof includes 0 byte as well. */
> - if (strlen(str) > (sizeof("ukhGHpppPSDIWe") - 1))
> + if (strlen(str) > (sizeof("ukhGHpppPSDIWeb") - 1))
>   return -1;
> 
>   while (*p) {
> @@ -1950,6 +1955,7 @@ int parse_events__modifier_event(struct list_head 
> *list, char *str, bool add)
>   evsel->sample_read = mod.sample_read;
>   evsel->precise_max = mod.precise_max;
>   evsel->weak_group  = mod.weak;
> + evsel->bpf_counter = mod.bpf_counter;
> 
>   if (evsel__is_group_leader(evsel)) {
>   evsel->core.attr.pinned = mod.pinned;
> diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> index 0b36285a9435..fb8646cc3e83 100644
> --- a/tools/perf/util/parse-events.l
> +++ b/tools/perf/util/parse-events.l
> @@ -210,7 +210,7 @@ name_tag  
> [\'][a-zA-Z_*?\[\]][a-zA-Z0-9_*?\-,\.\[\]:=]*[\']
> name_minus[a-zA-Z_*?][a-zA-Z0-9\-_*?.:]*
> drv_cfg_term  [a-zA-Z0-9_\.]+(=[a-zA-Z0-9_*?\.:]+)?
> /* If you add a modifier you need to update check_modifier() */
> -modifier_event   [ukhpPGHSDIWe]+
> +modifier_event   [ukhpPGHSDIWeb]+
> modifier_bp   [rwx]{1,3}
> 
> %%
> 



[PATCH v1] ata: ahci_tegra: call tegra_powergate_power_off only when PM domain is not present

2021-04-08 Thread Sowjanya Komatineni
This patch adds a check on present of PM domain and calls legacy power
domain API tegra_powergate_power_off() only when PM domain is not present.

This is a follow-up patch to Tegra186 AHCI support patch series
https://lore.kernel.org/patchwork/cover/1408752/

Signed-off-by: Sowjanya Komatineni 

---
 drivers/ata/ahci_tegra.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/ahci_tegra.c b/drivers/ata/ahci_tegra.c
index 56612af..bd484dd 100644
--- a/drivers/ata/ahci_tegra.c
+++ b/drivers/ata/ahci_tegra.c
@@ -287,7 +287,8 @@ static void tegra_ahci_power_off(struct ahci_host_priv 
*hpriv)
reset_control_assert(tegra->sata_cold_rst);
 
clk_disable_unprepare(tegra->sata_clk);
-   tegra_powergate_power_off(TEGRA_POWERGATE_SATA);
+   if (!tegra->pdev->dev.pm_domain)
+   tegra_powergate_power_off(TEGRA_POWERGATE_SATA);
 
regulator_bulk_disable(tegra->soc->num_supplies, tegra->supplies);
 }
-- 
2.7.4



[PATCH v1] Follow up patch to Tegra186 AHCI support patch series

2021-04-08 Thread Sowjanya Komatineni
This includes a follow up patch to Tegra186 AHCI support patch series
https://lore.kernel.org/patchwork/cover/1408752/


Sowjanya Komatineni (1):
  ata: ahci_tegra: call tegra_powergate_power_off only when PM domain is
not present

 drivers/ata/ahci_tegra.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.7.4



Re: [Intel-gfx] [PATCH v2] drm/i915: Fix invalid access to ACPI _DSM objects

2021-04-08 Thread Ville Syrjälä
On Thu, Apr 08, 2021 at 06:34:06PM +0200, Takashi Iwai wrote:
> On Thu, 08 Apr 2021 09:51:18 +0200,
> Takashi Iwai wrote:
> > 
> > On Wed, 07 Apr 2021 23:28:48 +0200,
> > Ville Syrjälä wrote:
> > > 
> > > Oh, could you ask the bug reporter to attach an acpidump to the
> > > bug? Might be good to have that stuff on record somewhere if/when
> > > someone wants to actually figure out what's going on here.
> > 
> > OK, I'll ask.
> 
> Available at
>   acpidump: https://susepaste.org/59777448
>   hwinfo: https://susepaste.org/86284020

Those will apparently expire real soon. I took local copies out
of morbid curiosity, but that's not going to help anyone else 
in the future. I rather wish bug reporting tools would flat out
refuse to accept any pastebin links.

-- 
Ville Syrjälä
Intel


Re: [PATCH] ASoC: fsl: sunxi: remove redundant dev_err call

2021-04-08 Thread Mark Brown
On Wed, 7 Apr 2021 14:56:34 +0500, Muhammad Usama Anjum wrote:
> devm_ioremap_resource() prints error message in itself. Remove the
> dev_err call to avoid redundant error message.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl: sunxi: remove redundant dev_err call
  commit: a93799d55fd479f540ed97066e69114aa7709787

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH -next] ASoC: codecs: wsa881x: constify static struct snd_soc_dai_ops

2021-04-08 Thread Mark Brown
On Thu, 8 Apr 2021 14:27:00 +0800, Ye Bin wrote:
> The snd_soc_dai_ops structures is only stored in the ops field of a
> snd_soc_dai_driver structure, so make the snd_soc_dai_ops structure
> const to allow the compiler to put it in read-only memory.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: codecs: wsa881x: constify static struct snd_soc_dai_ops
  commit: f985838003ee618daba7a38da3efe27c639575e2

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH -next] ASoC: rt1019: constify static struct snd_soc_dai_ops

2021-04-08 Thread Mark Brown
On Thu, 8 Apr 2021 14:27:01 +0800, Ye Bin wrote:
> The snd_soc_dai_ops structures is only stored in the ops field of a
> snd_soc_dai_driver structure, so make the snd_soc_dai_ops structure
> const to allow the compiler to put it in read-only memory.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: rt1019: constify static struct snd_soc_dai_ops
  commit: 5e71e9c14db4e49cca56354c95ce10e0e00214d1

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH] [v2] ASoC: codecs: Fix runtime PM imbalance in tas2552_probe

2021-04-08 Thread Mark Brown
On Thu, 8 Apr 2021 14:40:34 +0800, Dinghao Liu wrote:
> There is a rumtime PM imbalance between the error handling path
> after devm_snd_soc_register_component() and all other error
> handling paths. Add a PM runtime increment to balance refcount.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: codecs: Fix runtime PM imbalance in tas2552_probe
  commit: 7b3f5b207da5116add56c335c5fb92cee140dc63

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH -next] ASoC: tas2770: Constify static struct snd_soc_dai_ops

2021-04-08 Thread Mark Brown
On Thu, 8 Apr 2021 14:26:46 +0800, Ye Bin wrote:
> The snd_soc_dai_ops structures is only stored in the ops field of a
> snd_soc_dai_driver structure, so make the snd_soc_dai_ops structure
> const to allow the compiler to put it in read-only memory.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: tas2770: Constify static struct snd_soc_dai_ops
  commit: f2ec1ebb257155fb534cad390575d696dfd567fb

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH -next] ASoC: tas2764: constify static struct snd_soc_dai_ops

2021-04-08 Thread Mark Brown
On Thu, 8 Apr 2021 14:26:43 +0800, Ye Bin wrote:
> The snd_soc_dai_ops structures is only stored in the ops field of a
> snd_soc_dai_driver structure, so make the snd_soc_dai_ops structure
> const to allow the compiler to put it in read-only memory.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: tas2764: constify static struct snd_soc_dai_ops
  commit: b186e7c17d9f2c2bc9cd0bd362402eddbea7749b

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH -next] ASoC: cx2072x: constify static struct snd_soc_dai_ops

2021-04-08 Thread Mark Brown
On Thu, 8 Apr 2021 14:26:56 +0800, Ye Bin wrote:
> The snd_soc_dai_ops structures is only stored in the ops field of a
> snd_soc_dai_driver structure, so make the snd_soc_dai_ops structure
> const to allow the compiler to put it in read-only memory.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: cx2072x: constify static struct snd_soc_dai_ops
  commit: e9a216d8f14ac4d926078885e7e772db08e6aad9

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH V4 XRT Alveo 08/20] fpga: xrt: platform driver infrastructure

2021-04-08 Thread Max Zhen

Hi Tom,


On 3/31/21 5:50 AM, Tom Rix wrote:

Several just for debugging items, consider adding a CONFIG_XRT_DEBUGGING



I'd like to clarify what "only for debugging" means here. It actually 
means that the content of the msg/output only makes sense to a 
developer, v.s. end user. It does not mean that only developer will ever 
execute this code path which triggers the debugging code.


We have msg from print functions like this, and we have output from 
sysfs node like this. We can't just disable all of them by default 
because the content only makes sense to a developer. In some cases, 
requiring a recompilation of the driver to enable the debugging code is 
very difficult to do. E.g., when we need to debug a customer issue and 
we do not have access to the system. It's a big ask for our customer to 
recompile, reload the driver and reproduce the issue for us (v.s. just 
collect and send us the msg/output).




On 3/23/21 10:29 PM, Lizhi Hou wrote:

Infrastructure code providing APIs for managing leaf driver instance
groups, facilitating inter-leaf driver calls and root calls.

Signed-off-by: Sonal Santan
Signed-off-by: Max Zhen
Signed-off-by: Lizhi Hou
---
  drivers/fpga/xrt/lib/subdev.c | 865 ++
  1 file changed, 865 insertions(+)
  create mode 100644 drivers/fpga/xrt/lib/subdev.c

diff --git a/drivers/fpga/xrt/lib/subdev.c b/drivers/fpga/xrt/lib/subdev.c
new file mode 100644
index ..6428b183fee3
--- /dev/null
+++ b/drivers/fpga/xrt/lib/subdev.c
@@ -0,0 +1,865 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020-2021 Xilinx, Inc.
+ *
+ * Authors:
+ *   Cheng Zhen
+ */
+
+#include 
+#include 
+#include 
+#include "xleaf.h"
+#include "subdev_pool.h"
+#include "lib-drv.h"
+#include "metadata.h"
+
+#define IS_ROOT_DEV(dev) ((dev)->bus == _bus_type)

for readablity, add a new line here



Will do.



+static inline struct device *find_root(struct platform_device *pdev)
+{
+ struct device *d = DEV(pdev);
+
+ while (!IS_ROOT_DEV(d))
+ d = d->parent;
+ return d;
+}
+
+/*
+ * It represents a holder of a subdev. One holder can repeatedly hold a subdev
+ * as long as there is a unhold corresponding to a hold.
+ */
+struct xrt_subdev_holder {
+ struct list_head xsh_holder_list;
+ struct device *xsh_holder;
+ int xsh_count;
+ struct kref xsh_kref;
+};
+
+/*
+ * It represents a specific instance of platform driver for a subdev, which
+ * provides services to its clients (another subdev driver or root driver).
+ */
+struct xrt_subdev {
+ struct list_head xs_dev_list;
+ struct list_head xs_holder_list;
+ enum xrt_subdev_id xs_id;   /* type of subdev */
+ struct platform_device *xs_pdev;/* a particular subdev inst */
+ struct completion xs_holder_comp;
+};
+
+static struct xrt_subdev *xrt_subdev_alloc(void)
+{
+ struct xrt_subdev *sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);

ok

+
+ if (!sdev)
+ return NULL;
+
+ INIT_LIST_HEAD(>xs_dev_list);
+ INIT_LIST_HEAD(>xs_holder_list);
+ init_completion(>xs_holder_comp);
+ return sdev;
+}
+
+static void xrt_subdev_free(struct xrt_subdev *sdev)
+{
+ kfree(sdev);

Abstraction for a single function is not needed, use kfree directly.



Will do.



+}
+
+int xrt_subdev_root_request(struct platform_device *self, u32 cmd, void *arg)
+{
+ struct device *dev = DEV(self);
+ struct xrt_subdev_platdata *pdata = DEV_PDATA(self);
+
+ WARN_ON(!pdata->xsp_root_cb);

ok

+ return (*pdata->xsp_root_cb)(dev->parent, pdata->xsp_root_cb_arg, cmd, 
arg);
+}
+
+/*
+ * Subdev common sysfs nodes.
+ */
+static ssize_t holders_show(struct device *dev, struct device_attribute *attr, 
char *buf)
+{
+ ssize_t len;
+ struct platform_device *pdev = to_platform_device(dev);
+ struct xrt_root_get_holders holders = { pdev, buf, 1024 };

Since 1024 is config, #define it somewhere so it can be tweeked later



Will do.



+
+ len = xrt_subdev_root_request(pdev, XRT_ROOT_GET_LEAF_HOLDERS, );
+ if (len >= holders.xpigh_holder_buf_len)
+ return len;
+ buf[len] = '\n';
+ return len + 1;
+}
+static DEVICE_ATTR_RO(holders);
+
+static struct attribute *xrt_subdev_attrs[] = {
+ _attr_holders.attr,
+ NULL,
+};
+
+static ssize_t metadata_output(struct file *filp, struct kobject *kobj,
+struct bin_attribute *attr, char *buf, loff_t off, 
size_t count)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct platform_device *pdev = to_platform_device(dev);
+ struct xrt_subdev_platdata *pdata = DEV_PDATA(pdev);
+ unsigned char *blob;
+ unsigned long  size;
+ ssize_t ret = 0;
+
+ blob = pdata->xsp_dtb;
+ size = xrt_md_size(dev, blob);
+ if (size == XRT_MD_INVALID_LENGTH) {
+ ret = -EINVAL;
+ goto failed;
+ }
+
+ if (off >= size)
+ goto failed;

if this and next are used for debugging, 

Re: [PATCH 1/7] cxl/mem: Use dev instead of pdev->dev

2021-04-08 Thread Jonathan Cameron
On Wed, 7 Apr 2021 15:26:19 -0700
Ben Widawsky  wrote:

> Trivial cleanup.

Obviously correct :)

> 
> Signed-off-by: Ben Widawsky 
FWIW
Acked-by: Jonathan Cameron 

> ---
>  drivers/cxl/mem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index b6fe4e81d38a..99534260034e 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -935,7 +935,7 @@ static struct cxl_mem *cxl_mem_create(struct pci_dev 
> *pdev, u32 reg_lo,
>   u8 bar;
>   int rc;
>  
> - cxlm = devm_kzalloc(>dev, sizeof(*cxlm), GFP_KERNEL);
> + cxlm = devm_kzalloc(dev, sizeof(*cxlm), GFP_KERNEL);
>   if (!cxlm) {
>   dev_err(dev, "No memory available\n");
>   return NULL;



[PATCH 7/7] ARM: configs: qcom_defconfig: Reduce CMA size to 64MB

2021-04-08 Thread Manivannan Sadhasivam
Not all platforms are able to allocate CMA size of 256MB. One such
platform is SDX55. Hence, use the standard 64MB size for CMA.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm/configs/qcom_defconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/configs/qcom_defconfig b/arch/arm/configs/qcom_defconfig
index 4e4c49c29aa5..26353cbfa968 100644
--- a/arch/arm/configs/qcom_defconfig
+++ b/arch/arm/configs/qcom_defconfig
@@ -296,7 +296,7 @@ CONFIG_NLS_ASCII=y
 CONFIG_NLS_ISO8859_1=y
 CONFIG_NLS_UTF8=y
 CONFIG_DMA_CMA=y
-CONFIG_CMA_SIZE_MBYTES=256
+CONFIG_CMA_SIZE_MBYTES=64
 CONFIG_PRINTK_TIME=y
 CONFIG_DYNAMIC_DEBUG=y
 CONFIG_DEBUG_INFO=y
-- 
2.25.1



[PATCH 4/7] ARM: configs: qcom_defconfig: Enable Q6V5_PAS remoteproc driver

2021-04-08 Thread Manivannan Sadhasivam
Enable the Qualcomm Q6V5_PAS (Peripheral Authentication Service)
remoteproc driver to manage the modem co-processor in SDX55 platform.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm/configs/qcom_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/qcom_defconfig b/arch/arm/configs/qcom_defconfig
index 47343d0ea586..695612829503 100644
--- a/arch/arm/configs/qcom_defconfig
+++ b/arch/arm/configs/qcom_defconfig
@@ -237,6 +237,7 @@ CONFIG_MAILBOX=y
 CONFIG_QCOM_APCS_IPC=y
 CONFIG_REMOTEPROC=y
 CONFIG_QCOM_ADSP_PIL=y
+CONFIG_QCOM_Q6V5_PAS=y
 CONFIG_QCOM_Q6V5_PIL=y
 CONFIG_QCOM_WCNSS_PIL=y
 CONFIG_RPMSG_CHAR=y
-- 
2.25.1



[PATCH 0/7] SDX55 defconfig updates for v5.13

2021-04-08 Thread Manivannan Sadhasivam
Hi Bjorn,

This series updates the qcom_defconfig by enabling the drivers required
for the SDX55 platform.

Please consider merging!

Thanks,
Mani

Manivannan Sadhasivam (7):
  ARM: configs: qcom_defconfig: Enable APCS IPC mailbox driver
  ARM: configs: qcom_defconfig: Enable SDX55 A7 PLL and APCS clock
driver
  ARM: configs: qcom_defconfig: Enable CPUFreq support
  ARM: configs: qcom_defconfig: Enable Q6V5_PAS remoteproc driver
  ARM: configs: qcom_defconfig: Enable SDX55 interconnect driver
  ARM: configs: qcom_defconfig: Enable GLINK SMEM driver
  ARM: configs: qcom_defconfig: Reduce CMA size to 64MB

 arch/arm/configs/qcom_defconfig | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

-- 
2.25.1



[PATCH 1/7] ARM: configs: qcom_defconfig: Enable APCS IPC mailbox driver

2021-04-08 Thread Manivannan Sadhasivam
Enable Qualcomm APCS IPC mailbox driver for IPC communication between
application processor and other masters in platforms like SDX55.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm/configs/qcom_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/qcom_defconfig b/arch/arm/configs/qcom_defconfig
index 3f36887e8333..0b9da27f923a 100644
--- a/arch/arm/configs/qcom_defconfig
+++ b/arch/arm/configs/qcom_defconfig
@@ -232,6 +232,7 @@ CONFIG_ARM_SMMU=y
 CONFIG_HWSPINLOCK=y
 CONFIG_HWSPINLOCK_QCOM=y
 CONFIG_MAILBOX=y
+CONFIG_QCOM_APCS_IPC=y
 CONFIG_REMOTEPROC=y
 CONFIG_QCOM_ADSP_PIL=y
 CONFIG_QCOM_Q6V5_PIL=y
-- 
2.25.1



[PATCH 2/7] ARM: configs: qcom_defconfig: Enable SDX55 A7 PLL and APCS clock driver

2021-04-08 Thread Manivannan Sadhasivam
Enable A7 PLL driver and APCS clock driver on SDX55 platform.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm/configs/qcom_defconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/configs/qcom_defconfig b/arch/arm/configs/qcom_defconfig
index 0b9da27f923a..02f6185f31a6 100644
--- a/arch/arm/configs/qcom_defconfig
+++ b/arch/arm/configs/qcom_defconfig
@@ -215,6 +215,8 @@ CONFIG_DMADEVICES=y
 CONFIG_QCOM_BAM_DMA=y
 CONFIG_STAGING=y
 CONFIG_COMMON_CLK_QCOM=y
+CONFIG_QCOM_A7PLL=y
+CONFIG_QCOM_CLK_APCS_SDX55=y
 CONFIG_QCOM_CLK_RPM=y
 CONFIG_QCOM_CLK_RPMH=y
 CONFIG_QCOM_CLK_SMD_RPM=y
-- 
2.25.1



[PATCH 6/7] ARM: configs: qcom_defconfig: Enable GLINK SMEM driver

2021-04-08 Thread Manivannan Sadhasivam
Enable the Qualcomm GLINK SMEM driver to support GLINK protocol over
shared memory.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm/configs/qcom_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/qcom_defconfig b/arch/arm/configs/qcom_defconfig
index 5955aeb0646e..4e4c49c29aa5 100644
--- a/arch/arm/configs/qcom_defconfig
+++ b/arch/arm/configs/qcom_defconfig
@@ -241,6 +241,7 @@ CONFIG_QCOM_Q6V5_PAS=y
 CONFIG_QCOM_Q6V5_PIL=y
 CONFIG_QCOM_WCNSS_PIL=y
 CONFIG_RPMSG_CHAR=y
+CONFIG_RPMSG_QCOM_GLINK_SMEM=y
 CONFIG_RPMSG_QCOM_SMD=y
 CONFIG_QCOM_COMMAND_DB=y
 CONFIG_QCOM_GSBI=y
-- 
2.25.1



[PATCH 5/7] ARM: configs: qcom_defconfig: Enable SDX55 interconnect driver

2021-04-08 Thread Manivannan Sadhasivam
Enable interconnect driver for SDX55 platform to manage the interconnect
providers.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm/configs/qcom_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/qcom_defconfig b/arch/arm/configs/qcom_defconfig
index 695612829503..5955aeb0646e 100644
--- a/arch/arm/configs/qcom_defconfig
+++ b/arch/arm/configs/qcom_defconfig
@@ -277,6 +277,7 @@ CONFIG_QCOM_QFPROM=y
 CONFIG_INTERCONNECT=y
 CONFIG_INTERCONNECT_QCOM=y
 CONFIG_INTERCONNECT_QCOM_MSM8974=m
+CONFIG_INTERCONNECT_QCOM_SDX55=m
 CONFIG_EXT2_FS=y
 CONFIG_EXT2_FS_XATTR=y
 CONFIG_EXT3_FS=y
-- 
2.25.1



[PATCH 3/7] ARM: configs: qcom_defconfig: Enable CPUFreq support

2021-04-08 Thread Manivannan Sadhasivam
Enable CPUFreq and CPUFreq DT drivers to carry out CPU Frequency scaling
duties on platforms like SDX55.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm/configs/qcom_defconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/configs/qcom_defconfig b/arch/arm/configs/qcom_defconfig
index 02f6185f31a6..47343d0ea586 100644
--- a/arch/arm/configs/qcom_defconfig
+++ b/arch/arm/configs/qcom_defconfig
@@ -302,3 +302,5 @@ CONFIG_MAGIC_SYSRQ=y
 CONFIG_WATCHDOG=y
 CONFIG_QCOM_WDT=y
 CONFIG_ARM_PSCI=y
+CONFIG_CPU_FREQ=y
+CONFIG_CPUFREQ_DT=y
-- 
2.25.1



Re: [PATCH -next] PM: runtime: Replace inline function pm_runtime_callbacks_present()

2021-04-08 Thread Rafael J. Wysocki
On Fri, Apr 2, 2021 at 8:14 AM YueHaibing  wrote:
>
> commit 9a7875461fd0 ("PM: runtime: Replace pm_runtime_callbacks_present()")
> forget to change the inline version.
>
> Fixes: 9a7875461fd0 ("PM: runtime: Replace pm_runtime_callbacks_present()")
> Signed-off-by: YueHaibing 
> ---
>  include/linux/pm_runtime.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> index b492ae00cc90..6c08a085367b 100644
> --- a/include/linux/pm_runtime.h
> +++ b/include/linux/pm_runtime.h
> @@ -265,7 +265,7 @@ static inline void pm_runtime_no_callbacks(struct device 
> *dev) {}
>  static inline void pm_runtime_irq_safe(struct device *dev) {}
>  static inline bool pm_runtime_is_irq_safe(struct device *dev) { return 
> false; }
>
> -static inline bool pm_runtime_callbacks_present(struct device *dev) { return 
> false; }
> +static inline bool pm_runtime_has_no_callbacks(struct device *dev) { return 
> false; }
>  static inline void pm_runtime_mark_last_busy(struct device *dev) {}
>  static inline void __pm_runtime_use_autosuspend(struct device *dev,
> bool use) {}
> --

Applied as 5.13 material, thanks!


Re: [PATCH v2 01/24] x86/resctrl: Split struct rdt_resource

2021-04-08 Thread James Morse
Hi Reinette,

On 07/04/2021 00:42, Reinette Chatre wrote:
> On 4/6/2021 10:13 AM, James Morse wrote:
>> On 31/03/2021 22:35, Reinette Chatre wrote:
>>> On 3/12/2021 9:58 AM, James Morse wrote:
 resctrl is the defacto Linux ABI for SoC resource partitioning features.
 To support it on another architecture, it needs to be abstracted from
 the features provided by Intel RDT and AMD PQoS, and moved to /fs/.

 Start by splitting struct rdt_resource, (the name is kept to keep the noise
 down), and add some type-trickery to keep the foreach helpers working.

 Move everything that is particular to resctrl into a new header
 file, keeping the x86 hardware accessors where they are. resctrl code
 paths touching a 'hw' struct indicates where an abstraction is needed.
>>>
>>> This establishes the significance of this patch. Here the rdt_resource 
>>> struct is split up
>>> and it is this split that guides the subsequent abstraction. Considering 
>>> this I find that
>>> this description does not explain the resulting split sufficiently.
>>>
>>> Specifically, after reading the above summary I expect fs information in 
>>> rdt_resource and
>>> hw information in rdt_hw_resource but that does not seem to be the case. 
>>> For example,
>>> num_rmid is a property obtained from hardware but is found in rdt_resource 
>>> while other
>>> hardware properties initialized at the same time are found in 
>>> rdt_hw_resource. It is
>>> interesting to look at when the hardware is discovered (for example, 
>>> functions like
>>> cache_alloc_hsw_probe(), __get_mem_config_intel(), 
>>> __rdt_get_mem_config_amd(),
>>> rdt_get_cache_alloc_cfg()). Note how some of the discovered values end up 
>>> in rdt_resource
>>> and some in rdt_hw_resource.
>>
>>> I was expecting these properties discovered from hardware to
>>> be in rdt_hw_resource.
>>
>> Not all values discovered from the hardware are private to the architecture. 
>> They only
>> need to be private if there is some further abstraction involved.

> ok, but rdt_hw_resource is described as "hw attributes of a resctrl resource" 
> so this can
> be very confusing if rdt_hw_resource does _not_ actually contain (all of) the 
> hw
> attributes of a resctrl resource.

Aha, right. I'm bad at naming things. This started as untangling the hardware 
(cough:
arch) specific bits, but some things have migrated back the other way.

Do you think either of arch_rdt_resource or rdt_priv_resource are clearer?


> Could you please expand the kernel doc for rdt_hw_resource to explain that, 
> apart from
> @resctrl (that I just noticed is missing a description),

I'll add one for mbm_width too,

> it contains attributes needing
> abstraction for different architectures as opposed to the actual hardware 
> attributes?

|/**
| * struct rdt_hw_resource - arch private attributes of a resctrl resource
| * @resctrl:   Attributes of the resource used directly by resctrl.
| * @num_closid:Number of CLOSIDs available.
| * @msr_base:  Base MSR address for CBMs
| * @msr_update:Function pointer to update QOS MSRs
| * @mon_scale: cqm counter * mon_scale = occupancy in bytes
| * @mbm_width: Monitor width, to detect and correct for overflow.
| *
| * Members of this structure are either private to the architecture
| * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
| * msr_update and msr_base.
| */


>> On your specific example: the resctrl filesystem code allocates from 
>> num_rmid. Its meaning
>> doesn't change. num_closid on the other hand changes depending on whether 
>> CDP is in use.
>>
>> Putting num_closid in resctrl's struct rdt_resource would work, but the 
>> value is wrong
>> once CDP is enabled. This would be annoying to debug, hiding the hardware 
>> value and
>> providing it via a helper avoids this, as by the end of the series there is 
>> only one
>> consumer: schemata_list_create().
>>
>> For MPAM, the helper would return arm64's version of rdt_min_closid as there 
>> is only one
>> 'num_closid' for the system, regardless of the resource. The driver has to 
>> duplicate the
>> logic in closid_init() to find the minimum common value of all the 
>> resources, as not all
>> the resources are exposed to resctrl, and an out-of-range closid value 
>> triggers an error
>> interrupt.
>>
>>
>>> It is also not clear to me how these structures are intended to be used for 
>>> related
>>> hardware properties. For example, rdt_resource keeps the properties
>>> alloc_capable/alloc_enabled/mon_capable/mon_enabled - but in this series 
>>> companion
>>> properties of cdp_capable/cdp_enabled are introduced and placed in 
>>> rdt_hw_resource.
>>
>> There needs to be further abstraction around cdp_enabled. For Arm's MPAM CDP 
>> is emulated
>> by providing different closid for data-access and instruction-fetch. This is 
>> done in the
>> equivalent to IA32_PQR_ASSOC, so it affects all the resources.
>>
>> For MPAM all 

Re: [PATCH v2 00/24] x86/resctrl: Merge the CDP resources

2021-04-08 Thread James Morse
Hi Babu,

On 06/04/2021 22:37, Babu Moger wrote:
> On 4/6/21 12:19 PM, James Morse wrote:
>> On 30/03/2021 21:36, Babu Moger wrote:
>>> On 3/12/21 11:58 AM, James Morse wrote:
 This series re-folds the resctrl code so the CDP resources (L3CODE et al)
 behaviour is all contained in the filesystem parts, with a minimum amount
 of arch specific code.

 This series collapses the CODE/DATA resources, moving all the user-visible
 resctrl ABI into what becomes the filesystem code. CDP becomes the type of
 configuration being applied to a cache. This is done by adding a
 struct resctrl_schema to the parts of resctrl that will move to fs. This
 holds the arch-code resource that is in use for this schema, along with
 other properties like the name, and whether the configuration being applied
 is CODE/DATA/BOTH.
>>
>>
>>> I applied your patches on my AMD box.
>>
>> Great! Thanks for taking a look,
>>
>>
>>> Seeing some difference in the behavior.
>>
>> Ooer,
>>
>>
>>> Before these patches.
>>>
>>> # dmesg |grep -i resctrl
>>> [   13.076973] resctrl: L3 allocation detected
>>> [   13.087835] resctrl: L3DATA allocation detected
>>> [   13.092886] resctrl: L3CODE allocation detected
>>> [   13.097936] resctrl: MB allocation detected
>>> [   13.102599] resctrl: L3 monitoring detected
>>>
>>>
>>> After the patches.
>>>
>>> # dmesg |grep -i resctrl
>>> [   13.076973] resctrl: L3 allocation detected
>>> [   13.097936] resctrl: MB allocation detected
>>> [   13.102599] resctrl: L3 monitoring detected
>>>
>>> You can see that L3DATA and L3CODE disappeared. I think we should keep the
>>> behavior same for x86(at least).
>>
>> This is the kernel log ... what user-space software is parsing that for an 
>> expected value?
>> What happens if the resctrl strings have been overwritten by more kernel log?
>>
>> I don't think user-space should be relying on this. I'd argue any user-space 
>> doing this is
>> already broken. Is it just the kernel selftest's filter_dmesg()? It doesn't 
>> seem to do
>> anything useful
>>
>> Whether resctrl is support can be read from /proc/filesystems. CDP is 
>> probably a
>> try-it-and-see. User-space could parse /proc/cpuinfo, but its probably not a 
>> good idea.

> Yes. Agree. Looking at the dmesg may no be right way to figure out all the
> support details. As a normal practice, I searched for these texts and
> noticed difference. That is why I felt it is best to keep those texts same
> as before.

>> Its easy to fix, but it seems odd that the kernel has to print things for 
>> user-space to
>> try and parse. (I'd like to point at the user-space software that depends on 
>> this)
> 
> I dont think there is any software that parses the dmesg for these
> details. These are info messages for the developers.

The kernel log changes all the time, messages at boot aren't something you can 
depend on
seeing later. Unless there is some user-space software broken by this, I'm 
afraid I don't
think its a good idea to add extra code to keep it the same.

Printing 'CDP supported by Lx' would be more useful to developers perusing the 
log. Even
more useful would be exposing feature attributes via sysfs to say what resctrl 
supports
without having to mount-it-and-see. This can then be used by user-space too.
e.g.:
| cat /sys/fs/ext4/features/fast_commit


>>> I am still not clear why we needed resctrl_conf_type
>>>
>>> enum resctrl_conf_type {
>>> CDP_BOTH,
>>> CDP_CODE,
>>> CDP_DATA,
>>> };
>>>
>>> Right now, I see all the resources are initialized as CDP_BOTH.
>>>
>>>  [RDT_RESOURCE_L3] =
>>> {
>>> .conf_type  = CDP_BOTH,
>>>  [RDT_RESOURCE_L2] =
>>> {
>>> .conf_type  = CDP_BOTH,
>>>  [RDT_RESOURCE_MBA] =
>>> {
>>> .conf_type  = CDP_BOTH,
>>
>> Ah, those should have been removed in patch 24. Once all the resources are 
>> the same, the
>> resource doesn't need to describe what kind it is.
>>
>>
>>> If all the resources are CDP_BOTH, then why we need separate CDP_CODE and
>>> CDP_DATA?
>>
>> The filesystem code for resctrl that will eventually move out of arch/x86 
>> needs to be able
>> to describe the type of configuration change being made back to the arch 
>> code. The enum
>> gets used for that.
>>
>> x86 needs this as it affects which MSRs the configuration value is written 
>> to.
>>
>>
>>> Are these going to be different for ARM?
>>
>> Nope. Arm's MPAM ends up emulating CDP with the closid values that get 
>> applied to
>> transactions.
>>
>>
>>> Also initializing RDT_RESOURCE_MBA as CDP_BOTH does not seem right. I dont
>>> think there will CDP support in MBA in future.
>>
>> Its not code or data, which makes it both. 'BOTH' is more of a 'do nothing 
>> special', there
>> may be a better name, but I'm not very good at naming things. (any 
>> suggestions?)

> Do you think CDP_NONE will make some sense?

If you 

Re: [PATCH v2 3/3] perf-stat: introduce config stat.bpf-counter-events

2021-04-08 Thread Jiri Olsa
On Thu, Apr 08, 2021 at 04:39:33PM +, Song Liu wrote:
> 
> 
> > On Apr 8, 2021, at 4:47 AM, Jiri Olsa  wrote:
> > 
> > On Tue, Apr 06, 2021 at 05:36:01PM -0700, Song Liu wrote:
> >> Currently, to use BPF to aggregate perf event counters, the user uses
> >> --bpf-counters option. Enable "use bpf by default" events with a config
> >> option, stat.bpf-counter-events. This is limited to hardware events in
> >> evsel__hw_names.
> >> 
> >> This also enables mixed BPF event and regular event in the same sesssion.
> >> For example:
> >> 
> >>   perf config stat.bpf-counter-events=instructions
> >>   perf stat -e instructions,cs
> >> 
> > 
> > so if we are mixing events now, how about uing modifier for bpf counters,
> > instead of configuring .perfconfig list we could use:
> > 
> >  perf stat -e instructions:b,cs
> > 
> > thoughts?
> > 
> > the change below adds 'b' modifier and sets 'evsel::bpf_counter',
> > feel free to use it
> 
> I think we will need both 'b' modifier and .perfconfig configuration. 
> For systems with BPF-managed perf events running in the background, 

hum, I'm not sure I understand what that means.. you mean there
are tools that run perf stat so you don't want to change them?

> .perfconfig makes sure perf-stat sessions will share PMCs with these 
> background monitoring tools. 'b' modifier, on the other hand, is useful
> when the user knows there is opportunity to share the PMCs. 
> 
> Does this make sense? 

if there's reason for that then sure.. but let's not limit that just
on HARDWARE events only.. there are RAW events with the same demand
for this feature.. why don't we let user define any event for this?

jirka



Re: [PATCH v2 02/24] x86/resctrl: Split struct rdt_domain

2021-04-08 Thread James Morse
Hi Reinette,

On 31/03/2021 22:36, Reinette Chatre wrote:
> On 3/12/2021 9:58 AM, James Morse wrote:
>> resctrl is the defacto Linux ABI for SoC resource partitioning features.
>> To support it on another architecture, it needs to be abstracted from
>> the features provided by Intel RDT and AMD PQoS, and moved to /fs/.
>>
>> Split struct rdt_domain up too. Move everything that that is particular
> 
> s/that that/that/
> 
>> to resctrl into a new header file. resctrl code paths touching a 'hw'
>> struct indicates where an abstraction is needed.
> 
> Similar to previous patch it would help to explain how this split was chosen. 
> For example,
> why are the CPUs to which a resource is associated not considered a hardware 
> property?

Similarly, because the meaning of those CPUs doesn't differ or change between 
architectures.

I've expanded the middle paragraph in the commit message to explain why the 
arch specific
things are arch specific:
| Continue by splitting struct rdt_domain, into an architecture private
| 'hw' struct, which contains the common resctrl structure that would be
| used by any architecture.
|
| The hardware values in ctrl_val and mbps_val need to be accessed via
| helpers to allow another architecture to convert these into a different
| format if necessary.
|
| After this split, filesystem code code paths touching a 'hw' struct
| indicates where an abstraction is needed.

and similarly changed the kernel doc comment.


Let me know if you prefer some other struct name.


Thanks,

James


Re: [PATCH 0/4] Add support for XMM fast hypercalls

2021-04-08 Thread Siddharth Chandrasekaran
On Thu, Apr 08, 2021 at 04:30:18PM +, Wei Liu wrote:
> On Thu, Apr 08, 2021 at 05:54:43PM +0200, Siddharth Chandrasekaran wrote:
> > On Thu, Apr 08, 2021 at 05:48:19PM +0200, Paolo Bonzini wrote:
> > > On 08/04/21 17:40, Siddharth Chandrasekaran wrote:
> > > > > > > Although the Hyper-v TLFS mentions that a guest cannot use this 
> > > > > > > feature
> > > > > > > unless the hypervisor advertises support for it, some hypercalls 
> > > > > > > which
> > > > > > > we plan on upstreaming in future uses them anyway.
> > > > > > No, please don't do this. Check the feature bit(s) before you issue
> > > > > > hypercalls which rely on the extended interface.
> > > > > Perhaps Siddharth should clarify this, but I read it as Hyper-V being
> > > > > buggy and using XMM arguments unconditionally.
> > > > The guest is at fault here as it expects Hyper-V to consume arguments
> > > > from XMM registers for certain hypercalls (that we are working) even if
> > > > we didn't expose the feature via CPUID bits.
> > >
> > > What guest is that?
> >
> > It is a Windows Server 2016.
> 
> Can you be more specific? Are you implementing some hypercalls from
> TLFS? If so, which ones?

Yes all of them are from TLFS. We are implementing VSM and there are a
bunch of hypercalls that we have implemented to manage VTL switches,
memory protection and virtual interrupts.

The following 3 hypercalls that use the XMM fast hypercalls are relevant
to this patch set:

HvCallModifyVtlProtectionMask
HvGetVpRegisters 
HvSetVpRegisters 

~ Sid.



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879





Re: [PATCH v2 3/3] perf-stat: introduce config stat.bpf-counter-events

2021-04-08 Thread Jiri Olsa
On Thu, Apr 08, 2021 at 05:28:10PM +, Song Liu wrote:
> 
> 
> > On Apr 8, 2021, at 10:20 AM, Jiri Olsa  wrote:
> > 
> > On Thu, Apr 08, 2021 at 04:39:33PM +, Song Liu wrote:
> >> 
> >> 
> >>> On Apr 8, 2021, at 4:47 AM, Jiri Olsa  wrote:
> >>> 
> >>> On Tue, Apr 06, 2021 at 05:36:01PM -0700, Song Liu wrote:
>  Currently, to use BPF to aggregate perf event counters, the user uses
>  --bpf-counters option. Enable "use bpf by default" events with a config
>  option, stat.bpf-counter-events. This is limited to hardware events in
>  evsel__hw_names.
>  
>  This also enables mixed BPF event and regular event in the same sesssion.
>  For example:
>  
>   perf config stat.bpf-counter-events=instructions
>   perf stat -e instructions,cs
>  
> >>> 
> >>> so if we are mixing events now, how about uing modifier for bpf counters,
> >>> instead of configuring .perfconfig list we could use:
> >>> 
> >>> perf stat -e instructions:b,cs
> >>> 
> >>> thoughts?
> >>> 
> >>> the change below adds 'b' modifier and sets 'evsel::bpf_counter',
> >>> feel free to use it
> >> 
> >> I think we will need both 'b' modifier and .perfconfig configuration. 
> >> For systems with BPF-managed perf events running in the background, 
> > 
> > hum, I'm not sure I understand what that means.. you mean there
> > are tools that run perf stat so you don't want to change them?
> 
> We have tools that do perf_event_open(). I will change them to use 
> BPF managed perf events for "cycles" and "instructions". Since these 
> tools are running 24/7, perf-stat on the system should use BPF managed
> "cycles" and "instructions" by default. 

well if you are already changing the tools why not change them to add
modifier.. but I don't mind adding that .perfconfig stuff if you need
that

> 
> > 
> >> .perfconfig makes sure perf-stat sessions will share PMCs with these 
> >> background monitoring tools. 'b' modifier, on the other hand, is useful
> >> when the user knows there is opportunity to share the PMCs. 
> >> 
> >> Does this make sense? 
> > 
> > if there's reason for that then sure.. but let's not limit that just
> > on HARDWARE events only.. there are RAW events with the same demand
> > for this feature.. why don't we let user define any event for this?
> 
> I haven't found a good way to config RAW events. I guess RAW events 
> could use 'b' modifier? 

any event uing the pmu notation like cpu/instructions/

we can allow any event to be BPF-managed, right? IIUC we don't care,
the code will work with any event

jirka



Re: [PATCH] doc/zh_CN: Clean zh_CN translation maintainer

2021-04-08 Thread Jonathan Corbet
Wu XiangCheng  writes:

> Remove Harry Wei and  from
> MAINTAINERS Chinese Translation.
>
> According to git logs, Harry Wei (aka WeiWei Jia)
> * last submitted at 2012-05-07
> commit a9e73211fb0f ("Fix a mistake sentence in the file 
> 'Documentation/zh_CN/magic-number.txt'")
> * last Reviewed-by at 2016-02-16
> commit 45c73ea7a785 ("Documentation: Chinese translation of 
> arm64/silicon-errata.txt")
> * last Signed-off-by at 2019-03-13 (pick by Alex Shi)
> commit 95dcdb6e125f ("docs/zh_CN: rename magic-numbers as rst doc")
>
> According to mail list archives, Harry Wei
> * last replied at 2016-02-15
> 
> * last appeared at 2018-05-12
> 
>
> He/She did not maintain zh_CN translations for a long time.
>  is a maillist for Linux group of
> Xi'an University of Posts and Telecommunications, not special for zh_CN
> translation work.
>
> Anyway, many thanks him/her and Xiyou for their contributions to the early
> Chinese translation work!
>
> Signed-off-by: Wu XiangCheng 

I've applied this, thanks.

jon


Re: [PATCH 13/13] tty: clean include/linux/tty.h up

2021-04-08 Thread Greg Kroah-Hartman
On Thu, Apr 08, 2021 at 02:51:34PM +0200, Greg Kroah-Hartman wrote:
> There are a lot of tty-core-only functions that are listed in
> include/linux/tty.h.  Move them to drivers/tty/tty.h so that no one else
> can accidentally call them or think that they are public functions.
> 
> Cc: Jiri Slaby 
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  drivers/tty/n_gsm.c  |  1 +
>  drivers/tty/n_hdlc.c |  1 +
>  drivers/tty/tty.h| 37 +
>  include/linux/tty.h  | 34 --
>  4 files changed, 39 insertions(+), 34 deletions(-)

This needs a "tty.h" inclusion into drivers/tty/tty_baudrate.c,
otherwise it's a build warning, I missed that, sorry.  Will add that to
the next revision if it's needed, or just fix it up when committing it.

thanks,

greg k-h


Re: [PATCH] blk-mq: fix alignment mismatch.

2021-04-08 Thread Jian Cai
So this issue is blocking the LLVM upgrading on ChromeOS. Nathan, do
you mind sending out the smaller patch like Nick suggested just to see
what feedback we could get? I could send it for you if you are busy,
and please let me know what tags I should use in that case.

Thanks,
Jian

On Wed, Mar 31, 2021 at 3:06 PM Nick Desaulniers
 wrote:
>
> On Wed, Mar 31, 2021 at 2:58 PM Nathan Chancellor  wrote:
> >
> > On Wed, Mar 31, 2021 at 02:27:03PM -0700, Jian Cai wrote:
> > >
> > > I just realized you already proposed solutions for skipping the check
> > > in 
> > > https://lore.kernel.org/linux-block/20210310225240.4epj2mdmzt4vurr3@archlinux-ax161/#t.
> > > Do you have any plans to send them for review?
> >
> > I was hoping to gather some feedback on which option would be preferred
> > by Jens and the other ClangBuiltLinux folks before I sent them along. I
> > can send the first just to see what kind of feedback I can gather.
>
> Either approach is fine by me. The smaller might be easier to get
> accepted into stable. The larger approach will probably become more
> useful in the future (having the diag infra work properly with clang).
> I think the ball is kind of in Jens' court to decide.  Would doing
> both be appropriate, Jens? Have the smaller patch tagged for stable
> disabling it for the whole file, then another commit on top not tagged
> for stable that adds the diag infra, and a third on top replacing the
> file level warning disablement with local diags to isolate this down
> to one case?  It's a fair but small amount of churn IMO; but if Jens
> is not opposed it seems fine?
> --
> Thanks,
> ~Nick Desaulniers


Re: [PATCH 05/13] tty: remove tty_warn()

2021-04-08 Thread Greg Kroah-Hartman
On Thu, Apr 08, 2021 at 10:47:21PM +0900, Tetsuo Handa wrote:
> On 2021/04/08 21:51, Greg Kroah-Hartman wrote:
> > Remove users of tty_warn() and replace them with calls to dev_warn()
> > which provides more information about the tty that has the error and
> > uses the standard formatting logic.
> 
> Ouch. This series would be good for clean up, but this series might be
> bad for handling lockdep warning syzbot is reporting.

Again, we can worry about lockdep stuff for the real places where it
matters, which should not have been the same place as all of these were
used (they were used very infrequently.)

> Since tty_warn() is using plain printk(), we can avoid lockdep warning by
> using printk_deferred(). If we use dev_warn() instead, we need to modify
> __dev_printk() to use printk_deferred(), which means that all dev_*() users
> are affected by this change.

I don't want to use printk_deffered() if at all possible, let's let the
printk developers fix up their implementation which should make that
change not needed.

And worst case, take the few places that really really really need it,
and call printk_deferred() so it's obvious what we are doing.

> Also, we need to modify dev_printk_emit()/dev_vprintk_emit() callers to embed
> loglevel into the format string so that we pass LOGLEVEL_SCHED to 
> vprintk_emit() ...
> maybe just change from "if (!in_sched)" to "if (!in_sched && !dev_info)" 
> instead ?

Huh?  No.

> Also, dev_vprintk_emit() need to start calling defer_console_output()
> after returning from vprintk_emit() in order to behave like printk_deferred().

Again, no.  If we really need to deferr a printk, let's call that, but
that should not be the case for all of the places these macros were
used.

thanks,

greg k-h


Re: [PATCH v7 5/8] pwm: core: Support new PWM_STAGGERING_ALLOWED flag

2021-04-08 Thread Clemens Gruber
On Thu, Apr 08, 2021 at 07:36:37PM +0200, Uwe Kleine-König wrote:
> On Thu, Apr 08, 2021 at 05:51:36PM +0200, Clemens Gruber wrote:
> > On Thu, Apr 08, 2021 at 02:50:40PM +0200, Thierry Reding wrote:
> > > Yes, I think that's basically what this is saying. I think we're perhaps
> > > getting hung up on the terminology here. PWM_STAGGERING_ALLOWED gives
> > > the impression that we're dealing with some provider-specific feature,
> > > whereas what we really want to express is that the PWM doesn't care
> > > exactly when the active cycle starts and based on that a provider that
> > > can support it may optimize the EMI behavior.
> > > 
> > > Maybe we can find a better name for this? Ultimately what this means is
> > > that the consumer is primarily interested in the power output of the PWM
> > > rather than the exact shape of the signal. So perhaps something like
> > > PWM_USAGE_POWER would be more appropriate.
> > 
> > Yes, although it would then no longer be obvious that this feature leads
> > to improved EMI behavior, as long as we mention that in the docs, I
> > think it's a good idea
> > 
> > Maybe document it as follows?
> > PWM_USAGE_POWER - Allow the driver to delay the start of the cycle
> > for EMI improvements, as long as the power output stays the same
> 
> I don't like both names, because for someone who is only halfway into
> PWM stuff it is not understandable. Maybe ALLOW_PHASE_SHIFT?

Sounds good to me.

> When a consumer is only interested in the power output than
> 
>   .period = 20
>   .duty_cycle = 5
> 
> would also be an allowed response for the request
> 
>   .period = 200
>   .duty_cycle = 50
> 
> and this is not what is in the focus here.

Right.

If Thierry agrees, I can spin up a new revision.

Maybe we can get it into 5.13 after all.

Thanks,
Clemens


Re: [PATCH net v1] Revert "lan743x: trim all 4 bytes of the FCS; not just 2"

2021-04-08 Thread Sven Van Asbroeck
Hi Heiner,

On Thu, Apr 8, 2021 at 2:22 PM Heiner Kallweit  wrote:
>
> Just an idea:
> RX_HEAD_PADDING is an alias for NET_IP_ALIGN that can have two values:
> 0 and 2
> The two systems you use may have different NET_IP_ALIGN values.
> This could explain the behavior. Then what I proposed should work
> for both of you: frame_length - ETH_FCS_LEN

Yes, good point! I was thinking the exact same thing just now.
Subtracting 2 + RX_HEAD_PADDING from the frame length made no sense.

George, I will send a patch for you to try shortly. Except if you're
already ahead :)


Re: rtlwifi/rtl8192cu AP mode broken with PS STA

2021-04-08 Thread Maciej S. Szmigiero

On 08.04.2021 06:42, Pkshih wrote:

-Original Message-
From: Maciej S. Szmigiero [mailto:m...@maciej.szmigiero.name]
Sent: Thursday, April 08, 2021 4:53 AM
To: Larry Finger; Pkshih
Cc: linux-wirel...@vger.kernel.org; net...@vger.kernel.org; 
linux-kernel@vger.kernel.org;
johan...@sipsolutions.net; kv...@codeaurora.org
Subject: Re: rtlwifi/rtl8192cu AP mode broken with PS STA


(...)

Maceij,

Does this patch fix the problem?


The beacon seems to be updating now and STAs no longer get stuck in PS
mode.
Although sometimes (every 2-3 minutes with continuous 1s interval pings)
there is around 5s delay in updating the transmitted beacon - don't know
why, maybe the NIC hardware still has the old version in queue?


Since USB device doesn't update every beacon, dtim_count isn't updated neither.
It leads STA doesn't awake properly. Please try to fix dtim_period=1 in
hostapd.conf, which tells STA awakes every beacon interval.


The situation is the same with dtim_period=1.

When I look at a packet trace (captured from a different NIC running in
monitor mode) it's obvious that the pinged STA wakes up almost
immediately once it sees its DTIM bit set in a beacon.

But many beacons pass (the network has beacon interval of 0.1s) between
where a ping request (ICMP echo request) is buffered on the AP and where
the transmitted beacon actually starts to have DTIM bit set.

I am observing delays up to 9 seconds, which means a delay up to 90
beacons between when DTIM bit should be set and when it actually gets
set.

As I said above, this happens only sometimes (every 2-3 minutes with
continuous 1s interval pings) but there is definitely something wrong
here.

My wild guess would be that if the NIC is prevented from sending a beacon
due to, for example, its radio channel being busy it keeps that beacon
in queue for the next transmission attempt and only then sends it and
removes from that queue.
Even thought there might be newer beacon data already available for
transmission.

And then these "unsent" beacons pile up in queue and the delays I am
observing are simply old queued beacons (that don't have the DTIM bit
set yet) being transmitted.

But that's just my hypothesis - I don't know how rtl8192cu hardware
actually works.


--
Ping-Ke


Thanks,
Maciej


Re: [PATCH v4 19/20] mips: Convert to GENERIC_CMDLINE

2021-04-08 Thread Rob Herring
On Tue, Apr 06, 2021 at 10:38:36AM -0700, Daniel Walker wrote:
> On Fri, Apr 02, 2021 at 03:18:21PM +, Christophe Leroy wrote:
> > -config CMDLINE_BOOL
> > -   bool "Built-in kernel command line"
> > -   help
> > - For most systems, it is firmware or second stage bootloader that
> > - by default specifies the kernel command line options.  However,
> > - it might be necessary or advantageous to either override the
> > - default kernel command line or add a few extra options to it.
> > - For such cases, this option allows you to hardcode your own
> > - command line options directly into the kernel.  For that, you
> > - should choose 'Y' here, and fill in the extra boot arguments
> > - in CONFIG_CMDLINE.
> > -
> > - The built-in options will be concatenated to the default command
> > - line if CMDLINE_OVERRIDE is set to 'N'. Otherwise, the default
> > - command line will be ignored and replaced by the built-in string.
> > -
> > - Most MIPS systems will normally expect 'N' here and rely upon
> > - the command line from the firmware or the second-stage bootloader.
> > -
> 
> 
> See how you complained that I have CMDLINE_BOOL in my changed, and you think 
> it
> shouldn't exist.
> 
> Yet here mips has it, and you just deleted it with no feature parity in your
> changes for this.

AFAICT, CMDLINE_BOOL equates to a non-empty or empty CONFIG_CMDLINE. You 
seem to need it just because you have CMDLINE_PREPEND and 
CMDLINE_APPEND. If that's not it, what feature is missing? CMDLINE_BOOL 
is not a feature, but an implementation detail.

Rob


[PATCH] PCI: Delay after FLR of Intel DC P4510 NVMe

2021-04-08 Thread Raphael Norwitz
Like the Intel DC P3700 NVMe, the Intel P4510 NVMe exhibits a timeout
failure when the driver tries to interact with the device to soon after
an FLR. The same reset quirk the P3700 uses also resolves the failure
for the P4510, so this change introduces the same reset quirk for the
P4510.

Reviewed-by: Alex Williamson 
Signed-off-by: Alay Shah 
Signed-off-by: Suresh Gumpula 
Signed-off-by: Raphael Norwitz 
---
 drivers/pci/quirks.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3ba9e..5a8c059b848d 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3922,6 +3922,7 @@ static const struct pci_dev_reset_methods 
pci_dev_reset_methods[] = {
reset_ivb_igd },
{ PCI_VENDOR_ID_SAMSUNG, 0xa804, nvme_disable_and_flr },
{ PCI_VENDOR_ID_INTEL, 0x0953, delay_250ms_after_flr },
+   { PCI_VENDOR_ID_INTEL, 0x0a54, delay_250ms_after_flr },
{ PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
reset_chelsio_generic_dev },
{ 0 }
-- 
2.20.1


RE: [PATCH net-next v2 2/2] misc: Add Renesas Synchronization Management Unit (SMU) support

2021-04-08 Thread Min Li
> 
> Please provide some sort of documentation and at the least, a pointer to the
> software that uses this so that we can see how it all ties together.
> 
Hi Greg, I will withdraw this review for misc and focus on MFD part review for 
now. I will re-submit the misc review after mfd change is in.

Thanks

Min


Re: [PATCH v2 3/3] perf-stat: introduce config stat.bpf-counter-events

2021-04-08 Thread Song Liu



> On Apr 8, 2021, at 11:24 AM, Jiri Olsa  wrote:
> 
> On Thu, Apr 08, 2021 at 06:08:20PM +, Song Liu wrote:
>> 
>> 
>>> On Apr 8, 2021, at 10:45 AM, Jiri Olsa  wrote:
>>> 
>>> On Thu, Apr 08, 2021 at 05:28:10PM +, Song Liu wrote:
 
 
> On Apr 8, 2021, at 10:20 AM, Jiri Olsa  wrote:
> 
> On Thu, Apr 08, 2021 at 04:39:33PM +, Song Liu wrote:
>> 
>> 
>>> On Apr 8, 2021, at 4:47 AM, Jiri Olsa  wrote:
>>> 
>>> On Tue, Apr 06, 2021 at 05:36:01PM -0700, Song Liu wrote:
 Currently, to use BPF to aggregate perf event counters, the user uses
 --bpf-counters option. Enable "use bpf by default" events with a config
 option, stat.bpf-counter-events. This is limited to hardware events in
 evsel__hw_names.
 
 This also enables mixed BPF event and regular event in the same 
 sesssion.
 For example:
 
 perf config stat.bpf-counter-events=instructions
 perf stat -e instructions,cs
 
>>> 
>>> so if we are mixing events now, how about uing modifier for bpf 
>>> counters,
>>> instead of configuring .perfconfig list we could use:
>>> 
>>> perf stat -e instructions:b,cs
>>> 
>>> thoughts?
>>> 
>>> the change below adds 'b' modifier and sets 'evsel::bpf_counter',
>>> feel free to use it
>> 
>> I think we will need both 'b' modifier and .perfconfig configuration. 
>> For systems with BPF-managed perf events running in the background, 
> 
> hum, I'm not sure I understand what that means.. you mean there
> are tools that run perf stat so you don't want to change them?
 
 We have tools that do perf_event_open(). I will change them to use 
 BPF managed perf events for "cycles" and "instructions". Since these 
 tools are running 24/7, perf-stat on the system should use BPF managed
 "cycles" and "instructions" by default. 
>>> 
>>> well if you are already changing the tools why not change them to add
>>> modifier.. but I don't mind adding that .perfconfig stuff if you need
>>> that
>> 
>> The tools I mentioned here don't use perf-stat, they just use 
>> perf_event_open() and read the perf events fds. We want a config to make
> 
> just curious, how those tools use perf_event_open?
> 
>> "cycles" to use BPF by default, so that when the user (not these tools)
>> runs perf-stat, it will share PMCs with those events by default. 
> 
> I'm sorry but I still don't see the usecase.. if you need to change both 
> tools,
> you can change them to use bpf-managed event, why bother with the list?
> 
>>> 
 
> 
>> .perfconfig makes sure perf-stat sessions will share PMCs with these 
>> background monitoring tools. 'b' modifier, on the other hand, is useful
>> when the user knows there is opportunity to share the PMCs. 
>> 
>> Does this make sense? 
> 
> if there's reason for that then sure.. but let's not limit that just
> on HARDWARE events only.. there are RAW events with the same demand
> for this feature.. why don't we let user define any event for this?
 
 I haven't found a good way to config RAW events. I guess RAW events 
 could use 'b' modifier? 
>>> any event uing the pmu notation like cpu/instructions/
>> 
>> Can we do something like "perf config stat.bpf-counter-events=cpu/*" means 
>> all "cpu/xx" events use BPF by default?
> 
> I think there's misundestanding, all I'm saying is that IIUC you check
> events stat.bpf-counter-events to be HARDWARE type, which I don't think
> is necessary and we can allow any event in there

>From what I see, most of the opportunity of sharing comes from a few common
events, like cycles, instructions. The second reason is that, the config 
implementation is easy and straightforward. We sure can extend the config 
to other events. Before that, 'b' modifier should be good for these use
cases. 

Thanks,
Song



[GIT PULL] Apple M1 SoC platform bring-up for 5.13

2021-04-08 Thread Hector Martin

Hi Arnd and all,

Here's the final version of the M1 SoC bring-up series, based on
v4 which was reviewed here:

https://lore.kernel.org/linux-arm-kernel/20210402090542.131194-1-mar...@marcan.st/T/#u

Changes since v4 as reviewed:

* Sort DT soc bus nodes by address (NFC)
* Introduce defines to better represent the meaning of hwirq IDs in
  the AIC driver (NFC)
* Update stale comments in AIC (NFC)
* Make of_mmio_is_nonposted static and not exported (export change only)
* Rewrite pci_remap_cfgspace() more succintly using ?: operator (NFC)
* Update FIQ series merge to arm64/for-next/fiq
* Remove the nVHE series (we will let this go through amd64 on its own)

The public key that signed the tag is available here:

https://mrcn.st/pub

Or pull e22a629a4c515dd5 from keys.gnupg.net or pgp.mit.edu.

Cheers,
Hector

The following changes since commit 847bea3d08af9158ae9e17b43632d6aa4f1702a0:

  Merge remote-tracking branch 'arm64/for-next/fiq' (2021-04-08 19:21:57 +0900)

are available in the Git repository at:

  https://github.com/AsahiLinux/linux.git tags/m1-soc-bringup-v5

for you to fetch changes up to 7d2d16ccf15d8eb84accfaf44a0b324f36e39588:

  arm64: apple: Add initial Apple Mac mini (M1, 2020) devicetree (2021-04-08 
20:18:41 +0900)


Apple M1 SoC platform bring-up

This series brings up initial support for the Apple M1 SoC, used in the
2020 Mac Mini, MacBook Pro, and MacBook Air models.

The following features are supported in this initial port:

- UART (samsung-style) with earlycon support
- Interrupts, including affinity and IPIs (Apple Interrupt Controller)
- SMP (through standard spin-table support)
- simplefb-based framebuffer
- Devicetree for the Mac Mini (should work for the others too at this
  stage)

== Merge notes ==

This tag is based on v5.12-rc3 and includes the following two
dependencies merged in:

* Tip of arm64/for-next/fiq: 3889ba70102e
  This is a hard (build) dependency that adds support for FIQ
  interrupts, which is required for this SoC and the included AIC
  irqchip driver. It is already merged in the arm64 tree.

* From tty/tty-next: 71b25f4df984
  This commit includes the Samsung UART changes that have already
  been merged into the tty tree. It is nominally a soft dependency,
  but if this series is merged first it would trigger devicetree
  validation failures as the DT included in it depends on bindings
  introduced in the tty tree.

  There was a merge conflict here. It has been resolved the same
  way gregkh resolved it in a later tty merge, and both tty-next
  and torvalds/master merge cleanly with this series at this time.

This series additionally depends on the nVHE changes in [1] to boot,
but we are letting those get merged through arm64.

[1] 
https://lore.kernel.org/linux-arm-kernel/20210408131010.1109027-1-...@kernel.org/T/#u

== Testing notes ==

This has been tested on an Apple M1 Mac Mini booting to a framebuffer
and serial console, with SMP and KASLR, with an arm64 defconfig
(+ CONFIG_FB_SIMPLE for the fb). In addition, the AIC driver now
supports running in EL1, tested in UP mode only.

== About the hardware ==

These machines officially support booting unsigned/user-provided
XNU-like kernels, with a very different boot protocol and devicetree
format. We are developing an initial bootloader, m1n1 [1], to take care
of as many hardware peculiarities as possible and present a standard
Linux arm64 boot protocol and device tree. In the future, I expect that
production setups will add U-Boot and perhaps GRUB into the boot chain,
to make the boot process similar to other ARM64 platforms.

The machines expose their debug UART over USB Type C, triggered with
vendor-specific USB-PD commands. Currently, the easiest way to get a
serial console on these machines is to use a second M1 box and a simple
USB C cable [2]. You can also build a DIY interface using an Arduino, a
FUSB302 chip or board, and a 1.2V UART-TTL adapter [3]. In the coming
weeks we will be designing an open hardware project to provide
serial/debug connectivity to these machines (and, hopefully, also
support other UART-over-Type C setups from other vendors). Please
contact me privately if you are interested in getting an early prototype
version of one of these devices.

We also have WIP/not merged yet support for loading kernels and
interacting via dwc3 usb-gadget, which works with a standard C-C or C-A
cable and any Linux host.

A quickstart guide to booting Linux kernels on these machines is
available at [4], and we are documenting the hardware at [5].

[1] https://github.com/AsahiLinux/m1n1/
[2] https://github.com/AsahiLinux/macvdmtool/
[3] https://github.com/AsahiLinux/vdmtool/
[4] https://github.com/AsahiLinux/docs/wiki/Developer-Quickstart
[5] https://github.com/AsahiLinux/docs/wiki

== Project Blurb ==

Asahi Linux is an open community project dedicated to developing and
maintaining mainline support for Apple Silicon on Linux. Feel free 

Re: [PATCH] SUNRPC: Add a check for gss_release_msg

2021-04-08 Thread Trond Myklebust
On Thu, 2021-04-08 at 11:24 -0400, Olga Kornievskaia wrote:
> On Thu, Apr 8, 2021 at 11:01 AM Trond Myklebust <
> tron...@hammerspace.com> wrote:
> > 
> > On Tue, 2021-04-06 at 19:16 -0500, Aditya Pakki wrote:
> > > In gss_pipe_destroy_msg(), in case of error in msg,
> > > gss_release_msg
> > > deletes gss_msg. The patch adds a check to avoid a potential
> > > double
> > > free.
> > > 
> > > Signed-off-by: Aditya Pakki 
> > > ---
> > >  net/sunrpc/auth_gss/auth_gss.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/sunrpc/auth_gss/auth_gss.c
> > > b/net/sunrpc/auth_gss/auth_gss.c
> > > index 5f42aa5fc612..eb52eebb3923 100644
> > > --- a/net/sunrpc/auth_gss/auth_gss.c
> > > +++ b/net/sunrpc/auth_gss/auth_gss.c
> > > @@ -848,7 +848,8 @@ gss_pipe_destroy_msg(struct rpc_pipe_msg
> > > *msg)
> > >     warn_gssd();
> > >     gss_release_msg(gss_msg);
> > >     }
> > > -   gss_release_msg(gss_msg);
> > > +   if (gss_msg)
> > > +   gss_release_msg(gss_msg);
> > >  }
> > > 
> > >  static void gss_pipe_dentry_destroy(struct dentry *dir,
> > 
> > 
> > NACK. There's no double free there.
> 
> I disagree that there is no double free, the wording of the commit
> describes the problem in the error case is that we call
> gss_release_msg() and then we call it again but the 1st one released
> the gss_msg. However, I think the fix should probably be instead:
> diff --git a/net/sunrpc/auth_gss/auth_gss.c
> b/net/sunrpc/auth_gss/auth_gss.c
> index 5f42aa5fc612..e8aae617e981 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -846,7 +846,7 @@ gss_pipe_destroy_msg(struct rpc_pipe_msg *msg)
>     gss_unhash_msg(gss_msg);
>     if (msg->errno == -ETIMEDOUT)
>     warn_gssd();
> -   gss_release_msg(gss_msg);
> +   return gss_release_msg(gss_msg);
>     }
>     gss_release_msg(gss_msg);
>  }
> 
Please look one line further up: there is a refcount_inc() that matches
that first gss_release_msg(). Removing that call results in a
reintroduction of the leak that Neil fixed in commit 1cded9d2974fe
("SUNRPC: fix refcounting problems with auth_gss messages.").

We might, however, instead opt to remove both the refcount_inc() and
matching gss_release_msg(). Those do seem to be redundant, given that
the caller also holds a refcount.

> > 
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.mykleb...@hammerspace.com
> > 
> > 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.mykleb...@hammerspace.com




[RFC PATCH hyperv-next] scsi: storvsc: Use blk_mq_unique_tag() to generate requestIDs

2021-04-08 Thread Andrea Parri (Microsoft)
Use blk_mq_unique_tag() to generate requestIDs for StorVSC, avoiding
all issues with allocating enough entries in the VMbus requestor.

Suggested-by: Michael Kelley 
Signed-off-by: Andrea Parri (Microsoft) 
---
 drivers/hv/channel.c  | 14 +++---
 drivers/hv/ring_buffer.c  | 12 ++---
 drivers/net/hyperv/netvsc.c   |  8 ++--
 drivers/net/hyperv/rndis_filter.c |  2 +
 drivers/scsi/storvsc_drv.c| 73 ++-
 include/linux/hyperv.h| 13 +-
 6 files changed, 92 insertions(+), 30 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index db30be8f9ccea..f78e02ace51e8 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -1121,15 +1121,14 @@ EXPORT_SYMBOL_GPL(vmbus_recvpacket_raw);
  * vmbus_next_request_id - Returns a new request id. It is also
  * the index at which the guest memory address is stored.
  * Uses a spin lock to avoid race conditions.
- * @rqstor: Pointer to the requestor struct
+ * @channel: Pointer to the VMbus channel struct
  * @rqst_add: Guest memory address to be stored in the array
  */
-u64 vmbus_next_request_id(struct vmbus_requestor *rqstor, u64 rqst_addr)
+u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr)
 {
+   struct vmbus_requestor *rqstor = >requestor;
unsigned long flags;
u64 current_id;
-   const struct vmbus_channel *channel =
-   container_of(rqstor, const struct vmbus_channel, requestor);
 
/* Check rqstor has been initialized */
if (!channel->rqstor_size)
@@ -1163,16 +1162,15 @@ EXPORT_SYMBOL_GPL(vmbus_next_request_id);
 /*
  * vmbus_request_addr - Returns the memory address stored at @trans_id
  * in @rqstor. Uses a spin lock to avoid race conditions.
- * @rqstor: Pointer to the requestor struct
+ * @channel: Pointer to the VMbus channel struct
  * @trans_id: Request id sent back from Hyper-V. Becomes the requestor's
  * next request id.
  */
-u64 vmbus_request_addr(struct vmbus_requestor *rqstor, u64 trans_id)
+u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id)
 {
+   struct vmbus_requestor *rqstor = >requestor;
unsigned long flags;
u64 req_addr;
-   const struct vmbus_channel *channel =
-   container_of(rqstor, const struct vmbus_channel, requestor);
 
/* Check rqstor has been initialized */
if (!channel->rqstor_size)
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index ecd82ebfd5bc4..46d8e038e4ee1 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -310,10 +310,12 @@ int hv_ringbuffer_write(struct vmbus_channel *channel,
 */
 
if (desc->flags == VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED) {
-   rqst_id = vmbus_next_request_id(>requestor, requestid);
-   if (rqst_id == VMBUS_RQST_ERROR) {
-   spin_unlock_irqrestore(_info->ring_lock, flags);
-   return -EAGAIN;
+   if (channel->next_request_id_callback != NULL) {
+   rqst_id = channel->next_request_id_callback(channel, 
requestid);
+   if (rqst_id == VMBUS_RQST_ERROR) {
+   
spin_unlock_irqrestore(_info->ring_lock, flags);
+   return -EAGAIN;
+   }
}
}
desc = hv_get_ring_buffer(outring_info) + old_write;
@@ -341,7 +343,7 @@ int hv_ringbuffer_write(struct vmbus_channel *channel,
if (channel->rescind) {
if (rqst_id != VMBUS_NO_RQSTOR) {
/* Reclaim request ID to avoid leak of IDs */
-   vmbus_request_addr(>requestor, rqst_id);
+   channel->request_addr_callback(channel, rqst_id);
}
return -ENODEV;
}
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index c64cc7639c39c..1a221ce2d6fdc 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -730,7 +730,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
int queue_sends;
u64 cmd_rqst;
 
-   cmd_rqst = vmbus_request_addr(>requestor, (u64)desc->trans_id);
+   cmd_rqst = channel->request_addr_callback(channel, (u64)desc->trans_id);
if (cmd_rqst == VMBUS_RQST_ERROR) {
netdev_err(ndev, "Incorrect transaction id\n");
return;
@@ -790,8 +790,8 @@ static void netvsc_send_completion(struct net_device *ndev,
 
/* First check if this is a VMBUS completion without data payload */
if (!msglen) {
-   cmd_rqst = vmbus_request_addr(_channel->requestor,
- (u64)desc->trans_id);
+   cmd_rqst = 
incoming_channel->request_addr_callback(incoming_channel,
+  
(u64)desc->trans_id);
if 

Re: [PATCH] KVM: vmx: add mismatched size in vmcs_check32

2021-04-08 Thread Paolo Bonzini

On 08/04/21 18:05, Sean Christopherson wrote:

   Add compile-time assertions in vmcs_check32() to disallow accesses to
   64-bit and 64-bit high fields via vmcs_{read,write}32().  Upper level
   KVM code should never do partial accesses to VMCS fields.  KVM handles
   the split accesses automatically in vmcs_{read,write}64() when running
   as a 32-bit kernel.


KVM also uses raw vmread/vmwrite (__vmcs_readl/__vmcs_writel) when 
copying to and from the shadow VMCS, so that path will not go through 
vmcs_check32 either.


Paolo



Re: [PATCH v13 03/18] arm64: hyp-stub: Move el1_sync into the vectors

2021-04-08 Thread Pavel Tatashin
> > Thank you for noticing this. Not sure how this missmerge happened. I
> > have added the missing case, and VHE is initialized correctly during
> > boot.
> > [   14.698175] kvm [1]: VHE mode initialized successfully
> >
> > During normal boot, kexec reboot, and kdump reboot. I will respin the
> > series and send the version 14 soon.
>
> Please give people a chance to review this lot first. This isn't code
> that is easy to digest, and immediate re-spinning does more harm than
> good (this isn't targeting 5.13, I would assume).
>

There are people who are testing this series, this is why I wanted to
respin. But, I will wait for review comments before sending the next
version. In the meantime I will send a fixed version of this patch as
a reply to this thread instead.

Thanks,
Pasha


Re: [PATCH 3/4] docs: Add documentation for HiSilicon PTT device driver

2021-04-08 Thread Bjorn Helgaas
On Thu, Apr 08, 2021 at 09:22:52PM +0800, Yicong Yang wrote:
> On 2021/4/8 2:55, Bjorn Helgaas wrote:
> > On Tue, Apr 06, 2021 at 08:45:53PM +0800, Yicong Yang wrote:

> >> +On Kunpeng 930 SoC, the PCIe root complex is composed of several
> >> +PCIe cores.
> 
> > Can you connect "Kunpeng 930" to something in the kernel tree?
> > "git grep -i kunpeng" shows nothing that's obviously relevant.
> > I assume there's a related driver in drivers/pci/controller/?
> 
> Kunpeng 930 is the product name of Hip09 platform. The PCIe
> controller uses the generic PCIe driver based on ACPI.

I guess I'm just looking for a hint to help users know when to enable
the Kconfig for this.  Maybe the "HiSilicon" in the Kconfig help is
enough?  Maybe "Kunpeng 930" is not even necessary?  If "Kunpeng 930"
*is* necessary, there should be some way to relate it to something
else.

> >> +from the file, and the desired value written to the file to tune.
> > 
> >> +Tuning multiple events at the same time is not permitted, which means
> >> +you cannot read or write more than one tune file at one time.
> > 
> > I think this is obvious from the model, so the sentence doesn't really
> > add anything.  Each event is a separate file, and it's obvious that
> > there's no way to write to multiple files simultaneously.
> 
> from the usage we shown below this situation won't happen. I just worry
> that users may have a program to open multiple files at the same time and
> read/write simultaneously, so add this line here to mention the restriction.

How is this possible?  I don't think "writing multiple files
simultaneously" is even possible in the Linux syscall model.  I don't
think a user will do anything differently after reading "you cannot
read or write more than one tune file at one time."

> >> +- tx_path_rx_req_alloc_buf_level: watermark of RX requested
> >> +- tx_path_tx_req_alloc_buf_level: watermark of TX requested
> >> +
> >> +These events influence the watermark of the buffer allocated for each
> >> +type. RX means the inbound while Tx means outbound. For a busy
> >> +direction, you should increase the related buffer watermark to enhance
> >> +the performance.
> > 
> > Based on what you have written here, I would just write 2 to both
> > files to enhance the performance in both directions.  But obviously
> > there must be some tradeoff here, e.g., increasing Rx performance
> > comes at the cost of Tx performane.
> 
> the Rx buffer and Tx buffer are separate, so they won't influence
> each other.

Why would I write anything other than 2 to these files?  That's the
question I think this paragraph should answer.

> >> +9. data_format
> >> +--
> >> +
> >> +File to indicate the format of the traced TLP headers. User can also
> >> +specify the desired format of traced TLP headers. Available formats
> >> +are 4DW, 8DW which indicates the length of each TLP headers traced.
> >> +::
> >> +$ cat data_format
> >> +[4DW]8DW
> >> +$ echo 8 > data_format
> >> +$ cat data_format
> >> +4DW [8DW]
> >> +
> >> +The traced TLP header format is different from the PCIe standard.
> > 
> > I'm confused.  Below you say the fields of the traced TLP header are
> > defined by the PCIe spec.  But here you say the format is *different*.
> > What exactly is different?
> 
> For the Request Header Format for 64-bit addressing of Memory, defind in
> PCIe spec 4.0, Figure 2-15, the 1st DW is like:
> 
> Byte 0 > [Fmt] [Type] [T9] [Tc] [T8] [Attr] [LN] [TH] ... [Length]
> 
> some are recorded in our traced header like below, which some are not.
> that's what I mean the format of the header are different. But for a
> certain field like 'Fmt', the meaning keeps same with what Spec defined.
> that's what I mean the fields definition of our traced header keep same
> with the Spec.

Ah, that helps a lot, thank you.  Maybe you could say something along
the lines of this:

  When using the 8DW data format, the entire TLP header is logged.
  For example, the TLP header for Memory Reads with 64-bit addresses
  is shown in PCIe r5.0, Figure 2-17; the header for Configuration
  Requests is shown in Figure 2.20, etc.

  In addition, 8DW trace buffer entries contain a timestamp and
  possibly a prefix, e.g., a PASID TLP prefix (see Figure 6-20).  TLPs
  may include more than one prefix, but only one can be logged in
  trace buffer entries.

  When using the 4DW data format, DW0 of the trace buffer entry
  contains selected fields of DW0 of the TLP, together with a
  timestamp.  DW1-DW3 of the trace buffer entry contain DW1-DW3
  directly from the TLP header.

This looks like a really cool device.  I wish we had this for more
platforms.

Bjorn


[PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static

2021-04-08 Thread Logan Gunthorpe
pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
implementation because it will need to determine the mapping type
ahead of actually doing the mapping to create the actual iommu mapping.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/p2pdma.c   | 34 +++---
 include/linux/pci-p2pdma.h | 15 +++
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index bcb1a6d6119d..38c93f57a941 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -20,13 +20,6 @@
 #include 
 #include 
 
-enum pci_p2pdma_map_type {
-   PCI_P2PDMA_MAP_UNKNOWN = 0,
-   PCI_P2PDMA_MAP_NOT_SUPPORTED,
-   PCI_P2PDMA_MAP_BUS_ADDR,
-   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
-};
-
 struct pci_p2pdma {
struct gen_pool *pool;
bool p2pmem_published;
@@ -822,13 +815,30 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool 
publish)
 }
 EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
 
-static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
-   struct device *dev)
+/**
+ * pci_p2pdma_map_type - return the type of mapping that should be used for
+ * a given device and pgmap
+ * @pgmap: the pagemap of a page to determine the mapping type for
+ * @dev: device that is mapping the page
+ * @dma_attrs: the attributes passed to the dma_map operation --
+ * this is so they can be checked to ensure P2PDMA pages were not
+ * introduced into an incorrect interface (like dma_map_sg). *
+ *
+ * Returns one of:
+ * PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
+ * PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
+ * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done directly
+ */
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+   struct device *dev, unsigned long dma_attrs)
 {
struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
enum pci_p2pdma_map_type ret;
struct pci_dev *client;
 
+   WARN_ONCE(!(dma_attrs & __DMA_ATTR_PCI_P2PDMA),
+ "PCI P2PDMA pages were mapped with dma_map_sg!");
+
if (!provider->p2pdma)
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 
@@ -879,7 +889,8 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg,
struct pci_p2pdma_pagemap *p2p_pgmap =
to_p2p_pgmap(sg_page(sg)->pgmap);
 
-   switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
+   switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
+   __DMA_ATTR_PCI_P2PDMA)) {
case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
case PCI_P2PDMA_MAP_BUS_ADDR:
@@ -904,7 +915,8 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct 
scatterlist *sg,
 {
enum pci_p2pdma_map_type map_type;
 
-   map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
+   map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
+  __DMA_ATTR_PCI_P2PDMA);
 
if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 8318a97c9c61..a06072ac3a52 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -16,6 +16,13 @@
 struct block_device;
 struct scatterlist;
 
+enum pci_p2pdma_map_type {
+   PCI_P2PDMA_MAP_UNKNOWN = 0,
+   PCI_P2PDMA_MAP_NOT_SUPPORTED,
+   PCI_P2PDMA_MAP_BUS_ADDR,
+   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
+};
+
 #ifdef CONFIG_PCI_P2PDMA
 int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
u64 offset);
@@ -30,6 +37,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
 unsigned int *nents, u32 length);
 void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+   struct device *dev, unsigned long dma_attrs);
 int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
 void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -83,6 +92,12 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
 static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 {
 }
+static inline enum pci_p2pdma_map_type pci_p2pdma_map_type(
+   struct dev_pagemap *pgmap, struct device *dev,
+   unsigned long dma_attrs)
+{
+   return PCI_P2PDMA_MAP_NOT_SUPPORTED;
+}
 static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
struct scatterlist *sg, int 

[PATCH 00/16] Add new DMA mapping operation for P2PDMA

2021-04-08 Thread Logan Gunthorpe
Hi,

This patchset continues my work to to add P2PDMA support to the common
dma map operations. This allows for creating SGLs that have both P2PDMA
and regular pages which is a necessary step to allowing P2PDMA pages in
userspace.

The earlier RFC[1] generated a lot of great feedback and I heard no show
stopping objections. Thus, I've incorporated all the feedback and have
decided to post this as a proper patch series with hopes of eventually
getting it in mainline.

I'm happy to do a few more passes if anyone has any further feedback
or better ideas.

This series is based on v5.12-rc6 and a git branch can be found here:

  https://github.com/sbates130272/linux-p2pmem/  p2pdma_map_ops_v1

Thanks,

Logan

[1] 
https://lore.kernel.org/linux-block/20210311233142.7900-1-log...@deltatee.com/


Changes since the RFC:
 * Added comment and fixed up the pci_get_slot patch. (per Bjorn)
 * Fixed glaring sg_phys() double offset bug. (per Robin)
 * Created a new map operation (dma_map_sg_p2pdma()) with a new calling
   convention instead of modifying the calling convention of
   dma_map_sg(). (per Robin)
 * Integrated the two similar pci_p2pdma_dma_map_type() and
   pci_p2pdma_map_type() functions into one (per Ira)
 * Reworked some of the logic in the map_sg() implementations into
   helpers in the p2pdma code. (per Christoph)
 * Dropped a bunch of unnecessary symbol exports (per Christoph)
 * Expanded the code in dma_pci_p2pdma_supported() for clarity. (per
   Ira and Christoph)
 * Finished off using the new dma_map_sg_p2pdma() call in rdma_rw
   and removed the old pci_p2pdma_[un]map_sg(). (per Jason)

--

Logan Gunthorpe (16):
  PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  PCI/P2PDMA: Attempt to set map_type if it has not been set
  PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
  dma-mapping: Introduce dma_map_sg_p2pdma()
  lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL
  PCI/P2PDMA: Make pci_p2pdma_map_type() non-static
  PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
  dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
  iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
  nvme-pci: Check DMA ops when indicating support for PCI P2PDMA
  nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages
  nvme-rdma: Ensure dma support when using p2pdma
  RDMA/rw: use dma_map_sg_p2pdma()
  PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()

 drivers/infiniband/core/rw.c |  50 +++---
 drivers/iommu/dma-iommu.c|  66 ++--
 drivers/nvme/host/core.c |   3 +-
 drivers/nvme/host/nvme.h |   2 +-
 drivers/nvme/host/pci.c  |  39 
 drivers/nvme/target/rdma.c   |   3 +-
 drivers/pci/Kconfig  |   2 +-
 drivers/pci/p2pdma.c | 188 +++
 include/linux/dma-map-ops.h  |   3 +
 include/linux/dma-mapping.h  |  20 
 include/linux/pci-p2pdma.h   |  53 ++
 include/linux/scatterlist.h  |  49 -
 include/rdma/ib_verbs.h  |  32 ++
 kernel/dma/direct.c  |  25 -
 kernel/dma/mapping.c |  70 +++--
 15 files changed, 416 insertions(+), 189 deletions(-)


base-commit: e49d033bddf5b565044e2abe4241353959bc9120
--
2.20.1


[PATCH 15/16] RDMA/rw: use dma_map_sg_p2pdma()

2021-04-08 Thread Logan Gunthorpe
Drop the use of pci_p2pdma_map_sg() in favour of dma_map_sg_p2pdma().

The new interface allows mapping scatterlists that mix both regular
and P2PDMA pages and will verify that the dma device can communicate
with the device the pages are on.

Signed-off-by: Logan Gunthorpe 
---
 drivers/infiniband/core/rw.c | 50 ++--
 include/rdma/ib_verbs.h  | 32 +++
 2 files changed, 46 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 31156e22d3e7..0c6213d9b044 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -273,26 +273,6 @@ static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
return 1;
 }
 
-static void rdma_rw_unmap_sg(struct ib_device *dev, struct scatterlist *sg,
-u32 sg_cnt, enum dma_data_direction dir)
-{
-   if (is_pci_p2pdma_page(sg_page(sg)))
-   pci_p2pdma_unmap_sg(dev->dma_device, sg, sg_cnt, dir);
-   else
-   ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
-}
-
-static int rdma_rw_map_sg(struct ib_device *dev, struct scatterlist *sg,
- u32 sg_cnt, enum dma_data_direction dir)
-{
-   if (is_pci_p2pdma_page(sg_page(sg))) {
-   if (WARN_ON_ONCE(ib_uses_virt_dma(dev)))
-   return 0;
-   return pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
-   }
-   return ib_dma_map_sg(dev, sg, sg_cnt, dir);
-}
-
 /**
  * rdma_rw_ctx_init - initialize a RDMA READ/WRITE context
  * @ctx:   context to initialize
@@ -315,9 +295,9 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp 
*qp, u8 port_num,
struct ib_device *dev = qp->pd->device;
int ret;
 
-   ret = rdma_rw_map_sg(dev, sg, sg_cnt, dir);
-   if (!ret)
-   return -ENOMEM;
+   ret = ib_dma_map_sg_p2pdma(dev, sg, sg_cnt, dir);
+   if (ret < 0)
+   return ret;
sg_cnt = ret;
 
/*
@@ -354,7 +334,7 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp 
*qp, u8 port_num,
return ret;
 
 out_unmap_sg:
-   rdma_rw_unmap_sg(dev, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
return ret;
 }
 EXPORT_SYMBOL(rdma_rw_ctx_init);
@@ -394,17 +374,15 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
return -EINVAL;
}
 
-   ret = rdma_rw_map_sg(dev, sg, sg_cnt, dir);
-   if (!ret)
-   return -ENOMEM;
+   ret = ib_dma_map_sg_p2pdma(dev, sg, sg_cnt, dir);
+   if (ret < 0)
+   return ret;
sg_cnt = ret;
 
if (prot_sg_cnt) {
-   ret = rdma_rw_map_sg(dev, prot_sg, prot_sg_cnt, dir);
-   if (!ret) {
-   ret = -ENOMEM;
+   ret = ib_dma_map_sg_p2pdma(dev, prot_sg, prot_sg_cnt, dir);
+   if (ret < 0)
goto out_unmap_sg;
-   }
prot_sg_cnt = ret;
}
 
@@ -469,9 +447,9 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
kfree(ctx->reg);
 out_unmap_prot_sg:
if (prot_sg_cnt)
-   rdma_rw_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
+   ib_dma_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
 out_unmap_sg:
-   rdma_rw_unmap_sg(dev, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
return ret;
 }
 EXPORT_SYMBOL(rdma_rw_ctx_signature_init);
@@ -603,7 +581,7 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct 
ib_qp *qp, u8 port_num,
break;
}
 
-   rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy);
 
@@ -631,8 +609,8 @@ void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
kfree(ctx->reg);
 
if (prot_sg_cnt)
-   rdma_rw_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
-   rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ca28fca5736b..a541ed1702f5 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4028,6 +4028,17 @@ static inline int ib_dma_map_sg_attrs(struct ib_device 
*dev,
dma_attrs);
 }
 
+static inline int ib_dma_map_sg_p2pdma_attrs(struct ib_device *dev,
+struct scatterlist *sg, int nents,
+enum dma_data_direction direction,
+unsigned long dma_attrs)
+{
+   if (ib_uses_virt_dma(dev))
+   return 

[PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()

2021-04-08 Thread Logan Gunthorpe
In order to call upstream_bridge_distance_warn() from a dma_map function,
it must not sleep. The only reason it does sleep is to allocate the seqbuf
to print which devices are within the ACS path.

Switch the kmalloc call to use a passed in gfp_mask and don't print that
message if the buffer fails to be allocated.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/p2pdma.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 196382630363..bd89437faf06 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -267,7 +267,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
 
 static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev)
 {
-   if (!buf)
+   if (!buf || !buf->buffer)
return;
 
seq_buf_printf(buf, "%s;", pci_name(pdev));
@@ -495,25 +495,26 @@ upstream_bridge_distance(struct pci_dev *provider, struct 
pci_dev *client,
 
 static enum pci_p2pdma_map_type
 upstream_bridge_distance_warn(struct pci_dev *provider, struct pci_dev *client,
- int *dist)
+ int *dist, gfp_t gfp_mask)
 {
struct seq_buf acs_list;
bool acs_redirects;
int ret;
 
-   seq_buf_init(_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
-   if (!acs_list.buffer)
-   return -ENOMEM;
+   seq_buf_init(_list, kmalloc(PAGE_SIZE, gfp_mask), PAGE_SIZE);
 
ret = upstream_bridge_distance(provider, client, dist, _redirects,
   _list);
if (acs_redirects) {
pci_warn(client, "ACS redirect is set between the client and 
provider (%s)\n",
 pci_name(provider));
-   /* Drop final semicolon */
-   acs_list.buffer[acs_list.len-1] = 0;
-   pci_warn(client, "to disable ACS redirect for this path, add 
the kernel parameter: pci=disable_acs_redir=%s\n",
-acs_list.buffer);
+
+   if (acs_list.buffer) {
+   /* Drop final semicolon */
+   acs_list.buffer[acs_list.len - 1] = 0;
+   pci_warn(client, "to disable ACS redirect for this 
path, add the kernel parameter: pci=disable_acs_redir=%s\n",
+acs_list.buffer);
+   }
}
 
if (ret == PCI_P2PDMA_MAP_NOT_SUPPORTED) {
@@ -566,7 +567,7 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, 
struct device **clients,
 
if (verbose)
ret = upstream_bridge_distance_warn(provider,
-   pci_client, );
+   pci_client, , GFP_KERNEL);
else
ret = upstream_bridge_distance(provider, pci_client,
   , NULL, NULL);
-- 
2.20.1



[PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations

2021-04-08 Thread Logan Gunthorpe
Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
implementations. It takes an scatterlist segment that must point to a
pci_p2pdma struct page and will map it if the mapping requires a bus
address.

The return value indicates whether the mapping required a bus address
or whether the caller still needs to map the segment normally. If the
segment should not be mapped, -EREMOTEIO is returned.

This helper uses a state structure to track the changes to the
pgmap across calls and avoid needing to lookup into the xarray for
every page.

Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
dma_map_sg() implementations where the sg segment containing the page
differs from the sg segment containing the DMA address.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/p2pdma.c   | 65 ++
 include/linux/pci-p2pdma.h | 21 
 2 files changed, 86 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 38c93f57a941..44ad7664e875 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -923,6 +923,71 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct 
scatterlist *sg,
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
 
+/**
+ * pci_p2pdma_map_segment - map an sg segment determining the mapping type
+ * @state: State structure that should be declared on the stack outside of
+ * the for_each_sg() loop and initialized to zero.
+ * @dev: DMA device that's doing the mapping operation
+ * @sg: scatterlist segment to map
+ * @attrs: dma mapping attributes
+ *
+ * This is a helper to be used by non-iommu dma_map_sg() implementations where
+ * the sg segment is the same for the page_link and the dma_address.
+ *
+ * Attempt to map a single segment in an SGL with the PCI bus address.
+ * The segment must point to a PCI P2PDMA page and thus must be
+ * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.
+ *
+ * Returns 1 if the segment was mapped, 0 if the segment should be mapped
+ * directly (or through the IOMMU) and -EREMOTEIO if the segment should not
+ * be mapped at all.
+ */
+int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
+  struct device *dev, struct scatterlist *sg,
+  unsigned long dma_attrs)
+{
+   if (state->pgmap != sg_page(sg)->pgmap) {
+   state->pgmap = sg_page(sg)->pgmap;
+   state->map = pci_p2pdma_map_type(state->pgmap, dev, dma_attrs);
+   state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
+   }
+
+   switch (state->map) {
+   case PCI_P2PDMA_MAP_BUS_ADDR:
+   sg->dma_address = sg_phys(sg) + state->bus_off;
+   sg_dma_len(sg) = sg->length;
+   sg_mark_pci_p2pdma(sg);
+   return 1;
+   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+   return 0;
+   default:
+   return -EREMOTEIO;
+   }
+}
+
+/**
+ * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
+ * be mapped with PCI_P2PDMA_MAP_BUS_ADDR
+ * @pg_sg: scatterlist segment with the page to map
+ * @dma_sg: scatterlist segment to assign a dma address to
+ *
+ * This is a helper for iommu dma_map_sg() implementations when the
+ * segment for the dma address differs from the segment containing the
+ * source page.
+ *
+ * pci_p2pdma_map_type() must have already been called on the pg_sg and
+ * returned PCI_P2PDMA_MAP_BUS_ADDR.
+ */
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+   struct scatterlist *dma_sg)
+{
+   struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap);
+
+   dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset;
+   sg_dma_len(dma_sg) = pg_sg->length;
+   sg_mark_pci_p2pdma(dma_sg);
+}
+
 /**
  * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
  * to enable p2pdma
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index a06072ac3a52..49e7679403cf 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -13,6 +13,12 @@
 
 #include 
 
+struct pci_p2pdma_map_state {
+   struct dev_pagemap *pgmap;
+   int map;
+   u64 bus_off;
+};
+
 struct block_device;
 struct scatterlist;
 
@@ -43,6 +49,11 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
 void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
+int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
+   struct device *dev, struct scatterlist *sg,
+   unsigned long dma_attrs);
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+   struct scatterlist *dma_sg);
 int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
   

[PATCH 13/16] nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages

2021-04-08 Thread Logan Gunthorpe
Convert to using dma_map_sg_p2pdma() for PCI p2pdma pages.

This should be equivalent but allows for heterogeneous scatterlists
with both P2PDMA and regular pages. However, P2PDMA support will be
slightly more restricted (only dma-direct and dma-iommu are currently
supported).

Signed-off-by: Logan Gunthorpe 
---
 drivers/nvme/host/pci.c | 28 
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 14f092973792..a1ed07ff38b7 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -577,17 +577,6 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct 
request *req)
 
 }
 
-static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
-{
-   struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
-   if (is_pci_p2pdma_page(sg_page(iod->sg)))
-   pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
-   rq_dma_dir(req));
-   else
-   dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
-}
-
 static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -600,7 +589,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct 
request *req)
 
WARN_ON_ONCE(!iod->nents);
 
-   nvme_unmap_sg(dev, req);
+   dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
if (iod->npages == 0)
dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
  iod->first_dma);
@@ -868,14 +857,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
if (!iod->nents)
goto out_free_sg;
 
-   if (is_pci_p2pdma_page(sg_page(iod->sg)))
-   nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
-   iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
-   else
-   nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
-rq_dma_dir(req), DMA_ATTR_NO_WARN);
-   if (!nr_mapped)
+   nr_mapped = dma_map_sg_p2pdma_attrs(dev->dev, iod->sg, iod->nents,
+rq_dma_dir(req), DMA_ATTR_NO_WARN);
+   if (nr_mapped < 0) {
+   if (nr_mapped != -ENOMEM)
+   ret = BLK_STS_TARGET;
goto out_free_sg;
+   }
 
iod->use_sgl = nvme_pci_use_sgls(dev, req);
if (iod->use_sgl)
@@ -887,7 +875,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
return BLK_STS_OK;
 
 out_unmap_sg:
-   nvme_unmap_sg(dev, req);
+   dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
 out_free_sg:
mempool_free(iod->sg, dev->iod_mempool);
return ret;
-- 
2.20.1



[PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps

2021-04-08 Thread Logan Gunthorpe
In order to use upstream_bridge_distance_warn() from a dma_map function,
it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
might sleep.

In order to avoid this, try to get the host bridge's device from
bus->self, and if that is not set, just get the first element in the
device list. It should be impossible for the host bridge's device to
go away while references are held on child devices, so the first element
should not be able to change and, thus, this should be safe.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/p2pdma.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index bd89437faf06..473a08940fbc 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry {
 static bool __host_bridge_whitelist(struct pci_host_bridge *host,
bool same_host_bridge)
 {
-   struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
const struct pci_p2pdma_whitelist_entry *entry;
+   struct pci_dev *root = host->bus->self;
unsigned short vendor, device;
 
+   /*
+* This makes the assumption that the first device on the bus is the
+* bridge itself and it has the devfn of 00.0. This assumption should
+* hold for the devices in the white list above, and if there are cases
+* where this isn't true they will have to be dealt with when such a
+* case is added to the whitelist.
+*/
if (!root)
+   root = list_first_entry_or_null(>bus->devices,
+   struct pci_dev, bus_list);
+
+   if (!root || root->devfn)
return false;
 
vendor = root->vendor;
device = root->device;
-   pci_dev_put(root);
 
for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) {
if (vendor != entry->vendor || device != entry->device)
-- 
2.20.1



[PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma

2021-04-08 Thread Logan Gunthorpe
Ensure the dma operations support p2pdma before using the RDMA
device for P2PDMA. This allows switching the RDMA driver from
pci_p2pdma_map_sg() to dma_map_sg_p2pdma().

Signed-off-by: Logan Gunthorpe 
---
 drivers/nvme/target/rdma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 6c1f3ab7649c..3ec7e77e5416 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -414,7 +414,8 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device 
*ndev,
if (ib_dma_mapping_error(ndev->device, r->send_sge.addr))
goto out_free_rsp;
 
-   if (!ib_uses_virt_dma(ndev->device))
+   if (!ib_uses_virt_dma(ndev->device) &&
+   dma_pci_p2pdma_supported(>device->dev))
r->req.p2p_client = >device->dev;
r->send_sge.length = sizeof(*r->req.cqe);
r->send_sge.lkey = ndev->pd->local_dma_lkey;
-- 
2.20.1



[PATCH 12/16] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA

2021-04-08 Thread Logan Gunthorpe
Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
flags can be checked for PCI P2PDMA support.

Signed-off-by: Logan Gunthorpe 
---
 drivers/nvme/host/core.c |  3 ++-
 drivers/nvme/host/nvme.h |  2 +-
 drivers/nvme/host/pci.c  | 11 +--
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0896e21642be..223419454516 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3907,7 +3907,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid,
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);
 
blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
-   if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+   if (ctrl->ops->supports_pci_p2pdma &&
+   ctrl->ops->supports_pci_p2pdma(ctrl))
blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
 
ns->queue->queuedata = ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 07b34175c6ce..9c04df982d2c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -473,7 +473,6 @@ struct nvme_ctrl_ops {
unsigned int flags;
 #define NVME_F_FABRICS (1 << 0)
 #define NVME_F_METADATA_SUPPORTED  (1 << 1)
-#define NVME_F_PCI_P2PDMA  (1 << 2)
int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
@@ -481,6 +480,7 @@ struct nvme_ctrl_ops {
void (*submit_async_event)(struct nvme_ctrl *ctrl);
void (*delete_ctrl)(struct nvme_ctrl *ctrl);
int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
+   bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl);
 };
 
 #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 7249ae74f71f..14f092973792 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2759,17 +2759,24 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, 
char *buf, int size)
return snprintf(buf, size, "%s\n", dev_name(>dev));
 }
 
+static bool nvme_pci_supports_pci_p2pdma(struct nvme_ctrl *ctrl)
+{
+   struct nvme_dev *dev = to_nvme_dev(ctrl);
+
+   return dma_pci_p2pdma_supported(dev->dev);
+}
+
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
.name   = "pcie",
.module = THIS_MODULE,
-   .flags  = NVME_F_METADATA_SUPPORTED |
- NVME_F_PCI_P2PDMA,
+   .flags  = NVME_F_METADATA_SUPPORTED,
.reg_read32 = nvme_pci_reg_read32,
.reg_write32= nvme_pci_reg_write32,
.reg_read64 = nvme_pci_reg_read64,
.free_ctrl  = nvme_pci_free_ctrl,
.submit_async_event = nvme_pci_submit_async_event,
.get_address= nvme_pci_get_address,
+   .supports_pci_p2pdma= nvme_pci_supports_pci_p2pdma,
 };
 
 static int nvme_dev_map(struct nvme_dev *dev)
-- 
2.20.1



[PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg

2021-04-08 Thread Logan Gunthorpe
When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.

Similar to dma-direct, the sg_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is
set to DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for
P2PDMA pages.

Signed-off-by: Logan Gunthorpe 
---
 drivers/iommu/dma-iommu.c | 66 ++-
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af765c813cc8..ef49635f9819 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -864,6 +865,16 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;
 
+   if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
+   if (i > 0)
+   cur = sg_next(cur);
+
+   pci_p2pdma_map_bus_segment(s, cur);
+   count++;
+   cur_len = 0;
+   continue;
+   }
+
/*
 * Now fill in the real DMA data. If...
 * - there is a valid output segment to append to
@@ -961,10 +972,12 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
struct iova_domain *iovad = >iovad;
struct scatterlist *s, *prev = NULL;
int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+   struct dev_pagemap *pgmap = NULL;
+   enum pci_p2pdma_map_type map_type;
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
-   int i;
+   int i, ret = 0;
 
if (static_branch_unlikely(_deferred_attach_enabled) &&
iommu_deferred_attach(dev, domain))
@@ -993,6 +1006,31 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
s_length = iova_align(iovad, s_length + s_iova_off);
s->length = s_length;
 
+   if (is_pci_p2pdma_page(sg_page(s))) {
+   if (sg_page(s)->pgmap != pgmap) {
+   pgmap = sg_page(s)->pgmap;
+   map_type = pci_p2pdma_map_type(pgmap, dev,
+  attrs);
+   }
+
+   switch (map_type) {
+   case PCI_P2PDMA_MAP_BUS_ADDR:
+   /*
+* A zero length will be ignored by
+* iommu_map_sg() and then can be detected
+* in __finalise_sg() to actually map the
+* bus address.
+*/
+   s->length = 0;
+   continue;
+   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+   break;
+   default:
+   ret = -EREMOTEIO;
+   goto out_restore_sg;
+   }
+   }
+
/*
 * Due to the alignment of our single IOVA allocation, we can
 * depend on these assumptions about the segment boundary mask:
@@ -1015,6 +1053,9 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
prev = s;
}
 
+   if (!iova_len)
+   return __finalise_sg(dev, sg, nents, 0);
+
iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
if (!iova)
goto out_restore_sg;
@@ -1032,13 +1073,13 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
iommu_dma_free_iova(cookie, iova, iova_len, NULL);
 out_restore_sg:
__invalidate_sg(sg, nents);
-   return 0;
+   return ret;
 }
 
 static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs)
 {
-   dma_addr_t start, end;
+   dma_addr_t end, start = DMA_MAPPING_ERROR;
struct scatterlist *tmp;
int i;
 
@@ -1054,14 +1095,22 @@ static void iommu_dma_unmap_sg(struct device *dev, 
struct scatterlist *sg,
 * The scatterlist segments are mapped into a single
 * contiguous IOVA allocation, so this is incredibly easy.
 */
-   start = sg_dma_address(sg);
- 

[PATCH 16/16] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()

2021-04-08 Thread Logan Gunthorpe
This interface is superseded by the new dma_map_sg_p2pdma() interface
which supports heterogeneous scatterlists. There are no longer
any users, so remove it.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/p2pdma.c   | 67 --
 include/linux/pci-p2pdma.h | 27 ---
 2 files changed, 94 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 44ad7664e875..2f2adcccfa11 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -856,73 +856,6 @@ enum pci_p2pdma_map_type pci_p2pdma_map_type(struct 
dev_pagemap *pgmap,
 GFP_ATOMIC);
 }
 
-static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
-   struct device *dev, struct scatterlist *sg, int nents)
-{
-   struct scatterlist *s;
-   int i;
-
-   for_each_sg(sg, s, nents, i) {
-   s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset;
-   sg_dma_len(s) = s->length;
-   }
-
-   return nents;
-}
-
-/**
- * pci_p2pdma_map_sg_attrs - map a PCI peer-to-peer scatterlist for DMA
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: elements in the scatterlist
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_map_sg() (if called)
- *
- * Scatterlists mapped with this function should be unmapped using
- * pci_p2pdma_unmap_sg_attrs().
- *
- * Returns the number of SG entries mapped or 0 on error.
- */
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-   struct pci_p2pdma_pagemap *p2p_pgmap =
-   to_p2p_pgmap(sg_page(sg)->pgmap);
-
-   switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
-   __DMA_ATTR_PCI_P2PDMA)) {
-   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
-   return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
-   case PCI_P2PDMA_MAP_BUS_ADDR:
-   return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
-   default:
-   return 0;
-   }
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
-
-/**
- * pci_p2pdma_unmap_sg_attrs - unmap a PCI peer-to-peer scatterlist that was
- * mapped with pci_p2pdma_map_sg()
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: number of elements returned by pci_p2pdma_map_sg()
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_unmap_sg() (if called)
- */
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-   enum pci_p2pdma_map_type map_type;
-
-   map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
-  __DMA_ATTR_PCI_P2PDMA);
-
-   if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
-   dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
-
 /**
  * pci_p2pdma_map_segment - map an sg segment determining the mapping type
  * @state: State structure that should be declared on the stack outside of
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 49e7679403cf..2ec9c75fa097 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -45,10 +45,6 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct 
scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
 enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
struct device *dev, unsigned long dma_attrs);
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs);
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs);
 int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
struct device *dev, struct scatterlist *sg,
unsigned long dma_attrs);
@@ -109,17 +105,6 @@ static inline enum pci_p2pdma_map_type pci_p2pdma_map_type(
 {
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 }
-static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
-   struct scatterlist *sg, int nents, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-   return 0;
-}
-static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev,
-   struct scatterlist *sg, int nents, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-}
 static inline int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
struct device *dev, struct scatterlist *sg,
unsigned long dma_attrs)
@@ -155,16 +140,4 @@ static inline struct pci_dev *pci_p2pmem_find(struct 
device *client)
return pci_p2pmem_find_many(, 1);
 }
 
-static inline int 

[PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support

2021-04-08 Thread Logan Gunthorpe
Add a flags member to the dma_map_ops structure with one flag to
indicate support for PCI P2PDMA.

Also, add a helper to check if a device supports PCI P2PDMA.

Signed-off-by: Logan Gunthorpe 
---
 include/linux/dma-map-ops.h |  3 +++
 include/linux/dma-mapping.h |  5 +
 kernel/dma/mapping.c| 18 ++
 3 files changed, 26 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 51872e736e7b..481892822104 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -12,6 +12,9 @@
 struct cma;
 
 struct dma_map_ops {
+   unsigned int flags;
+#define DMA_F_PCI_P2PDMA_SUPPORTED (1 << 0)
+
void *(*alloc)(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp,
unsigned long attrs);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 50b8f586cf59..c31980ecca62 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -146,6 +146,7 @@ int dma_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
unsigned long attrs);
 bool dma_can_mmap(struct device *dev);
 int dma_supported(struct device *dev, u64 mask);
+bool dma_pci_p2pdma_supported(struct device *dev);
 int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
@@ -247,6 +248,10 @@ static inline int dma_supported(struct device *dev, u64 
mask)
 {
return 0;
 }
+static inline bool dma_pci_p2pdma_supported(struct device *dev)
+{
+   return 0;
+}
 static inline int dma_set_mask(struct device *dev, u64 mask)
 {
return -EIO;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 923089c4267b..ce44a0fcc4e5 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -573,6 +573,24 @@ int dma_supported(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_supported);
 
+bool dma_pci_p2pdma_supported(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+
+   /* if ops is not set, dma direct will be used which supports P2PDMA */
+   if (!ops)
+   return true;
+
+   /*
+* Note: dma_ops_bypass is not checked here because P2PDMA should
+* not be used with dma mapping ops that do not have support even
+* if the specific device is bypassing them.
+*/
+
+   return ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED;
+}
+EXPORT_SYMBOL_GPL(dma_pci_p2pdma_supported);
+
 #ifdef CONFIG_ARCH_HAS_DMA_SET_MASK
 void arch_dma_set_mask(struct device *dev, u64 mask);
 #else
-- 
2.20.1



[PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg

2021-04-08 Thread Logan Gunthorpe
Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
PCI P2PDMA pages directly without a hack in the callers. This allows
for heterogeneous SGLs that contain both P2PDMA and regular pages.

SGL segments that contain PCI bus addresses are marked with
sg_mark_pci_p2pdma() and are ignored when unmapped.

Signed-off-by: Logan Gunthorpe 
---
 kernel/dma/direct.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 002268262c9a..108dfb4ecbd5 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "direct.h"
 
 /*
@@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct 
scatterlist *sgl,
struct scatterlist *sg;
int i;
 
-   for_each_sg(sgl, sg, nents, i)
+   for_each_sg(sgl, sg, nents, i) {
+   if (sg_is_pci_p2pdma(sg)) {
+   sg_unmark_pci_p2pdma(sg);
+   continue;
+   }
+
dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
 attrs);
+   }
 }
 #endif
 
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs)
 {
-   int i;
+   struct pci_p2pdma_map_state p2pdma_state = {};
struct scatterlist *sg;
+   int i, ret = 0;
 
for_each_sg(sgl, sg, nents, i) {
+   if (is_pci_p2pdma_page(sg_page(sg))) {
+   ret = pci_p2pdma_map_segment(_state, dev, sg,
+attrs);
+   if (ret < 0) {
+   goto out_unmap;
+   } else if (ret) {
+   ret = 0;
+   continue;
+   }
+   }
+
sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
sg->offset, sg->length, dir, attrs);
if (sg->dma_address == DMA_MAPPING_ERROR)
@@ -411,7 +430,7 @@ int dma_direct_map_sg(struct device *dev, struct 
scatterlist *sgl, int nents,
 
 out_unmap:
dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
-   return 0;
+   return ret;
 }
 
 dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
-- 
2.20.1



Re: [RFC PATCH v1 2/2] docs: reporting-issues: make everyone CC the regressions list

2021-04-08 Thread Jonathan Corbet
Thorsten Leemhuis  writes:

> +In case you performed a successful bisection, use the title of the change 
> that
> +introduced the regression as the second part of your subject. Make the report
> +also mention the commit id of the culprit. For tracking purposes, add a line
> +like the following that contains both pieces of information, but with the
> +commit id shortened to 12 characters::
> +
> +#regzb introduced: 94a632d91ad1 ("usc: xhbi-foo: check bar_params 
> earlier")
> +
> +In case of an unsuccessful bisection, make your report mention the latest 
> tested
> +version that's working fine (say 5.7) and the oldest where the issue occurs 
> (say
> +5.8-rc1). For tracking purposes add a line expressing it like this::
> +
> +#regzb introduced: v5.7..v5.8-rc1

I kind of share Greg's concern that people aren't going to want to do
this; it could even be seen as an impediment to reporting problems in
general.  If you *really* want random users to input this sort of
information, you may well end up creating some sort of web page to step
them through it.

Also, though, as I understand it the system that will interpret these
lines does not yet exist.  Experience tells me that, as this system
comes into existence, you will have a good chance of deciding that you
want the syntax to look different anyway.  So I would personally hold
off on telling people to include directives like this until you have
something that works with them.

But that's just me... if this is the way it's going to work then the
docs should of course reflect that.

Thanks,

jon


Re: [PATCH v2 1/2] USB:ehci:Add a whitelist for EHCI controllers

2021-04-08 Thread kernel test robot
Hi Longfang,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on usb/usb-testing]
[also build test WARNING on v5.12-rc6 next-20210408]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Longfang-Liu/USB-ehci-fix-the-no-SRBN-register-problem/20210408-215249
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git 
usb-testing
config: x86_64-randconfig-s022-20210408 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-279-g6d5d9b42-dirty
# 
https://github.com/0day-ci/linux/commit/01b93fbbf8fb6137c7779062232c0fe8c1592940
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Longfang-Liu/USB-ehci-fix-the-no-SRBN-register-problem/20210408-215249
git checkout 01b93fbbf8fb6137c7779062232c0fe8c1592940
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)
>> drivers/usb/host/ehci-pci.c:57:10: sparse: sparse: incorrect type in 
>> initializer (different base types) @@ expected unsigned short [usertype] 
>> vendor @@ got void * @@
   drivers/usb/host/ehci-pci.c:57:10: sparse: expected unsigned short 
[usertype] vendor
   drivers/usb/host/ehci-pci.c:57:10: sparse: got void *
>> drivers/usb/host/ehci-pci.c:57:16: sparse: sparse: incorrect type in 
>> initializer (different base types) @@ expected unsigned short [usertype] 
>> device @@ got void * @@
   drivers/usb/host/ehci-pci.c:57:16: sparse: expected unsigned short 
[usertype] device
   drivers/usb/host/ehci-pci.c:57:16: sparse: got void *

vim +57 drivers/usb/host/ehci-pci.c

49  
50  static const struct usb_nosbrn_whitelist_entry {
51  u16 vendor;
52  u16 device;
53  } usb_nosbrn_whitelist[] = {
54  /* STMICRO ConneXT has no sbrn register */
55  {PCI_VENDOR_ID_STMICRO, PCI_DEVICE_ID_STMICRO_USB_HOST},
56  /* End of list */
  > 57  {NULL, NULL}
58  };
59  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH 02/11] mm/page_alloc: Convert per-cpu list protection to local_lock

2021-04-08 Thread Mel Gorman
On Thu, Apr 08, 2021 at 12:52:07PM +0200, Peter Zijlstra wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a68bacddcae0..e9e60d1a85d4 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -112,6 +112,13 @@ typedef int __bitwise fpi_t;
> >  static DEFINE_MUTEX(pcp_batch_high_lock);
> >  #define MIN_PERCPU_PAGELIST_FRACTION   (8)
> >  
> > +struct pagesets {
> > +   local_lock_t lock;
> > +};
> > +static DEFINE_PER_CPU(struct pagesets, pagesets) = {
> > +   .lock = INIT_LOCAL_LOCK(lock),
> > +};
> 
> So why isn't the local_lock_t in struct per_cpu_pages ? That seems to be
> the actual object that is protected by it and is already per-cpu.
> 
> Is that because you want to avoid the duplication across zones? Is that
> worth the effort?

When I wrote the patch, the problem was that zone_pcp_reset freed the
per_cpu_pages structure and it was "protected" by local_irq_save(). If
that was converted to local_lock_irq then the structure containing the
lock is freed before it is released which is obviously bad.

Much later when trying to make the allocator RT-safe in general, I realised
that locking was broken and fixed it in patch 3 of this series. With that,
the local_lock could potentially be embedded within per_cpu_pages safely
at the end of this series.

-- 
Mel Gorman
SUSE Labs


Re: [PATCH v4 2/6] dt-bindings: PCI: Add bindings for Brcmstb endpoint device voltage regulators

2021-04-08 Thread Rob Herring
On Thu, Apr 08, 2021 at 12:58:05PM -0400, Jim Quinlan wrote:
> On Thu, Apr 8, 2021 at 12:20 PM Rob Herring  wrote:
> >
> > On Tue, Apr 06, 2021 at 02:25:49PM -0400, Jim Quinlan wrote:
> > > On Tue, Apr 6, 2021 at 1:32 PM Mark Brown  wrote:
> > > >
> > > > On Tue, Apr 06, 2021 at 01:26:51PM -0400, Jim Quinlan wrote:
> > > > > On Tue, Apr 6, 2021 at 12:47 PM Mark Brown  wrote:
> > > >
> > > > > > No great problem with having these in the controller node (assming 
> > > > > > it
> > > > > > accurately describes the hardware) but I do think we ought to also 
> > > > > > be
> > > > > > able to describe these per slot.
> >
> > PCIe is effectively point to point, so there's only 1 slot unless
> > there's a PCIe switch in the middle. If that's the case, then it's all
> > more complicated.
> >
> > > > > Can you explain what you think that would look like in the DT?
> > > >
> > > > I *think* that's just some properties on the nodes for the endpoints,
> > > > note that the driver could just ignore them for now.  Not sure where or
> > > > if we document any extensions but child nodes are in section 4 of the
> > > > v2.1 PCI bus binding.
> > >
> > > Hi Mark,
> > >
> > > I'm a little confused -- here is how I remember the chronology of the
> > > "DT bindings" commit reviews, please correct me if I'm wrong:
> > >
> > > o JimQ submitted a pullreq for using voltage regulators in the same
> > > style as the existing "rockport" PCIe driver.
> > > o After some deliberation, RobH preferred that the voltage regulators
> > > should go into the PCIe subnode device's DT node.
> >
> > IIRC, that's because you said there isn't a standard slot.
> Admittedly, I'm not exactly sure what you mean by a "standard slot".
> Our PCIIe HW does not support  hotplug or have a presence bit or
> anything like that.  Our root complex has one port; it can only
> directly connect to a single switch or endpoint. This connection shows
> up as slot0.  The voltage regulator(s) involved depend on a GPIO that
> turns the power  on/off for the connected device/chip.  The gpio pin
> can vary from board to board but this is easily handled in our DT.
> Some boards have regulators that are always on and not associated with
> a GPIO pin -- these have no representation in our DT.

By standard slot, I mean you have standard voltage rails 12V and 3.3V 
(or 1.5 and 3.3 for mini PCIe) and PERST# signal, no other extra 
things to make a device discoverable, and the timing for 
those rails and PERST# follow what the spec defines.

There's also CLKREQ, WAKE, and hotplug detect signals, but I think those 
are all optional and could be tied off. I think most PCI h/w is not 
hotplug capable.

Rob


CAN I TRUST YOU?

2021-04-08 Thread Mr Suleman Bello
Dear Friend,

Please I want you to read this letter very carefully and I must
apologize for berging this message into your mailbox without any
formal introduction due to the urgency and confidentiality of this
issue and I know that this message will come to you as a surprise.
Please, I urge you to handle this message as a top secret and I will
not like you to joke with it.

I am Mr.Suleman Bello, a staff in Bank of Burkina faso (Ouagadougou)
As a branch internal auditor of the bank, I discovered an existing
dormant account for years without any banking activity going on in it;
neither deposits nor withdrawals from this account for this long
period. And according to the rules guiding our bank, any unserviceable
account for more than (7) seven years will be delisted, and the funds,
no matter the amount, will be transferred to the national treasury as
an unclaimed fund.

I hope you know the meaning of unclaimed funds; an abandoned fund you
can’t locate the owner, probably because the owner is dead by
sickness, accident, plane crash, natural disaster or murdered without
leaving information about the funds to any relative or friend.  Or
funds looted and abandoned by politicians for the fear of  being
prosecuted by their government. That's why you have many unclaimed
funds littered in different banks everywhere. And It is only staff
like in my position as an internal auditor of a bank that often come
across such accounts and their details.

I hope that you will not expose or betray this trust and confidence
that I am about to establish with you for the mutual benefit of you
and I.  Please, I need  us to work together in order to move the funds
out immediately to your possession so that we can share it.  I will
assist you to come up with a beneficiary claim to enable the
transferring of the said sum of $10.5 million United States Dollar
into your account within 7 banking days. Once it is done, I will
resign from my work and use my share for better and lucrative
businesses,  and to continue my life happily in another beautiful and
cozy foreign country of my choice, with my new family.

As an insider in the bank that you can trust, I can leak-out  or
secretly reveal to you every information and document about the
account including the account statement to enable us  facilitate it.
The truth is that this account and funds in it have been dormant for
years in our Bank, and the request for a  foreigner to be involved in
this deal becomes necessary because our late customer was a foreigner
and a burkinabe cannot stand as Next of Kin to a foreigner. Therefore,
because of the nature of this transaction, I want you to stand as the
Next of kin so that our bank will accord you the recognition and have
the fund transferred to your account. And I promised that I will be
guiding you until the end to avoid any mistake.

Upon your response, I shall then provide you with further information,
including verifiable documents linked to the account and more details
that will help you understand the transaction. I am expecting your
urgent response to enable me inform you how this deal will be
executed.

Please I would like you to keep this transaction confidential and as a
top secret or delete if you are not interested.

With many thanks,
Suleman Bello.


Re: [PATCH] blk-mq: fix alignment mismatch.

2021-04-08 Thread Nathan Chancellor
Hi Jian,

On Thu, Apr 08, 2021 at 10:57:54AM -0700, Jian Cai wrote:
> So this issue is blocking the LLVM upgrading on ChromeOS. Nathan, do
> you mind sending out the smaller patch like Nick suggested just to see
> what feedback we could get? I could send it for you if you are busy,
> and please let me know what tags I should use in that case.
> 
> Thanks,
> Jian

I will go ahead and send the smaller patch at some point today.

For what it's worth, Nick brought up https://reviews.llvm.org/D100037,
which may be relevant here.

Cheers,
Nathan

> On Wed, Mar 31, 2021 at 3:06 PM Nick Desaulniers
>  wrote:
> >
> > On Wed, Mar 31, 2021 at 2:58 PM Nathan Chancellor  wrote:
> > >
> > > On Wed, Mar 31, 2021 at 02:27:03PM -0700, Jian Cai wrote:
> > > >
> > > > I just realized you already proposed solutions for skipping the check
> > > > in 
> > > > https://lore.kernel.org/linux-block/20210310225240.4epj2mdmzt4vurr3@archlinux-ax161/#t.
> > > > Do you have any plans to send them for review?
> > >
> > > I was hoping to gather some feedback on which option would be preferred
> > > by Jens and the other ClangBuiltLinux folks before I sent them along. I
> > > can send the first just to see what kind of feedback I can gather.
> >
> > Either approach is fine by me. The smaller might be easier to get
> > accepted into stable. The larger approach will probably become more
> > useful in the future (having the diag infra work properly with clang).
> > I think the ball is kind of in Jens' court to decide.  Would doing
> > both be appropriate, Jens? Have the smaller patch tagged for stable
> > disabling it for the whole file, then another commit on top not tagged
> > for stable that adds the diag infra, and a third on top replacing the
> > file level warning disablement with local diags to isolate this down
> > to one case?  It's a fair but small amount of churn IMO; but if Jens
> > is not opposed it seems fine?
> > --
> > Thanks,
> > ~Nick Desaulniers


Re: [PATCH] export: Make CRCs robust to symbol trimming

2021-04-08 Thread Greg KH
On Thu, Apr 08, 2021 at 06:01:05PM +, Quentin Perret wrote:
> The CRC calculation done by genksyms is triggered when the parser hits
> EXPORT_SYMBOL*() macros. At this point, genksyms recursively expands the
> types, and uses that as the input for the CRC calculation. In the case
> of forward-declared structs, the type expands to 'UNKNOWN'. Next, the
> result of the expansion of each type is cached, and is re-used when/if
> the same type is seen again for another exported symbol in the file.
> 
> Unfortunately, this can cause CRC 'stability' issues when a struct
> definition becomes visible in the middle of a C file. For example, let's
> assume code with the following pattern:
> 
> struct foo;
> 
> int bar(struct foo *arg)
> {
>   /* Do work ... */
> }
> EXPORT_SYMBOL_GPL(bar);
> 
> /* This contains struct foo's definition */
> #include "foo.h"
> 
> int baz(struct foo *arg)
> {
>   /* Do more work ... */
> }
> EXPORT_SYMBOL_GPL(baz);
> 
> Here, baz's CRC will be computed using the expansion of struct foo that
> was cached after bar's CRC calculation ('UNKOWN' here). But if
> EXPORT_SYMBOL_GPL(bar) is removed from the file (because of e.g. symbol
> trimming using CONFIG_TRIM_UNUSED_KSYMS), struct foo will be expanded
> late, during baz's CRC calculation, which now has visibility over the
> full struct definition, hence resulting in a different CRC for baz.
> 
> This can cause annoying issues for distro kernel (such as the Android
> Generic Kernel Image) which use CONFIG_UNUSED_KSYMS_WHITELIST. Indeed,
> as per the above, adding a symbol to the whitelist can change the CRC of
> symbols that are already kept exported. As such, modules built against a
> kernel with a trimmed ABI may not load against the same kernel built
> with an extended whitelist, even though they are still strictly binary
> compatible. While rebuilding the modules would obviously solve the
> issue, I believe this classifies as an odd genksyms corner case, and it
> gets in the way of kernel updates in the GKI context.
> 
> To work around the issue, make sure to keep issuing the
> __GENKSYMS_EXPORT_SYMBOL macros for all trimmed symbols, hence making
> the genksyms parsing insensitive to symbol trimming.
> 
> Signed-off-by: Quentin Perret 
> ---
>  include/linux/export.h | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/linux/export.h b/include/linux/export.h
> index 6271a5d9c988..27d848712b90 100644
> --- a/include/linux/export.h
> +++ b/include/linux/export.h
> @@ -140,7 +140,12 @@ struct kernel_symbol {
>  #define ___cond_export_sym(sym, sec, ns, enabled)\
>   __cond_export_sym_##enabled(sym, sec, ns)
>  #define __cond_export_sym_1(sym, sec, ns) ___EXPORT_SYMBOL(sym, sec, ns)
> +
> +#ifdef __GENKSYMS__
> +#define __cond_export_sym_0(sym, sec, ns) __GENKSYMS_EXPORT_SYMBOL(sym)
> +#else
>  #define __cond_export_sym_0(sym, sec, ns) /* nothing */
> +#endif
>  
>  #else
>  

Anything to help make these symbol values more "stable" is good, they
drive me crazy...

Acked-by: Greg Kroah-Hartman 


Re: [PATCH v2 3/3] perf-stat: introduce config stat.bpf-counter-events

2021-04-08 Thread Jiri Olsa
On Thu, Apr 08, 2021 at 06:08:20PM +, Song Liu wrote:
> 
> 
> > On Apr 8, 2021, at 10:45 AM, Jiri Olsa  wrote:
> > 
> > On Thu, Apr 08, 2021 at 05:28:10PM +, Song Liu wrote:
> >> 
> >> 
> >>> On Apr 8, 2021, at 10:20 AM, Jiri Olsa  wrote:
> >>> 
> >>> On Thu, Apr 08, 2021 at 04:39:33PM +, Song Liu wrote:
>  
>  
> > On Apr 8, 2021, at 4:47 AM, Jiri Olsa  wrote:
> > 
> > On Tue, Apr 06, 2021 at 05:36:01PM -0700, Song Liu wrote:
> >> Currently, to use BPF to aggregate perf event counters, the user uses
> >> --bpf-counters option. Enable "use bpf by default" events with a config
> >> option, stat.bpf-counter-events. This is limited to hardware events in
> >> evsel__hw_names.
> >> 
> >> This also enables mixed BPF event and regular event in the same 
> >> sesssion.
> >> For example:
> >> 
> >> perf config stat.bpf-counter-events=instructions
> >> perf stat -e instructions,cs
> >> 
> > 
> > so if we are mixing events now, how about uing modifier for bpf 
> > counters,
> > instead of configuring .perfconfig list we could use:
> > 
> > perf stat -e instructions:b,cs
> > 
> > thoughts?
> > 
> > the change below adds 'b' modifier and sets 'evsel::bpf_counter',
> > feel free to use it
>  
>  I think we will need both 'b' modifier and .perfconfig configuration. 
>  For systems with BPF-managed perf events running in the background, 
> >>> 
> >>> hum, I'm not sure I understand what that means.. you mean there
> >>> are tools that run perf stat so you don't want to change them?
> >> 
> >> We have tools that do perf_event_open(). I will change them to use 
> >> BPF managed perf events for "cycles" and "instructions". Since these 
> >> tools are running 24/7, perf-stat on the system should use BPF managed
> >> "cycles" and "instructions" by default. 
> > 
> > well if you are already changing the tools why not change them to add
> > modifier.. but I don't mind adding that .perfconfig stuff if you need
> > that
> 
> The tools I mentioned here don't use perf-stat, they just use 
> perf_event_open() and read the perf events fds. We want a config to make

just curious, how those tools use perf_event_open?

> "cycles" to use BPF by default, so that when the user (not these tools)
> runs perf-stat, it will share PMCs with those events by default. 

I'm sorry but I still don't see the usecase.. if you need to change both tools,
you can change them to use bpf-managed event, why bother with the list?

> > 
> >> 
> >>> 
>  .perfconfig makes sure perf-stat sessions will share PMCs with these 
>  background monitoring tools. 'b' modifier, on the other hand, is useful
>  when the user knows there is opportunity to share the PMCs. 
>  
>  Does this make sense? 
> >>> 
> >>> if there's reason for that then sure.. but let's not limit that just
> >>> on HARDWARE events only.. there are RAW events with the same demand
> >>> for this feature.. why don't we let user define any event for this?
> >> 
> >> I haven't found a good way to config RAW events. I guess RAW events 
> >> could use 'b' modifier? 
> > any event uing the pmu notation like cpu/instructions/
> 
> Can we do something like "perf config stat.bpf-counter-events=cpu/*" means 
> all "cpu/xx" events use BPF by default?

I think there's misundestanding, all I'm saying is that IIUC you check
events stat.bpf-counter-events to be HARDWARE type, which I don't think
is necessary and we can allow any event in there

jirka



Re: [PATCH 2/3] fpga: dfl: Add DFL bus driver for Altera SPI Master

2021-04-08 Thread Moritz Fischer
On Thu, Apr 08, 2021 at 09:20:19AM +, Wu, Hao wrote:
> > On Thu, Apr 08, 2021 at 03:30:15PM +0800, Wu, Hao wrote:
> > > > > On Mon, 5 Apr 2021, Moritz Fischer wrote:
> > > > >
> > > > > > Hi Matthew,
> > > > > >
> > > > > > On Mon, Apr 05, 2021 at 04:53:00PM -0700,
> > > > matthew.gerl...@linux.intel.com wrote:
> > > > > > > From: Matthew Gerlach 
> > > > > > >
> > > > > > > This patch adds DFL bus driver for the Altera SPI Master
> > > > > > > controller.  The SPI master is connected to an Intel SPI Slave to
> > > > > > > Avalon Master Bridge, inside an Intel MAX10 BMC Chip.
> > > > > > >
> > > > > > > Signed-off-by: Matthew Gerlach 
> > > > > > > ---
> > > > > > >  drivers/fpga/Kconfig  |   9 ++
> > > > > > >  drivers/fpga/Makefile |   1 +
> > > > > > >  drivers/fpga/dfl-spi-altera.c | 221
> > > > ++
> > > > > > >  3 files changed, 231 insertions(+)
> > > > > > >  create mode 100644 drivers/fpga/dfl-spi-altera.c
> > > > > > >
> > > > > > > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> > > > > > > index d591dd9..0a86994 100644
> > > > > > > --- a/drivers/fpga/Kconfig
> > > > > > > +++ b/drivers/fpga/Kconfig
> > > > > > > @@ -210,6 +210,15 @@ config FPGA_DFL_NIOS_INTEL_PAC_N3000
> > > > > > >the card. It also instantiates the SPI master (spi-altera) for
> > > > > > >the card's BMC (Board Management Controller).
> > > > > > >
> > > > > > > +config FPGA_DFL_SPI_ALTERA
> > > > > > > +tristate "FPGA DFL Altera SPI Master Driver"
> > > > > > > +depends on FPGA_DFL
> > > > > > > +select REGMAP
> > > > > > > +help
> > > > > > > +  This is a DFL bus driver for the Altera SPI master controller.
> > > > > > > +  The SPI master is connected to a SPI slave to Avalon Master
> > > > > > > +  bridge in a Intel MAX BMC.
> > > > > > > +
> > > > > > >  config FPGA_DFL_PCI
> > > > > > >  tristate "FPGA DFL PCIe Device Driver"
> > > > > > >  depends on PCI && FPGA_DFL
> > > > > > > diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
> > > > > > > index 18dc9885..58a42ad 100644
> > > > > > > --- a/drivers/fpga/Makefile
> > > > > > > +++ b/drivers/fpga/Makefile
> > > > > > > @@ -45,6 +45,7 @@ dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o 
> > > > > > > dfl-
> > > > afu-dma-region.o
> > > > > > >  dfl-afu-objs += dfl-afu-error.o
> > > > > > >
> > > > > > >  obj-$(CONFIG_FPGA_DFL_NIOS_INTEL_PAC_N3000)+= dfl-n3000-
> > > > nios.o
> > > > > > > +obj-$(CONFIG_FPGA_DFL_SPI_ALTERA)+= dfl-spi-altera.o
> > > > > > >
> > > > > > >  # Drivers for FPGAs which implement DFL
> > > > > > >  obj-$(CONFIG_FPGA_DFL_PCI)+= dfl-pci.o
> > > > > > > diff --git a/drivers/fpga/dfl-spi-altera.c 
> > > > > > > b/drivers/fpga/dfl-spi-altera.c
> > > > > > > new file mode 100644
> > > > > > > index 000..9bec25fd
> > > > > > > --- /dev/null
> > > > > > > +++ b/drivers/fpga/dfl-spi-altera.c
> > > > > > > @@ -0,0 +1,221 @@
> > > > > > > +// SPDX-License-Identifier: GPL-2.0
> > > > > > > +/*
> > > > > > > + * DFL bus driver for Altera SPI Master
> > > > > > > + *
> > > > > > > + * Copyright (C) 2020 Intel Corporation, Inc.
> > > > > > > + *
> > > > > > > + * Authors:
> > > > > > > + *   Matthew Gerlach 
> > > > > > > + */
> > > > > > > +
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +#include 
> > > > > > > +
> > > > > > > +struct dfl_altera_spi {
> > > > > > > +void __iomem *base;
> > > > > > > +struct regmap *regmap;
> > > > > > > +struct device *dev;
> > > > > > > +struct platform_device *altr_spi;
> > > > > > > +};
> > > > > > > +
> > > > > > > +#define SPI_CORE_PARAMETER  0x8
> > > > > > > +#define SHIFT_MODE  BIT_ULL(1)
> > > > > > > +#define SHIFT_MODE_MSB  0
> > > > > > > +#define SHIFT_MODE_LSB  1
> > > > > > > +#define DATA_WIDTH  GENMASK_ULL(7, 2)
> > > > > > > +#define NUM_CHIPSELECT  GENMASK_ULL(13, 8)
> > > > > > > +#define CLK_POLARITYBIT_ULL(14)
> > > > > > > +#define CLK_PHASE   BIT_ULL(15)
> > > > > > > +#define PERIPHERAL_ID   GENMASK_ULL(47, 32)
> > > > > > > +#define SPI_CLK GENMASK_ULL(31, 22)
> > > > > > > +#define SPI_INDIRECT_ACC_OFST   0x10
> > > > > > > +
> > > > > > > +#define INDIRECT_ADDR   (SPI_INDIRECT_ACC_OFST+0x0)
> > > > > > > +#define INDIRECT_WR BIT_ULL(8)
> > > > > > > +#define INDIRECT_RD BIT_ULL(9)
> > > > > > > +#define INDIRECT_RD_DATA(SPI_INDIRECT_ACC_OFST+0x8)
> > > > > > > +#define INDIRECT_DATA_MASK  GENMASK_ULL(31, 0)
> > > > > > > +#define INDIRECT_DEBUG  BIT_ULL(32)
> > > > > > > +#define INDIRECT_WR_DATA(SPI_INDIRECT_ACC_OFST+0x10)
> > > > > > > +#define INDIRECT_TIMEOUT1
> > > > > > > 

<    1   2   3   4   5   6   7   8   9   10   >