Re: [PATCH 1/3] x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n
Hi Sean, I noticed this commit in linux-next. On Tue, 9 Apr 2024 10:51:05 -0700 Sean Christopherson wrote: > > Initialize cpu_mitigations to CPU_MITIGATIONS_OFF if the kernel is built > with CONFIG_SPECULATION_MITIGATIONS=n, as the help text quite clearly > states that disabling SPECULATION_MITIGATIONS is supposed to turn off all > mitigations by default. > > │ If you say N, all mitigations will be disabled. You really > │ should know what you are doing to say so. > > As is, the kernel still defaults to CPU_MITIGATIONS_AUTO, which results in > some mitigations being enabled in spite of SPECULATION_MITIGATIONS=n. > > Fixes: f43b9876e857 ("x86/retbleed: Add fine grained Kconfig knobs") > Cc: sta...@vger.kernel.org > Signed-off-by: Sean Christopherson > --- > kernel/cpu.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 8f6affd051f7..07ad53b7f119 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -3207,7 +3207,8 @@ enum cpu_mitigations { > }; > > static enum cpu_mitigations cpu_mitigations __ro_after_init = > - CPU_MITIGATIONS_AUTO; > + IS_ENABLED(CONFIG_SPECULATION_MITIGATIONS) ? CPU_MITIGATIONS_AUTO : > + CPU_MITIGATIONS_OFF; > > static int __init mitigations_parse_cmdline(char *arg) > { > -- > 2.44.0.478.gd926399ef9-goog > I noticed because it turned off all mitigations for my PowerPC qemu boot tests - probably because CONFIG_SPECULATION_MITIGATIONS only exists in arch/x86/Kconfig ... thus for other architectures that have cpu mitigations, this will always default them to off, right? -- Cheers, Stephen Rothwell pgpVHLvRdprn6.pgp Description: OpenPGP digital signature
Re: [PATCH v12 8/8] PCI: endpoint: Remove "core_init_notifier" flag
On Wed, Mar 27, 2024 at 02:43:37PM +0530, Manivannan Sadhasivam wrote: > "core_init_notifier" flag is set by the glue drivers requiring refclk from > the host to complete the DWC core initialization. Also, those drivers will > send a notification to the EPF drivers once the initialization is fully > completed using the pci_epc_init_notify() API. Only then, the EPF drivers > will start functioning. > > For the rest of the drivers generating refclk locally, EPF drivers will > start functioning post binding with them. EPF drivers rely on the > 'core_init_notifier' flag to differentiate between the drivers. > Unfortunately, this creates two different flows for the EPF drivers. > > So to avoid that, let's get rid of the "core_init_notifier" flag and follow > a single initialization flow for the EPF drivers. This is done by calling > the dw_pcie_ep_init_notify() from all glue drivers after the completion of > dw_pcie_ep_init_registers() API. This will allow all the glue drivers to > send the notification to the EPF drivers once the initialization is fully > completed. Thanks for doing this! I think this is a significantly nicer solution than core_init_notifier was. One question: both qcom and tegra194 call dw_pcie_ep_init_registers() from an interrupt handler, but they register that handler in a different order with respect to dw_pcie_ep_init(). I don't know what actually starts the process that leads to the interrupt, but if it's dw_pcie_ep_init(), then one of these (qcom, I think) must be racy: qcom_pcie_ep_probe dw_pcie_ep_init <- A qcom_pcie_ep_enable_irq_resources devm_request_threaded_irq(qcom_pcie_ep_perst_irq_thread) <- B qcom_pcie_ep_perst_irq_thread qcom_pcie_perst_deassert dw_pcie_ep_init_registers tegra_pcie_dw_probe tegra_pcie_config_ep devm_request_threaded_irq(tegra_pcie_ep_pex_rst_irq) <- B dw_pcie_ep_init <- A tegra_pcie_ep_pex_rst_irq pex_ep_event_pex_rst_deassert dw_pcie_ep_init_registers Whatever the right answer is, I think qcom and tegra194 should both order dw_pcie_ep_init() and the devm_request_threaded_irq() the same way. Bjorn
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
On 11.04.24 18:55, Paolo Bonzini wrote: On Mon, Apr 8, 2024 at 3:56 PM Peter Xu wrote: Paolo, I may miss a bunch of details here (as I still remember some change_pte patches previously on the list..), however not sure whether we considered enable it? Asked because I remember Andrea used to have a custom tree maintaining that part: https://github.com/aagit/aa/commit/c761078df7a77d13ddfaeebe56a0f4bc128b1968 The patch enables it only for KSM, so it would still require a bunch of cleanups, for example I also would still use set_pte_at() in all the places that are not KSM. This would at least fix the issue with the poor documentation of where to use set_pte_at_notify() vs set_pte_at(). With regard to the implementation, I like the idea of disabling the invalidation on the MMU notifier side, but I would rather have MMU_NOTIFIER_CHANGE_PTE as a separate field in the range instead of overloading the event field. Maybe it can't be enabled for some reason that I overlooked in the current tree, or we just decided to not to? I have just learnt about the patch, nobody had ever mentioned it even though it's almost 2 years old... It's a lot of code though and no one I assume Andrea used it on his tree where he also has a version of "randprotect" (even included in that commit subject) to mitigate a KSM security issue that was reported by some security researchers [1] a while ago. From what I recall, the industry did not end up caring about that security issue that much. IIUC, with "randprotect" we get a lot more R/O protection even when not de-duplicating a page -- thus the name. Likely, the reporter mentioned in the commit is a researcher that played with Andreas fix for the security issue. But I'm just speculating at this point :) has ever reported an issue for over 10 years, so I think it's easiest to just rip the code out. Yes. Can always be readded in a possibly cleaner fashion (like you note above), when deemed necessary and we are willing to support it. [1] https://gruss.cc/files/remote_dedup.pdf -- Cheers, David / dhildenb
Re: [PATCH v12 2/8] PCI: dwc: ep: Add Kernel-doc comments for APIs
On Wed, Mar 27, 2024 at 02:43:31PM +0530, Manivannan Sadhasivam wrote: > All of the APIs are missing the Kernel-doc comments. Hence, add them. > + * dw_pcie_ep_reset_bar - Reset endpoint BAR Apparently this resets @bar for every function of the device, so it's not just a single BAR? > + * dw_pcie_ep_raise_intx_irq - Raise INTx IRQ to the host > + * @ep: DWC EP device > + * @func_no: Function number of the endpoint > + * > + * Return: 0 if success, errono otherwise. s/errono/errno/ (another instance below) Bjorn
[RFC PATCH] powerpc: Optimise barriers for fully ordered atomics
"Fully ordered" atomics (RMW that return a value) are said to have a full barrier before and after the atomic operation. This is implemented as: hwsync larx ... stcx. bne- hwsync This is slow on POWER processors because hwsync and stcx. require a round-trip to the nest (~= L2 cache). The hwsyncs can be avoided with the sequence: lwsync larx ... stcx. bne- isync lwsync prevents all reorderings except store/load reordering, so the larx could be execued ahead of a prior store becoming visible. However the stcx. is a store, so it is ordered by the lwsync against all prior access and if the value in memory had been modified since the larx, it will fail. So the point at which the larx executes is not a concern because the stcx. always verifies the memory was unchanged. The isync prevents subsequent instructions being executed before the stcx. executes, and stcx. is necessarily visible to the system after it executes, so there is no opportunity for it (or prior stores, thanks to lwsync) to become visible after a subsequent load or store. This sequence requires only one L2 round-trip and so is around 2x faster measured on a POWER10 with back-to-back atomic ops on cached memory. [ Remains to be seen if this is always faster when there is other activity going on, and if it's faster on non-POEWR CPUs or perhaps older ones like 970 that might not optimise isync so much. ] Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/synch.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/synch.h b/arch/powerpc/include/asm/synch.h index b0b4c64870d7..0b1718eb9a40 100644 --- a/arch/powerpc/include/asm/synch.h +++ b/arch/powerpc/include/asm/synch.h @@ -60,8 +60,8 @@ static inline void ppc_after_tlbiel_barrier(void) MAKE_LWSYNC_SECTION_ENTRY(97, __lwsync_fixup); #define PPC_ACQUIRE_BARRIER "\n" stringify_in_c(__PPC_ACQUIRE_BARRIER) #define PPC_RELEASE_BARRIER stringify_in_c(LWSYNC) "\n" -#define PPC_ATOMIC_ENTRY_BARRIER "\n" stringify_in_c(sync) "\n" -#define PPC_ATOMIC_EXIT_BARRIER "\n" stringify_in_c(sync) "\n" +#define PPC_ATOMIC_ENTRY_BARRIER "\n" stringify_in_c(LWSYNC) "\n" +#define PPC_ATOMIC_EXIT_BARRIER "\n" stringify_in_c(isync) "\n" #else #define PPC_ACQUIRE_BARRIER #define PPC_RELEASE_BARRIER -- 2.43.0
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
On Fri, Apr 12, 2024, Marc Zyngier wrote: > On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon wrote: > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote: > > Also, if you're in the business of hacking the MMU notifier code, it > > would be really great to change the .clear_flush_young() callback so > > that the architecture could handle the TLB invalidation. At the moment, > > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret' > > being set by kvm_handle_hva_range(), whereas we could do a much > > lighter-weight and targetted TLBI in the architecture page-table code > > when we actually update the ptes for small ranges. > > Indeed, and I was looking at this earlier this week as it has a pretty > devastating effect with NV (it blows the shadow S2 for that VMID, with > costly consequences). > > In general, it feels like the TLB invalidation should stay with the > code that deals with the page tables, as it has a pretty good idea of > what needs to be invalidated and how -- specially on architectures > that have a HW-broadcast facility like arm64. Would this be roughly on par with an in-line flush on arm64? The simpler, more straightforward solution would be to let architectures override flush_on_ret, but I would prefer something like the below as x86 can also utilize a range-based flush when running as a nested hypervisor. diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ff0a20565f90..b65116294efe 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -601,6 +601,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, struct kvm_gfn_range gfn_range; struct kvm_memory_slot *slot; struct kvm_memslots *slots; + bool need_flush = false; int i, idx; if (WARN_ON_ONCE(range->end <= range->start)) @@ -653,10 +654,22 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, break; } r.ret |= range->handler(kvm, &gfn_range); + + /* +* Use a precise gfn-based TLB flush when possible, as +* most mmu_notifier events affect a small-ish range. +* Fall back to a full TLB flush if the gfn-based flush +* fails, and don't bother trying the gfn-based flush +* if a full flush is already pending. +*/ + if (range->flush_on_ret && !need_flush && r.ret && + kvm_arch_flush_remote_tlbs_range(kvm, gfn_range.start +gfn_range.end - gfn_range.start + 1)) + need_flush = true; } } - if (range->flush_on_ret && r.ret) + if (need_flush) kvm_flush_remote_tlbs(kvm); if (r.found_memslot)
Re: [RFC PATCH 0/8] Reimplement huge pages without hugepd on powerpc 8xx
On Fri, Apr 12, 2024 at 02:08:03PM +, Christophe Leroy wrote: > > > Le 11/04/2024 à 18:15, Peter Xu a écrit : > > On Mon, Mar 25, 2024 at 01:38:40PM -0300, Jason Gunthorpe wrote: > >> On Mon, Mar 25, 2024 at 03:55:53PM +0100, Christophe Leroy wrote: > >>> This series reimplements hugepages with hugepd on powerpc 8xx. > >>> > >>> Unlike most architectures, powerpc 8xx HW requires a two-level > >>> pagetable topology for all page sizes. So a leaf PMD-contig approach > >>> is not feasible as such. > >>> > >>> Possible sizes are 4k, 16k, 512k and 8M. > >>> > >>> First level (PGD/PMD) covers 4M per entry. For 8M pages, two PMD entries > >>> must point to a single entry level-2 page table. Until now that was > >>> done using hugepd. This series changes it to use standard page tables > >>> where the entry is replicated 1024 times on each of the two pagetables > >>> refered by the two associated PMD entries for that 8M page. > >>> > >>> At the moment it has to look into each helper to know if the > >>> hugepage ptep is a PTE or a PMD in order to know it is a 8M page or > >>> a lower size. I hope this can me handled by core-mm in the future. > >>> > >>> There are probably several ways to implement stuff, so feedback is > >>> very welcome. > >> > >> I thought it looks pretty good! > > > > I second it. > > > > I saw the discussions in patch 1. Christophe, I suppose you're exploring > > the big hammer over hugepd, and perhaps went already with the 32bit pmd > > solution for nohash/32bit challenge you mentioned? > > > > I'm trying to position my next step; it seems like at least I should not > > adding any more hugepd code, then should I go with ARCH_HAS_HUGEPD checks, > > or you're going to have an RFC soon then I can base on top? > > Depends on what you expect by "soon". > > I sure won't be able to send any RFC before end of April. > > Should be possible to have something during May. That's good enough, thanks. I'll see what is the best I can do. Then do you think I can leave p4d/pgd leaves alone? Please check the other email where I'm not sure whether pgd leaves ever existed for any of PowerPC. That's so far what I plan to do, on teaching pgtable walkers recognize pud and lower for all leaves. Then if Power can switch from hugepd to this it should just work. Even if pgd exists (then something I overlooked..), I'm wondering whether we can push that downwards to be either pud/pmd (and looks like we all agree p4d is never used on Power). That may involve some pgtable operations moving from pgd level to lower, e.g. my pure imagination would look like starting with: #define PTE_INDEX_SIZE PTE_SHIFT #define PMD_INDEX_SIZE 0 #define PUD_INDEX_SIZE 0 #define PGD_INDEX_SIZE (32 - PGDIR_SHIFT) To: #define PTE_INDEX_SIZE PTE_SHIFT #define PMD_INDEX_SIZE (32 - PMD_SHIFT) #define PUD_INDEX_SIZE 0 #define PGD_INDEX_SIZE 0 And the rest will need care too. I hope moving downward is easier (e.g. the walker should always exist for lower levels but not always for higher levels), but I actually have little idea on whether there's any other implications, so please bare with me on stupid mistakes. I just hope pgd leaves don't exist already, then I think it'll be simpler. Thanks, -- Peter Xu
Re: [PATCH v3 00/12] mm/gup: Unify hugetlb, part 2
Le 10/04/2024 à 21:58, Peter Xu a écrit : >> >> e500 has two modes: 32 bits and 64 bits. >> >> For 32 bits: >> >> 8xx is the only one handling it through HW-assisted pagetable walk hence >> requiring a 2-level whatever the pagesize is. > > Hmm I think maybe finally I get it.. > > I think the confusion came from when I saw there's always such level-2 > table described in Figure 8-5 of the manual: > > https://www.nxp.com/docs/en/reference-manual/MPC860UM.pdf Yes indeed that figure is confusing. Table 8-1 gives a pretty good idea of what is required. We only use MD_CTR[TWAM] = 1 > > So I suppose you meant for 8M, the PowerPC 8xx system hardware will be > aware of such 8M pgtable (from level-1's entry, where it has bit 28-29 set > 011b), then it won't ever read anything starting from "Level-2 Descriptor > 1" (but only read the only entry "Level-2 Descriptor 0"), so fundamentally > hugepd format must look like such for 8xx? > > But then perhaps it's still compatible with cont-pte because the rest > entries (pte index 1+) will simply be ignored by the hardware? Yes, still compatible with CONT-PTE allthough things become tricky because you need two page tables to get the full 8M so that's a kind of cont-PMD down to PTE level, as you can see in my RFC series. > >> >> On e500 it is all software so pages 2M and larger should be cont-PGD (by >> the way I'm a bit puzzled that on arches that have only 2 levels, ie PGD >> and PTE, the PGD entries are populated by a function called PMD_populate()). > > Yeah.. I am also wondering whether pgd_populate() could also work there > (perhaps with some trivial changes, or maybe not even needed..), as when > p4d/pud/pmd levels are missing, linux should just do something like an > enforced cast from pgd_t* -> pmd_t* in this case. > > I think currently they're already not pgd, as __find_linux_pte() already > skipped pgd unconditionally: > > pgdp = pgdir + pgd_index(ea); > p4dp = p4d_offset(pgdp, ea); > Yes that's what is confusing, some parts of code considers we have only a PGD and a PT while other parts consider we have only a PMD and a PT >> >> Current situation for 8xx is illustrated here: >> https://github.com/linuxppc/wiki/wiki/Huge-pages#8xx >> >> I also tried to better illustrate e500/32 here: >> https://github.com/linuxppc/wiki/wiki/Huge-pages#e500 >> >> For 64 bits: >> We have PTE/PMD/PUD/PGD, no P4D >> >> See arch/powerpc/include/asm/nohash/64/pgtable-4k.h > > We don't have anything that is above pud in this category, right? That's > what I read from your wiki (and thanks for providing that in the first > place; helps a lot for me to understand how it works on PowerPC). Yes thanks to Michael and Aneesh who initiated that Wiki page. > > I want to make sure if I can move on without caring on p4d/pgd leafs like > what we do right now, even after if we can remove hugepd for good, in this > case since p4d always missing, then it's about whether "pud|pmd|pte_leaf()" > can also cover the pgd ones when that day comes, iiuc. I guess so but I'd like Aneesh and/or Michael to confirm as I'm not an expert on PPC64. Christophe
Re: [RFC PATCH 0/8] Reimplement huge pages without hugepd on powerpc 8xx
Le 11/04/2024 à 18:15, Peter Xu a écrit : > On Mon, Mar 25, 2024 at 01:38:40PM -0300, Jason Gunthorpe wrote: >> On Mon, Mar 25, 2024 at 03:55:53PM +0100, Christophe Leroy wrote: >>> This series reimplements hugepages with hugepd on powerpc 8xx. >>> >>> Unlike most architectures, powerpc 8xx HW requires a two-level >>> pagetable topology for all page sizes. So a leaf PMD-contig approach >>> is not feasible as such. >>> >>> Possible sizes are 4k, 16k, 512k and 8M. >>> >>> First level (PGD/PMD) covers 4M per entry. For 8M pages, two PMD entries >>> must point to a single entry level-2 page table. Until now that was >>> done using hugepd. This series changes it to use standard page tables >>> where the entry is replicated 1024 times on each of the two pagetables >>> refered by the two associated PMD entries for that 8M page. >>> >>> At the moment it has to look into each helper to know if the >>> hugepage ptep is a PTE or a PMD in order to know it is a 8M page or >>> a lower size. I hope this can me handled by core-mm in the future. >>> >>> There are probably several ways to implement stuff, so feedback is >>> very welcome. >> >> I thought it looks pretty good! > > I second it. > > I saw the discussions in patch 1. Christophe, I suppose you're exploring > the big hammer over hugepd, and perhaps went already with the 32bit pmd > solution for nohash/32bit challenge you mentioned? > > I'm trying to position my next step; it seems like at least I should not > adding any more hugepd code, then should I go with ARCH_HAS_HUGEPD checks, > or you're going to have an RFC soon then I can base on top? Depends on what you expect by "soon". I sure won't be able to send any RFC before end of April. Should be possible to have something during May. Christophe
[PATCH v3 2/2] PCI: Create helper to print TLP Header and Prefix Log
Add pcie_print_tlp_log() helper to print TLP Header and Prefix Log. Print End-End Prefixes only if they are non-zero. Consolidate the few places which currently print TLP using custom formatting. The first attempt used pr_cont() instead of building a string first but it turns out pr_cont() is not compatible with pci_err() but prints on a separate line. When I asked about this, Andy Shevchenko suggested pr_cont() should not be used in the first place (to eventually get rid of it) so pr_cont() is now replaced with building the string first. Signed-off-by: Ilpo Järvinen --- drivers/pci/pci.c | 32 drivers/pci/pcie/aer.c | 10 ++ drivers/pci/pcie/dpc.c | 5 + include/linux/aer.h| 2 ++ 4 files changed, 37 insertions(+), 12 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index af230e6e5557..54d4872d14b8 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -9,6 +9,7 @@ */ #include +#include #include #include #include @@ -1116,6 +1117,37 @@ int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2, } EXPORT_SYMBOL_GPL(pcie_read_tlp_log); +/** + * pcie_print_tlp_log - Print TLP Header / Prefix Log contents + * @dev: PCIe device + * @tlp_log: TLP Log structure + * @pfx: Internal string prefix (for indentation) + * + * Prints TLP Header and Prefix Log information held by @tlp_log. + */ +void pcie_print_tlp_log(const struct pci_dev *dev, + const struct pcie_tlp_log *tlp_log, const char *pfx) +{ + char buf[(10 + 1) * (4 + ARRAY_SIZE(tlp_log->prefix)) + 14 + 1]; + unsigned int i; + int len; + + len = scnprintf(buf, sizeof(buf), "%#010x %#010x %#010x %#010x", + tlp_log->dw[0], tlp_log->dw[1], tlp_log->dw[2], + tlp_log->dw[3]); + + if (tlp_log->prefix[0]) + len += scnprintf(buf + len, sizeof(buf) - len, " E-E Prefixes:"); + for (i = 0; i < ARRAY_SIZE(tlp_log->prefix); i++) { + if (!tlp_log->prefix[i]) + break; + len += scnprintf(buf + len, sizeof(buf) - len, +" %#010x", tlp_log->prefix[i]); + } + + pci_err(dev, "%sTLP Header: %s\n", pfx, buf); +} + /** * pci_restore_bars - restore a device's BAR values (e.g. after wake-up) * @dev: PCI device to have its BARs restored diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index ecc1dea5a208..efb9e728fe94 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -664,12 +664,6 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev, } } -static void __print_tlp_header(struct pci_dev *dev, struct pcie_tlp_log *t) -{ - pci_err(dev, " TLP Header: %08x %08x %08x %08x\n", - t->dw[0], t->dw[1], t->dw[2], t->dw[3]); -} - static void __aer_print_error(struct pci_dev *dev, struct aer_err_info *info) { @@ -724,7 +718,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) __aer_print_error(dev, info); if (info->tlp_header_valid) - __print_tlp_header(dev, &info->tlp); + pcie_print_tlp_log(dev, &info->tlp, " "); out: if (info->id && info->error_dev_num > 1 && info->id == id) @@ -796,7 +790,7 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, aer->uncor_severity); if (tlp_header_valid) - __print_tlp_header(dev, &aer->header_log); + pcie_print_tlp_log(dev, &aer->header_log, " "); trace_aer_event(dev_name(&dev->dev), (status & ~mask), aer_severity, tlp_header_valid, &aer->header_log); diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 80b1456f95fe..3f8e3b6c7948 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -229,10 +229,7 @@ static void dpc_process_rp_pio_error(struct pci_dev *pdev) pcie_read_tlp_log(pdev, cap + PCI_EXP_DPC_RP_PIO_HEADER_LOG, cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG, dpc_tlp_log_len(pdev), &tlp_log); - pci_err(pdev, "TLP Header: %#010x %#010x %#010x %#010x\n", - tlp_log.dw[0], tlp_log.dw[1], tlp_log.dw[2], tlp_log.dw[3]); - for (i = 0; i < pdev->dpc_rp_log_size - 5; i++) - pci_err(pdev, "TLP Prefix Header: dw%d, %#010x\n", i, tlp_log.prefix[i]); + pcie_print_tlp_log(pdev, &tlp_log, ""); if (pdev->dpc_rp_log_size < 5) goto clear_status; diff --git a/include/linux/aer.h b/include/linux/aer.h index 2484056feb8d..1e8c61deca65 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -41,6 +41,8 @@ struct aer_capability_regs { int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2, unsigned int tlp_len, struct pcie_tlp_log *log); unsigned int aer_tlp_log_len(struct pci_dev *
[PATCH v3 1/2] PCI: Add TLP Prefix reading into pcie_read_tlp_log()
pcie_read_tlp_log() handles only 4 TLP Header Log DWORDs but TLP Prefix Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present. Generalize pcie_read_tlp_log() and struct pcie_tlp_log to handle also TLP Prefix Log. The layout of relevant registers in AER and DPC Capability is not identical because the offsets of TLP Header Log and TLP Prefix Log vary so the callers must pass the offsets to pcie_read_tlp_log(). Convert eetlp_prefix_path into integer called eetlp_prefix_max and make is available also when CONFIG_PCI_PASID is not configured to be able to determine the number of E-E Prefixes. Signed-off-by: Ilpo Järvinen --- drivers/pci/ats.c | 2 +- drivers/pci/pci.c | 34 -- drivers/pci/pcie/aer.c| 4 +++- drivers/pci/pcie/dpc.c| 22 +++--- drivers/pci/probe.c | 14 +- include/linux/aer.h | 5 - include/linux/pci.h | 2 +- include/uapi/linux/pci_regs.h | 2 ++ 8 files changed, 63 insertions(+), 22 deletions(-) diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c index c570892b2090..e13433dcfc82 100644 --- a/drivers/pci/ats.c +++ b/drivers/pci/ats.c @@ -377,7 +377,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features) if (WARN_ON(pdev->pasid_enabled)) return -EBUSY; - if (!pdev->eetlp_prefix_path && !pdev->pasid_no_tlp) + if (!pdev->eetlp_prefix_max && !pdev->pasid_no_tlp) return -EINVAL; if (!pasid) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e5f243dd4288..af230e6e5557 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1066,26 +1066,48 @@ static void pci_enable_acs(struct pci_dev *dev) pci_disable_acs_redir(dev); } +/** + * aer_tlp_log_len - Calculates TLP Header/Prefix Log length + * @dev: PCIe device + * + * Return: TLP Header/Prefix Log length + */ +unsigned int aer_tlp_log_len(struct pci_dev *dev) +{ + return 4 + dev->eetlp_prefix_max; +} + /** * pcie_read_tlp_log - read TLP Header Log * @dev: PCIe device * @where: PCI Config offset of TLP Header Log + * @where2: PCI Config offset of TLP Prefix Log + * @tlp_len: TLP Log length (in DWORDs) * @tlp_log: TLP Log structure to fill * * Fill @tlp_log from TLP Header Log registers, e.g., AER or DPC. * * Return: 0 on success and filled TLP Log structure, <0 on error. */ -int pcie_read_tlp_log(struct pci_dev *dev, int where, - struct pcie_tlp_log *tlp_log) +int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2, + unsigned int tlp_len, struct pcie_tlp_log *tlp_log) { - int i, ret; + unsigned int i; + int off, ret; + u32 *to; memset(tlp_log, 0, sizeof(*tlp_log)); - for (i = 0; i < 4; i++) { - ret = pci_read_config_dword(dev, where + i * 4, - &tlp_log->dw[i]); + for (i = 0; i < tlp_len; i++) { + if (i < 4) { + to = &tlp_log->dw[i]; + off = where + i * 4; + } else { + to = &tlp_log->prefix[i - 4]; + off = where2 + (i - 4) * 4; + } + + ret = pci_read_config_dword(dev, off, to); if (ret) return pcibios_err_to_errno(ret); } diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index ac6293c24976..ecc1dea5a208 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1245,7 +1245,9 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) if (info->status & AER_LOG_TLP_MASKS) { info->tlp_header_valid = 1; - pcie_read_tlp_log(dev, aer + PCI_ERR_HEADER_LOG, &info->tlp); + pcie_read_tlp_log(dev, aer + PCI_ERR_HEADER_LOG, + aer + PCI_ERR_PREFIX_LOG, + aer_tlp_log_len(dev), &info->tlp); } } diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index a668820696dc..80b1456f95fe 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -187,10 +187,19 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev) return ret; } +static unsigned int dpc_tlp_log_len(struct pci_dev *pdev) +{ + /* Remove ImpSpec Log register from the count */ + if (pdev->dpc_rp_log_size >= 5) + return pdev->dpc_rp_log_size - 1; + + return pdev->dpc_rp_log_size; +} + static void dpc_process_rp_pio_error(struct pci_dev *pdev) { u16 cap = pdev->dpc_cap, dpc_status, first_error; - u32 status, mask, sev, syserr, exc, log, prefix; + u32 status, mask, sev, syserr, exc, log; struct pcie_tlp_log tlp_log; int i; @@ -217,20 +226,19 @@ static void dpc_process_rp_pio_error
[PATCH v3 0/2] PCI: Consolidate TLP Log reading and printing
This series has the remaining patches of the AER & DPC TLP Log handling consolidation. v3: - Small rewording in a commit message v2: - Don't add EXPORT()s - Don't include igxbe changes - Don't use pr_cont() as it's incompatible with pci_err() and according to Andy Shevchenko should not be used in the first place Ilpo Järvinen (2): PCI: Add TLP Prefix reading into pcie_read_tlp_log() PCI: Create helper to print TLP Header and Prefix Log drivers/pci/ats.c | 2 +- drivers/pci/pci.c | 66 +++ drivers/pci/pcie/aer.c| 14 +++- drivers/pci/pcie/dpc.c| 23 +++- drivers/pci/probe.c | 14 +--- include/linux/aer.h | 7 +++- include/linux/pci.h | 2 +- include/uapi/linux/pci_regs.h | 2 ++ 8 files changed, 98 insertions(+), 32 deletions(-) -- 2.39.2
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon wrote: > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote: > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index dc04bc767865..ff17849be9f4 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1768,40 +1768,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct > > kvm_gfn_range *range) > > return false; > > } > > > > -bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > -{ > > - kvm_pfn_t pfn = pte_pfn(range->arg.pte); > > - > > - if (!kvm->arch.mmu.pgt) > > - return false; > > - > > - WARN_ON(range->end - range->start != 1); > > - > > - /* > > -* If the page isn't tagged, defer to user_mem_abort() for sanitising > > -* the MTE tags. The S2 pte should have been unmapped by > > -* mmu_notifier_invalidate_range_end(). > > -*/ > > - if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn))) > > - return false; > > - > > - /* > > -* We've moved a page around, probably through CoW, so let's treat > > -* it just like a translation fault and the map handler will clean > > -* the cache to the PoC. > > -* > > -* The MMU notifiers will have unmapped a huge PMD before calling > > -* ->change_pte() (which in turn calls kvm_set_spte_gfn()) and > > -* therefore we never need to clear out a huge PMD through this > > -* calling path and a memcache is not required. > > -*/ > > - kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT, > > - PAGE_SIZE, __pfn_to_phys(pfn), > > - KVM_PGTABLE_PROT_R, NULL, 0); > > - > > - return false; > > -} > > - > > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > { > > u64 size = (range->end - range->start) << PAGE_SHIFT; > > Thanks. It's nice to see this code retire: > > Acked-by: Will Deacon > > Also, if you're in the business of hacking the MMU notifier code, it > would be really great to change the .clear_flush_young() callback so > that the architecture could handle the TLB invalidation. At the moment, > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret' > being set by kvm_handle_hva_range(), whereas we could do a much > lighter-weight and targetted TLBI in the architecture page-table code > when we actually update the ptes for small ranges. Indeed, and I was looking at this earlier this week as it has a pretty devastating effect with NV (it blows the shadow S2 for that VMID, with costly consequences). In general, it feels like the TLB invalidation should stay with the code that deals with the page tables, as it has a pretty good idea of what needs to be invalidated and how -- specially on architectures that have a HW-broadcast facility like arm64. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify()
On Fri, 05 Apr 2024 12:58:11 +0100, Paolo Bonzini wrote: > > The .change_pte() MMU notifier callback was intended as an optimization > and for this reason it was initially called without a surrounding > mmu_notifier_invalidate_range_{start,end}() pair. It was only ever > implemented by KVM (which was also the original user of MMU notifiers) > and the rules on when to call set_pte_at_notify() rather than set_pte_at() > have always been pretty obscure. > > It may seem a miracle that it has never caused any hard to trigger > bugs, but there's a good reason for that: KVM's implementation has > been nonfunctional for a good part of its existence. Already in > 2012, commit 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with > invalidate_range_start and invalidate_range_end", 2012-10-09) changed the > .change_pte() callback to occur within an invalidate_range_start/end() > pair; and because KVM unmaps the sPTEs during .invalidate_range_start(), > .change_pte() has no hope of finding a sPTE to change. > > Therefore, all the code for .change_pte() can be removed from both KVM > and mm/, and set_pte_at_notify() can be replaced with just set_pte_at(). > > Please review! Also feel free to take the KVM patches through the mm > tree, as I don't expect any conflicts. > > Thanks, > > Paolo > > Paolo Bonzini (4): > KVM: delete .change_pte MMU notifier callback > KVM: remove unused argument of kvm_handle_hva_range() > mmu_notifier: remove the .change_pte() callback > mm: replace set_pte_at_notify() with just set_pte_at() > > arch/arm64/kvm/mmu.c | 34 - > arch/loongarch/include/asm/kvm_host.h | 1 - > arch/loongarch/kvm/mmu.c | 32 > arch/mips/kvm/mmu.c | 30 --- > arch/powerpc/include/asm/kvm_ppc.h| 1 - > arch/powerpc/kvm/book3s.c | 5 --- > arch/powerpc/kvm/book3s.h | 1 - > arch/powerpc/kvm/book3s_64_mmu_hv.c | 12 -- > arch/powerpc/kvm/book3s_hv.c | 1 - > arch/powerpc/kvm/book3s_pr.c | 7 > arch/powerpc/kvm/e500_mmu_host.c | 6 --- > arch/riscv/kvm/mmu.c | 20 -- > arch/x86/kvm/mmu/mmu.c| 54 +-- > arch/x86/kvm/mmu/spte.c | 16 > arch/x86/kvm/mmu/spte.h | 2 - > arch/x86/kvm/mmu/tdp_mmu.c| 46 --- > arch/x86/kvm/mmu/tdp_mmu.h| 1 - > include/linux/kvm_host.h | 2 - > include/linux/mmu_notifier.h | 44 -- > include/trace/events/kvm.h| 15 > kernel/events/uprobes.c | 5 +-- > mm/ksm.c | 4 +- > mm/memory.c | 7 +--- > mm/migrate_device.c | 8 +--- > mm/mmu_notifier.c | 17 - > virt/kvm/kvm_main.c | 50 + > 26 files changed, 10 insertions(+), 411 deletions(-) > Reviewed-by: Marc Zyngier M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 1/1] Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig
Vignesh Balasubramanian writes: > "ARCH_HAVE_EXTRA_ELF_NOTES" enables an extra note section in the > core dump. Kconfig variable is preferred over ARCH_HAVE_* macro. > > Co-developed-by: Jini Susan George > Signed-off-by: Jini Susan George > Signed-off-by: Vignesh Balasubramanian > --- > arch/Kconfig | 9 + > arch/powerpc/Kconfig | 1 + > arch/powerpc/include/asm/elf.h | 2 -- > include/linux/elf.h| 2 +- > 4 files changed, 11 insertions(+), 3 deletions(-) Acked-by: Michael Ellerman (powerpc) cheers > diff --git a/arch/Kconfig b/arch/Kconfig > index 9f066785bb71..143f021c8a76 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -502,6 +502,15 @@ config MMU_LAZY_TLB_SHOOTDOWN > config ARCH_HAVE_NMI_SAFE_CMPXCHG > bool > > +config ARCH_HAVE_EXTRA_ELF_NOTES > + bool > + help > + An architecture should select this in order to enable adding an > + arch-specific ELF note section to core files. It must provide two > + functions: elf_coredump_extra_notes_size() and > + elf_coredump_extra_notes_write() which are invoked by the ELF core > + dumper. > + > config ARCH_HAS_NMI_SAFE_THIS_CPU_OPS > bool > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 1c4be3373686..c45fa9d7fb76 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -156,6 +156,7 @@ config PPC > select ARCH_HAS_UACCESS_FLUSHCACHE > select ARCH_HAS_UBSAN > select ARCH_HAVE_NMI_SAFE_CMPXCHG > + select ARCH_HAVE_EXTRA_ELF_NOTESif SPU_BASE > select ARCH_KEEP_MEMBLOCK > select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE if PPC_RADIX_MMU > select ARCH_MIGHT_HAVE_PC_PARPORT > diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h > index 79f1c480b5eb..bb4b9d3e 100644 > --- a/arch/powerpc/include/asm/elf.h > +++ b/arch/powerpc/include/asm/elf.h > @@ -127,8 +127,6 @@ extern int arch_setup_additional_pages(struct > linux_binprm *bprm, > /* Notes used in ET_CORE. Note name is "SPU//". */ > #define NT_SPU 1 > > -#define ARCH_HAVE_EXTRA_ELF_NOTES > - > #endif /* CONFIG_SPU_BASE */ > > #ifdef CONFIG_PPC64 > diff --git a/include/linux/elf.h b/include/linux/elf.h > index c9a46c4e183b..5c402788da19 100644 > --- a/include/linux/elf.h > +++ b/include/linux/elf.h > @@ -65,7 +65,7 @@ extern Elf64_Dyn _DYNAMIC []; > struct file; > struct coredump_params; > > -#ifndef ARCH_HAVE_EXTRA_ELF_NOTES > +#ifndef CONFIG_ARCH_HAVE_EXTRA_ELF_NOTES > static inline int elf_coredump_extra_notes_size(void) { return 0; } > static inline int elf_coredump_extra_notes_write(struct coredump_params > *cprm) { return 0; } > #else > -- > 2.34.1
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback
On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote: > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index dc04bc767865..ff17849be9f4 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1768,40 +1768,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct > kvm_gfn_range *range) > return false; > } > > -bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > -{ > - kvm_pfn_t pfn = pte_pfn(range->arg.pte); > - > - if (!kvm->arch.mmu.pgt) > - return false; > - > - WARN_ON(range->end - range->start != 1); > - > - /* > - * If the page isn't tagged, defer to user_mem_abort() for sanitising > - * the MTE tags. The S2 pte should have been unmapped by > - * mmu_notifier_invalidate_range_end(). > - */ > - if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn))) > - return false; > - > - /* > - * We've moved a page around, probably through CoW, so let's treat > - * it just like a translation fault and the map handler will clean > - * the cache to the PoC. > - * > - * The MMU notifiers will have unmapped a huge PMD before calling > - * ->change_pte() (which in turn calls kvm_set_spte_gfn()) and > - * therefore we never need to clear out a huge PMD through this > - * calling path and a memcache is not required. > - */ > - kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT, > -PAGE_SIZE, __pfn_to_phys(pfn), > -KVM_PGTABLE_PROT_R, NULL, 0); > - > - return false; > -} > - > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > { > u64 size = (range->end - range->start) << PAGE_SHIFT; Thanks. It's nice to see this code retire: Acked-by: Will Deacon Also, if you're in the business of hacking the MMU notifier code, it would be really great to change the .clear_flush_young() callback so that the architecture could handle the TLB invalidation. At the moment, the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret' being set by kvm_handle_hva_range(), whereas we could do a much lighter-weight and targetted TLBI in the architecture page-table code when we actually update the ptes for small ranges. Will
[PATCH v2 1/2] powerpc/pseries: Add pool idle time at LPAR boot
When there are no options specified for lparstat, it is expected to give reports since LPAR(Logical Partition) boot. APP(Available Processor Pool) is an indicator of how many cores in the shared pool are free to use in Shared Processor LPAR(SPLPAR). APP is derived using pool_idle_time which is obtained using H_PIC call. The interval based reports show correct APP value while since boot report shows very high APP values. This happens because in that case APP is obtained by dividing pool idle time by LPAR uptime. Since pool idle time is reported by the PowerVM hypervisor since its boot, it need not align with LPAR boot. To fix that export boot pool idle time in lparcfg and powerpc-utils will use this info to derive APP as below for since boot reports. APP = (pool idle time - boot pool idle time) / (uptime * timebase) Results:: Observe APP values. == Shared LPAR lparstat System Configuration type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00 reboot stress-ng --cpu=$(nproc) -t 600 sleep 600 So in this case app is expected to close to 37-6=31. == 6.9-rc1 and lparstat 1.3.10 = %user %sys %wait%idlephysc %entc lbusy app vcsw phint - - --- - - - - - 47.48 0.01 0.0052.51 0.00 0.00 47.49 69099.72 54154721 === With this patch and powerpc-utils patch to do the above equation === %user %sys %wait%idlephysc %entc lbusy app vcsw phint - - --- - - - - - 47.48 0.01 0.0052.51 5.73 47.75 47.49 31.21 54175321 = Note: physc, purr/idle purr being inaccurate is being handled in a separate patch in powerpc-utils tree. Signed-off-by: Shrikanth Hegde --- arch/powerpc/platforms/pseries/lparcfg.c | 39 ++-- 1 file changed, 30 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c index f73c4d1c26af..5c2a3e802a02 100644 --- a/arch/powerpc/platforms/pseries/lparcfg.c +++ b/arch/powerpc/platforms/pseries/lparcfg.c @@ -170,20 +170,24 @@ static void show_gpci_data(struct seq_file *m) kfree(buf); } -static unsigned h_pic(unsigned long *pool_idle_time, - unsigned long *num_procs) +static long h_pic(unsigned long *pool_idle_time, + unsigned long *num_procs) { - unsigned long rc; - unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; + long rc; + unsigned long retbuf[PLPAR_HCALL_BUFSIZE] = {0}; rc = plpar_hcall(H_PIC, retbuf); - *pool_idle_time = retbuf[0]; - *num_procs = retbuf[1]; + if (pool_idle_time) + *pool_idle_time = retbuf[0]; + if (num_procs) + *num_procs = retbuf[1]; return rc; } +unsigned long boot_pool_idle_time; + /* * parse_ppp_data * Parse out the data returned from h_get_ppp and h_pic @@ -215,9 +219,15 @@ static void parse_ppp_data(struct seq_file *m) seq_printf(m, "pool_capacity=%d\n", ppp_data.active_procs_in_pool * 100); - h_pic(&pool_idle_time, &pool_procs); - seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time); - seq_printf(m, "pool_num_procs=%ld\n", pool_procs); + /* In case h_pic call is not successful, this would result in +* APP values being wrong in tools like lparstat. +*/ + + if (h_pic(&pool_idle_time, &pool_procs) == H_SUCCESS) { + seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time); + seq_printf(m, "pool_num_procs=%ld\n", pool_procs); + seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time); + } } seq_printf(m, "unallocated_capacity_weight=%d\n", @@ -792,6 +802,7 @@ static const struct proc_ops lparcfg_proc_ops = { static int __init lparcfg_init(void) { umode_t mode = 0444; + long retval; /* Allow writing if we have FW_FEATURE_SPLPAR */ if (firmware_has_feature(FW_FEATURE_SPLPAR)) @@ -801,6 +812,16 @@ static int __init lparcfg_init(void) printk(KERN_ERR "Failed to create powerpc/lparcfg\n"); return -EIO; } + + /* If this call fails, it would result in APP values +* being wrong for since boot reports of lparstat +*/ + retval = h_pic(&boot_pool_idle_time, NULL); + + if (retval != H_SUCCESS) + pr_debug("H_PIC failed during lparcfg init retval: %ld\n", +retval); + return 0; } machine_device_initcall(pseries, lparcfg_init); -- 2.39.3
[PATCH v2 2/2] powerpc/pseries: Add failure related checks for h_get_mpp and h_get_ppp
Couple of Minor fixes: - hcall return values are long. Fix that for h_get_mpp, h_get_ppp and parse_ppp_data - If hcall fails, values set should be at-least zero. It shouldn't be uninitialized values. Fix that for h_get_mpp and h_get_ppp Signed-off-by: Shrikanth Hegde --- arch/powerpc/include/asm/hvcall.h| 2 +- arch/powerpc/platforms/pseries/lpar.c| 6 +++--- arch/powerpc/platforms/pseries/lparcfg.c | 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index a41e542ba94d..3d642139b900 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -570,7 +570,7 @@ struct hvcall_mpp_data { unsigned long backing_mem; }; -int h_get_mpp(struct hvcall_mpp_data *); +long h_get_mpp(struct hvcall_mpp_data *mpp_data); struct hvcall_mpp_x_data { unsigned long coalesced_bytes; diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index 4e9916bb03d7..c1d8bee8f701 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -1886,10 +1886,10 @@ notrace void __trace_hcall_exit(long opcode, long retval, unsigned long *retbuf) * h_get_mpp * H_GET_MPP hcall returns info in 7 parms */ -int h_get_mpp(struct hvcall_mpp_data *mpp_data) +long h_get_mpp(struct hvcall_mpp_data *mpp_data) { - int rc; - unsigned long retbuf[PLPAR_HCALL9_BUFSIZE]; + unsigned long retbuf[PLPAR_HCALL9_BUFSIZE] = {0}; + long rc; rc = plpar_hcall9(H_GET_MPP, retbuf); diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c index 5c2a3e802a02..ed2176d8a866 100644 --- a/arch/powerpc/platforms/pseries/lparcfg.c +++ b/arch/powerpc/platforms/pseries/lparcfg.c @@ -113,8 +113,8 @@ struct hvcall_ppp_data { */ static unsigned int h_get_ppp(struct hvcall_ppp_data *ppp_data) { - unsigned long rc; - unsigned long retbuf[PLPAR_HCALL9_BUFSIZE]; + unsigned long retbuf[PLPAR_HCALL9_BUFSIZE] = {0}; + long rc; rc = plpar_hcall9(H_GET_PPP, retbuf); @@ -197,7 +197,7 @@ static void parse_ppp_data(struct seq_file *m) struct hvcall_ppp_data ppp_data; struct device_node *root; const __be32 *perf_level; - int rc; + long rc; rc = h_get_ppp(&ppp_data); if (rc) -- 2.39.3
[PATCH v2 0/2] powerpc/pseries: Fixes for lparstat boot reports
Currently lparstat reports which shows since LPAR boot are wrong for some fields. There is a need for storing the PIC(Pool Idle Count) at boot for accurate reporting. PATCH 1 Does that. While there, it was noticed that hcall return value is long and both h_get_ppp and h_get_mpp could set the uninitialized values if the hcall fails. PATCH 2 does that. v1 -> v2: - Nathan pointed out the issues surrounding the h_pic call. Addressed those. - Added a pr_debug if h_pic fails during lparcfg_init - If h_pic fails while reading lparcfg, related files are not exported. - Added failure checks for h_get_mpp, h_get_ppp calls as well. v1: https://lore.kernel.org/all/20240405101340.149171-1-sshe...@linux.ibm.com/ Shrikanth Hegde (2): powerpc/pseries: Add pool idle time at LPAR boot powerpc/pseries: Add fail related checks for h_get_mpp and h_get_ppp arch/powerpc/include/asm/hvcall.h| 2 +- arch/powerpc/platforms/pseries/lpar.c| 6 ++-- arch/powerpc/platforms/pseries/lparcfg.c | 45 +--- 3 files changed, 37 insertions(+), 16 deletions(-) -- 2.39.3
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
* Mike Rapoport wrote: > +/** > + * enum execmem_type - types of executable memory ranges > + * > + * There are several subsystems that allocate executable memory. > + * Architectures define different restrictions on placement, > + * permissions, alignment and other parameters for memory that can be used > + * by these subsystems. > + * Types in this enum identify subsystems that allocate executable memory > + * and let architectures define parameters for ranges suitable for > + * allocations by each subsystem. > + * > + * @EXECMEM_DEFAULT: default parameters that would be used for types that > + * are not explcitly defined. > + * @EXECMEM_MODULE_TEXT: parameters for module text sections > + * @EXECMEM_KPROBES: parameters for kprobes > + * @EXECMEM_FTRACE: parameters for ftrace > + * @EXECMEM_BPF: parameters for BPF > + * @EXECMEM_TYPE_MAX: > + */ > +enum execmem_type { > + EXECMEM_DEFAULT, > + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT, > + EXECMEM_KPROBES, > + EXECMEM_FTRACE, > + EXECMEM_BPF, > + EXECMEM_TYPE_MAX, > +}; s/explcitly /explicitly Thanks, Ingo
Re: [RFC PATCH 5/7] x86/module: perpare module loading for ROX allocations of text
* Mike Rapoport wrote: > for (s = start; s < end; s++) { > void *addr = (void *)s + *s; > + void *wr_addr = addr + module_writable_offset(mod, addr); So instead of repeating this pattern in a dozen of places, why not use a simpler method: void *wr_addr = module_writable_address(mod, addr); or so, since we have to pass 'addr' to the module code anyway. The text patching code is pretty complex already. Thanks, Ingo
Re: Re: [PATCH] tty: hvc: wakeup hvc console immediately when needed
> On 12. 04. 24, 5:38, li.ha...@zte.com.cn wrote: > > From: Li Hao > > > > Cancel the do_wakeup flag in hvc_struct, and change it to immediately > > wake up tty when hp->n_outbuf is 0 in hvc_push(). > > > > When we receive a key input character, the interrupt handling function > > hvc_handle_interrupt() will be executed, and the echo thread > > flush_to_ldisc() will be added to the queue. > > > > If the user is currently using tcsetattr(), a hang may occur. tcsetattr() > > enters kernel and waits for hp->n_outbuf to become 0 via > > tty_wait_until_sent(). If the echo thread finishes executing before > > reaching tty_wait_until_sent (for example, put_chars() takes too long), > > it will cause while meeting the wakeup condition (hp->do_wakeup = 1), > > tty_wait_until_sent() cannot be woken up (missed the tty_wakeup() of > > this round's tty_poll). Unless the next key input character comes, > > hvc_poll will be executed, and tty_wakeup() will be performed through > > the do_wakeup flag. > > > > Signed-off-by: Li Hao > > --- > > drivers/tty/hvc/hvc_console.c | 12 +--- > > drivers/tty/hvc/hvc_console.h | 1 - > > 2 files changed, 5 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c > > index cd1f657f7..2fa90d938 100644 > > --- a/drivers/tty/hvc/hvc_console.c > > +++ b/drivers/tty/hvc/hvc_console.c > > @@ -476,11 +476,13 @@ static void hvc_hangup(struct tty_struct *tty) > > static int hvc_push(struct hvc_struct *hp) > > { > > int n; > > +struct tty_struct *tty; > > > > n = hp->ops->put_chars(hp->vtermno, hp->outbuf, hp->n_outbuf); > > +tty = tty_port_tty_get(&hp->port); > > if (n <= 0) { > > if (n == 0 || n == -EAGAIN) { > > -hp->do_wakeup = 1; > > +tty_wakeup(tty); > > What if tty is NULL? Did you intent to use tty_port_tty_wakeup() instead? > > thanks, > -- > js > suse labs Thank you for your prompt reply. tty_port_tty_wakeup() is better, it no longer check if tty is NULL in hvc_push() Li Hao