[PATCH v3] kernel/module_64.c: Add REL24 relocation support of livepatch symbols
Livepatch re-uses module loader function apply_relocate_add() to write relocations, instead of managing them by arch-dependent klp_write_module_reloc() function. apply_relocate_add() doesn't understand livepatch symbols (marked with SHN_LIVEPATCH symbol section index) and assumes them to be local symbols by default for R_PPC64_REL24 relocation type. It fails with an error, when trying to calculate offset with local_entry_offset(): module_64: kpatch_meminfo: REL24 -1152921504897399800 out of range! Whereas livepatch symbols are essentially SHN_UNDEF, should be called via stub used for global calls. This issue can be fixed by teaching apply_relocate_add() to handle both SHN_UNDEF/SHN_LIVEPATCH symbols via the same stub. It isn't a complete fix, as it will fail for local calls becoming global. Consider a livepatch sequence[1] below, where function A calls B, B is livepatched function and any calls to function B is redirected to patched version P. Now, livepatched function P calls function C in M2, whereas C was local to function B and global to function P. ++++++ ++ || +++--->|| +-->|| | A | || B || F | | | P | || ||||+--+ || |+---+||||<-+ || ||<--+ ++ C||| | || || | | +->|||| | ||<---+ | K / M1 | | | | | K / M2 | +-+ Kernel | +---+ Mod3 +--+ | ++ | | | ++ | ++ ++ | | | | | | | | +---+-+--+ | | | || | | ++ | ++ Handling such call with existing stub, triggers another error: module_64: kpatch_meminfo: Expect noop after relocate, got 3d22 Reason being, ppc64le ABI v2 expects a nop instruction after every branch to a global call. That gets overwritten by an instruction to restore TOC with r2 value of callee. Given function C was local to function B, it does not store/restore TOC as they are not expected to be clobbered for functions called via local entry point. The current stub can be extended to re-store TOC and have a single stub for both SHN_UNDEF/SHN_LIVEPATCH symbols. Constructing a single stub proves to be an overhead for non-livepatch calls, with additional instructions to restore TOC. A new stub to call livepatch symbols with an intermediate stack to store/restore, TOC/LR between livepatched function and global function will work for most of the cases but will fail when arguments are passed via stack between functions. This issue has been already solved by introduction of livepatch_handler, which runs in _mcount context by introducing livepatch stack growing upwards from the base of the regular stack, eliminating the need for an intermediate stack. Current approach is to setup klp_stub mimicking the functionality of entry_64.S::livepatch_handler to store/restore TOC/LR values, other than the stub setup and branch. This patch also introduces new ppc64le_klp_stub_entry[], along with the helpers to find/allocate livepatch stub. [1] ASCII diagram adopted from: http://mpe.github.io/posts/2016/05/23/kernel-live-patching-for-ppc64le/ Signed-off-by: Kamalesh Babulal Reviewed-by: Balbir Singh Cc: Naveen N. Rao Cc: Josh Poimboeuf Cc: Jessica Yu Cc: Ananth N Mavinakayanahalli Cc: Aravinda Prasad Cc: Torsten Duwe Cc: linuxppc-dev@lists.ozlabs.org Cc: live-patch...@vger.kernel.org --- v3: - Defined FUNC_DESC_OFFSET to calculate func_desc offset from struct ppc64le_klp_stub_entry. - Replaced BUG_ON() with WARN_ON() in klp_stub_for_addr(). - Major commit message re-write. v2: - Changed klp_stub construction to re-use livepatch_handler and additional patch code required for klp_stub, instead of duplicating it. - Minor comments and commit body edit. arch/powerpc/include/asm/module.h | 4 + arch/powerpc/kernel/module_64.c| 139 - arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 31 ++ 3 files changed, 171 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h index 6c0132c..de22c4c 100644 --- a/arch/powerpc/include/asm/module.h +++ b/arch/powerpc/include/asm/module.h @@ -44,6 +44,10 @@ struct mod_arch_specific { unsigned long toc; unsigned long tramp; #endif +#ifdef CONFIG_LIVEPATCH + /* Count of kernel livepatch relocations */ + unsigned long klp_relocs; +#endif #else /* powerpc64 */ /* Indices of PLT sections within module. */ diff --git a/arch/po
Re: [PATCH] powerpc/watchdog: Convert timers to use timer_setup()
Kees Cook writes: > In preparation for unconditionally passing the struct timer_list pointer to > all timer callbacks, switch to using the new timer_setup() and from_timer() > to pass the timer pointer explicitly. > > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > Cc: Nicholas Piggin > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Kees Cook > --- > arch/powerpc/kernel/watchdog.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c > index 15e209a37c2d..50797528b5e1 100644 > --- a/arch/powerpc/kernel/watchdog.c > +++ b/arch/powerpc/kernel/watchdog.c > @@ -263,9 +263,8 @@ static void wd_timer_reset(unsigned int cpu, struct > timer_list *t) > add_timer_on(t, cpu); > } > > -static void wd_timer_fn(unsigned long data) > +static void wd_timer_fn(struct timer_list *t) At first glance this looks like it will break, changing the signature, but the new timer_setup() force-casts to the old signature (for now), and we can fit a pointer in an unsigned long so it's OK. > { > - struct timer_list *t = this_cpu_ptr(&wd_timer); > int cpu = smp_processor_id(); > > watchdog_timer_interrupt(cpu); > @@ -292,7 +291,7 @@ static void start_watchdog_timer_on(unsigned int cpu) > > per_cpu(wd_timer_tb, cpu) = get_tb(); > > - setup_pinned_timer(t, wd_timer_fn, 0); > + timer_setup(t, wd_timer_fn, TIMER_PINNED); > wd_timer_reset(cpu, t); > } > Acked-by: Michael Ellerman cheers
Re: [PATCH v3 2/2] pseries/eeh: Add Pseries pcibios_bus_add_device
"Bryant G. Ly" writes: > On 10/12/17 1:29 PM, Bjorn Helgaas wrote: ... >> >> If that's the case, how to you ever bind a driver to these VFs? The >> changelog says you don't want VF drivers to load *immediately*, so I >> assume you do want them to load eventually. >> > The VF's that get dynamically created within the configure SR-IOV call, > on the Pseries Platform, wont be matched with a driver. - We do not > want it to match. > > The Power Hypervisor will load the VFs. The VF's will get assigned(by > the user) > via the HMC or Novalink in this environment which will > then trigger PHYP to load the VF device node to the device tree. What about the other "Power Hypervisor"? ie. KVM running on Power. We also use the pseries platform when running under KVM. cheers
[PATCH 2/2] mm/mmu_notifier: avoid call to invalidate_range() in range_end()
From: Jérôme Glisse This is an optimization patch that only affect mmu_notifier users which rely on the invalidate_range() callback. This patch avoids calling that callback twice in a row from inside __mmu_notifier_invalidate_range_end Existing pattern (before this patch): mmu_notifier_invalidate_range_start() pte/pmd/pud_clear_flush_notify() mmu_notifier_invalidate_range() mmu_notifier_invalidate_range_end() mmu_notifier_invalidate_range() New pattern (after this patch): mmu_notifier_invalidate_range_start() pte/pmd/pud_clear_flush_notify() mmu_notifier_invalidate_range() mmu_notifier_invalidate_range_only_end() We call the invalidate_range callback after clearing the page table under the page table lock and we skip the call to invalidate_range inside the __mmu_notifier_invalidate_range_end() function. Idea from Andrea Arcangeli Signed-off-by: Jérôme Glisse Cc: Andrea Arcangeli Cc: Andrew Morton Cc: Joerg Roedel Cc: Suravee Suthikulpanit Cc: David Woodhouse Cc: Alistair Popple Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Stephen Rothwell Cc: Andrew Donnellan Cc: io...@lists.linux-foundation.org Cc: linuxppc-dev@lists.ozlabs.org --- include/linux/mmu_notifier.h | 17 ++-- mm/huge_memory.c | 46 mm/memory.c | 6 +- mm/migrate.c | 15 --- mm/mmu_notifier.c| 11 +-- 5 files changed, 83 insertions(+), 12 deletions(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 49c925c96b8a..6665c4624287 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -213,7 +213,8 @@ extern void __mmu_notifier_change_pte(struct mm_struct *mm, extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm, unsigned long start, unsigned long end); extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, - unsigned long start, unsigned long end); + unsigned long start, unsigned long end, + bool only_end); extern void __mmu_notifier_invalidate_range(struct mm_struct *mm, unsigned long start, unsigned long end); @@ -267,7 +268,14 @@ static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, unsigned long start, unsigned long end) { if (mm_has_notifiers(mm)) - __mmu_notifier_invalidate_range_end(mm, start, end); + __mmu_notifier_invalidate_range_end(mm, start, end, false); +} + +static inline void mmu_notifier_invalidate_range_only_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_invalidate_range_end(mm, start, end, true); } static inline void mmu_notifier_invalidate_range(struct mm_struct *mm, @@ -438,6 +446,11 @@ static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm, { } +static inline void mmu_notifier_invalidate_range_only_end(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + static inline void mmu_notifier_invalidate_range(struct mm_struct *mm, unsigned long start, unsigned long end) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ff5bc647b51d..b2912305994f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1220,7 +1220,12 @@ static int do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, pmd_t orig_pmd, page_remove_rmap(page, true); spin_unlock(vmf->ptl); - mmu_notifier_invalidate_range_end(vma->vm_mm, mmun_start, mmun_end); + /* +* No need to double call mmu_notifier->invalidate_range() callback as +* the above pmdp_huge_clear_flush_notify() did already call it. +*/ + mmu_notifier_invalidate_range_only_end(vma->vm_mm, mmun_start, + mmun_end); ret |= VM_FAULT_WRITE; put_page(page); @@ -1369,7 +1374,12 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) } spin_unlock(vmf->ptl); out_mn: - mmu_notifier_invalidate_range_end(vma->vm_mm, mmun_start, mmun_end); + /* +* No need to double call mmu_notifier->invalidate_range() callback as +* the above pmdp_huge_clear_flush_notify() did already call it. +*/ + mmu_notifier_invalidate_range_only_end(vma->vm_mm, mmun_start, + mmun_end); out: return ret; out_unlock: @@ -2021,7 +2031,12 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, out: spin_unlock(ptl); - mmu_notifier_invalidate_range_end(mm, haddr, haddr + HPAGE
[PATCH 1/2] mm/mmu_notifier: avoid double notification when it is useless v2
From: Jérôme Glisse This patch only affects users of mmu_notifier->invalidate_range callback which are device drivers related to ATS/PASID, CAPI, IOMMUv2, SVM ... and it is an optimization for those users. Everyone else is unaffected by it. When clearing a pte/pmd we are given a choice to notify the event under the page table lock (notify version of *_clear_flush helpers do call the mmu_notifier_invalidate_range). But that notification is not necessary in all cases. This patches remove almost all cases where it is useless to have a call to mmu_notifier_invalidate_range before mmu_notifier_invalidate_range_end. It also adds documentation in all those cases explaining why. Below is a more in depth analysis of why this is fine to do this: For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when device use thing like ATS/PASID to get the IOMMU to walk the CPU page table to access a process virtual address space). There is only 2 cases when you need to notify those secondary TLB while holding page table lock when clearing a pte/pmd: A) page backing address is free before mmu_notifier_invalidate_range_end B) a page table entry is updated to point to a new page (COW, write fault on zero page, __replace_page(), ...) Case A is obvious you do not want to take the risk for the device to write to a page that might now be used by something completely different. Case B is more subtle. For correctness it requires the following sequence to happen: - take page table lock - clear page table entry and notify (pmd/pte_huge_clear_flush_notify()) - set page table entry to point to new page If clearing the page table entry is not followed by a notify before setting the new pte/pmd value then you can break memory model like C11 or C++11 for the device. Consider the following scenario (device use a feature similar to ATS/ PASID): Two address addrA and addrB such that |addrA - addrB| >= PAGE_SIZE we assume they are write protected for COW (other case of B apply too). [Time N] - CPU-thread-0 {try to write to addrA} CPU-thread-1 {try to write to addrB} CPU-thread-2 {} CPU-thread-3 {} DEV-thread-0 {read addrA and populate device TLB} DEV-thread-2 {read addrB and populate device TLB} [Time N+1] --- CPU-thread-0 {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}} CPU-thread-1 {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}} CPU-thread-2 {} CPU-thread-3 {} DEV-thread-0 {} DEV-thread-2 {} [Time N+2] --- CPU-thread-0 {COW_step1: {update page table point to new page for addrA}} CPU-thread-1 {COW_step1: {update page table point to new page for addrB}} CPU-thread-2 {} CPU-thread-3 {} DEV-thread-0 {} DEV-thread-2 {} [Time N+3] --- CPU-thread-0 {preempted} CPU-thread-1 {preempted} CPU-thread-2 {write to addrA which is a write to new page} CPU-thread-3 {} DEV-thread-0 {} DEV-thread-2 {} [Time N+3] --- CPU-thread-0 {preempted} CPU-thread-1 {preempted} CPU-thread-2 {} CPU-thread-3 {write to addrB which is a write to new page} DEV-thread-0 {} DEV-thread-2 {} [Time N+4] --- CPU-thread-0 {preempted} CPU-thread-1 {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}} CPU-thread-2 {} CPU-thread-3 {} DEV-thread-0 {} DEV-thread-2 {} [Time N+5] --- CPU-thread-0 {preempted} CPU-thread-1 {} CPU-thread-2 {} CPU-thread-3 {} DEV-thread-0 {read addrA from old page} DEV-thread-2 {read addrB from new page} So here because at time N+2 the clear page table entry was not pair with a notification to invalidate the secondary TLB, the device see the new value for addrB before seing the new value for addrA. This break total memory ordering for the device. When changing a pte to write protect or to point to a new write protected page with same content (KSM) it is ok to delay invalidate_range callback to mmu_notifier_invalidate_range_end() outside the page table lock. This is true even if the thread doing page table update is preempted right after releasing page table lock before calling mmu_notifier_invalidate_range_end Changed since v1: - typos (thanks to Andrea) - Avoid unnecessary precaution in try_to_unmap() (Andrea) - Be more conservative in try_to_unmap_one() Signed-off-by: Jérôme Glisse Cc: Andrea Arcangeli Cc: Nadav Amit Cc: Linus Torvalds Cc: Andrew Morton Cc: Joerg Roedel Cc: Suravee Suthikulpanit Cc: David Woodhouse Cc: Alistair Popple Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Stephen Rothwell Cc: Andrew Donnellan Cc: io...@lists.linux-foundation.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-n...@vger.kernel.org --
[PATCH 0/2] Optimize mmu_notifier->invalidate_range callback
From: Jérôme Glisse (Andrew you already have v1 in your queue of patch 1, patch 2 is new, i think you can drop it patch 1 v1 for v2, v2 is bit more conservative and i fixed typos) All this only affect user of invalidate_range callback (at this time CAPI arch/powerpc/platforms/powernv/npu-dma.c, IOMMU ATS/PASID in drivers/iommu/amd_iommu_v2.c|intel-svm.c) This patchset remove useless double call to mmu_notifier->invalidate_range callback wherever it is safe to do so. The first patch just remove useless call and add documentation explaining why it is safe to do so. The second patch go further by introducing mmu_notifier_invalidate_range_only_end() which skip callback to invalidate_range this can be done when clearing a pte, pmd or pud with notification which call invalidate_range right after clearing under the page table lock. It should improve performances but i am lacking hardware and benchmarks which might show an improvement. Maybe folks in cc can help here. Cc: Andrea Arcangeli Cc: Andrew Morton Cc: Joerg Roedel Cc: Suravee Suthikulpanit Cc: David Woodhouse Cc: Alistair Popple Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Stephen Rothwell Cc: Andrew Donnellan Cc: io...@lists.linux-foundation.org Cc: linuxppc-dev@lists.ozlabs.org Jérôme Glisse (2): mm/mmu_notifier: avoid double notification when it is useless v2 mm/mmu_notifier: avoid call to invalidate_range() in range_end() Documentation/vm/mmu_notifier.txt | 93 +++ fs/dax.c | 9 +++- include/linux/mmu_notifier.h | 20 +++-- mm/huge_memory.c | 66 --- mm/hugetlb.c | 16 +-- mm/ksm.c | 15 ++- mm/memory.c | 6 ++- mm/migrate.c | 15 +-- mm/mmu_notifier.c | 11 - mm/rmap.c | 59 ++--- 10 files changed, 281 insertions(+), 29 deletions(-) create mode 100644 Documentation/vm/mmu_notifier.txt -- 2.13.6
Re: [PATCH] powerpc/watchdog: Convert timers to use timer_setup()
On Mon, 16 Oct 2017 16:47:10 -0700 Kees Cook wrote: > In preparation for unconditionally passing the struct timer_list pointer to > all timer callbacks, switch to using the new timer_setup() and from_timer() > to pass the timer pointer explicitly. > > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > Cc: Nicholas Piggin > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Kees Cook Looks fine to me. Is this intended to be merged via the powerpc tree in the next merge window? Acked-by: Nicholas Piggin > --- > arch/powerpc/kernel/watchdog.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c > index 15e209a37c2d..50797528b5e1 100644 > --- a/arch/powerpc/kernel/watchdog.c > +++ b/arch/powerpc/kernel/watchdog.c > @@ -263,9 +263,8 @@ static void wd_timer_reset(unsigned int cpu, struct > timer_list *t) > add_timer_on(t, cpu); > } > > -static void wd_timer_fn(unsigned long data) > +static void wd_timer_fn(struct timer_list *t) > { > - struct timer_list *t = this_cpu_ptr(&wd_timer); > int cpu = smp_processor_id(); > > watchdog_timer_interrupt(cpu); > @@ -292,7 +291,7 @@ static void start_watchdog_timer_on(unsigned int cpu) > > per_cpu(wd_timer_tb, cpu) = get_tb(); > > - setup_pinned_timer(t, wd_timer_fn, 0); > + timer_setup(t, wd_timer_fn, TIMER_PINNED); > wd_timer_reset(cpu, t); > } >
Re: BUG_ON() in irq_work_run_list
Chandan Rajendra writes: > Executing fstests' generic/036 test in a loop on next-20171013 kernel causes > BUG_ON()'s condition to evaluate to true, Did it used to work? ie. the bug just started happening? If so is there a next tag which *doesn't* have the bug. > run fstests generic/036 at 2017-10-14 09:23:29 > [ cut here ] > kernel BUG at /root/repos/linux/kernel/irq_work.c:138! > Oops: Exception in kernel mode, sig: 5 [#1] > BE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries > Modules linked in: > CPU: 3 PID: 0 Comm: swapper/3 Tainted: GW > 4.14.0-rc4-next-20171013 #7 > task: c0063862c780 task.stack: c006387e4000 > NIP: c02476ac LR: c02477c8 CTR: > REGS: c0063ffd3810 TRAP: 0700 Tainted: GW > (4.14.0-rc4-next-20171013) > MSR: 80029032 CR: 24002442 XER: 2000 ^ Hard enabled, but that's expected. > CFAR: c02477c4 SOFTE: 1 ^ Also soft enabled, which is bad. > GPR00: c01b70c4 c0063ffd3a90 c143bb00 c0063fee1a60 > GPR04: 002b c00635ad1b0c c0063383c9e8 > GPR08: 00063ecd 0001 0001 > GPR12: 28002482 cfd41080 c006387e7f90 0200 > GPR16: f0b048c0 c13a0920 c13a0920 > GPR20: 0003 0001 0002 > GPR24: 0010 c0063e22c498 c0063ffd3df0 > GPR28: 00063ecd c1211a60 > NIP [c02476ac] .irq_work_run_list+0xc/0x100 > LR [c02477c8] .irq_work_run+0x28/0x50 > Call Trace: > [c0063ffd3a90] [c0787638] > .__blk_mq_complete_request_remote+0x38/0x50 (unreliable) > [c0063ffd3b10] [c01b70c4] > .flush_smp_call_function_queue+0xd4/0x1e0 > [c0063ffd3ba0] [c0044a4c] .smp_ipi_demux_relaxed+0x9c/0x110 > [c0063ffd3c30] [c008dbdc] .icp_hv_ipi_action+0x5c/0xb0 > [c0063ffd3cb0] [c0174384] .__handle_irq_event_percpu+0x94/0x2d0 > [c0063ffd3d80] [c01745f4] .handle_irq_event_percpu+0x34/0x90 > [c0063ffd3e10] [c017ae20] .handle_percpu_irq+0x80/0xd0 > [c0063ffd3e90] [c0172ad0] .generic_handle_irq+0x50/0x80 > [c0063ffd3f10] [c0016cd0] .__do_irq+0x90/0x210 > [c0063ffd3f90] [c002a900] .call_do_irq+0x14/0x24 > [c006387e77a0] [c0016ee0] .do_IRQ+0x90/0x140 > [c006387e7840] [c0008c20] hardware_interrupt_common+0x150/0x160 > --- interrupt: 501 at .plpar_hcall_norets+0x14/0x20 > LR = .check_and_cede_processor+0x2c/0x40 > [c006387e7b30] [c0b3f028] .check_and_cede_processor+0x18/0x40 > (unreliable) > [c006387e7ba0] [c0b3f3c8] .shared_cede_loop+0x48/0x140 > [c006387e7c20] [c0b3c644] .cpuidle_enter_state+0xa4/0x410 > [c006387e7cd0] [c0159158] .call_cpuidle+0x68/0xd0 > [c006387e7d60] [c0159640] .do_idle+0x2b0/0x310 > [c006387e7e20] [c01598b0] .cpu_startup_entry+0x30/0x40 > [c006387e7ea0] [c0045e38] .start_secondary+0x4e8/0x530 > [c006387e7f90] [c000b06c] start_secondary_prolog+0x10/0x14 > Instruction dump: > 3861 4e800020 6000 6000 6000 3860 4e800020 6000 > 6000 894d027a 312a 7d295110 <0b09> e923 2fa9 4d9e0020 > ---[ end trace 921006f210ad28ba ]--- > > The corresponding code is, > > static void irq_work_run_list(struct llist_head *list) > { > unsigned long flags; > struct irq_work *work; > struct llist_node *llnode; > > BUG_ON(!irqs_disabled()); OK. Called from irq_work_run() called from the end of flush_smp_call_function_queue(). That's also before we start running any of the irq_work functions, so it can't have been one of them. So seemingly something that was smp_call_function'ed() turned interrupts back on. The obvious candidate being __blk_mq_complete_request_remote(). It's called via smp_call_function_single_async() in __blk_mq_complete_request(). The problem is __blk_mq_complete_request_remote() actually just calls another function via a pointer, and we don't have a good way of working out what that was. It's no longer in CTR (although it's a bit odd that CTR is 0x0), it could be in a GPR but we can't know which one without a full disassembly. I dug a bit more but couldn't see anything obvious. Does this reproduce? Easily? cheers
Re: [PATCH v2 2/2] vgaarb: Factor out EFI and fallback default device selection
Bjorn Helgaas writes: > The default VGA device is normally set in vga_arbiter_add_pci_device() when > we call it for the first enabled device that can be accessed with the > legacy VGA resources ([mem 0xa-0xb], etc.) > > That default device can be overridden by an EFI device that owns the boot > framebuffer. As a fallback, we can also select a VGA device that can't be > accessed via legacy VGA resources, or a VGA device that isn't even enabled. > > Factor out this EFI and fallback selection from vga_arb_device_init() into > a separate vga_arb_select_default_device() function. This doesn't change > any behavior, but it untangles the "bridge control possible" checking and > messages from the default device selection. > > Tested-by: Zhou Wang # D05 Hisi Hip07, Hip08 > Signed-off-by: Bjorn Helgaas > --- > drivers/gpu/vga/vgaarb.c | 57 > -- > 1 file changed, 35 insertions(+), 22 deletions(-) > > diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c > index 8035e38d5110..d35d6d271f3f 100644 > --- a/drivers/gpu/vga/vgaarb.c > +++ b/drivers/gpu/vga/vgaarb.c > @@ -1402,29 +1402,14 @@ static struct miscdevice vga_arb_device = { > MISC_DYNAMIC_MINOR, "vga_arbiter", &vga_arb_device_fops > }; > > -static int __init vga_arb_device_init(void) > +static void __init vga_arb_select_default_device(void) > { > - int rc; > struct pci_dev *pdev; > struct vga_device *vgadev; > > - rc = misc_register(&vga_arb_device); > - if (rc < 0) > - pr_err("error %d registering device\n", rc); > - > - bus_register_notifier(&pci_bus_type, &pci_notifier); > - > - /* We add all pci devices satisfying vga class in the arbiter by > - * default */ > - pdev = NULL; > - while ((pdev = > - pci_get_subsys(PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, > -PCI_ANY_ID, pdev)) != NULL) > - vga_arbiter_add_pci_device(pdev); > - > +#if defined(CONFIG_X86) || defined(CONFIG_IA64) > list_for_each_entry(vgadev, &vga_list, list) { > struct device *dev = &vgadev->pdev->dev; > -#if defined(CONFIG_X86) || defined(CONFIG_IA64) > /* >* Override vga_arbiter_add_pci_device()'s I/O based detection >* as it may take the wrong device (e.g. on Apple system under > @@ -1461,12 +1446,8 @@ static int __init vga_arb_device_init(void) > vgaarb_info(dev, "overriding boot device\n"); > vga_set_default_device(vgadev->pdev); > } > -#endif > - if (vgadev->bridge_has_one_vga) > - vgaarb_info(dev, "bridge control possible\n"); > - else > - vgaarb_info(dev, "no bridge control possible\n"); > } > +#endif > > if (!vga_default_device()) { > list_for_each_entry(vgadev, &vga_list, list) { > @@ -1492,6 +1473,38 @@ static int __init vga_arb_device_init(void) > vga_set_default_device(vgadev->pdev); > } > } > +} > + > +static int __init vga_arb_device_init(void) > +{ > + int rc; > + struct pci_dev *pdev; > + struct vga_device *vgadev; > + > + rc = misc_register(&vga_arb_device); > + if (rc < 0) > + pr_err("error %d registering device\n", rc); > + > + bus_register_notifier(&pci_bus_type, &pci_notifier); > + > + /* We add all PCI devices satisfying VGA class in the arbiter by > + * default */ > + pdev = NULL; > + while ((pdev = > + pci_get_subsys(PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, > +PCI_ANY_ID, pdev)) != NULL) > + vga_arbiter_add_pci_device(pdev); > + > + list_for_each_entry(vgadev, &vga_list, list) { > + struct device *dev = &vgadev->pdev->dev; > + > + if (vgadev->bridge_has_one_vga) > + vgaarb_info(dev, "bridge control possible\n"); > + else > + vgaarb_info(dev, "no bridge control possible\n"); > + } Initially I wondered if this info printk could be moved into vga_arbiter_check_bridge_sharing(), but it's been separated out since 3448a19da479b ("vgaarb: use bridges to control VGA routing where possible."), and upon closer examination, it seems you can't be sure a device doesn't share a bridge until the end of the process, so this is indeed correct. Everything else also looks good to me. Reviewed-by: Daniel Axtens Regards, Daniel > + > + vga_arb_select_default_device(); > > pr_info("loaded\n"); > return rc;
[PATCH 55/58] net: fs_enet: Remove unused timer
Removes unused timer and its old initialization call. Cc: Pantelis Antoniou Cc: Vitaly Bordug Cc: linuxppc-dev@lists.ozlabs.org Cc: net...@vger.kernel.org Signed-off-by: Kees Cook --- drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c | 2 -- drivers/net/ethernet/freescale/fs_enet/fs_enet.h | 1 - 2 files changed, 3 deletions(-) diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c index 753259091b22..7892f2f0c6b5 100644 --- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c +++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c @@ -1023,8 +1023,6 @@ static int fs_enet_probe(struct platform_device *ofdev) ndev->ethtool_ops = &fs_ethtool_ops; - init_timer(&fep->phy_timer_list); - netif_carrier_off(ndev); ndev->features |= NETIF_F_SG; diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet.h b/drivers/net/ethernet/freescale/fs_enet/fs_enet.h index 5ce516c8a62a..dd306deb7cf1 100644 --- a/drivers/net/ethernet/freescale/fs_enet/fs_enet.h +++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet.h @@ -137,7 +137,6 @@ struct fs_enet_private { cbd_t __iomem *cur_rx; cbd_t __iomem *cur_tx; int tx_free; - struct timer_list phy_timer_list; const struct phy_info *phy; u32 msg_enable; struct mii_if_info mii_if; -- 2.7.4
[PATCH] powerpc/watchdog: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Nicholas Piggin Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Kees Cook --- arch/powerpc/kernel/watchdog.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 15e209a37c2d..50797528b5e1 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -263,9 +263,8 @@ static void wd_timer_reset(unsigned int cpu, struct timer_list *t) add_timer_on(t, cpu); } -static void wd_timer_fn(unsigned long data) +static void wd_timer_fn(struct timer_list *t) { - struct timer_list *t = this_cpu_ptr(&wd_timer); int cpu = smp_processor_id(); watchdog_timer_interrupt(cpu); @@ -292,7 +291,7 @@ static void start_watchdog_timer_on(unsigned int cpu) per_cpu(wd_timer_tb, cpu) = get_tb(); - setup_pinned_timer(t, wd_timer_fn, 0); + timer_setup(t, wd_timer_fn, TIMER_PINNED); wd_timer_reset(cpu, t); } -- 2.7.4 -- Kees Cook Pixel Security
[PATCH] crypto: vmx - Use skcipher for ctr fallback
Signed-off-by: Paulo Flabiano Smorigo --- drivers/crypto/vmx/aes_ctr.c | 33 + 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/drivers/crypto/vmx/aes_ctr.c b/drivers/crypto/vmx/aes_ctr.c index 17d8421..fc60d00 100644 --- a/drivers/crypto/vmx/aes_ctr.c +++ b/drivers/crypto/vmx/aes_ctr.c @@ -27,21 +27,23 @@ #include #include #include +#include + #include "aesp8-ppc.h" struct p8_aes_ctr_ctx { - struct crypto_blkcipher *fallback; + struct crypto_skcipher *fallback; struct aes_key enc_key; }; static int p8_aes_ctr_init(struct crypto_tfm *tfm) { const char *alg = crypto_tfm_alg_name(tfm); - struct crypto_blkcipher *fallback; + struct crypto_skcipher *fallback; struct p8_aes_ctr_ctx *ctx = crypto_tfm_ctx(tfm); - fallback = - crypto_alloc_blkcipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK); + fallback = crypto_alloc_skcipher(alg, 0, + CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK); if (IS_ERR(fallback)) { printk(KERN_ERR "Failed to allocate transformation for '%s': %ld\n", @@ -49,11 +51,11 @@ static int p8_aes_ctr_init(struct crypto_tfm *tfm) return PTR_ERR(fallback); } printk(KERN_INFO "Using '%s' as fallback implementation.\n", - crypto_tfm_alg_driver_name((struct crypto_tfm *) fallback)); + crypto_skcipher_driver_name(fallback)); - crypto_blkcipher_set_flags( + crypto_skcipher_set_flags( fallback, - crypto_blkcipher_get_flags((struct crypto_blkcipher *)tfm)); + crypto_skcipher_get_flags((struct crypto_skcipher *)tfm)); ctx->fallback = fallback; return 0; @@ -64,7 +66,7 @@ static void p8_aes_ctr_exit(struct crypto_tfm *tfm) struct p8_aes_ctr_ctx *ctx = crypto_tfm_ctx(tfm); if (ctx->fallback) { - crypto_free_blkcipher(ctx->fallback); + crypto_free_skcipher(ctx->fallback); ctx->fallback = NULL; } } @@ -83,7 +85,7 @@ static int p8_aes_ctr_setkey(struct crypto_tfm *tfm, const u8 *key, pagefault_enable(); preempt_enable(); - ret += crypto_blkcipher_setkey(ctx->fallback, key, keylen); + ret += crypto_skcipher_setkey(ctx->fallback, key, keylen); return ret; } @@ -117,15 +119,14 @@ static int p8_aes_ctr_crypt(struct blkcipher_desc *desc, struct blkcipher_walk walk; struct p8_aes_ctr_ctx *ctx = crypto_tfm_ctx(crypto_blkcipher_tfm(desc->tfm)); - struct blkcipher_desc fallback_desc = { - .tfm = ctx->fallback, - .info = desc->info, - .flags = desc->flags - }; if (in_interrupt()) { - ret = crypto_blkcipher_encrypt(&fallback_desc, dst, src, - nbytes); + SKCIPHER_REQUEST_ON_STACK(req, ctx->fallback); + skcipher_request_set_tfm(req, ctx->fallback); + skcipher_request_set_callback(req, desc->flags, NULL, NULL); + skcipher_request_set_crypt(req, src, dst, nbytes, desc->info); + ret = crypto_skcipher_encrypt(req); + skcipher_request_zero(req); } else { blkcipher_walk_init(&walk, dst, src, nbytes); ret = blkcipher_walk_virt_block(desc, &walk, AES_BLOCK_SIZE); -- 2.9.4
Re: [PATCH 0/4] char-TPM: Adjustments for ten function implementations
On Mon, 2017-10-16 at 21:35 +0300, Jarkko Sakkinen wrote: > A minor complaint: all commits are missing "Fixes:" tag. None of these patches fix anything. All are trivial changes without much of any impact.
Re: [PATCH 0/4] char-TPM: Adjustments for ten function implementations
> A minor complaint: all commits are missing "Fixes:" tag. * Do you require it to be added to the commit messages? * Would you like to get a finer patch granularity then? * Do you find any more information missing? Regards, Markus
Re: [PATCH 0/4] char-TPM: Adjustments for ten function implementations
On Mon, Oct 16, 2017 at 09:31:39PM +0300, Jarkko Sakkinen wrote: > On Mon, Oct 16, 2017 at 07:30:13PM +0200, SF Markus Elfring wrote: > > From: Markus Elfring > > Date: Mon, 16 Oct 2017 19:12:34 +0200 > > > > A few update suggestions were taken into account > > from static source code analysis. > > > > Markus Elfring (4): > > Delete an error message for a failed memory allocation > > in tpm_ascii_bios_measurements_show() > > Delete an error message for a failed memory allocation in > > tpm_ibmvtpm_probe() > > Improve a size determination in nine functions > > Less checks in tpm_ibmvtpm_probe() after error detection > > > > drivers/char/tpm/st33zp24/i2c.c | 3 +-- > > drivers/char/tpm/st33zp24/spi.c | 3 +-- > > drivers/char/tpm/st33zp24/st33zp24.c | 3 +-- > > drivers/char/tpm/tpm1_eventlog.c | 5 + > > drivers/char/tpm/tpm_crb.c | 2 +- > > drivers/char/tpm/tpm_i2c_atmel.c | 2 +- > > drivers/char/tpm/tpm_i2c_nuvoton.c | 2 +- > > drivers/char/tpm/tpm_ibmvtpm.c | 23 +-- > > drivers/char/tpm/tpm_tis.c | 2 +- > > drivers/char/tpm/tpm_tis_spi.c | 3 +-- > > 10 files changed, 18 insertions(+), 30 deletions(-) > > > > -- > > 2.14.2 > > > > For some sparse errors I fixed a while ago I got review feedback that > one should explain what is wrong what the fix does and not tell tool > reported. And it really does make sense to me. > > Describing the tool that was used to find the issues fits to the cover > letter but not to the commits themselves. > > I think I recently accepted a small fix with a "tool generated commit > message" but I don't want to take it as a practice It was a minor > mistake from my side to accept such patch. A minor complaint: all commits are missing "Fixes:" tag. /Jarkko
Re: [PATCH 0/4] char-TPM: Adjustments for ten function implementations
On Mon, Oct 16, 2017 at 07:30:13PM +0200, SF Markus Elfring wrote: > From: Markus Elfring > Date: Mon, 16 Oct 2017 19:12:34 +0200 > > A few update suggestions were taken into account > from static source code analysis. > > Markus Elfring (4): > Delete an error message for a failed memory allocation > in tpm_ascii_bios_measurements_show() > Delete an error message for a failed memory allocation in > tpm_ibmvtpm_probe() > Improve a size determination in nine functions > Less checks in tpm_ibmvtpm_probe() after error detection > > drivers/char/tpm/st33zp24/i2c.c | 3 +-- > drivers/char/tpm/st33zp24/spi.c | 3 +-- > drivers/char/tpm/st33zp24/st33zp24.c | 3 +-- > drivers/char/tpm/tpm1_eventlog.c | 5 + > drivers/char/tpm/tpm_crb.c | 2 +- > drivers/char/tpm/tpm_i2c_atmel.c | 2 +- > drivers/char/tpm/tpm_i2c_nuvoton.c | 2 +- > drivers/char/tpm/tpm_ibmvtpm.c | 23 +-- > drivers/char/tpm/tpm_tis.c | 2 +- > drivers/char/tpm/tpm_tis_spi.c | 3 +-- > 10 files changed, 18 insertions(+), 30 deletions(-) > > -- > 2.14.2 > For some sparse errors I fixed a while ago I got review feedback that one should explain what is wrong what the fix does and not tell tool reported. And it really does make sense to me. Describing the tool that was used to find the issues fits to the cover letter but not to the commits themselves. I think I recently accepted a small fix with a "tool generated commit message" but I don't want to take it as a practice It was a minor mistake from my side to accept such patch. /Jarkko
[PATCH 4/4] char/tpm: Less checks in tpm_ibmvtpm_probe() after error detection
From: Markus Elfring Date: Mon, 16 Oct 2017 19:00:34 +0200 Two pointer checks could be repeated by the tpm_ibmvtpm_probe() function during error handling even if the relevant properties can be determined for the involved variables before by source code analysis. * Return directly after a call of the function "kzalloc" failed at the beginning. * Adjust jump targets so that extra checks can be omitted at the end. Signed-off-by: Markus Elfring --- drivers/char/tpm/tpm_ibmvtpm.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c index a4b462a77b99..b8dda7546f64 100644 --- a/drivers/char/tpm/tpm_ibmvtpm.c +++ b/drivers/char/tpm/tpm_ibmvtpm.c @@ -610,7 +610,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev, ibmvtpm = kzalloc(sizeof(*ibmvtpm), GFP_KERNEL); if (!ibmvtpm) - goto cleanup; + return -ENOMEM; ibmvtpm->dev = dev; ibmvtpm->vdev = vio_dev; @@ -619,7 +619,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev, crq_q->crq_addr = (struct ibmvtpm_crq *)get_zeroed_page(GFP_KERNEL); if (!crq_q->crq_addr) { dev_err(dev, "Unable to allocate memory for crq_addr\n"); - goto cleanup; + goto free_tpm; } crq_q->num_entry = CRQ_RES_BUF_SIZE / sizeof(*crq_q->crq_addr); @@ -629,7 +629,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev, if (dma_mapping_error(dev, ibmvtpm->crq_dma_handle)) { dev_err(dev, "dma mapping failed\n"); - goto cleanup; + goto free_page; } rc = plpar_hcall_norets(H_REG_CRQ, vio_dev->unit_address, @@ -683,13 +683,10 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev, reg_crq_cleanup: dma_unmap_single(dev, ibmvtpm->crq_dma_handle, CRQ_RES_BUF_SIZE, DMA_BIDIRECTIONAL); -cleanup: - if (ibmvtpm) { - if (crq_q->crq_addr) - free_page((unsigned long)crq_q->crq_addr); - kfree(ibmvtpm); - } - +free_page: + free_page((unsigned long)crq_q->crq_addr); +free_tpm: + kfree(ibmvtpm); return rc; } -- 2.14.2
[PATCH 3/4] char/tpm: Improve a size determination in nine functions
From: Markus Elfring Date: Mon, 16 Oct 2017 18:28:17 +0200 Replace the specification of data structures by pointer dereferences as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/char/tpm/st33zp24/i2c.c | 3 +-- drivers/char/tpm/st33zp24/spi.c | 3 +-- drivers/char/tpm/st33zp24/st33zp24.c | 3 +-- drivers/char/tpm/tpm_crb.c | 2 +- drivers/char/tpm/tpm_i2c_atmel.c | 2 +- drivers/char/tpm/tpm_i2c_nuvoton.c | 2 +- drivers/char/tpm/tpm_ibmvtpm.c | 2 +- drivers/char/tpm/tpm_tis.c | 2 +- drivers/char/tpm/tpm_tis_spi.c | 3 +-- 9 files changed, 9 insertions(+), 13 deletions(-) diff --git a/drivers/char/tpm/st33zp24/i2c.c b/drivers/char/tpm/st33zp24/i2c.c index be5d1abd3e8e..d0cb25688485 100644 --- a/drivers/char/tpm/st33zp24/i2c.c +++ b/drivers/char/tpm/st33zp24/i2c.c @@ -245,8 +245,7 @@ static int st33zp24_i2c_probe(struct i2c_client *client, return -ENODEV; } - phy = devm_kzalloc(&client->dev, sizeof(struct st33zp24_i2c_phy), - GFP_KERNEL); + phy = devm_kzalloc(&client->dev, sizeof(*phy), GFP_KERNEL); if (!phy) return -ENOMEM; diff --git a/drivers/char/tpm/st33zp24/spi.c b/drivers/char/tpm/st33zp24/spi.c index 0fc4f20b5f83..c952df9244c8 100644 --- a/drivers/char/tpm/st33zp24/spi.c +++ b/drivers/char/tpm/st33zp24/spi.c @@ -358,8 +358,7 @@ static int st33zp24_spi_probe(struct spi_device *dev) return -ENODEV; } - phy = devm_kzalloc(&dev->dev, sizeof(struct st33zp24_spi_phy), - GFP_KERNEL); + phy = devm_kzalloc(&dev->dev, sizeof(*phy), GFP_KERNEL); if (!phy) return -ENOMEM; diff --git a/drivers/char/tpm/st33zp24/st33zp24.c b/drivers/char/tpm/st33zp24/st33zp24.c index 4d1dc8b46877..0686a129268c 100644 --- a/drivers/char/tpm/st33zp24/st33zp24.c +++ b/drivers/char/tpm/st33zp24/st33zp24.c @@ -533,8 +533,7 @@ int st33zp24_probe(void *phy_id, const struct st33zp24_phy_ops *ops, if (IS_ERR(chip)) return PTR_ERR(chip); - tpm_dev = devm_kzalloc(dev, sizeof(struct st33zp24_dev), - GFP_KERNEL); + tpm_dev = devm_kzalloc(dev, sizeof(*tpm_dev), GFP_KERNEL); if (!tpm_dev) return -ENOMEM; diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c index 7b3c2a8aa9de..343c46e8560f 100644 --- a/drivers/char/tpm/tpm_crb.c +++ b/drivers/char/tpm/tpm_crb.c @@ -557,7 +557,7 @@ static int crb_acpi_add(struct acpi_device *device) if (sm == ACPI_TPM2_MEMORY_MAPPED) return -ENODEV; - priv = devm_kzalloc(dev, sizeof(struct crb_priv), GFP_KERNEL); + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; diff --git a/drivers/char/tpm/tpm_i2c_atmel.c b/drivers/char/tpm/tpm_i2c_atmel.c index 95ce2e9ccdc6..2d0df930a76d 100644 --- a/drivers/char/tpm/tpm_i2c_atmel.c +++ b/drivers/char/tpm/tpm_i2c_atmel.c @@ -165,7 +165,7 @@ static int i2c_atmel_probe(struct i2c_client *client, if (IS_ERR(chip)) return PTR_ERR(chip); - priv = devm_kzalloc(dev, sizeof(struct priv_data), GFP_KERNEL); + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; diff --git a/drivers/char/tpm/tpm_i2c_nuvoton.c b/drivers/char/tpm/tpm_i2c_nuvoton.c index c6428771841f..5983d52eb6d9 100644 --- a/drivers/char/tpm/tpm_i2c_nuvoton.c +++ b/drivers/char/tpm/tpm_i2c_nuvoton.c @@ -551,7 +551,7 @@ static int i2c_nuvoton_probe(struct i2c_client *client, if (IS_ERR(chip)) return PTR_ERR(chip); - priv = devm_kzalloc(dev, sizeof(struct priv_data), GFP_KERNEL); + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c index b18148ef2612..a4b462a77b99 100644 --- a/drivers/char/tpm/tpm_ibmvtpm.c +++ b/drivers/char/tpm/tpm_ibmvtpm.c @@ -608,7 +608,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev, if (IS_ERR(chip)) return PTR_ERR(chip); - ibmvtpm = kzalloc(sizeof(struct ibmvtpm_dev), GFP_KERNEL); + ibmvtpm = kzalloc(sizeof(*ibmvtpm), GFP_KERNEL); if (!ibmvtpm) goto cleanup; diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index ebd0e75a3e4d..0a3af60bab2a 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -294,7 +294,7 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info) if (rc) return rc; - phy = devm_kzalloc(dev, sizeof(struct tpm_tis_tcg_phy), GFP_KERNEL); +
[PATCH 2/4] char/tpm: Delete an error message for a failed memory allocation in tpm_ibmvtpm_probe()
From: Markus Elfring Date: Mon, 16 Oct 2017 18:08:23 +0200 Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/char/tpm/tpm_ibmvtpm.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c index 25f6e2665385..b18148ef2612 100644 --- a/drivers/char/tpm/tpm_ibmvtpm.c +++ b/drivers/char/tpm/tpm_ibmvtpm.c @@ -609,10 +609,8 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev, return PTR_ERR(chip); ibmvtpm = kzalloc(sizeof(struct ibmvtpm_dev), GFP_KERNEL); - if (!ibmvtpm) { - dev_err(dev, "kzalloc for ibmvtpm failed\n"); + if (!ibmvtpm) goto cleanup; - } ibmvtpm->dev = dev; ibmvtpm->vdev = vio_dev; -- 2.14.2
[PATCH 1/4] char/tpm: Delete an error message for a failed memory allocation in tpm_ascii_bios_measurements_show()
From: Markus Elfring Date: Mon, 16 Oct 2017 17:43:55 +0200 Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/char/tpm/tpm1_eventlog.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/char/tpm/tpm1_eventlog.c b/drivers/char/tpm/tpm1_eventlog.c index 9a8605e500b5..9749469cb823 100644 --- a/drivers/char/tpm/tpm1_eventlog.c +++ b/drivers/char/tpm/tpm1_eventlog.c @@ -280,11 +280,8 @@ static int tpm_ascii_bios_measurements_show(struct seq_file *m, void *v) (unsigned char *)(v + sizeof(struct tcpa_event)); eventname = kmalloc(MAX_TEXT_EVENT, GFP_KERNEL); - if (!eventname) { - printk(KERN_ERR "%s: ERROR - No Memory for event name\n ", - __func__); + if (!eventname) return -EFAULT; - } /* 1st: PCR */ seq_printf(m, "%2d ", do_endian_conversion(event->pcr_index)); -- 2.14.2
[PATCH 0/4] char-TPM: Adjustments for ten function implementations
From: Markus Elfring Date: Mon, 16 Oct 2017 19:12:34 +0200 A few update suggestions were taken into account from static source code analysis. Markus Elfring (4): Delete an error message for a failed memory allocation in tpm_ascii_bios_measurements_show() Delete an error message for a failed memory allocation in tpm_ibmvtpm_probe() Improve a size determination in nine functions Less checks in tpm_ibmvtpm_probe() after error detection drivers/char/tpm/st33zp24/i2c.c | 3 +-- drivers/char/tpm/st33zp24/spi.c | 3 +-- drivers/char/tpm/st33zp24/st33zp24.c | 3 +-- drivers/char/tpm/tpm1_eventlog.c | 5 + drivers/char/tpm/tpm_crb.c | 2 +- drivers/char/tpm/tpm_i2c_atmel.c | 2 +- drivers/char/tpm/tpm_i2c_nuvoton.c | 2 +- drivers/char/tpm/tpm_ibmvtpm.c | 23 +-- drivers/char/tpm/tpm_tis.c | 2 +- drivers/char/tpm/tpm_tis_spi.c | 3 +-- 10 files changed, 18 insertions(+), 30 deletions(-) -- 2.14.2
Re: [PATCH 2/2] powerpc/perf/imc: use NUMA_NO_NODE for alloc_pages_node
On Monday 16 October 2017 07:48 AM, Balbir Singh wrote: On Mon, 16 Oct 2017 00:13:42 +0530 Madhavan Srinivasan wrote: alloc_pages_node() when passed NUMA_NO_NODE for the node_id, could get memory from closest node. Cleanup core imc and thread imc memory init functions to use NUMA_NO_NODE. The changelog is not clear, alloc_pages_node() takes a preferred node id and creates a node zonelist from it. How is NUMA_NO_NODE better? IIUC with NUMA_NO_NODE we could remove the __GFP_NOWARN for alloc_pages_node(). That said, one must be careful to make sure we dont end up allocating memory from the boot cpu. And incase of In Memory Collection (IMC) counters, it is handled in the cpu online path. Will send a v2 with __GFP_NOWARN removed which I missed in this :) . Maddy Balbir Singh.
[Part1 PATCH v6 09/17] resource: Provide resource struct in resource walk callback
From: Tom Lendacky In prep for a new function that will need additional resource information during the resource walk, update the resource walk callback to pass the resource structure. Since the current callback start and end arguments are pulled from the resource structure, the callback functions can obtain them from the resource structure directly. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Borislav Petkov Cc: linux-ker...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Benjamin Herrenschmidt Reviewed-by: Kees Cook Reviewed-by: Borislav Petkov Signed-off-by: Tom Lendacky Signed-off-by: Brijesh Singh --- arch/powerpc/kernel/machine_kexec_file_64.c | 12 +--- arch/x86/kernel/crash.c | 18 +- arch/x86/kernel/pmem.c | 2 +- include/linux/ioport.h | 4 ++-- include/linux/kexec.h | 2 +- kernel/kexec_file.c | 5 +++-- kernel/resource.c | 9 + 7 files changed, 30 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_file_64.c b/arch/powerpc/kernel/machine_kexec_file_64.c index 992c0d258e5d..e4395f937d63 100644 --- a/arch/powerpc/kernel/machine_kexec_file_64.c +++ b/arch/powerpc/kernel/machine_kexec_file_64.c @@ -91,11 +91,13 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image) * and that value will be returned. If all free regions are visited without * func returning non-zero, then zero will be returned. */ -int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *)) +int arch_kexec_walk_mem(struct kexec_buf *kbuf, + int (*func)(struct resource *, void *)) { int ret = 0; u64 i; phys_addr_t mstart, mend; + struct resource res = { }; if (kbuf->top_down) { for_each_free_mem_range_reverse(i, NUMA_NO_NODE, 0, @@ -105,7 +107,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *)) * range while in kexec, end points to the last byte * in the range. */ - ret = func(mstart, mend - 1, kbuf); + res.start = mstart; + res.end = mend - 1; + ret = func(&res, kbuf); if (ret) break; } @@ -117,7 +121,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *)) * range while in kexec, end points to the last byte * in the range. */ - ret = func(mstart, mend - 1, kbuf); + res.start = mstart; + res.end = mend - 1; + ret = func(&res, kbuf); if (ret) break; } diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 44404e2307bb..815008c9ca18 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -209,7 +209,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) } #ifdef CONFIG_KEXEC_FILE -static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg) +static int get_nr_ram_ranges_callback(struct resource *res, void *arg) { unsigned int *nr_ranges = arg; @@ -342,7 +342,7 @@ static int elf_header_exclude_ranges(struct crash_elf_data *ced, return ret; } -static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg) +static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg) { struct crash_elf_data *ced = arg; Elf64_Ehdr *ehdr; @@ -355,7 +355,7 @@ static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg) ehdr = ced->ehdr; /* Exclude unwanted mem ranges */ - ret = elf_header_exclude_ranges(ced, start, end); + ret = elf_header_exclude_ranges(ced, res->start, res->end); if (ret) return ret; @@ -518,14 +518,14 @@ static int add_e820_entry(struct boot_params *params, struct e820_entry *entry) return 0; } -static int memmap_entry_callback(u64 start, u64 end, void *arg) +static int memmap_entry_callback(struct resource *res, void *arg) { struct crash_memmap_data *cmd = arg; struct boot_params *params = cmd->params; struct e820_entry ei; - ei.addr = start; - ei.size = end - start + 1; + ei.addr = res->start; + ei.size = res->end - res->start + 1; ei.type = cmd->type; add_e820_entry(params, &ei); @@ -619,12 +619,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params) return ret; } -static int determine_backup_region(u64 start, u64 end, void *arg) +static int determine_backup_regi
Re: [PATCH0/8] Support for ibm, dynamic-memory-v2 device tree property
On 10/06/2017 04:58 PM, Michael Ellerman wrote: > Nathan Fontenot writes: >> This patch set provides a set of updates to de-couple the LMB information >> provided in the ibm,dynamic-memory device tree property from the device >> tree property format. The goal is to provide a data set of LMB information >> so that consumners of this data do not need to understand and provide >> multiple parsing routines for the supported device tree property formats. > > Couple of build problems in this series. > > The non-pseries build dies with: > > arch/powerpc/kernel/prom.c:541:4: error: implicit declaration of function > 'early_init_dt_scan_drmem_lmbs' [-Werror=implicit-function-declaration] > > And pseries dies with: > > arch/powerpc/platforms/pseries/hotplug-memory.c:860:21: error: > 'lmbs_available' may be used uninitialized in this function > [-Werror=maybe-uninitialized] > > cheers > Ugh! I was hoping I had tried all permutations. Obviously I missed some. I'll send out an updated patch set after fixing the build issues. -Nathan
Re: [PATCH v2] cxl: Dump PSL_FIR register on PSL9 error irq
Le 11/10/2017 à 08:14, Vaibhav Jain a écrit : For PSL9 currently we aren't dumping the PSL FIR register when a PSL error interrupt is triggered. Contents of this register are useful in debugging AFU issues. This patch fixes issue by adding a new service_layer_ops callback cxl_native_err_irq_dump_regs_psl9() to dump the PSL_FIR registers on a PSL error interrupt thereby bringing the behavior in line with PSL on POWER-8. Also the existing service_layer_ops callback for PSL8 has been renamed to cxl_native_err_irq_dump_regs_psl8(). Signed-off-by: Vaibhav Jain --- Changelog: [v2] -> As created a different function to dump the FIR register for PSL9 (Fred) --- Thanks Acked-by: Christophe Lombard
Re: [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug
Michael Bringmann writes: > powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU, > it may occur that the new resources are to be inserted into nodes > that were not used for memory resources at bootup. Many different > configurations of PowerPC resources may need to be supported depending > upon the environment. Give me some detail please?! > This patch fixes some problems encountered at What problems? > runtime with configurations that support memory-less nodes, but which > allow CPUs to be added at and after boot. How does it fix those problems? > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index b385cd0..e811dd1 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu, > return rc; > } > > +static int verify_node_preparation(int nid) > +{ I would not expect a function called "verify" ... > + if ((NODE_DATA(nid) == NULL) || > + (NODE_DATA(nid)->node_spanned_pages == 0)) { > + if (try_online_node(nid)) .. to do something like online a node. > + return first_online_node; > + } > + > + return nid; > +} > + > /* > * Update the CPU maps and sysfs entries for a single CPU when its NUMA > * characteristics change. This function doesn't perform any locking and is > @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked) > /* Use associativity from first thread for all siblings */ > vphn_get_associativity(cpu, associativity); > new_nid = associativity_to_nid(associativity); > - if (new_nid < 0 || !node_online(new_nid)) > + if (new_nid < 0 || !node_possible(new_nid)) > new_nid = first_online_node; > > + new_nid = verify_node_preparation(new_nid); You're being called part-way through CPU hotplug here, are we sure it's safe to go and do memory hotplug from there? What's the locking situation? cheers
Re: [PATCH] cxl: Rename register PSL9_FIR2 to PSL9_FIR_MASK
Le 09/10/2017 à 19:56, Vaibhav Jain a écrit : PSL9 doesn't have a FIR2 register as was the case with PSL8. However currently the register definitions in 'cxl.h' have a definition for PSL9_FIR2 that actually points to PSL9_FIR_MASK register in the P1 area at offset 0x308. So this patch renames the def PSL9_FIR2 to PSL9_FIR_MASK and updates the references in the code to point to the new identifier. It also removes the code to dump contents of FIR2 (FIR_MASK actually) in cxl_native_irq_dump_regs_psl9(). Fixes: f24be42aab37("cxl: Add psl9 specific code") Reported-by: Frederic Barrat Signed-off-by: Vaibhav Jain Thanks Acked-by: Christophe Lombard
Re: refactor dma_cache_sync V2
On Tue, Oct 03, 2017 at 12:49:51PM +0100, Robin Murphy wrote: > Reviewed-by: Robin Murphy Thanks Robin. I've heard very little from the arch maintainers, but if people remain silent I will apply the whole series to the dma-mapping tree in the next days.
Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations
Michael Bringmann writes: > powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU This is a powerpc-only patch, so saying "systems like PowerPC" is confusing. What you should be saying is "On pseries systems". > or memory resources, it may occur that the new resources are to be > inserted into nodes that were not used for these resources at bootup. > In the kernel, any node that is used must be defined and initialized > at boot. > > This patch extracts the value of the lowest domain level (number of > allocable resources) from the "rtas" device tree property > "ibm,current-associativity-domains" or the device tree property What is current associativity domains? I've not heard of it, where is it documented, and what does it mean. Why would use the "current" set vs the "max"? I thought the whole point was to discover the maximum possible set of nodes that could be hotplugged. > "ibm,max-associativity-domains" to use as the maximum number of nodes > to setup as possibly available in the system. This new setting will > override the instruction, > > nodes_and(node_possible_map, node_possible_map, node_online_map); > > presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). > > If the property is not present at boot, no operation will be performed > to define or enable additional nodes. > > Signed-off-by: Michael Bringmann > --- > arch/powerpc/mm/numa.c | 47 +++ > 1 file changed, 47 insertions(+) > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index ec098b3..b385cd0 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 > start_pfn, u64 end_pfn) > NODE_DATA(nid)->node_spanned_pages = spanned_pages; > } > > +static void __init node_associativity_setup(void) This should really be called "find_possible_nodes()" or something more descriptive. > +{ > + struct device_node *rtas; > + > + rtas = of_find_node_by_path("/rtas"); > + if (rtas) { If you just short-circuit that return the whole function body can be deintented, making it significantly more readable. ie: + rtas = of_find_node_by_path("/rtas"); + if (!rtas) + return; > + const __be32 *prop; > + u32 len, entries, numnodes, i; > + > + prop = of_get_property(rtas, > + "ibm,current-associativity-domains", > &len); Please don't use of_get_property() in new code, we have much better accessors these days, which do better error checking and handle the endian conversions for you. In this case you'd use eg: u32 entries; rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries); > + if (!prop || len < sizeof(unsigned int)) { > + prop = of_get_property(rtas, > + "ibm,max-associativity-domains", &len); > + goto endit; > + } > + > + entries = of_read_number(prop++, 1); > + > + if (len < (entries * sizeof(unsigned int))) > + goto endit; > + > + if ((0 <= min_common_depth) && (min_common_depth <= > (entries-1))) > + entries = min_common_depth; > + else > + entries -= 1; ^ You can't just guess that will be the right entry. If min_common_depth is < 0 the function should have just returned immediately at the top. If min_common_depth is outside the range of the property that's a buggy device tree, you should print a warning and return. > + numnodes = of_read_number(&prop[entries], 1); u32 num_nodes; rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes); > + > + printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes, > + min_common_depth); > + > + for (i = 0; i < numnodes; i++) { > + if (!node_possible(i)) { > + setup_node_data(i, 0, 0); Do we actually need to setup the NODE_DATA() yet? Doing it now ensures it will not be allocated node local, which sucks. > + node_set(i, node_possible_map); > + } > + } > + } > + > +endit: "out" would be the normal name. > + if (rtas) > + of_node_put(rtas); > +} > + > void __init initmem_init(void) > { > int nid, cpu; > @@ -911,6 +956,8 @@ void __init initmem_init(void) >*/ You need to update the comment above here which is contradicted by the new function you're adding. > nodes_and(node_possible_map, node_possible_map, node_online_map); > > + node_associativity_setup(); > + > for_each_online_node(nid) { > unsigned long start_pfn, end_
Re: [PATCH v4 5/5] powerpc/mce: hookup memory_failure for UE errors
Balbir Singh writes: > On Mon, Oct 16, 2017 at 4:38 PM, Michael Ellerman wrote: >> Balbir Singh writes: >> >>> If we are in user space and hit a UE error, we now have the >>> basic infrastructure to walk the page tables and find out >>> the effective address that was accessed, since the DAR >>> is not valid. >>> >>> We use a work_queue content to hookup the bad pfn, any >>> other context causes problems, since memory_failure itself >>> can call into schedule() via lru_drain_ bits. >>> >>> We could probably poison the struct page to avoid a race >>> between detection and taking corrective action. >>> >>> Signed-off-by: Balbir Singh >>> Reviewed-by: Nicholas Piggin >>> --- >>> arch/powerpc/kernel/mce.c | 70 >>> +-- >>> 1 file changed, 67 insertions(+), 3 deletions(-) >> >> I'm not sure why this is in mce.c but the rest was in mce_power.c ? > > The way the code is organized is that save_mce_event is implemented here > and called from mce_power.c. save_mce_event() does the processing > of the event. save_mce_event() is only called from mce_power.c :/ I'd be happy if it was mce_real_mode.c and mce.c, that would be a good distinction to make. Though even then, most of the code is Book3s64 specific I think so maybe it shouldn't be in just mce.c. Anyway I'll merge this and we can improve things later. cheers
Re: [PATCH kernel] RFC: prom_init: Fetch flatten device tree from the system firmware
Alexey Kardashevskiy writes: > At the moment, on 256CPU + 256 PCI devices guest, it takes the guest > about 8.5sec to read the entire device tree. Some explanation can be > found here: https://patchwork.ozlabs.org/patch/826124/ but mostly it is > because the kernel traverses the tree twice and it calls "getprop" for > each properly which is really SLOF as it searches from the linked list > beginning every time. > > Since SLOF has just learned to build FDT and this takes less than 0.5sec > for such a big guest, this makes use of the proposed client interface > method - "fdt-fetch". It's a pity doing it the normal way is so slow, but this seems like a reasonable idea anyway. > diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c > index 02190e90c7ae..daa50a153737 100644 > --- a/arch/powerpc/kernel/prom_init.c > +++ b/arch/powerpc/kernel/prom_init.c > @@ -2498,6 +2498,31 @@ static void __init flatten_device_tree(void) > prom_panic("Can't allocate initial device-tree chunk\n"); > mem_end = mem_start + room; I'd prefer you didn't munge it inside flatten_device_tree(), rather create a wrapper that does ~=: void get_flat_devicetree(void) { if (!fetch_flat_devicetree()) flatten_device_tree(); printf(...) } > + if (!call_prom_ret("fdt-fetch", 2, 1, NULL, mem_start, > +room - sizeof(mem_reserve_map))) { > + u32 size; > + > + hdr = (void *) mem_start; > + > + /* Fixup the boot cpuid */ > + hdr->boot_cpuid_phys = cpu_to_be32(prom.cpu); > + > + /* Append the reserved map to the end of the blob */ > + hdr->off_mem_rsvmap = hdr->totalsize; > + size = be32_to_cpu(hdr->totalsize); > + rsvmap = (void *) hdr + size; > + hdr->totalsize = cpu_to_be32(size + sizeof(mem_reserve_map)); > + memcpy(rsvmap, mem_reserve_map, sizeof(mem_reserve_map)); > + > + /* Store the DT address */ > + dt_header_start = mem_start; > + > +#ifdef DEBUG_PROM > + prom_printf("Fetched DTB: %d bytes to @%x\n", size, mem_start); > +#endif I think that should actually not be under DEBUG_PROM. The origin of the FDT is fairly crucial information, so I think we can tolerate an extra line of output to know that. > + goto print_exit; This was the clue that it should be in a separate function :) cheers > + } > + > /* Get root of tree */ > root = call_prom("peer", 1, 1, (phandle)0); > if (root == (phandle)0) > @@ -2548,6 +2573,7 @@ static void __init flatten_device_tree(void) > /* Copy the reserve map in */ > memcpy(rsvmap, mem_reserve_map, sizeof(mem_reserve_map)); > > +print_exit: > #ifdef DEBUG_PROM > { > int i; > -- > 2.11.0
Re: [PATCH kernel] RFC: prom_init: Fetch flatten device tree from the system firmware
On 16/10/17 21:20, Segher Boessenkool wrote: > On Mon, Oct 16, 2017 at 06:07:06PM +1100, Alexey Kardashevskiy wrote: >> On 16/10/17 17:46, David Gibson wrote: > >> +/* Fixup the boot cpuid */ >> +hdr->boot_cpuid_phys = cpu_to_be32(prom.cpu); > > If SLOF is generating a tree it really should get this header field > right as well. Ah, I did not realize it is just a phandle from /chosen/cpu. Will fix. >>> >>> It's not a phandle. It's just the "address" (i.e. reg value) of the >>> boot cpu. >> >> Well, it is "reg" of a CPU with phandle==/chosen/cpu so my fdt code needs >> to look there to pick the right "reg" rather than just plain 0. I'll fix >> this but in general can it possibly be not a zero in QEMU/SLOF? > > /chosen/cpu is an ihandle, not a phandle. Sure, that is what my proposed fdt-boot-cpu does already. > Most (if not all) references > in /chosen are. -- Alexey
[PATCH v3] KVM: PPC: Book3S PR: only install valid SLBs during KVM_SET_SREGS
Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS, some of which are valid (ie, SLB_ESID_V is set) and the rest are likely all-zeroes (with QEMU at least). Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which assumes to find the SLB index in the 3 lower bits of its rb argument. When passed zeroed arguments, it happily overwrites the 0th SLB entry with zeroes. This is exactly what happens while doing live migration with QEMU when the destination pushes the incoming SLB descriptors to KVM PR. When reloading the SLBs at the next synchronization, QEMU first clears its SLB array and only restore valid ones, but the 0th one is now gone and we cannot access the corresponding memory anymore: (qemu) x/x $pc c00b742c: Cannot access memory To avoid this, let's filter out non-valid SLB entries. While here, we also force a full SLB flush before installing new entries. Since SLB is for 64-bit only, we now build this path conditionally to avoid a build break on 32-bit, which doesn't define SLB_ESID_V. Signed-off-by: Greg Kurz --- v3: - build SLB path for 64-bit only [*] - add blank line after declaration of rb and rs to silence checkpatch v2: - flush SLB before installing new entries [*] I'm wondering if other users of BOOK3S_HFLAG_SLB in this file shouldn't be conditionally built as well, for consistency. --- arch/powerpc/kvm/book3s_pr.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 3beb4ff469d1..68c1f8bc17c4 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -1326,12 +1326,22 @@ static int kvm_arch_vcpu_ioctl_set_sregs_pr(struct kvm_vcpu *vcpu, kvmppc_set_pvr_pr(vcpu, sregs->pvr); vcpu3s->sdr1 = sregs->u.s.sdr1; +#ifdef CONFIG_PPC_BOOK3S_64 if (vcpu->arch.hflags & BOOK3S_HFLAG_SLB) { + /* Flush all SLB entries */ + vcpu->arch.mmu.slbmte(vcpu, 0, 0); + vcpu->arch.mmu.slbia(vcpu); + for (i = 0; i < 64; i++) { - vcpu->arch.mmu.slbmte(vcpu, sregs->u.s.ppc64.slb[i].slbv, - sregs->u.s.ppc64.slb[i].slbe); + u64 rb = sregs->u.s.ppc64.slb[i].slbe; + u64 rs = sregs->u.s.ppc64.slb[i].slbv; + + if (rb & SLB_ESID_V) + vcpu->arch.mmu.slbmte(vcpu, rs, rb); } - } else { + } else +#endif + { for (i = 0; i < 16; i++) { vcpu->arch.mmu.mtsrin(vcpu, i, sregs->u.s.ppc32.sr[i]); }
Re: [PATCH kernel] RFC: prom_init: Fetch flatten device tree from the system firmware
On Mon, Oct 16, 2017 at 06:07:06PM +1100, Alexey Kardashevskiy wrote: > On 16/10/17 17:46, David Gibson wrote: > +/* Fixup the boot cpuid */ > +hdr->boot_cpuid_phys = cpu_to_be32(prom.cpu); > >>> > >>> If SLOF is generating a tree it really should get this header field > >>> right as well. > >> > >> > >> Ah, I did not realize it is just a phandle from /chosen/cpu. Will > >> fix. > > > > It's not a phandle. It's just the "address" (i.e. reg value) of the > > boot cpu. > > Well, it is "reg" of a CPU with phandle==/chosen/cpu so my fdt code needs > to look there to pick the right "reg" rather than just plain 0. I'll fix > this but in general can it possibly be not a zero in QEMU/SLOF? /chosen/cpu is an ihandle, not a phandle. Most (if not all) references in /chosen are. Segher
Re: [PATCH] cxl: Rename register PSL9_FIR2 to PSL9_FIR_MASK
Le 09/10/2017 à 19:56, Vaibhav Jain a écrit : PSL9 doesn't have a FIR2 register as was the case with PSL8. However currently the register definitions in 'cxl.h' have a definition for PSL9_FIR2 that actually points to PSL9_FIR_MASK register in the P1 area at offset 0x308. So this patch renames the def PSL9_FIR2 to PSL9_FIR_MASK and updates the references in the code to point to the new identifier. It also removes the code to dump contents of FIR2 (FIR_MASK actually) in cxl_native_irq_dump_regs_psl9(). Fixes: f24be42aab37("cxl: Add psl9 specific code") Reported-by: Frederic Barrat Signed-off-by: Vaibhav Jain --- Thanks Acked-by: Christophe Lombard
Re: [PATCH kernel] RFC: prom_init: Fetch flatten device tree from the system firmware
On Mon, Oct 16, 2017 at 06:07:06PM +1100, Alexey Kardashevskiy wrote: > On 16/10/17 17:46, David Gibson wrote: > > On Mon, Oct 16, 2017 at 05:22:55PM +1100, Alexey Kardashevskiy wrote: > >> On 16/10/17 17:11, David Gibson wrote: > >>> On Mon, Oct 16, 2017 at 04:49:17PM +1100, Alexey Kardashevskiy wrote: > At the moment, on 256CPU + 256 PCI devices guest, it takes the guest > about 8.5sec to read the entire device tree. Some explanation can be > found here: https://patchwork.ozlabs.org/patch/826124/ but mostly it is > because the kernel traverses the tree twice and it calls "getprop" for > each properly which is really SLOF as it searches from the linked list > beginning every time. > > Since SLOF has just learned to build FDT and this takes less than 0.5sec > for such a big guest, this makes use of the proposed client interface > method - "fdt-fetch". > > If "fdt-fetch" is not available, the old method is used. > > Signed-off-by: Alexey Kardashevskiy > >>> > >>> I like the concept, few details though.. > >>> > --- > arch/powerpc/kernel/prom_init.c | 26 ++ > 1 file changed, 26 insertions(+) > > diff --git a/arch/powerpc/kernel/prom_init.c > b/arch/powerpc/kernel/prom_init.c > index 02190e90c7ae..daa50a153737 100644 > --- a/arch/powerpc/kernel/prom_init.c > +++ b/arch/powerpc/kernel/prom_init.c > @@ -2498,6 +2498,31 @@ static void __init flatten_device_tree(void) > prom_panic("Can't allocate initial device-tree > chunk\n"); > mem_end = mem_start + room; > > +if (!call_prom_ret("fdt-fetch", 2, 1, NULL, mem_start, > + room - sizeof(mem_reserve_map))) { > +u32 size; > + > +hdr = (void *) mem_start; > + > +/* Fixup the boot cpuid */ > +hdr->boot_cpuid_phys = cpu_to_be32(prom.cpu); > >>> > >>> If SLOF is generating a tree it really should get this header field > >>> right as well. > >> > >> > >> Ah, I did not realize it is just a phandle from /chosen/cpu. Will > >> fix. > > > > It's not a phandle. It's just the "address" (i.e. reg value) of the > > boot cpu. > > > Well, it is "reg" of a CPU with phandle==/chosen/cpu so my fdt code needs > to look there to pick the right "reg" rather than just plain 0. Ah, right, I see what you mean. > I'll fix > this but in general can it possibly be not a zero in QEMU/SLOF? Erm.. probably not, but I'm not totally certain what could happen if you tried creating all your cpu cores explicitly with -device instead of just using -smp. I think it's safer to look it up in SLOF, so that it won't break if we change how cpu addresses are assigned in qemu. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH v3 3/3] powerpc:selftest update memcmp_64 selftest for VMX implementation
Hi Cyril, Thanks for the review. On Mon, Oct 16, 2017 at 02:32:58PM +1100, Cyril Bur wrote: > On Fri, 2017-10-13 at 12:30 +0800, wei.guo.si...@gmail.com wrote: > > From: Simon Guo > > > > This patch adjust selftest memcmp_64 so that memcmp selftest can be > > compiled successfully. > > > > Do they not compile at the moment? > Since I renamed enter/exit_vmx_copy() to enter/exit_vmx_ops() in patch 2nd, this patch is necessary to make selftest compiled based on patch 2nd. > > It also adds testcases for: > > - memcmp over 4K bytes size. > > - s1/s2 with different/random offset on 16 bytes boundary. > > - enter/exit_vmx_ops pairness. > > > > This is a great idea, just a thought though - perhaps it might make > more sense to have each condition be tested for in a separate binary > rather than a single binary that tests everything. > I am not sure whether it is better to seperate those test cases on the same memcmp func. Currently the time to execute this single test bin is within 35 secs, and keeping them in one exec will make people test more comprehensively. So Personally I prefer to keep them in one at present. Thanks, - Simon
[PATCH] powerpc/mm/radix: Drop unneeded NULL check
From: Michael Ellerman We call these functions with non-NULL mm or vma. Hence we can skip the NULL check in these functions. We also remove now unused function __local_flush_hugetlb_page(). Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/hugetlb.h | 6 -- arch/powerpc/mm/tlb-radix.c| 18 -- 2 files changed, 8 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h index b8a0fb442c64..795d825c2edd 100644 --- a/arch/powerpc/include/asm/hugetlb.h +++ b/arch/powerpc/include/asm/hugetlb.h @@ -40,12 +40,6 @@ static inline void flush_hugetlb_page(struct vm_area_struct *vma, return radix__flush_hugetlb_page(vma, vmaddr); } -static inline void __local_flush_hugetlb_page(struct vm_area_struct *vma, - unsigned long vmaddr) -{ - if (radix_enabled()) - return radix__local_flush_hugetlb_page(vma, vmaddr); -} #else static inline pte_t *hugepd_page(hugepd_t hpd) diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c index 191cc5abc940..98fdf2243d35 100644 --- a/arch/powerpc/mm/tlb-radix.c +++ b/arch/powerpc/mm/tlb-radix.c @@ -164,7 +164,7 @@ void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmadd unsigned long ap = mmu_get_ap(psize); preempt_disable(); - pid = mm ? mm->context.id : 0; + pid = mm->context.id; if (pid != MMU_NO_CONTEXT) _tlbiel_va(vmaddr, pid, ap, RIC_FLUSH_TLB); preempt_enable(); @@ -175,10 +175,9 @@ void radix__local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmadd #ifdef CONFIG_HUGETLB_PAGE /* need the return fix for nohash.c */ if (vma && is_vm_hugetlb_page(vma)) - return __local_flush_hugetlb_page(vma, vmaddr); + return radix__local_flush_hugetlb_page(vma, vmaddr); #endif - radix__local_flush_tlb_page_psize(vma ? vma->vm_mm : NULL, vmaddr, - mmu_virtual_psize); + radix__local_flush_tlb_page_psize(vma->vm_mm, vmaddr, mmu_virtual_psize); } EXPORT_SYMBOL(radix__local_flush_tlb_page); @@ -232,7 +231,7 @@ void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr, unsigned long ap = mmu_get_ap(psize); preempt_disable(); - pid = mm ? mm->context.id : 0; + pid = mm->context.id; if (unlikely(pid == MMU_NO_CONTEXT)) goto bail; if (!mm_is_thread_local(mm)) @@ -247,10 +246,9 @@ void radix__flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr) { #ifdef CONFIG_HUGETLB_PAGE if (vma && is_vm_hugetlb_page(vma)) - return flush_hugetlb_page(vma, vmaddr); + return radix__flush_hugetlb_page(vma, vmaddr); #endif - radix__flush_tlb_page_psize(vma ? vma->vm_mm : NULL, vmaddr, - mmu_virtual_psize); + radix__flush_tlb_page_psize(vma->vm_mm, vmaddr, mmu_virtual_psize); } EXPORT_SYMBOL(radix__flush_tlb_page); @@ -330,7 +328,7 @@ void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start, preempt_disable(); - pid = mm ? mm->context.id : 0; + pid = mm->context.id; if (unlikely(pid == MMU_NO_CONTEXT)) goto err_out; @@ -361,7 +359,7 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr) unsigned long pid, end; - pid = mm ? mm->context.id : 0; + pid = mm->context.id; if (unlikely(pid == MMU_NO_CONTEXT)) goto no_context; -- 2.13.6
Re: [PATCH kernel] RFC: prom_init: Fetch flatten device tree from the system firmware
On 16/10/17 17:46, David Gibson wrote: > On Mon, Oct 16, 2017 at 05:22:55PM +1100, Alexey Kardashevskiy wrote: >> On 16/10/17 17:11, David Gibson wrote: >>> On Mon, Oct 16, 2017 at 04:49:17PM +1100, Alexey Kardashevskiy wrote: At the moment, on 256CPU + 256 PCI devices guest, it takes the guest about 8.5sec to read the entire device tree. Some explanation can be found here: https://patchwork.ozlabs.org/patch/826124/ but mostly it is because the kernel traverses the tree twice and it calls "getprop" for each properly which is really SLOF as it searches from the linked list beginning every time. Since SLOF has just learned to build FDT and this takes less than 0.5sec for such a big guest, this makes use of the proposed client interface method - "fdt-fetch". If "fdt-fetch" is not available, the old method is used. Signed-off-by: Alexey Kardashevskiy >>> >>> I like the concept, few details though.. >>> --- arch/powerpc/kernel/prom_init.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 02190e90c7ae..daa50a153737 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -2498,6 +2498,31 @@ static void __init flatten_device_tree(void) prom_panic("Can't allocate initial device-tree chunk\n"); mem_end = mem_start + room; + if (!call_prom_ret("fdt-fetch", 2, 1, NULL, mem_start, + room - sizeof(mem_reserve_map))) { + u32 size; + + hdr = (void *) mem_start; + + /* Fixup the boot cpuid */ + hdr->boot_cpuid_phys = cpu_to_be32(prom.cpu); >>> >>> If SLOF is generating a tree it really should get this header field >>> right as well. >> >> >> Ah, I did not realize it is just a phandle from /chosen/cpu. Will >> fix. > > It's not a phandle. It's just the "address" (i.e. reg value) of the > boot cpu. Well, it is "reg" of a CPU with phandle==/chosen/cpu so my fdt code needs to look there to pick the right "reg" rather than just plain 0. I'll fix this but in general can it possibly be not a zero in QEMU/SLOF? > + /* Append the reserved map to the end of the blob */ + hdr->off_mem_rsvmap = hdr->totalsize; + size = be32_to_cpu(hdr->totalsize); + rsvmap = (void *) hdr + size; + hdr->totalsize = cpu_to_be32(size + sizeof(mem_reserve_map)); + memcpy(rsvmap, mem_reserve_map, sizeof(mem_reserve_map)); >>> >>> .. and the reserve map for that matter. I don't really understand >>> what you're doing here. >> >> ? Get the blob, increase the FDT size by sizeof(mem_reserve_map), fix up >> totalsize and off_mem_rsvmap, copy mem_reserve_map to the end of the blob >> (the actual order is slightly different, may be a bit confusing). > > Right.. but where is mem_reserve_map coming from, if it hasn't come > from an FDT? > >> Asking SLOF to reserve the space seems to be unnecessary complication of >> the interface - SLOF does not provide any reserved memory records. > > Ah.. right, the reservations are coming from the pre-prom kernel, not > from the firmware itself. Yeah, that makes sense. Ok, this makes > sense then... Right, the reservations are added via reserve_mem() in arch/powerpc/kernel/prom_init.c > >>> Note also that the reserve map is required to >>> be 8-byte aligned, which totalsize might not be. >> >> Ah, good point. > > ..at least with that fixed and maybe some comments to make what's > gonig on clearer. >> >> >>> + /* Store the DT address */ + dt_header_start = mem_start; + +#ifdef DEBUG_PROM + prom_printf("Fetched DTB: %d bytes to @%x\n", size, mem_start); +#endif + goto print_exit; + } + /* Get root of tree */ root = call_prom("peer", 1, 1, (phandle)0); if (root == (phandle)0) @@ -2548,6 +2573,7 @@ static void __init flatten_device_tree(void) /* Copy the reserve map in */ memcpy(rsvmap, mem_reserve_map, sizeof(mem_reserve_map)); +print_exit: #ifdef DEBUG_PROM { int i; >>> >> >> > > > > -- Alexey signature.asc Description: OpenPGP digital signature
[PATCH] powerpc/mm/hash: Add pr_fmt() to hash_utils64.c
Make the printks look a bit nicer by adding a prefix. Radix config now do radix-mmu: Page sizes from device-tree: radix-mmu: Page size shift = 12 AP=0x0 radix-mmu: Page size shift = 16 AP=0x5 radix-mmu: Page size shift = 21 AP=0x1 radix-mmu: Page size shift = 30 AP=0x2 This patch update hash config to do similar dmesg output. With the patch we have hash-mmu: Page sizes from device-tree: hash-mmu: base_shift=12: shift=12, sllp=0x, avpnm=0x, tlbiel=1, penc=0 hash-mmu: base_shift=12: shift=16, sllp=0x, avpnm=0x, tlbiel=1, penc=7 hash-mmu: base_shift=12: shift=24, sllp=0x, avpnm=0x, tlbiel=1, penc=56 hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x, tlbiel=1, penc=1 hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x, tlbiel=1, penc=8 hash-mmu: base_shift=20: shift=20, sllp=0x0111, avpnm=0x, tlbiel=0, penc=2 hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x0001, tlbiel=0, penc=0 hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x07ff, tlbiel=0, penc=3 Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/mm/hash_utils_64.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index 78955c2153ed..b28dafe7d404 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -21,6 +21,7 @@ #undef DEBUG #undef DEBUG_LOW +#define pr_fmt(fmt) "hash-mmu: " fmt #include #include #include -- 2.13.6