Re: [PATCH] powerpc/mm: Update set_ptes to call pte_filter for all the ptes
Le 18/10/2023 à 06:55, Aneesh Kumar K.V a écrit : > With commit 9fee28baa601 ("powerpc: implement the new page table range > API") we added set_ptes to powerpc architecture but the implementation > missed calling the pte filter for all the ptes we are setting in the > range. set_pte_filter can be used for filter pte values and on some > platforms which don't support coherent icache it clears the exec bit so > that we can flush the icache on exec fault > > The patch also removes the usage of arch_enter/leave_lazy_mmu() because > set_pte is not supposed to be used when updating a pte entry. Powerpc > architecture uses this rule to skip the expensive tlb invalidate which > is not needed when you are setting up the pte for the first time. See > commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating > _PAGE_NUMA bit") for more details > > Fixes: 9fee28baa601 ("powerpc: implement the new page table range API") > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/mm/pgtable.c | 33 - > 1 file changed, 20 insertions(+), 13 deletions(-) > > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c > index 3ba9fe411604..95ab20cca2da 100644 > --- a/arch/powerpc/mm/pgtable.c > +++ b/arch/powerpc/mm/pgtable.c > @@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, > pte_t *ptep, > pte_t pte, unsigned int nr) > { > /* > - * Make sure hardware valid bit is not set. We don't do > - * tlb flush for this update. > + * We don't need to call arch_enter/leave_lazy_mmu_mode() > + * because we expect set_ptes to be only be used on not present > + * and not hw_valid ptes. Hence there is not translation cache flush > + * involved that need to be batched. >*/ > - VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); > + for (;;) { > > - /* Note: mm->context.id might not yet have been assigned as > - * this context might not have been activated yet when this > - * is called. > - */ > - pte = set_pte_filter(pte); > + /* > + * Make sure hardware valid bit is not set. We don't do > + * tlb flush for this update. > + */ > + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); > > - /* Perform the setting of the PTE */ > - arch_enter_lazy_mmu_mode(); > - for (;;) { > + /* Note: mm->context.id might not yet have been assigned as > + * this context might not have been activated yet when this > + * is called. > + */ > + pte = set_pte_filter(pte); Why do you need to call set_pte_filter() inside the loop ? The only difference between previous pte and next pte is the RPN, other flags remain untouched so I can't see why you need to call set_pte_filter() again. > + > + /* Perform the setting of the PTE */ > __set_pte_at(mm, addr, ptep, pte, 0); > if (--nr == 0) > break; > ptep++; > - pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT)); > addr += PAGE_SIZE; > + /* increment the pfn */ > + pte = __pte(pte_val(pte) + PAGE_SIZE); PAGE_SIZE doesn't work on all platforms, see for instance e500. see comment at https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/32/pgtable.h#L147 And then you see https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/pte-e500.h#L63 > + > } > - arch_leave_lazy_mmu_mode(); > } > > void unmap_kernel_page(unsigned long va) Christophe
Re: [Bisected] PowerMac G5 fails booting kernel 6.6-rc3 (BUG: Unable to handle kernel data access at 0xfeffbb62ffec65fe)
Erhard Furtner writes: > On Tue, 17 Oct 2023 14:40:49 +1100 > Michael Ellerman wrote: > >> I think I've reproduced the crash on my Quad G5 by using your config >> with some things tweaked, but I don't get any output on the screen :/ > > You could try PPC_EARLY_DEBUG=y with PPC_EARLY_DEBUG_BOOTX or > PPC_EARLY_DEBUG_G5. I have tried PPC_EARLY_DEBUG_BOOTX but it didn't help :/ >> Do you mind booting the commit above and taking a photo of the oops and >> attach it here. The oops you transcribed didn't entirely make sense, >> probably due to a typo here or there, so a photo would be best. >> >> cheers > > No problem. Just thought transcribing the photo would make more sense > for a mailing list. ;) But maybe some subtle errors slept crept in. > Attached are 2 photos from the issue on current v6.6-rc6. Thanks. Yeah text is generally better, it archives better and can be grepped etc. but in this case I was going a bit mad trying to make sense of the oops :) In hindsight the bug is an obvious boot time ordering problem, can you confirm this fixes it. That should apply on top of Linus' current master. cheers diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 2f1026fba00d..71f16fb32ceb 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -948,6 +948,7 @@ void __init setup_arch(char **cmdline_p) /* Parse memory topology */ mem_topology_setup(); + set_max_mapnr(max_pfn); /* * Release secondary cpus out of their spinloops at 0x60 now that diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 8b121df7b08f..07e8f4f1e07f 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -288,7 +288,6 @@ void __init mem_init(void) #endif high_memory = (void *) __va(max_low_pfn * PAGE_SIZE); - set_max_mapnr(max_pfn); kasan_late_init();
[PATCH] powerpc/mm: Update set_ptes to call pte_filter for all the ptes
With commit 9fee28baa601 ("powerpc: implement the new page table range API") we added set_ptes to powerpc architecture but the implementation missed calling the pte filter for all the ptes we are setting in the range. set_pte_filter can be used for filter pte values and on some platforms which don't support coherent icache it clears the exec bit so that we can flush the icache on exec fault The patch also removes the usage of arch_enter/leave_lazy_mmu() because set_pte is not supposed to be used when updating a pte entry. Powerpc architecture uses this rule to skip the expensive tlb invalidate which is not needed when you are setting up the pte for the first time. See commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit") for more details Fixes: 9fee28baa601 ("powerpc: implement the new page table range API") Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/mm/pgtable.c | 33 - 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 3ba9fe411604..95ab20cca2da 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr) { /* -* Make sure hardware valid bit is not set. We don't do -* tlb flush for this update. +* We don't need to call arch_enter/leave_lazy_mmu_mode() +* because we expect set_ptes to be only be used on not present +* and not hw_valid ptes. Hence there is not translation cache flush +* involved that need to be batched. */ - VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); + for (;;) { - /* Note: mm->context.id might not yet have been assigned as -* this context might not have been activated yet when this -* is called. -*/ - pte = set_pte_filter(pte); + /* +* Make sure hardware valid bit is not set. We don't do +* tlb flush for this update. +*/ + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); - /* Perform the setting of the PTE */ - arch_enter_lazy_mmu_mode(); - for (;;) { + /* Note: mm->context.id might not yet have been assigned as +* this context might not have been activated yet when this +* is called. +*/ + pte = set_pte_filter(pte); + + /* Perform the setting of the PTE */ __set_pte_at(mm, addr, ptep, pte, 0); if (--nr == 0) break; ptep++; - pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT)); addr += PAGE_SIZE; + /* increment the pfn */ + pte = __pte(pte_val(pte) + PAGE_SIZE); + } - arch_leave_lazy_mmu_mode(); } void unmap_kernel_page(unsigned long va) -- 2.41.0
[PATCH] powerpc/vas: Limit open window failure messages in log bufffer
The VAS open window call prints error message and returns -EBUSY after the migration suspend event initiated and until the resume event completed on the destination system. It can cause the log buffer filled with these error messages if the user space issues continuous open window calls. Similar case even for DLPAR CPU remove event when no credits are available until the credits are freed or with the other DLPAR CPU add event. So changes in the patch to use pr_err_ratelimited() instead of pr_err() to display open window failure and not-available credits error messages. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/book3s/vas-api.c | 4 ++-- arch/powerpc/platforms/pseries/vas.c| 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c index 77ea9335fd04..203cfc2fb8ff 100644 --- a/arch/powerpc/platforms/book3s/vas-api.c +++ b/arch/powerpc/platforms/book3s/vas-api.c @@ -311,8 +311,8 @@ static int coproc_ioc_tx_win_open(struct file *fp, unsigned long arg) txwin = cp_inst->coproc->vops->open_win(uattr.vas_id, uattr.flags, cp_inst->coproc->cop_type); if (IS_ERR(txwin)) { - pr_err("%s() VAS window open failed, %ld\n", __func__, - PTR_ERR(txwin)); + pr_err_ratelimited("%s() VAS window open failed, %ld\n", + __func__, PTR_ERR(txwin)); return PTR_ERR(txwin); } diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c index b86f0db08e98..7259e6676503 100644 --- a/arch/powerpc/platforms/pseries/vas.c +++ b/arch/powerpc/platforms/pseries/vas.c @@ -341,7 +341,7 @@ static struct vas_window *vas_allocate_window(int vas_id, u64 flags, if (atomic_inc_return(_feat_caps->nr_used_credits) > atomic_read(_feat_caps->nr_total_credits)) { - pr_err("Credits are not available to allocate window\n"); + pr_err_ratelimited("Credits are not available to allocate window\n"); rc = -EINVAL; goto out; } @@ -439,7 +439,7 @@ static struct vas_window *vas_allocate_window(int vas_id, u64 flags, put_vas_user_win_ref(>vas_win.task_ref); rc = -EBUSY; - pr_err("No credit is available to allocate window\n"); + pr_err_ratelimited("No credit is available to allocate window\n"); out_free: /* -- 2.26.3
[PATCH v2] powerpc/pseries/vas: Migration suspend waits for no in-progress open windows
The hypervisor returns migration failure if all VAS windows are not closed. During pre-migration stage, vas_migration_handler() sets migration_in_progress flag and closes all windows from the list. The allocate VAS window routine checks the migration flag, setup the window and then add it to the list. So there is possibility of the migration handler missing the window that is still in the process of setup. t1: Allocate and open VAS t2: Migration event window lock vas_pseries_mutex If migration_in_progress set unlock vas_pseries_mutex return open window HCALL unlock vas_pseries_mutex Modify window HCALL lock vas_pseries_mutex setup windowmigration_in_progress=true Closes all windows from the list unlock vas_pseries_mutex lock vas_pseries_mutex return if nr_closed_windows == 0 // No DLPAR CPU or migration add to the list unlock vas_pseries_mutex return unlock vas_pseries_mutex Close VAS window // due to DLPAR CPU or migration return -EBUSY This patch resolves the issue with the following steps: - Define migration_in_progress as atomic so that the migration handler sets this flag without holding mutex. - Introduce nr_open_wins_progress counter in VAS capabilities struct - This counter tracks the number of open windows are still in progress - The allocate setup window thread closes windows if the migration is set and decrements nr_open_window_progress counter - The migration handler waits for no in-progress open windows. Fixes: 37e6764895ef ("powerpc/pseries/vas: Add VAS migration handler") Signed-off-by: Haren Myneni --- Changes from v1: - Do not define the migration_in_progress flag as atomic as suggested by Nathan --- arch/powerpc/platforms/pseries/vas.c | 45 +++- arch/powerpc/platforms/pseries/vas.h | 2 ++ 2 files changed, 40 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c index 15d958e38eca..b86f0db08e98 100644 --- a/arch/powerpc/platforms/pseries/vas.c +++ b/arch/powerpc/platforms/pseries/vas.c @@ -32,6 +32,7 @@ static struct hv_vas_cop_feat_caps hv_cop_caps; static struct vas_caps vascaps[VAS_MAX_FEAT_TYPE]; static DEFINE_MUTEX(vas_pseries_mutex); static bool migration_in_progress; +static DECLARE_WAIT_QUEUE_HEAD(open_win_progress_wq); static long hcall_return_busy_check(long rc) { @@ -384,11 +385,15 @@ static struct vas_window *vas_allocate_window(int vas_id, u64 flags, * same fault IRQ is not freed by the OS before. */ mutex_lock(_pseries_mutex); - if (migration_in_progress) + if (migration_in_progress) { rc = -EBUSY; - else + } else { rc = allocate_setup_window(txwin, (u64 *)[0], cop_feat_caps->win_type); + if (!rc) + atomic_inc(>nr_open_wins_progress); + } + mutex_unlock(_pseries_mutex); if (rc) goto out; @@ -403,8 +408,17 @@ static struct vas_window *vas_allocate_window(int vas_id, u64 flags, goto out_free; txwin->win_type = cop_feat_caps->win_type; - mutex_lock(_pseries_mutex); + /* +* The migration SUSPEND thread sets migration_in_progress and +* closes all open windows from the list. But the window is +* added to the list after open and modify HCALLs. So possible +* that migration_in_progress is set before modify HCALL which +* may cause some windows are still open when the hypervisor +* initiates the migration. +* So checks the migration_in_progress flag again and close all +* open windows. +* * Possible to lose the acquired credit with DLPAR core * removal after the window is opened. So if there are any * closed windows (means with lost credits), do not give new @@ -412,9 +426,11 @@ static struct vas_window *vas_allocate_window(int vas_id, u64 flags, * after the existing windows are reopened when credits are * available. */ - if (!caps->nr_close_wins) { + mutex_lock(_pseries_mutex); + if (!caps->nr_close_wins && !migration_in_progress) { list_add(>win_list, >list); caps->nr_open_windows++; + atomic_dec(>nr_open_wins_progress); mutex_unlock(_pseries_mutex); vas_user_win_add_mm_context(>vas_win.task_ref); return >vas_win; @@ -432,6 +448,8 @@ static struct vas_window *vas_allocate_window(int vas_id, u64 flags, */ free_irq_setup(txwin); h_deallocate_vas_window(txwin->vas_win.winid); + atomic_dec(>nr_open_wins_progress); + wake_up(_win_progress_wq); out: atomic_dec(_feat_caps->nr_used_credits);
Re: [RFC PATCH v6 09/11] media: uapi: Add audio rate controls support
On Tue, Oct 17, 2023 at 9:37 PM Hans Verkuil wrote: > > On 17/10/2023 15:11, Shengjiu Wang wrote: > > On Mon, Oct 16, 2023 at 9:16 PM Hans Verkuil wrote: > >> > >> Hi Shengjiu, > >> > >> On 13/10/2023 10:31, Shengjiu Wang wrote: > >>> Fixed point controls are used by the user to configure > >>> the audio sample rate to driver. > >>> > >>> Add V4L2_CID_ASRC_SOURCE_RATE and V4L2_CID_ASRC_DEST_RATE > >>> new IDs for ASRC rate control. > >>> > >>> Signed-off-by: Shengjiu Wang > >>> --- > >>> .../userspace-api/media/v4l/common.rst| 1 + > >>> .../media/v4l/ext-ctrls-fixed-point.rst | 36 +++ > >>> .../media/v4l/vidioc-g-ext-ctrls.rst | 4 +++ > >>> .../media/v4l/vidioc-queryctrl.rst| 7 > >>> .../media/videodev2.h.rst.exceptions | 1 + > >>> drivers/media/v4l2-core/v4l2-ctrls-core.c | 5 +++ > >>> drivers/media/v4l2-core/v4l2-ctrls-defs.c | 4 +++ > >>> include/media/v4l2-ctrls.h| 2 ++ > >>> include/uapi/linux/v4l2-controls.h| 13 +++ > >>> include/uapi/linux/videodev2.h| 3 ++ > >>> 10 files changed, 76 insertions(+) > >>> create mode 100644 > >>> Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst > >>> > >>> diff --git a/Documentation/userspace-api/media/v4l/common.rst > >>> b/Documentation/userspace-api/media/v4l/common.rst > >>> index ea0435182e44..35707edffb13 100644 > >>> --- a/Documentation/userspace-api/media/v4l/common.rst > >>> +++ b/Documentation/userspace-api/media/v4l/common.rst > >>> @@ -52,6 +52,7 @@ applicable to all devices. > >>> ext-ctrls-fm-rx > >>> ext-ctrls-detect > >>> ext-ctrls-colorimetry > >>> +ext-ctrls-fixed-point > >> > >> Rename this to ext-ctrls-audio-m2m. > >> > >>> fourcc > >>> format > >>> planar-apis > >>> diff --git > >>> a/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst > >>> b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst > >>> new file mode 100644 > >>> index ..2ef6e250580c > >>> --- /dev/null > >>> +++ b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst > >>> @@ -0,0 +1,36 @@ > >>> +.. SPDX-License-Identifier: GFDL-1.1-no-invariants-or-later > >>> + > >>> +.. _fixed-point-controls: > >>> + > >>> +*** > >>> +Fixed Point Control Reference > >> > >> This is for audio controls. "Fixed Point" is just the type, and it doesn't > >> make > >> sense to group fixed point controls. But it does make sense to group the > >> audio > >> controls. > >> > >> V4L2 controls can be grouped into classes. Basically it is a way to put > >> controls > >> into categories, and for each category there is also a control that gives a > >> description of the class (see 2.15.15 in > >> https://linuxtv.org/downloads/v4l-dvb-apis-new/driver-api/v4l2-controls.html#introduction) > >> > >> If you use e.g. 'v4l2-ctl -l' to list all the controls, then you will see > >> that > >> they are grouped based on what class of control they are. > >> > >> So I think it would be a good idea to create a new control class for M2M > >> audio controls, > >> instead of just adding them to the catch-all 'User Controls' class. > >> > >> Search e.g. for V4L2_CTRL_CLASS_COLORIMETRY and V4L2_CID_COLORIMETRY_CLASS > >> to see how > >> it is done. > >> > >> M2M_AUDIO would probably be a good name for the class. > >> > >>> +*** > >>> + > >>> +These controls are intended to support an asynchronous sample > >>> +rate converter. > >> > >> Add ' (ASRC).' at the end to indicate the common abbreviation for > >> that. > >> > >>> + > >>> +.. _v4l2-audio-asrc: > >>> + > >>> +``V4L2_CID_ASRC_SOURCE_RATE`` > >>> +sets the resampler source rate. > >>> + > >>> +``V4L2_CID_ASRC_DEST_RATE`` > >>> +sets the resampler destination rate. > >> > >> Document the unit (Hz) for these two controls. > >> > >>> + > >>> +.. c:type:: v4l2_ctrl_fixed_point > >>> + > >>> +.. cssclass:: longtable > >>> + > >>> +.. tabularcolumns:: |p{1.5cm}|p{5.8cm}|p{10.0cm}| > >>> + > >>> +.. flat-table:: struct v4l2_ctrl_fixed_point > >>> +:header-rows: 0 > >>> +:stub-columns: 0 > >>> +:widths: 1 1 2 > >>> + > >>> +* - __u32 > >> > >> Hmm, shouldn't this be __s32? > >> > >>> + - ``integer`` > >>> + - integer part of fixed point value. > >>> +* - __s32 > >> > >> and this __u32? > >> > >> You want to be able to use this generic type as a signed value. > >> > >>> + - ``fractional`` > >>> + - fractional part of fixed point value, which is Q31. > >>> diff --git a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst > >>> b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst > >>> index f9f73530a6be..1811dabf5c74 100644 > >>> --- a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst > >>> +++ b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst > >>> @@ -295,6 +295,10 @@ still cause this
[PATCH] hvc/xen: fix event channel handling for secondary consoles
From: David Woodhouse The xencons_connect_backend() function allocates a local interdomain event channel with xenbus_alloc_evtchn(), then calls bind_interdomain_evtchn_to_irq_lateeoi() to bind to that port# on the *remote* domain. That doesn't work very well: (qemu) device_add xen-console,id=con1,chardev=pty0 [ 44.323872] xenconsole console-1: 2 xenbus_dev_probe on device/console/1 [ 44.323995] xenconsole: probe of console-1 failed with error -2 Fix it to use bind_evtchn_to_irq_lateeoi(), which does the right thing by just binding that *local* event channel to an irq. The backend will do the interdomain binding. This didn't affect the primary console because the setup for that is special — the toolstack allocates the guest event channel and the guest discovers it with HVMOP_get_param. Once that's fixed, there's also a warning on hot-unplug because xencons_disconnect_backend() unconditionally calls free_irq() via unbind_from_irqhandler(): (qemu) device_del con1 [ 32.050919] [ cut here ] [ 32.050942] Trying to free already-free IRQ 33 [ 32.050990] WARNING: CPU: 0 PID: 51 at kernel/irq/manage.c:1895 __free_irq+0x1d4/0x330 Fix that by calling notifier_del_irq() first, which only calls free_irq() if the irq was requested in the first place. Then use evtchn_put() to release the irq and event channel. Avoid calling xenbus_free_evtchn() in the normal case, as evtchn_put() will do that too. The only time xenbus_free_evtchn() needs to be called is for the cleanup when bind_evtchn_to_irq_lateeoi() fails. Finally, fix the error path in xen_hvc_init() when there's no primary console. It should still register the frontend driver, as there may be secondary consoles. (Qemu can always add secondary consoles, but only the toolstack can add the primary because it's special.) Fixes: fe415186b4 ("xen/console: harden hvc_xen against event channel storms") Signed-off-by: David Woodhouse Cc: sta...@vger.kernel.org --- Untested on actual Xen. drivers/tty/hvc/hvc_xen.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index 98764e740c07..f0376612b267 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -377,9 +377,13 @@ void xen_console_resume(void) #ifdef CONFIG_HVC_XEN_FRONTEND static void xencons_disconnect_backend(struct xencons_info *info) { - if (info->irq > 0) - unbind_from_irqhandler(info->irq, NULL); - info->irq = 0; + if (info->irq > 0) { + notifier_del_irq(info->hvc, info->irq); + evtchn_put(info->evtchn); + info->irq = 0; + info->evtchn = 0; + } + /* evtchn_put() will also close it so this is only an error path */ if (info->evtchn > 0) xenbus_free_evtchn(info->xbdev, info->evtchn); info->evtchn = 0; @@ -433,7 +437,7 @@ static int xencons_connect_backend(struct xenbus_device *dev, if (ret) return ret; info->evtchn = evtchn; - irq = bind_interdomain_evtchn_to_irq_lateeoi(dev, evtchn); + irq = bind_evtchn_to_irq_lateeoi(evtchn); if (irq < 0) return irq; info->irq = irq; @@ -597,7 +601,7 @@ static int __init xen_hvc_init(void) else r = xen_pv_console_init(); if (r < 0) - return r; + goto register_fe; info = vtermno_to_xencons(HVC_COOKIE); info->irq = bind_evtchn_to_irq_lateeoi(info->evtchn); @@ -616,12 +620,13 @@ static int __init xen_hvc_init(void) list_del(>list); spin_unlock_irqrestore(_lock, flags); if (info->irq) - unbind_from_irqhandler(info->irq, NULL); + evtchn_put(info->evtchn); kfree(info); return r; } r = 0; + register_fe: #ifdef CONFIG_HVC_XEN_FRONTEND r = xenbus_register_frontend(_driver); #endif -- 2.40.1 smime.p7s Description: S/MIME cryptographic signature
[PING][PATCH] uapi/auxvec: Define AT_HWCAP3 and AT_HWCAP4 aux vector, entries
CCing linux-kernel for more exposure. PING. I'm waiting on a reply from anyone on the kernel side of things to see whether they have an issue with reserving values for AT_HWCAP3 and AT_HWCAP4. I'll note reviews from the GLIBC camp did not have an issue with the below patch. Thanks. Peter On 9/26/23 5:02 PM, Peter Bergner wrote: > The powerpc toolchain keeps a copy of the HWCAP bit masks in our TCB for fast > access by our __builtin_cpu_supports built-in function. The TCB space for > the HWCAP entries - which are created in pairs - is an ABI extension, so > waiting to create the space for HWCAP3 and HWCAP4 until we need them is > problematical, given distro unwillingness to apply ABI modifying patches > to distro point releases. Define AT_HWCAP3 and AT_HWCAP4 in the generic > uapi header so they can be used in GLIBC to reserve space in the powerpc > TCB for their future use. > > I scanned both the Linux and GLIBC source codes looking for unused AT_* > values and 29 and 30 did not seem to be used, so they are what I went > with. If anyone sees a problem with using those specific values, I'm > amenable to using other values, just let me know what would be better. > > Peter > > > Signed-off-by: Peter Bergner > --- > include/uapi/linux/auxvec.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/uapi/linux/auxvec.h b/include/uapi/linux/auxvec.h > index 6991c4b8ab18..cc61cb9b3e9a 100644 > --- a/include/uapi/linux/auxvec.h > +++ b/include/uapi/linux/auxvec.h > @@ -32,6 +32,8 @@ > #define AT_HWCAP2 26 /* extension of AT_HWCAP */ > #define AT_RSEQ_FEATURE_SIZE 27 /* rseq supported feature size */ > #define AT_RSEQ_ALIGN28 /* rseq allocation alignment */ > +#define AT_HWCAP3 29 /* extension of AT_HWCAP */ > +#define AT_HWCAP4 30 /* extension of AT_HWCAP */ > > #define AT_EXECFN 31/* filename of program */ >
Re: [PATCH 0/2] Allow nesting of lazy MMU mode
On Tue, 17 Oct 2023 11:34:23 +0530 "Aneesh Kumar K.V" wrote: > ie, we can do something like below. The change also make sure we call > set_pte_filter on all the ptes we are setting via set_ptes(). I haven't > sent this as a proper patch because we still are not able to fix the > issue Erhard reported. > > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c > index 3ba9fe411604..95ab20cca2da 100644 > --- a/arch/powerpc/mm/pgtable.c > +++ b/arch/powerpc/mm/pgtable.c > @@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, > pte_t *ptep, > pte_t pte, unsigned int nr) > { > /* > - * Make sure hardware valid bit is not set. We don't do > - * tlb flush for this update. > + * We don't need to call arch_enter/leave_lazy_mmu_mode() > + * because we expect set_ptes to be only be used on not present > + * and not hw_valid ptes. Hence there is not translation cache flush > + * involved that need to be batched. >*/ > - VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); > + for (;;) { > > - /* Note: mm->context.id might not yet have been assigned as > - * this context might not have been activated yet when this > - * is called. > - */ > - pte = set_pte_filter(pte); > + /* > + * Make sure hardware valid bit is not set. We don't do > + * tlb flush for this update. > + */ > + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); > > - /* Perform the setting of the PTE */ > - arch_enter_lazy_mmu_mode(); > - for (;;) { > + /* Note: mm->context.id might not yet have been assigned as > + * this context might not have been activated yet when this > + * is called. > + */ > + pte = set_pte_filter(pte); > + > + /* Perform the setting of the PTE */ > __set_pte_at(mm, addr, ptep, pte, 0); > if (--nr == 0) > break; > ptep++; > - pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT)); > addr += PAGE_SIZE; > + /* increment the pfn */ > + pte = __pte(pte_val(pte) + PAGE_SIZE); > + > } > - arch_leave_lazy_mmu_mode(); > } > > void unmap_kernel_page(unsigned long va) Was this a new version of the patch for me to test? Sorry for asking but this was a bit unclear to me. In any case I tried it on top of v6.6-rc6 and it did not help with the issue I reported. Regards, Erhard
Re: [PATCH 1/4] kbuild: remove ARCH_POSTLINK from module builds
On Tue, Oct 17, 2023 at 07:37:39PM +0900, Masahiro Yamada wrote: > The '%.ko' rule in arch/*/Makefile.postlink does nothing but call the > 'true' command. > > Remove the meaningless code. > > Signed-off-by: Masahiro Yamada > --- > > arch/mips/Makefile.postlink| 3 --- > arch/powerpc/Makefile.postlink | 3 --- > arch/riscv/Makefile.postlink | 3 --- > arch/x86/Makefile.postlink | 3 --- > scripts/Makefile.modfinal | 5 + > 5 files changed, 1 insertion(+), 16 deletions(-) Reviewed-by: Nicolas Schier
Re: [PATCH 1/4] add generic builtin command line
Hi Daniel We have a usecase which requires this patch necessarily. For android usecases, we have two different build variants differentiated by defconfigs - production and debug. However, we only have a single dts for both these variants. We want to enable certain features like page owner and slub debug which require cmdline params in addition to their respective configs to be enabled. Enabling page_owner and slub_debug options in dts file enables it for both production and debug variants. These features have significant memory overhead which are undesirable for our production environment. However, these are necessary for debug environment to enable internal testing and debug. Currently, android uses out-of-tree configs like CONFIG_CMDLINE_EXTEND to do so in gki_defconfig [1]. One option is to use CMDLINE_FORCE option which would enable these cmdline params but this disables the bootloader to add any additional cmdline params which may be necessary. For such a usecase, the CONFIG_CMDLINE_PREPEND seems to be quite useful as it would help to stitch bootloader and the desired build variant's configs together. Can you please help to merge this patch? [1] https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1-lts/arch/arm64/configs/gki_defconfig#62
[PATCH 2/6] usb: gadget: fsl-udc: Convert to use module_platform_driver()
module_platform_driver_probe() has the advantage that the .probe() and .remove() calls can live in .init.text and .exit.text respectively and so some memory is saved. The downside is that dynamic bind and unbind are impossible. As the driver doesn't benefit from the advantages (both .probe and .remove are defined in plain .text), stop suffering from the downsides and use module_platform_driver() instead of module_platform_driver_probe(). Signed-off-by: Uwe Kleine-König --- drivers/usb/gadget/udc/fsl_udc_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c b/drivers/usb/gadget/udc/fsl_udc_core.c index ee5705d336e3..2693a10eb0c7 100644 --- a/drivers/usb/gadget/udc/fsl_udc_core.c +++ b/drivers/usb/gadget/udc/fsl_udc_core.c @@ -2666,6 +2666,7 @@ static const struct platform_device_id fsl_udc_devtype[] = { }; MODULE_DEVICE_TABLE(platform, fsl_udc_devtype); static struct platform_driver udc_driver = { + .probe = fsl_udc_probe, .remove = fsl_udc_remove, .id_table = fsl_udc_devtype, /* these suspend and resume are not usb suspend and resume */ @@ -2679,7 +2680,7 @@ static struct platform_driver udc_driver = { }, }; -module_platform_driver_probe(udc_driver, fsl_udc_probe); +module_platform_driver(udc_driver); MODULE_DESCRIPTION(DRIVER_DESC); MODULE_AUTHOR(DRIVER_AUTHOR); -- 2.42.0
[PATCH 0/6] usb: gadget: Convert to use module_platform_driver()
Hello, module_platform_driver_probe() is an alternative to module_platform_driver(). Comparing the two the former has the advantage that the probe and remove callbacks can live in .init.text and .exit.text respectively. The latter has the advantage that it's a bit easier to use correctly and you can bind/unbind via sysfs and/or hotplug. There are considerations about deprecating module_platform_driver_probe()[1] as very few drivers use it, still less make actually use of the advantages and saving a few bytes isn't as important any more as it was (say) 10 years ago. Given that the drivers below drivers/usb/gadget making use of module_platform_driver_probe() doesn't benefit from the advantages at all (probe and remove are all defined in .text), convert these drivers to module_platform_driver(). Best regards Uwe [1] https://lore.kernel.org/linux-kbuild/20231017132045.afswdgcv4axjf...@pengutronix.de Uwe Kleine-König (6): usb: gadget: at91-udc: Convert to use module_platform_driver() usb: gadget: fsl-udc: Convert to use module_platform_driver() usb: gadget: fusb300-udc: Convert to use module_platform_driver() usb: gadget: lpc32xx-udc: Convert to use module_platform_driver() usb: gadget: m66592-udc: Convert to use module_platform_driver() usb: gadget: r8a66597-udc: Convert to use module_platform_driver() drivers/usb/gadget/udc/at91_udc.c | 3 ++- drivers/usb/gadget/udc/fsl_udc_core.c | 3 ++- drivers/usb/gadget/udc/fusb300_udc.c | 7 --- drivers/usb/gadget/udc/lpc32xx_udc.c | 3 ++- drivers/usb/gadget/udc/m66592-udc.c | 3 ++- drivers/usb/gadget/udc/r8a66597-udc.c | 3 ++- 6 files changed, 14 insertions(+), 8 deletions(-) base-commit: 4d5ab2376ec576af173e5eac3887ed0b51bd8566 -- 2.42.0
Re: [PATCH] tools/perf/arch/powerpc: Fix the CPU ID const char* value by adding 0x prefix
On Mon, 9 Oct 2023 10:30:52 +0530, Athira Rajeev wrote: > Simple expression parser test fails in powerpc as below: > > 4: Simple expression parser > test child forked, pid 170385 > Using CPUID 004e2102 > division by zero > syntax error > syntax error > FAILED tests/expr.c:65 parse test failed > test child finished with -1 > Simple expression parser: FAILED! > > [...] Applied to perf-tools-next, thanks!
Re: [PATCH V2 0/3] Fix for shellcheck issues with latest scripts in tests/shell
On Fri, 13 Oct 2023 13:00:18 +0530, Athira Rajeev wrote: > shellcheck was run on perf tool shell scripts as a pre-requisite > to include a build option for shellcheck discussed here: > https://www.spinics.net/lists/linux-perf-users/msg25553.html > > And fixes were added for the coding/formatting issues in > two patchsets: > https://lore.kernel.org/linux-perf-users/20230613164145.50488-1-atraj...@linux.vnet.ibm.com/ > https://lore.kernel.org/linux-perf-users/20230709182800.53002-1-atraj...@linux.vnet.ibm.com/ > > [...] Applied to perf-tools-next, thanks!
[PATCH v3 4/4] PCI: layerscape: Add suspend/resume for ls1043a
ls1043a add suspend/resume support. Implement ls1043a_pcie_send_turnoff_msg() to send PME_Turn_Off message. Implement ls1043a_pcie_exit_from_l2() to exit from L2 state. Signed-off-by: Frank Li --- Notes: Change from v2 to v3 - Remove ls_pcie_lut_readl(writel) function Change from v1 to v2 - Update subject 'a' to 'A' drivers/pci/controller/dwc/pci-layerscape.c | 86 - 1 file changed, 85 insertions(+), 1 deletion(-) diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c index 4b663b20d8612..9656224960b0c 100644 --- a/drivers/pci/controller/dwc/pci-layerscape.c +++ b/drivers/pci/controller/dwc/pci-layerscape.c @@ -41,6 +41,15 @@ #define SCFG_PEXSFTRSTCR 0x190 #define PEXSR(idx) BIT(idx) +/* LS1043A PEX PME control register */ +#define SCFG_PEXPMECR 0x144 +#define PEXPME(idx)BIT(31 - (idx) * 4) + +/* LS1043A PEX LUT debug register */ +#define LS_PCIE_LDBG 0x7fc +#define LDBG_SRBIT(30) +#define LDBG_WEBIT(31) + #define PCIE_IATU_NUM 6 #define LS_PCIE_DRV_SCFG BIT(0) @@ -227,6 +236,68 @@ static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp) return 0; } +static void ls1043a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp) +{ + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); + struct ls_pcie *pcie = to_ls_pcie(pci); + u32 val; + + if (!pcie->scfg) { + dev_dbg(pcie->pci->dev, "SYSCFG is NULL\n"); + return; + } + + /* Send Turn_off message */ + regmap_read(pcie->scfg, SCFG_PEXPMECR, ); + val |= PEXPME(pcie->index); + regmap_write(pcie->scfg, SCFG_PEXPMECR, val); + + /* +* There is no specific register to check for PME_To_Ack from endpoint. +* So on the safe side, wait for PCIE_PME_TO_L2_TIMEOUT_US. +*/ + mdelay(PCIE_PME_TO_L2_TIMEOUT_US/1000); + + /* +* Layerscape hardware reference manual recommends clearing the PMXMTTURNOFF bit +* to complete the PME_Turn_Off handshake. +*/ + regmap_read(pcie->scfg, SCFG_PEXPMECR, ); + val &= ~PEXPME(pcie->index); + regmap_write(pcie->scfg, SCFG_PEXPMECR, val); +} + +static int ls1043a_pcie_exit_from_l2(struct dw_pcie_rp *pp) +{ + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); + struct ls_pcie *pcie = to_ls_pcie(pci); + u32 val; + + /* +* Only way let PEX module exit L2 is do a software reset. +* LDBG_WE: allows the user to have write access to the PEXDBG[SR] for both setting and +* clearing the soft reset on the PEX module. +* LDBG_SR: When SR is set to 1, the PEX module enters soft reset. +*/ + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG); + val |= LDBG_WE; + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val); + + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG); + val |= LDBG_SR; + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val); + + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG); + val &= ~LDBG_SR; + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val); + + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG); + val &= ~LDBG_WE; + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val); + + return 0; +} + static const struct dw_pcie_host_ops ls_pcie_host_ops = { .host_init = ls_pcie_host_init, .pme_turn_off = ls_pcie_send_turnoff_msg, @@ -244,6 +315,19 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = { .flags = LS_PCIE_DRV_SCFG, }; +static const struct dw_pcie_host_ops ls1043a_pcie_host_ops = { + .host_init = ls_pcie_host_init, + .pme_turn_off = ls1043a_pcie_send_turnoff_msg, +}; + +static const struct ls_pcie_drvdata ls1043a_drvdata = { + .pf_lut_off = 0x1, + .pm_support = true, + .ops = _pcie_host_ops, + .exit_from_l2 = ls1043a_pcie_exit_from_l2, + .flags = LS_PCIE_DRV_SCFG, +}; + static const struct ls_pcie_drvdata layerscape_drvdata = { .pf_lut_off = 0xc, .pm_support = true, @@ -254,7 +338,7 @@ static const struct of_device_id ls_pcie_of_match[] = { { .compatible = "fsl,ls1012a-pcie", .data = _drvdata }, { .compatible = "fsl,ls1021a-pcie", .data = _drvdata }, { .compatible = "fsl,ls1028a-pcie", .data = _drvdata }, - { .compatible = "fsl,ls1043a-pcie", .data = _drvdata }, + { .compatible = "fsl,ls1043a-pcie", .data = _drvdata }, { .compatible = "fsl,ls1046a-pcie", .data = _drvdata }, { .compatible = "fsl,ls2080a-pcie", .data = _drvdata }, { .compatible = "fsl,ls2085a-pcie", .data = _drvdata }, -- 2.34.1
[PATCH v3 3/4] PCI: layerscape: Rename pf_* as pf_lut_*
'pf' and 'lut' is just difference name in difference chips, but basic it is a MMIO base address plus an offset. Rename it to avoid duplicate pf_* and lut_* in driver. Signed-off-by: Frank Li --- Notes: change from v1 to v3 - new patch at v3 drivers/pci/controller/dwc/pci-layerscape.c | 34 ++--- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c index 6f47cfe146c44..4b663b20d8612 100644 --- a/drivers/pci/controller/dwc/pci-layerscape.c +++ b/drivers/pci/controller/dwc/pci-layerscape.c @@ -46,7 +46,7 @@ #define LS_PCIE_DRV_SCFG BIT(0) struct ls_pcie_drvdata { - const u32 pf_off; + const u32 pf_lut_off; const struct dw_pcie_host_ops *ops; int (*exit_from_l2)(struct dw_pcie_rp *pp); int flags; @@ -56,13 +56,13 @@ struct ls_pcie_drvdata { struct ls_pcie { struct dw_pcie *pci; const struct ls_pcie_drvdata *drvdata; - void __iomem *pf_base; + void __iomem *pf_lut_base; struct regmap *scfg; int index; bool big_endian; }; -#define ls_pcie_pf_readl_addr(addr)ls_pcie_pf_readl(pcie, addr) +#define ls_pcie_pf_lut_readl_addr(addr)ls_pcie_pf_lut_readl(pcie, addr) #define to_ls_pcie(x) dev_get_drvdata((x)->dev) static bool ls_pcie_is_bridge(struct ls_pcie *pcie) @@ -103,20 +103,20 @@ static void ls_pcie_fix_error_response(struct ls_pcie *pcie) iowrite32(PCIE_ABSERR_SETTING, pci->dbi_base + PCIE_ABSERR); } -static u32 ls_pcie_pf_readl(struct ls_pcie *pcie, u32 off) +static u32 ls_pcie_pf_lut_readl(struct ls_pcie *pcie, u32 off) { if (pcie->big_endian) - return ioread32be(pcie->pf_base + off); + return ioread32be(pcie->pf_lut_base + off); - return ioread32(pcie->pf_base + off); + return ioread32(pcie->pf_lut_base + off); } -static void ls_pcie_pf_writel(struct ls_pcie *pcie, u32 off, u32 val) +static void ls_pcie_pf_lut_writel(struct ls_pcie *pcie, u32 off, u32 val) { if (pcie->big_endian) - iowrite32be(val, pcie->pf_base + off); + iowrite32be(val, pcie->pf_lut_base + off); else - iowrite32(val, pcie->pf_base + off); + iowrite32(val, pcie->pf_lut_base + off); } static void ls_pcie_send_turnoff_msg(struct dw_pcie_rp *pp) @@ -126,11 +126,11 @@ static void ls_pcie_send_turnoff_msg(struct dw_pcie_rp *pp) u32 val; int ret; - val = ls_pcie_pf_readl(pcie, LS_PCIE_PF_MCR); + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_PF_MCR); val |= PF_MCR_PTOMR; - ls_pcie_pf_writel(pcie, LS_PCIE_PF_MCR, val); + ls_pcie_pf_lut_writel(pcie, LS_PCIE_PF_MCR, val); - ret = readx_poll_timeout(ls_pcie_pf_readl_addr, LS_PCIE_PF_MCR, + ret = readx_poll_timeout(ls_pcie_pf_lut_readl_addr, LS_PCIE_PF_MCR, val, !(val & PF_MCR_PTOMR), PCIE_PME_TO_L2_TIMEOUT_US/10, PCIE_PME_TO_L2_TIMEOUT_US); @@ -149,15 +149,15 @@ static int ls_pcie_exit_from_l2(struct dw_pcie_rp *pp) * Set PF_MCR_EXL2S bit in LS_PCIE_PF_MCR register for the link * to exit L2 state. */ - val = ls_pcie_pf_readl(pcie, LS_PCIE_PF_MCR); + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_PF_MCR); val |= PF_MCR_EXL2S; - ls_pcie_pf_writel(pcie, LS_PCIE_PF_MCR, val); + ls_pcie_pf_lut_writel(pcie, LS_PCIE_PF_MCR, val); /* * L2 exit timeout of 10ms is not defined in the specifications, * it was chosen based on empirical observations. */ - ret = readx_poll_timeout(ls_pcie_pf_readl_addr, LS_PCIE_PF_MCR, + ret = readx_poll_timeout(ls_pcie_pf_lut_readl_addr, LS_PCIE_PF_MCR, val, !(val & PF_MCR_EXL2S), 1000, 1); @@ -245,7 +245,7 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = { }; static const struct ls_pcie_drvdata layerscape_drvdata = { - .pf_off = 0xc, + .pf_lut_off = 0xc, .pm_support = true, .exit_from_l2 = ls_pcie_exit_from_l2, }; @@ -295,7 +295,7 @@ static int ls_pcie_probe(struct platform_device *pdev) pcie->big_endian = of_property_read_bool(dev->of_node, "big-endian"); - pcie->pf_base = pci->dbi_base + pcie->drvdata->pf_off; + pcie->pf_lut_base = pci->dbi_base + pcie->drvdata->pf_lut_off; if (pcie->drvdata->flags & LS_PCIE_DRV_SCFG) { -- 2.34.1
[PATCH v3 2/4] PCI: layerscape: Add suspend/resume for ls1021a
ls1021a add suspend/resume support. Implement callback ls1021a_pcie_send_turnoff_msg(), which write scfg's SCFG_PEXPMWRCR to issue PME_Turn_off message. Implement ls1021a_pcie_exit_from_l2() to let controller exit L2 state. Signed-off-by: Frank Li --- Notes: Change from v2 to v3 - update according to mani's feedback change from v1 to v2 - change subject 'a' to 'A' drivers/pci/controller/dwc/pci-layerscape.c | 86 - 1 file changed, 85 insertions(+), 1 deletion(-) diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c index aea89926bcc4f..6f47cfe146c44 100644 --- a/drivers/pci/controller/dwc/pci-layerscape.c +++ b/drivers/pci/controller/dwc/pci-layerscape.c @@ -35,11 +35,21 @@ #define PF_MCR_PTOMR BIT(0) #define PF_MCR_EXL2S BIT(1) +/* LS1021A PEXn PM Write Control Register */ +#define SCFG_PEXPMWRCR(idx)(0x5c + (idx) * 0x64) +#define PMXMTTURNOFF BIT(31) +#define SCFG_PEXSFTRSTCR 0x190 +#define PEXSR(idx) BIT(idx) + #define PCIE_IATU_NUM 6 +#define LS_PCIE_DRV_SCFG BIT(0) + struct ls_pcie_drvdata { const u32 pf_off; + const struct dw_pcie_host_ops *ops; int (*exit_from_l2)(struct dw_pcie_rp *pp); + int flags; bool pm_support; }; @@ -47,6 +57,8 @@ struct ls_pcie { struct dw_pcie *pci; const struct ls_pcie_drvdata *drvdata; void __iomem *pf_base; + struct regmap *scfg; + int index; bool big_endian; }; @@ -171,13 +183,65 @@ static int ls_pcie_host_init(struct dw_pcie_rp *pp) return 0; } +static void ls1021a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp) +{ + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); + struct ls_pcie *pcie = to_ls_pcie(pci); + u32 val; + + /* Send PME_Turn_Off message */ + regmap_read(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), ); + val |= PMXMTTURNOFF; + regmap_write(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), val); + + /* +* There is no specific register to check for PME_To_Ack from endpoint. +* So on the safe side, wait for PCIE_PME_TO_L2_TIMEOUT_US. +*/ + mdelay(PCIE_PME_TO_L2_TIMEOUT_US/1000); + + /* +* Layerscape hardware reference manual recommends clearing the PMXMTTURNOFF bit +* to complete the PME_Turn_Off handshake. +*/ + regmap_read(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), ); + val &= ~PMXMTTURNOFF; + regmap_write(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), val); +} + +static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp) +{ + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); + struct ls_pcie *pcie = to_ls_pcie(pci); + u32 val; + + /* Only way exit from l2 is that do software reset */ + regmap_read(pcie->scfg, SCFG_PEXSFTRSTCR, ); + val |= PEXSR(pcie->index); + regmap_write(pcie->scfg, SCFG_PEXSFTRSTCR, val); + + regmap_read(pcie->scfg, SCFG_PEXSFTRSTCR, ); + val &= ~PEXSR(pcie->index); + regmap_write(pcie->scfg, SCFG_PEXSFTRSTCR, val); + + return 0; +} + static const struct dw_pcie_host_ops ls_pcie_host_ops = { .host_init = ls_pcie_host_init, .pme_turn_off = ls_pcie_send_turnoff_msg, }; +static const struct dw_pcie_host_ops ls1021a_pcie_host_ops = { + .host_init = ls_pcie_host_init, + .pme_turn_off = ls1021a_pcie_send_turnoff_msg, +}; + static const struct ls_pcie_drvdata ls1021a_drvdata = { - .pm_support = false, + .pm_support = true, + .ops = _pcie_host_ops, + .exit_from_l2 = ls1021a_pcie_exit_from_l2, + .flags = LS_PCIE_DRV_SCFG, }; static const struct ls_pcie_drvdata layerscape_drvdata = { @@ -205,6 +269,8 @@ static int ls_pcie_probe(struct platform_device *pdev) struct dw_pcie *pci; struct ls_pcie *pcie; struct resource *dbi_base; + u32 index[2]; + int ret; pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL); if (!pcie) @@ -220,6 +286,7 @@ static int ls_pcie_probe(struct platform_device *pdev) pci->pp.ops = _pcie_host_ops; pcie->pci = pci; + pci->pp.ops = pcie->drvdata->ops ? pcie->drvdata->ops : _pcie_host_ops; dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs"); pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base); @@ -230,6 +297,23 @@ static int ls_pcie_probe(struct platform_device *pdev) pcie->pf_base = pci->dbi_base + pcie->drvdata->pf_off; + if (pcie->drvdata->flags & LS_PCIE_DRV_SCFG) { + + pcie->scfg = syscon_regmap_lookup_by_phandle(dev->of_node, "fsl,pcie-scfg"); + if (IS_ERR(pcie->scfg)) { + dev_err(dev, "No syscfg phandle specified\n"); + return PTR_ERR(pcie->scfg); + } + + ret =
[PATCH v3 1/4] PCI: layerscape: Add function pointer for exit_from_l2()
Since difference SoCs require different sequence for exiting L2, let's add a separate "exit_from_l2()" callback. This callback can be used to execute SoC specific sequence. Signed-off-by: Frank Li --- Notes: Change from v2 to v3 - fixed according to mani's feedback 1. update commit message 2. move dw_pcie_host_ops to next patch 3. check return value from exit_from_l2() Change from v1 to v2 - change subject 'a' to 'A' Change from v1 to v2 - change subject 'a' to 'A' drivers/pci/controller/dwc/pci-layerscape.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c index 37956e09c65bd..aea89926bcc4f 100644 --- a/drivers/pci/controller/dwc/pci-layerscape.c +++ b/drivers/pci/controller/dwc/pci-layerscape.c @@ -39,6 +39,7 @@ struct ls_pcie_drvdata { const u32 pf_off; + int (*exit_from_l2)(struct dw_pcie_rp *pp); bool pm_support; }; @@ -125,7 +126,7 @@ static void ls_pcie_send_turnoff_msg(struct dw_pcie_rp *pp) dev_err(pcie->pci->dev, "PME_Turn_off timeout\n"); } -static void ls_pcie_exit_from_l2(struct dw_pcie_rp *pp) +static int ls_pcie_exit_from_l2(struct dw_pcie_rp *pp) { struct dw_pcie *pci = to_dw_pcie_from_pp(pp); struct ls_pcie *pcie = to_ls_pcie(pci); @@ -150,6 +151,8 @@ static void ls_pcie_exit_from_l2(struct dw_pcie_rp *pp) 1); if (ret) dev_err(pcie->pci->dev, "L2 exit timeout\n"); + + return ret; } static int ls_pcie_host_init(struct dw_pcie_rp *pp) @@ -180,6 +183,7 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = { static const struct ls_pcie_drvdata layerscape_drvdata = { .pf_off = 0xc, .pm_support = true, + .exit_from_l2 = ls_pcie_exit_from_l2, }; static const struct of_device_id ls_pcie_of_match[] = { @@ -247,11 +251,14 @@ static int ls_pcie_suspend_noirq(struct device *dev) static int ls_pcie_resume_noirq(struct device *dev) { struct ls_pcie *pcie = dev_get_drvdata(dev); + int ret; if (!pcie->drvdata->pm_support) return 0; - ls_pcie_exit_from_l2(>pci->pp); + ret = pcie->drvdata->exit_from_l2(>pci->pp); + if (ret) + return ret; return dw_pcie_resume_noirq(pcie->pci); } -- 2.34.1
[PATCH v3 0/4] PCI: layerscape: Add suspend/resume support for ls1043 and ls1021
Add suspend/resume support for ls1043 and ls1021. Change log see each patch Frank Li (4): PCI: layerscape: Add function pointer for exit_from_l2() PCI: layerscape: Add suspend/resume for ls1021a PCI: layerscape: Rename pf_* as pf_lut_* PCI: layerscape: Add suspend/resume for ls1043a drivers/pci/controller/dwc/pci-layerscape.c | 217 ++-- 1 file changed, 196 insertions(+), 21 deletions(-) -- 2.34.1
Re: [powerpc] kernel BUG fs/xfs/xfs_message.c:102! [4k block]
> On 13-Oct-2023, at 2:49 AM, Dave Chinner wrote: > > On Thu, Oct 12, 2023 at 03:51:04PM +0530, Sachin Sant wrote: >> While running xfstests on a IBM Power10 server having xfs file system with >> 4k block size following crash was seen >> >> [ 1609.771548] run fstests xfs/238 at 2023-10-11 17:00:40 >> [ 1609.971214] XFS (sdb1): Mounting V5 Filesystem >> 1105d60c-9514-4e7a-af6a-632d99bf06ea >> [ 1609.980693] XFS (sdb1): Ending clean mount >> [ 1609.982166] xfs filesystem being mounted at /mnt/test supports timestamps >> until 2038-01-19 (0x7fff) >> [ 1610.024793] XFS (sdb2): Mounting V5 Filesystem >> 60de8f0c-c80e-4a2a-b60a-f41a9cc0feca >> [ 1610.030295] XFS (sdb2): Ending clean mount >> [ 1610.031342] xfs filesystem being mounted at /mnt/scratch supports >> timestamps until 2038-01-19 (0x7fff) >> [ 1610.087139] XFS: Assertion failed: bp->b_flags & XBF_DONE, file: >> fs/xfs/xfs_trans_buf.c, line: 241 >> [ 1610.087162] [ cut here ] >> [ 1610.087165] kernel BUG at fs/xfs/xfs_message.c:102! >> [ 1610.087168] Oops: Exception in kernel mode, sig: 5 [#1] >> [ 1610.087171] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries >> [ 1610.087175] Modules linked in: overlay dm_zero dm_thin_pool >> dm_persistent_data dm_bio_prison dm_snapshot dm_bufio loop dm_flakey xfs >> dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet >> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat >> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding rfkill ip_set tls >> nf_tables libcrc32c nfnetlink pseries_rng vmx_crypto ext4 mbcache jbd2 >> sd_mod t10_pi crc64_rocksoft crc64 sg ibmvscsi scsi_transport_srp ibmveth >> fuse [last unloaded: scsi_debug] >> [ 1610.087220] CPU: 11 PID: 225897 Comm: kworker/11:46 Not tainted >> 6.6.0-rc5-gb8b05bc6d83c #1 >> [ 1610.087224] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 >> of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries >> [ 1610.087227] Workqueue: xfs-inodegc/sdb2 xfs_inodegc_worker [xfs] >> [ 1610.087303] NIP: c00802857edc LR: c00802857ec8 CTR: >> 7fff >> [ 1610.087306] REGS: c0002441b810 TRAP: 0700 Not tainted >> (6.6.0-rc5-gb8b05bc6d83c) >> [ 1610.087309] MSR: 8282b033 CR: >> 28000424 XER: 0007 >> [ 1610.087318] CFAR: c00802857d1c IRQMASK: 0 >> [ 1610.087318] GPR00: c00802857ec8 c0002441bab0 c00802a68300 >> ffea >> [ 1610.087318] GPR04: 000a c0002441b9b0 >> c008016c6c40 >> [ 1610.087318] GPR08: ffc0 0001 >> c0080285a3a8 >> [ 1610.087318] GPR12: c08311d0 c0117fff1b00 c0197de8 >> c936bf40 >> [ 1610.087318] GPR16: >> c009d16d48b0 >> [ 1610.087318] GPR20: c00079e80205 c0001cb52f80 c0001cb52fc0 >> 8000 >> [ 1610.087318] GPR24: 0001 c0002441bbc8 c000e77a7000 >> c000209b7e00 >> [ 1610.087318] GPR28: c0080279ae48 c008016b25f0 c0002441bc10 >> c0002dabaee8 >> [ 1610.087354] NIP [c00802857edc] assfail+0x54/0x74 [xfs] >> [ 1610.087420] LR [c00802857ec8] assfail+0x40/0x74 [xfs] >> [ 1610.087485] Call Trace: >> [ 1610.087486] [c0002441bab0] [c00802857ec8] assfail+0x40/0x74 [xfs] >> (unreliable) >> [ 1610.087551] [c0002441bb10] [c0080281cebc] >> xfs_trans_read_buf_map+0x384/0x590 [xfs] >> [ 1610.087622] [c0002441bba0] [c0080279ae48] >> xfs_imap_to_bp+0x70/0xa8 [xfs] >> [ 1610.087691] [c0002441bbf0] [c0080280c3ec] >> xfs_inode_item_precommit+0x244/0x320 [xfs] >> [ 1610.087760] [c0002441bc60] [c008027f3034] >> xfs_trans_run_precommits+0xac/0x160 [xfs] >> [ 1610.087830] [c0002441bcb0] [c008027f45b0] >> __xfs_trans_commit+0x68/0x430 [xfs] >> [ 1610.087900] [c0002441bd20] [c008027dfc30] >> xfs_inactive_ifree+0x158/0x2a0 [xfs] >> [ 1610.087969] [c0002441bdb0] [c008027dff84] >> xfs_inactive+0x20c/0x420 [xfs] >> [ 1610.088037] [c0002441bdf0] [c008027ceb90] >> xfs_inodegc_worker+0x148/0x250 [xfs] >> [ 1610.088106] [c0002441be40] [c0188130] >> process_scheduled_works+0x230/0x4f0 >> [ 1610.088113] [c0002441bf10] [c018b164] >> worker_thread+0x1e4/0x500 >> [ 1610.088118] [c0002441bf90] [c0197f18] kthread+0x138/0x140 >> [ 1610.088122] [c0002441bfe0] [c000df98] >> start_kernel_thread+0x14/0x18 >> [ 1610.088127] Code: e8a5ca38 7c641b78 3c62 e863ca48 f8010010 f821ffa1 >> 4bfffd91 3d22 e929ca50 89290010 2f89 419e0008 <0fe0> 0fe0 >> 38210060 e8010010 >> [ 1610.088140] ---[ end trace ]--- >> [ 1610.093928] sh (1070303): drop_caches: 3 >> [ 1610.095600] pstore: backend (nvram) writing error (-1) >> [ 1610.095605] >> >> xfs/238 test was executed when the crash was encountered. >> Devices were formatted with 4k block size. > > Yeah, I've seen
Re: [PATCH] powerpc/64s/radix: Don't warn on copros in radix__tlb_flush()
> On 17-Oct-2023, at 5:45 PM, Michael Ellerman wrote: > > Sachin reported a warning when running the inject-ra-err selftest: > > # selftests: powerpc/mce: inject-ra-err > Disabling lock debugging due to kernel taint > MCE: CPU19: machine check (Severe) Real address Load/Store (foreign/control > memory) [Not recovered] > MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [1e48] > MCE: CPU19: Initiator CPU > MCE: CPU19: Unknown > [ cut here ] > WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 > radix__tlb_flush+0x160/0x180 > CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G ME > 6.6.0-rc3-00055-g9ed22ae6be81 #4 > Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 > of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries > ... > NIP radix__tlb_flush+0x160/0x180 > LR radix__tlb_flush+0x104/0x180 > Call Trace: >radix__tlb_flush+0xf4/0x180 (unreliable) >tlb_finish_mmu+0x15c/0x1e0 >exit_mmap+0x1a0/0x510 >__mmput+0x60/0x1e0 >exit_mm+0xdc/0x170 >do_exit+0x2bc/0x5a0 >do_group_exit+0x4c/0xc0 >sys_exit_group+0x28/0x30 >system_call_exception+0x138/0x330 >system_call_vectored_common+0x15c/0x2ec > > And bisected it to commit e43c0a0c3c28 ("powerpc/64s/radix: combine > final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning > in radix__tlb_flush() if mm->context.copros is still elevated. > > However it's possible for the copros count to be elevated if a process > exits without first closing file descriptors that are associated with a > copro, eg. VAS. > > If the process exits with a VAS file still open, the release callback > is queued up for exit_task_work() via: > exit_files() >put_files_struct() > close_files() >filp_close() > fput() > > And called via: > exit_task_work() >fput() > __fput() >file->f_op->release(inode, file) > coproc_release() >vas_user_win_ops->close_win() > vas_deallocate_window() >mm_context_remove_vas_window() > mm_context_remove_copro() > > But that is after exit_mm() has been called from do_exit() and triggered > the warning. > > Fix it by dropping the warning, and always calling __flush_all_mm(). > > In the normal case of no copros, that will result in a call to > _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code > does. > > If the copros count is elevated then it will cause a global flush, which > should flush translations from any copros. Note that the process table > entry was cleared in arch_exit_mmap(), so copros should not be able to > fetch any new translations. > > Fixes: e43c0a0c3c28 ("powerpc/64s/radix: combine final TLB flush and lazy tlb > mm shootdown IPIs") > Reported-by: Sachin Sant > Closes: > https://lore.kernel.org/all/a8e52547-4bf1-47ce-8aea-bc5a9d7e3...@linux.ibm.com/ > Signed-off-by: Nicholas Piggin > Signed-off-by: Michael Ellerman > --- Thanks for the fix. This fixes the reported problem. Tested-by: Sachin Sant - Sachin
Re: [PATCH v8 0/3] generic and PowerPC SED Opal keystore
On Wed, 04 Oct 2023 15:19:54 -0500, gjo...@linux.vnet.ibm.com wrote: > This patchset has gone through numerous rounds of review and > all comments/suggetions have been addressed. The reviews have > covered all relevant areas including reviews by block and keyring > developers as well as the SED Opal maintainer. > > TCG SED Opal is a specification from The Trusted Computing Group > that allows self encrypting storage devices (SED) to be locked at > power on and require an authentication key to unlock the drive. > > [...] Applied, thanks! [1/3] block:sed-opal: SED Opal keystore commit: 96ff37ceb203426b1bcebbae42399686110b0130 [2/3] block: sed-opal: keystore access for SED Opal keys commit: 5dd339722f5f612f349b068e8da6d6710fd0e460 [3/3] powerpc/pseries: PLPKS SED Opal keystore support commit: ec8cf230ceccfcc2bd29990c2902be168a92dee4 Best regards, -- Jens Axboe
Re: [PATCH v8 0/3] generic and PowerPC SED Opal keystore
On 10/17/23 9:09 AM, Greg Joyce wrote: > > Hi Jens, > > I've addressed all the comments/issues on v7 of the patchset and > haven't received any feedback on v8. Is there anything else that you'd > like to see before this can be included? Let's give it another shot! -- Jens Axboe
Re: [PATCH v8 0/3] generic and PowerPC SED Opal keystore
Hi Jens, I've addressed all the comments/issues on v7 of the patchset and haven't received any feedback on v8. Is there anything else that you'd like to see before this can be included? Thanks, Greg On Wed, 2023-10-04 at 15:19 -0500, gjo...@linux.vnet.ibm.com wrote: > From: Greg Joyce > > This patchset has gone through numerous rounds of review and > all comments/suggetions have been addressed. The reviews have > covered all relevant areas including reviews by block and keyring > developers as well as the SED Opal maintainer. > > TCG SED Opal is a specification from The Trusted Computing Group > that allows self encrypting storage devices (SED) to be locked at > power on and require an authentication key to unlock the drive. > > PowerPC/pseries versions of key functions provide read/write access > to SED Opal keys in the PLPKS keystore. > > The SED block driver has been modified to read the SED Opal > keystore to populate a key in the SED Opal keyring. Changes to the > SED Opal key will be written to the SED Opal keystore. > > > Changelog > v8: - rebased to 6.6-rc4 > - fixed issues using clang (thanks Nathan Chancellor and Nick > Desaulniers) > - fixed crash if PLPKS is not present for pseries (thanks > Michael > Ellerman) > > v7: - rebased to for-6.5/block > > v6: - squashed two commits (suggested by Andrew Donnellan) > > v5: - updated to reflect changes in PLPKS API > > v4: > - scope reduced to cover just SED Opal keys > - base SED Opal keystore is now in SED block driver > - removed use of enum to indicate type > - refactored common code into common function that read and > write use > - removed cast to void > - added use of SED Opal keystore functions to SED block > driver > > v3: > - No code changes, but per reviewer requests, adding > additional > mailing lists(keyring, EFI) for wider review. > > v2: > - Include feedback from Gregory Joyce, Eric Richter and > Murilo Opsfelder Araujo. > - Include suggestions from Michael Ellerman. > - Moved a dependency from generic SED code to this patchset. > This patchset now builds of its own. > > > > > Greg Joyce (3): > block:sed-opal: SED Opal keystore > block: sed-opal: keystore access for SED Opal keys > powerpc/pseries: PLPKS SED Opal keystore support > > arch/powerpc/platforms/pseries/Kconfig| 6 + > arch/powerpc/platforms/pseries/Makefile | 1 + > .../powerpc/platforms/pseries/plpks_sed_ops.c | 131 > ++ > block/Kconfig | 1 + > block/sed-opal.c | 18 ++- > include/linux/sed-opal-key.h | 26 > 6 files changed, 181 insertions(+), 2 deletions(-) > create mode 100644 arch/powerpc/platforms/pseries/plpks_sed_ops.c > create mode 100644 include/linux/sed-opal-key.h >
Re: [PATCH 1/4] add generic builtin command line
On Tue, Oct 17, 2023 at 04:10:42PM +0530, Pratyush Brahma wrote: > For such a usecase, the CONFIG_CMDLINE_PREPEND seems to be quite useful as > it would help to stitch bootloader > and the desired build variant's configs together. Can you please help to > merge this patch? Yes, your at least the second person that's asked for it, and it's been on my list for some time to release again. I'll try to release it as soon as possible. Daniel
Re: [powerpc] Kernel crash while running LTP (bisected)
> On 17-Oct-2023, at 4:35 PM, Lorenzo Stoakes wrote: > > On Tue, Oct 17, 2023 at 02:46:07PM +0530, Sachin Sant wrote: >> While running LTP tests (getpid02) on a Power10 server booted with >> 6.6.0-rc6-next-20231016 following crash was seen: >> >> [ 76.386628] Kernel attempted to read user page (d8) - exploit attempt? >> (uid: 0) >> [ 76.386649] BUG: Kernel NULL pointer dereference on read at 0x00d8 >> [ 76.386653] Faulting instruction address: 0xc04cda90 >> [ 76.386658] Oops: Kernel access of bad area, sig: 11 [#1] > [snip] >> >> Git bisect points to following patch >> >> commit 1db41d29b79ad271674081c752961edd064bbbac >>mm: perform the mapping_map_writable() check after call_mmap() >> >> Reverting the patch allows the test to complete. >> >> - Sachin > > Hi Sachin, > > Thanks for the report but this was triggered in another test previously and > has been fixed already (apologies for the inconvenience!) see [0]. Andrew > took the -fix patch and applied to mm-unstable, this should wend its way to > -next in the meantime. Ah, thank you. Yes the fix works for me. - Sachin
Re: [RFC PATCH v6 09/11] media: uapi: Add audio rate controls support
On 17/10/2023 15:11, Shengjiu Wang wrote: > On Mon, Oct 16, 2023 at 9:16 PM Hans Verkuil wrote: >> >> Hi Shengjiu, >> >> On 13/10/2023 10:31, Shengjiu Wang wrote: >>> Fixed point controls are used by the user to configure >>> the audio sample rate to driver. >>> >>> Add V4L2_CID_ASRC_SOURCE_RATE and V4L2_CID_ASRC_DEST_RATE >>> new IDs for ASRC rate control. >>> >>> Signed-off-by: Shengjiu Wang >>> --- >>> .../userspace-api/media/v4l/common.rst| 1 + >>> .../media/v4l/ext-ctrls-fixed-point.rst | 36 +++ >>> .../media/v4l/vidioc-g-ext-ctrls.rst | 4 +++ >>> .../media/v4l/vidioc-queryctrl.rst| 7 >>> .../media/videodev2.h.rst.exceptions | 1 + >>> drivers/media/v4l2-core/v4l2-ctrls-core.c | 5 +++ >>> drivers/media/v4l2-core/v4l2-ctrls-defs.c | 4 +++ >>> include/media/v4l2-ctrls.h| 2 ++ >>> include/uapi/linux/v4l2-controls.h| 13 +++ >>> include/uapi/linux/videodev2.h| 3 ++ >>> 10 files changed, 76 insertions(+) >>> create mode 100644 >>> Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst >>> >>> diff --git a/Documentation/userspace-api/media/v4l/common.rst >>> b/Documentation/userspace-api/media/v4l/common.rst >>> index ea0435182e44..35707edffb13 100644 >>> --- a/Documentation/userspace-api/media/v4l/common.rst >>> +++ b/Documentation/userspace-api/media/v4l/common.rst >>> @@ -52,6 +52,7 @@ applicable to all devices. >>> ext-ctrls-fm-rx >>> ext-ctrls-detect >>> ext-ctrls-colorimetry >>> +ext-ctrls-fixed-point >> >> Rename this to ext-ctrls-audio-m2m. >> >>> fourcc >>> format >>> planar-apis >>> diff --git >>> a/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst >>> b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst >>> new file mode 100644 >>> index ..2ef6e250580c >>> --- /dev/null >>> +++ b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst >>> @@ -0,0 +1,36 @@ >>> +.. SPDX-License-Identifier: GFDL-1.1-no-invariants-or-later >>> + >>> +.. _fixed-point-controls: >>> + >>> +*** >>> +Fixed Point Control Reference >> >> This is for audio controls. "Fixed Point" is just the type, and it doesn't >> make >> sense to group fixed point controls. But it does make sense to group the >> audio >> controls. >> >> V4L2 controls can be grouped into classes. Basically it is a way to put >> controls >> into categories, and for each category there is also a control that gives a >> description of the class (see 2.15.15 in >> https://linuxtv.org/downloads/v4l-dvb-apis-new/driver-api/v4l2-controls.html#introduction) >> >> If you use e.g. 'v4l2-ctl -l' to list all the controls, then you will see >> that >> they are grouped based on what class of control they are. >> >> So I think it would be a good idea to create a new control class for M2M >> audio controls, >> instead of just adding them to the catch-all 'User Controls' class. >> >> Search e.g. for V4L2_CTRL_CLASS_COLORIMETRY and V4L2_CID_COLORIMETRY_CLASS >> to see how >> it is done. >> >> M2M_AUDIO would probably be a good name for the class. >> >>> +*** >>> + >>> +These controls are intended to support an asynchronous sample >>> +rate converter. >> >> Add ' (ASRC).' at the end to indicate the common abbreviation for >> that. >> >>> + >>> +.. _v4l2-audio-asrc: >>> + >>> +``V4L2_CID_ASRC_SOURCE_RATE`` >>> +sets the resampler source rate. >>> + >>> +``V4L2_CID_ASRC_DEST_RATE`` >>> +sets the resampler destination rate. >> >> Document the unit (Hz) for these two controls. >> >>> + >>> +.. c:type:: v4l2_ctrl_fixed_point >>> + >>> +.. cssclass:: longtable >>> + >>> +.. tabularcolumns:: |p{1.5cm}|p{5.8cm}|p{10.0cm}| >>> + >>> +.. flat-table:: struct v4l2_ctrl_fixed_point >>> +:header-rows: 0 >>> +:stub-columns: 0 >>> +:widths: 1 1 2 >>> + >>> +* - __u32 >> >> Hmm, shouldn't this be __s32? >> >>> + - ``integer`` >>> + - integer part of fixed point value. >>> +* - __s32 >> >> and this __u32? >> >> You want to be able to use this generic type as a signed value. >> >>> + - ``fractional`` >>> + - fractional part of fixed point value, which is Q31. >>> diff --git a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst >>> b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst >>> index f9f73530a6be..1811dabf5c74 100644 >>> --- a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst >>> +++ b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst >>> @@ -295,6 +295,10 @@ still cause this situation. >>>- ``p_av1_film_grain`` >>>- A pointer to a struct :c:type:`v4l2_ctrl_av1_film_grain`. Valid if >>> this control is >>> of type ``V4L2_CTRL_TYPE_AV1_FILM_GRAIN``. >>> +* - struct :c:type:`v4l2_ctrl_fixed_point` * >>> + - ``p_fixed_point`` >>> + - A pointer to a struct
Re: [RFC PATCH v6 09/11] media: uapi: Add audio rate controls support
On Mon, Oct 16, 2023 at 9:16 PM Hans Verkuil wrote: > > Hi Shengjiu, > > On 13/10/2023 10:31, Shengjiu Wang wrote: > > Fixed point controls are used by the user to configure > > the audio sample rate to driver. > > > > Add V4L2_CID_ASRC_SOURCE_RATE and V4L2_CID_ASRC_DEST_RATE > > new IDs for ASRC rate control. > > > > Signed-off-by: Shengjiu Wang > > --- > > .../userspace-api/media/v4l/common.rst| 1 + > > .../media/v4l/ext-ctrls-fixed-point.rst | 36 +++ > > .../media/v4l/vidioc-g-ext-ctrls.rst | 4 +++ > > .../media/v4l/vidioc-queryctrl.rst| 7 > > .../media/videodev2.h.rst.exceptions | 1 + > > drivers/media/v4l2-core/v4l2-ctrls-core.c | 5 +++ > > drivers/media/v4l2-core/v4l2-ctrls-defs.c | 4 +++ > > include/media/v4l2-ctrls.h| 2 ++ > > include/uapi/linux/v4l2-controls.h| 13 +++ > > include/uapi/linux/videodev2.h| 3 ++ > > 10 files changed, 76 insertions(+) > > create mode 100644 > > Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst > > > > diff --git a/Documentation/userspace-api/media/v4l/common.rst > > b/Documentation/userspace-api/media/v4l/common.rst > > index ea0435182e44..35707edffb13 100644 > > --- a/Documentation/userspace-api/media/v4l/common.rst > > +++ b/Documentation/userspace-api/media/v4l/common.rst > > @@ -52,6 +52,7 @@ applicable to all devices. > > ext-ctrls-fm-rx > > ext-ctrls-detect > > ext-ctrls-colorimetry > > +ext-ctrls-fixed-point > > Rename this to ext-ctrls-audio-m2m. > > > fourcc > > format > > planar-apis > > diff --git > > a/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst > > b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst > > new file mode 100644 > > index ..2ef6e250580c > > --- /dev/null > > +++ b/Documentation/userspace-api/media/v4l/ext-ctrls-fixed-point.rst > > @@ -0,0 +1,36 @@ > > +.. SPDX-License-Identifier: GFDL-1.1-no-invariants-or-later > > + > > +.. _fixed-point-controls: > > + > > +*** > > +Fixed Point Control Reference > > This is for audio controls. "Fixed Point" is just the type, and it doesn't > make > sense to group fixed point controls. But it does make sense to group the audio > controls. > > V4L2 controls can be grouped into classes. Basically it is a way to put > controls > into categories, and for each category there is also a control that gives a > description of the class (see 2.15.15 in > https://linuxtv.org/downloads/v4l-dvb-apis-new/driver-api/v4l2-controls.html#introduction) > > If you use e.g. 'v4l2-ctl -l' to list all the controls, then you will see that > they are grouped based on what class of control they are. > > So I think it would be a good idea to create a new control class for M2M > audio controls, > instead of just adding them to the catch-all 'User Controls' class. > > Search e.g. for V4L2_CTRL_CLASS_COLORIMETRY and V4L2_CID_COLORIMETRY_CLASS to > see how > it is done. > > M2M_AUDIO would probably be a good name for the class. > > > +*** > > + > > +These controls are intended to support an asynchronous sample > > +rate converter. > > Add ' (ASRC).' at the end to indicate the common abbreviation for > that. > > > + > > +.. _v4l2-audio-asrc: > > + > > +``V4L2_CID_ASRC_SOURCE_RATE`` > > +sets the resampler source rate. > > + > > +``V4L2_CID_ASRC_DEST_RATE`` > > +sets the resampler destination rate. > > Document the unit (Hz) for these two controls. > > > + > > +.. c:type:: v4l2_ctrl_fixed_point > > + > > +.. cssclass:: longtable > > + > > +.. tabularcolumns:: |p{1.5cm}|p{5.8cm}|p{10.0cm}| > > + > > +.. flat-table:: struct v4l2_ctrl_fixed_point > > +:header-rows: 0 > > +:stub-columns: 0 > > +:widths: 1 1 2 > > + > > +* - __u32 > > Hmm, shouldn't this be __s32? > > > + - ``integer`` > > + - integer part of fixed point value. > > +* - __s32 > > and this __u32? > > You want to be able to use this generic type as a signed value. > > > + - ``fractional`` > > + - fractional part of fixed point value, which is Q31. > > diff --git a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst > > b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst > > index f9f73530a6be..1811dabf5c74 100644 > > --- a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst > > +++ b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst > > @@ -295,6 +295,10 @@ still cause this situation. > >- ``p_av1_film_grain`` > >- A pointer to a struct :c:type:`v4l2_ctrl_av1_film_grain`. Valid if > > this control is > > of type ``V4L2_CTRL_TYPE_AV1_FILM_GRAIN``. > > +* - struct :c:type:`v4l2_ctrl_fixed_point` * > > + - ``p_fixed_point`` > > + - A pointer to a struct :c:type:`v4l2_ctrl_fixed_point`. Valid if > > this control is > > +of type
Re: [PATCH] powerpc/paravirt: Improve vcpu_is_preempted
On Mon, 2023-10-09 at 10:47 +0530, Srikar Dronamraju wrote: Hi Srikar, Benchmarked this patch on baremetal POWER9 node by launching KVM to observe the improvements achieved in KVM with the patched kernel. Below, you can find the schbench latency result comparision. System was running on SMT4 mode with the below configuration: Setup: $ lscpu Architecture:ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 4 Core(s) per socket: 16 Socket(s): 2 NUMA node(s):8 Model: 2.3 (pvr 004e 1203) Model name: POWER9, altivec supported CPU max MHz: 3800. CPU min MHz: 2300. L1d cache: 32K L1i cache: 32K L2 cache:512K L3 cache:10240K NUMA node0 CPU(s): 0-63 NUMA node8 CPU(s): 64-127 NUMA node250 CPU(s): NUMA node251 CPU(s): NUMA node252 CPU(s): NUMA node253 CPU(s): NUMA node254 CPU(s): NUMA node255 CPU(s): - Baseline kernel : v6.6.0-rc5 - Patched kernel : v6.6.0-rc5 with patchset - schbench version : upto commit 2eef44 (schbench: record the execution time in the matrix multiplication mode) Results: These results shows the schbench latency on a patched kernel compared to a baseline kernel on KVM. The numbers in the "compare%" column represent the percentage difference between the latency measured on the baseline kernel and the patched kernel. A negative percentage means the patched kernel performs less optimially (higher latency) than the baseline, while a positive percentage means it performs better (lower latency). Scenarios: -- Case 1: No Noise Host: Idle KVM 1: Launched a KVM affined to 0-39 CPUs (40 CPUs) KVM 1 (Workload) : ./schbench -m 20 -t 2 -r 30 (benchmark) schbench latency (niter: 20) percentile compare% (avg) (higher the better) 50.0th: -4.84 75.0th: -8.09 90.0th: -3.39 95.0th: +5.16 99.0th: +90.78 99.5th: +36.34 99.9th: +8.31 Case 2: With Noise: Over-commit case: Multiple KVM guests sharing the same set of CPUs Two KVM instances are launched, where one being benchmarked, and the other executing a workload to introduce noise. KVM 1: Launched a KVM affined to 0-39 CPUs (40 CPUs) KVM 1 (Workload) : ./schbench -m 20 -t 2 -r 30 (benchmark) KVM 2 (Noise): Launched a KVM affined to 0-39 CPUs schbench latency (niter: 20) percentile compare% (avg) (higher the better) 50.0th: -1.47 75.0th: -5.72 90.0th: +7.88 95.0th: +10.71 99.0th: +512.08 99.5th: +380.61 99.9th: +90.39 Case 3: Overlap case: Multiple KVM guests sharing a subset of CPUs. Two KVM instances are launched, where one being benchmarked, and the other executing a workload to introduce noise. KVM 1: Launched a KVM affined to 0-39 CPUs (40 CPUs) KVM 1 (Workload) : ./schbench -m 20 -t 2 -r 30 (benchmark) KVM 2 (Noise): Launched a KVM affined to 0-19 CPUs schbench latency (niter: 20) percentile compare% (avg) (higher the better) 50.0th: -1.63 75.0th: -2.78 90.0th: +57.62 95.0th: +87.90 99.0th: +343.66 99.5th: +178.01 99.9th: +36.07 The above results demonstrate the effectiveness of the proposed approach, which utilizes the idle-hint in lppaca to detect the preempted vCPU more efficiently. This approach is beneficial for improving schbench latency on KVM, particularly the tail latencies. Thanks, Aboorva > PowerVM Hypervisor dispatches on a whole core basis. In a shared > LPAR, a > CPU from a core that is preempted may have a larger latency. In > such a scenario, its preferable to choose a different CPU to run. > > If one of the CPUs in the core is active, i.e neither CEDED nor > preempted, then consider this CPU as not preempted. > > Also if any of the CPUs in the core has yielded but OS has not > requested > CEDE or CONFER, then consider this CPU to be preempted. > > Cc: Ajay Kaher > Cc: Alexey Makhalov > Cc: Christophe Leroy > Cc: Juergen Gross > Cc: linux-ker...@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: Michael Ellerman > Cc: Nicholas Piggin > Cc: virtualizat...@lists.linux-foundation.org > Signed-off-by: Srikar Dronamraju > --- > arch/powerpc/include/asm/paravirt.h | 33 ++- > -- > 1 file changed, 30 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/include/asm/paravirt.h > b/arch/powerpc/include/asm/paravirt.h > index e08513d73119..a980756f58df 100644 > --- a/arch/powerpc/include/asm/paravirt.h > +++ b/arch/powerpc/include/asm/paravirt.h > @@ -121,9 +121,19 @@ static inline bool vcpu_is_preempted(int cpu) > if (!is_shared_processor()) > return
[PATCH] powerpc/64s/radix: Don't warn on copros in radix__tlb_flush()
Sachin reported a warning when running the inject-ra-err selftest: # selftests: powerpc/mce: inject-ra-err Disabling lock debugging due to kernel taint MCE: CPU19: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered] MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [1e48] MCE: CPU19: Initiator CPU MCE: CPU19: Unknown [ cut here ] WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 radix__tlb_flush+0x160/0x180 CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G ME 6.6.0-rc3-00055-g9ed22ae6be81 #4 Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries ... NIP radix__tlb_flush+0x160/0x180 LR radix__tlb_flush+0x104/0x180 Call Trace: radix__tlb_flush+0xf4/0x180 (unreliable) tlb_finish_mmu+0x15c/0x1e0 exit_mmap+0x1a0/0x510 __mmput+0x60/0x1e0 exit_mm+0xdc/0x170 do_exit+0x2bc/0x5a0 do_group_exit+0x4c/0xc0 sys_exit_group+0x28/0x30 system_call_exception+0x138/0x330 system_call_vectored_common+0x15c/0x2ec And bisected it to commit e43c0a0c3c28 ("powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning in radix__tlb_flush() if mm->context.copros is still elevated. However it's possible for the copros count to be elevated if a process exits without first closing file descriptors that are associated with a copro, eg. VAS. If the process exits with a VAS file still open, the release callback is queued up for exit_task_work() via: exit_files() put_files_struct() close_files() filp_close() fput() And called via: exit_task_work() fput() __fput() file->f_op->release(inode, file) coproc_release() vas_user_win_ops->close_win() vas_deallocate_window() mm_context_remove_vas_window() mm_context_remove_copro() But that is after exit_mm() has been called from do_exit() and triggered the warning. Fix it by dropping the warning, and always calling __flush_all_mm(). In the normal case of no copros, that will result in a call to _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code does. If the copros count is elevated then it will cause a global flush, which should flush translations from any copros. Note that the process table entry was cleared in arch_exit_mmap(), so copros should not be able to fetch any new translations. Fixes: e43c0a0c3c28 ("powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs") Reported-by: Sachin Sant Closes: https://lore.kernel.org/all/a8e52547-4bf1-47ce-8aea-bc5a9d7e3...@linux.ibm.com/ Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman --- arch/powerpc/mm/book3s64/radix_tlb.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c index 39acc2cbab4c..9e1f6558d026 100644 --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -1212,14 +1212,7 @@ void radix__tlb_flush(struct mmu_gather *tlb) smp_mb(); /* see radix__flush_tlb_mm */ exit_flush_lazy_tlbs(mm); - _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL); - - /* -* It should not be possible to have coprocessors still -* attached here. -*/ - if (WARN_ON_ONCE(atomic_read(>context.copros) > 0)) - __flush_all_mm(mm, true); + __flush_all_mm(mm, true); preempt_enable(); } else { -- 2.41.0
Re: [powerpc] Kernel crash while running LTP (bisected)
On Tue, Oct 17, 2023 at 02:46:07PM +0530, Sachin Sant wrote: > While running LTP tests (getpid02) on a Power10 server booted with > 6.6.0-rc6-next-20231016 following crash was seen: > > [ 76.386628] Kernel attempted to read user page (d8) - exploit attempt? > (uid: 0) > [ 76.386649] BUG: Kernel NULL pointer dereference on read at 0x00d8 > [ 76.386653] Faulting instruction address: 0xc04cda90 > [ 76.386658] Oops: Kernel access of bad area, sig: 11 [#1] [snip] > > Git bisect points to following patch > > commit 1db41d29b79ad271674081c752961edd064bbbac > mm: perform the mapping_map_writable() check after call_mmap() > > Reverting the patch allows the test to complete. > > - Sachin Hi Sachin, Thanks for the report but this was triggered in another test previously and has been fixed already (apologies for the inconvenience!) see [0]. Andrew took the -fix patch and applied to mm-unstable, this should wend its way to -next in the meantime. [0]:https://lore.kernel.org/all/c9eb4cc6-7db4-4c2b-838d-43a0b319a4f0@lucifer.local/
Re: [PATCHv9 2/2] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
On 17/10/23 7:58 am, Pingfan Liu wrote: *** Idea *** For kexec -p, the boot cpu can be not the cpu0, this causes the problem of allocating memory for paca_ptrs[]. However, in theory, there is no requirement to assign cpu's logical id as its present sequence in the device tree. But there is something like cpu_first_thread_sibling(), which makes assumption on the mapping inside a core. Hence partially loosening the mapping, i.e. unbind the mapping of core while keep the mapping inside a core. *** Implement *** At this early stage, there are plenty of memory to utilize. Hence, this patch allocates interim memory to link the cpu info on a list, then reorder cpus by changing the list head. As a result, there is a rotate shift between the sequence number in dt and the cpu logical number. *** Result *** After this patch, a boot-cpu's logical id will always be mapped into the range [0,threads_per_core). Besides this, at this phase, all threads in the boot core are forced to be onlined. This restriction will be lifted in a later patch with extra effort. Signed-off-by: Pingfan Liu Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Mahesh Salgaonkar Cc: Wen Xiong Cc: Baoquan He Cc: Ming Lei Cc: Sourabh Jain Cc: Hari Bathini Cc: ke...@lists.infradead.org To: linuxppc-dev@lists.ozlabs.org Thanks for working on this, Pingfan. Looks good to me. Acked-by: Hari Bathini --- arch/powerpc/kernel/prom.c | 25 + arch/powerpc/kernel/setup-common.c | 84 +++--- 2 files changed, 82 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index ec82f5bda908..7ed9034912ca 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -76,7 +76,9 @@ u64 ppc64_rma_size; unsigned int boot_cpu_node_count __ro_after_init; #endif static phys_addr_t first_memblock_size; +#ifdef CONFIG_SMP static int __initdata boot_cpu_count; +#endif static int __init early_parse_mem(char *p) { @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node, const __be32 *intserv; int i, nthreads; int len; - int found = -1; - int found_thread = 0; + bool found = false; /* We are scanning "cpu" nodes only */ if (type == NULL || strcmp(type, "cpu") != 0) @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node, for (i = 0; i < nthreads; i++) { if (be32_to_cpu(intserv[i]) == fdt_boot_cpuid_phys(initial_boot_params)) { - found = boot_cpu_count; - found_thread = i; + /* +* always map the boot-cpu logical id into the +* range of [0, thread_per_core) +*/ + boot_cpuid = i; + found = true; + /* This forces all threads in a core to be online */ + if (nr_cpu_ids % nthreads != 0) + set_nr_cpu_ids(ALIGN(nr_cpu_ids, nthreads)); } #ifdef CONFIG_SMP /* logical cpu id is always 0 on UP kernels */ @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node, } /* Not the boot CPU */ - if (found < 0) + if (!found) return 0; - DBG("boot cpu: logical %d physical %d\n", found, - be32_to_cpu(intserv[found_thread])); - boot_cpuid = found; + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, + be32_to_cpu(intserv[boot_cpuid])); - boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); + boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]); /* * PAPR defines "logical" PVR values for cpus that diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 707f0490639d..9802c7e5ee2f 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc) u32 *cpu_to_phys_id = NULL; +struct interrupt_server_node { + struct list_head node; + boolavail; + int len; + __be32 intserv[]; +}; + /** * setup_cpu_maps - initialize the following cpu maps: * cpu_possible_mask @@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL; void __init smp_setup_cpu_maps(void) { struct device_node *dn; - int cpu = 0; - int nthreads = 1; + int shift = 0, cpu = 0; + int j, nthreads = 1; + int len; + struct interrupt_server_node *intserv_node, *n; + struct list_head *bt_node, head; + bool avail, found_boot_cpu = false; DBG("smp_setup_cpu_maps()\n"); +
[PATCH 1/4] kbuild: remove ARCH_POSTLINK from module builds
The '%.ko' rule in arch/*/Makefile.postlink does nothing but call the 'true' command. Remove the meaningless code. Signed-off-by: Masahiro Yamada --- arch/mips/Makefile.postlink| 3 --- arch/powerpc/Makefile.postlink | 3 --- arch/riscv/Makefile.postlink | 3 --- arch/x86/Makefile.postlink | 3 --- scripts/Makefile.modfinal | 5 + 5 files changed, 1 insertion(+), 16 deletions(-) diff --git a/arch/mips/Makefile.postlink b/arch/mips/Makefile.postlink index 34e3bd71f3b0..6cfdc149d3bc 100644 --- a/arch/mips/Makefile.postlink +++ b/arch/mips/Makefile.postlink @@ -31,9 +31,6 @@ ifeq ($(CONFIG_RELOCATABLE),y) $(call if_changed,relocs) endif -%.ko: FORCE - @true - clean: @true diff --git a/arch/powerpc/Makefile.postlink b/arch/powerpc/Makefile.postlink index 1f860b3c9bec..ae5a4256b03d 100644 --- a/arch/powerpc/Makefile.postlink +++ b/arch/powerpc/Makefile.postlink @@ -35,9 +35,6 @@ ifdef CONFIG_RELOCATABLE $(call if_changed,relocs_check) endif -%.ko: FORCE - @true - clean: rm -f .tmp_symbols.txt diff --git a/arch/riscv/Makefile.postlink b/arch/riscv/Makefile.postlink index a46fc578b30b..829b9abc91f6 100644 --- a/arch/riscv/Makefile.postlink +++ b/arch/riscv/Makefile.postlink @@ -36,9 +36,6 @@ ifdef CONFIG_RELOCATABLE $(call if_changed,relocs_strip) endif -%.ko: FORCE - @true - clean: @true diff --git a/arch/x86/Makefile.postlink b/arch/x86/Makefile.postlink index 936093d29160..fef2e977cc7d 100644 --- a/arch/x86/Makefile.postlink +++ b/arch/x86/Makefile.postlink @@ -34,9 +34,6 @@ ifeq ($(CONFIG_X86_NEED_RELOCS),y) $(call cmd,strip_relocs) endif -%.ko: FORCE - @true - clean: @rm -f $(OUT_RELOCS)/vmlinux.relocs diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal index b3a6aa8fbe8c..8568d256d6fb 100644 --- a/scripts/Makefile.modfinal +++ b/scripts/Makefile.modfinal @@ -28,14 +28,11 @@ quiet_cmd_cc_o_c = CC [M] $@ %.mod.o: %.mod.c FORCE $(call if_changed_dep,cc_o_c) -ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink) - quiet_cmd_ld_ko_o = LD [M] $@ cmd_ld_ko_o += \ $(LD) -r $(KBUILD_LDFLAGS) \ $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \ - -T scripts/module.lds -o $@ $(filter %.o, $^); \ - $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true) + -T scripts/module.lds -o $@ $(filter %.o, $^) quiet_cmd_btf_ko = BTF [M] $@ cmd_btf_ko = \ -- 2.40.1
Re: [PATCHv9 1/2] powerpc/setup : Enable boot_cpu_hwid for PPC32
On 17/10/23 7:58 am, Pingfan Liu wrote: In order to identify the boot cpu, its intserv[] should be recorded and checked in smp_setup_cpu_maps(). smp_setup_cpu_maps() is shared between PPC64 and PPC32. Since PPC64 has already used boot_cpu_hwid to carry that information, enabling this variable on PPC32 so later it can also be used to carry that information for PPC32 in the coming patch. Signed-off-by: Pingfan Liu Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Mahesh Salgaonkar Cc: Wen Xiong Cc: Baoquan He Cc: Ming Lei Cc: Sourabh Jain Cc: Hari Bathini Cc: ke...@lists.infradead.org To: linuxppc-dev@lists.ozlabs.org LGTM. Acked-by: Hari Bathini --- arch/powerpc/include/asm/smp.h | 2 +- arch/powerpc/kernel/prom.c | 3 +-- arch/powerpc/kernel/setup-common.c | 2 -- 3 files changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 576d0e15..5db9178cc800 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -26,7 +26,7 @@ #include extern int boot_cpuid; -extern int boot_cpu_hwid; /* PPC64 only */ +extern int boot_cpu_hwid; extern int spinning_secondaries; extern u32 *cpu_to_phys_id; extern bool coregroup_enabled; diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 0b5878c3125b..ec82f5bda908 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -372,8 +372,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node, be32_to_cpu(intserv[found_thread])); boot_cpuid = found; - if (IS_ENABLED(CONFIG_PPC64)) - boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); + boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); /* * PAPR defines "logical" PVR values for cpus that diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 2f1026fba00d..707f0490639d 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -87,9 +87,7 @@ EXPORT_SYMBOL(machine_id); int boot_cpuid = -1; EXPORT_SYMBOL_GPL(boot_cpuid); -#ifdef CONFIG_PPC64 int boot_cpu_hwid = -1; -#endif /* * These are used in binfmt_elf.c to put aux entries on the stack
[powerpc] Kernel crash while running LTP (bisected)
While running LTP tests (getpid02) on a Power10 server booted with 6.6.0-rc6-next-20231016 following crash was seen: [ 76.386628] Kernel attempted to read user page (d8) - exploit attempt? (uid: 0) [ 76.386649] BUG: Kernel NULL pointer dereference on read at 0x00d8 [ 76.386653] Faulting instruction address: 0xc04cda90 [ 76.386658] Oops: Kernel access of bad area, sig: 11 [#1] [ 76.386661] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries [ 76.386667] Modules linked in: rpadlpar_io rpaphp xsk_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding rfkill tls ip_set nf_tables nfnetlink sunrpc pseries_rng vmx_crypto aes_gcm_p10_crypto binfmt_misc xfs libcrc32c sd_mod t10_pi sr_mod cdrom crc64_rocksoft crc64 sg ibmvscsi ibmveth scsi_transport_srp fuse [ 76.386709] CPU: 22 PID: 5763 Comm: getpid02 Kdump: loaded Not tainted 6.6.0-rc6-next-20231016 #3 [ 76.386713] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries [ 76.386718] NIP: c04cda90 LR: c04cd840 CTR: [ 76.386721] REGS: c001f491b840 TRAP: 0300 Not tainted (6.6.0-rc6-next-20231016) [ 76.386724] MSR: 80009033 CR: 48082804 XER: [ 76.386733] CFAR: c04cd848 DAR: 00d8 DSISR: 4000 IRQMASK: 0 [ 76.386733] GPR00: c04cd840 c001f491bae0 c1471a00 [ 76.386733] GPR04: 00fb 0001 [ 76.386733] GPR08: 01c4 c001fb8aa830 c001e5140d00 c001eccfac00 [ 76.386733] GPR12: 001f c00e87bf7300 [ 76.386733] GPR16: [ 76.386733] GPR20: 7fff9944 c001e86bdd60 c001e86be8e0 [ 76.386733] GPR24: 0001 0001 0001 [ 76.386733] GPR28: 00fb c001e5140d00 7fff9944 c001fb8aa830 [ 76.386773] NIP [c04cda90] mmap_region+0x8b0/0xb30 [ 76.386781] LR [c04cd840] mmap_region+0x660/0xb30 [ 76.386784] Call Trace: [ 76.386786] [c001f491bae0] [c04cd840] mmap_region+0x660/0xb30 (unreliable) [ 76.386791] [c001f491bc10] [c04ce0dc] do_mmap+0x3cc/0x5c0 [ 76.386794] [c001f491bca0] [c0486724] vm_mmap_pgoff+0x134/0x240 [ 76.386800] [c001f491bd80] [c04c98a8] ksys_mmap_pgoff+0x158/0x2b0 [ 76.386806] [c001f491bdf0] [c0011834] do_mmap2+0x54/0xc0 [ 76.386811] [c001f491be10] [c0036624] system_call_exception+0x134/0x330 [ 76.386817] [c001f491be50] [c000d6a0] system_call_common+0x160/0x2e4 [ 76.386822] --- interrupt: c00 at 0x7fff9932ff68 [ 76.386825] NIP: 7fff9932ff68 LR: 10005074 CTR: [ 76.386828] REGS: c001f491be80 TRAP: 0c00 Not tainted (6.6.0-rc6-next-20231016) [ 76.386831] MSR: 8280f033 CR: 24002204 XER: [ 76.386840] IRQMASK: 0 [ 76.386840] GPR00: 005a 7fffd709f9f0 7fff99407300 [ 76.386840] GPR04: 0004 0003 0021 [ 76.386840] GPR08: [ 76.386840] GPR12: 7fff994ea3d0 [ 76.386840] GPR16: 10034498 10034be8 100336a8 [ 76.386840] GPR20: 10034ba8 0001 1007c418 10033770 [ 76.386840] GPR24: 10034bd0 1007c438 [ 76.386840] GPR28: 10061c88 7fffd70afed5 1007c438 10033770 [ 76.386876] NIP [7fff9932ff68] 0x7fff9932ff68 [ 76.386879] LR [10005074] 0x10005074 [ 76.386881] --- interrupt: c00 [ 76.386883] Code: 73890008 4082012c e93f0020 3b00 fb7f0078 4bfffc74 6000 6000 e87f0088 3b00 4b20 6000 39490044 7d005028 3108 [ 76.386896] ---[ end trace ]--- [ 76.388667] pstore: backend (nvram) writing error (-1) Git bisect points to following patch commit 1db41d29b79ad271674081c752961edd064bbbac mm: perform the mapping_map_writable() check after call_mmap() Reverting the patch allows the test to complete. - Sachin
Re: [PATCH 2/3] PCI: layerscape: add suspend/resume for ls1021a
On Mon, Oct 16, 2023 at 04:18:36PM -0400, Frank Li wrote: > On Mon, Oct 16, 2023 at 10:28:24PM +0530, Manivannan Sadhasivam wrote: > > On Fri, Sep 15, 2023 at 02:43:05PM -0400, Frank Li wrote: > > > ls1021a add suspend/resume support. > > > > > > > Please add what the driver is doing during suspend/resume. > > > > > Signed-off-by: Frank Li > > > --- > > > drivers/pci/controller/dwc/pci-layerscape.c | 88 - > > > 1 file changed, 87 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c > > > b/drivers/pci/controller/dwc/pci-layerscape.c > > > index 20c48c06e2248..bc5a8ff1a26ce 100644 > > > --- a/drivers/pci/controller/dwc/pci-layerscape.c > > > +++ b/drivers/pci/controller/dwc/pci-layerscape.c > > > @@ -35,6 +35,12 @@ > > > #define PF_MCR_PTOMR BIT(0) > > > #define PF_MCR_EXL2S BIT(1) > > > > > > +/* LS1021A PEXn PM Write Control Register */ > > > +#define SCFG_PEXPMWRCR(idx) (0x5c + (idx) * 0x64) > > > +#define PMXMTTURNOFF BIT(31) > > > +#define SCFG_PEXSFTRSTCR 0x190 > > > +#define PEXSR(idx) BIT(idx) > > > + > > > #define PCIE_IATU_NUM6 > > > > > > struct ls_pcie_drvdata { > > > @@ -48,6 +54,8 @@ struct ls_pcie { > > > struct dw_pcie *pci; > > > const struct ls_pcie_drvdata *drvdata; > > > void __iomem *pf_base; > > > + struct regmap *scfg; > > > + int index; > > > bool big_endian; > > > }; > > > > > > @@ -170,13 +178,91 @@ static int ls_pcie_host_init(struct dw_pcie_rp *pp) > > > return 0; > > > } > > > > > > +static void ls1021a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp) > > > +{ > > > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); > > > + struct ls_pcie *pcie = to_ls_pcie(pci); > > > + u32 val; > > > + > > > + if (!pcie->scfg) { > > > > Can this ever happen? > > > > > + dev_dbg(pcie->pci->dev, "SYSCFG is NULL\n"); > > > + return; > > > + } > > > + > > > + /* Send Turn_off message */ > > > > "Send PME_Turn_Off message" > > > > > + regmap_read(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), ); > > > + val |= PMXMTTURNOFF; > > > + regmap_write(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), val); > > > + > > > + /* There are not register to check ACK, so wait > > > PCIE_PME_TO_L2_TIMEOUT_US */ > > > > "There is no specific register to check for PME_To_Ack from endpoint. So on > > the > > safe side, wait for PCIE_PME_TO_L2_TIMEOUT_US." > > > > > + mdelay(PCIE_PME_TO_L2_TIMEOUT_US/1000); > > > + > > > + /* Clear Turn_off message */ > > > > "PME_Turn_off". But I'm not sure if this is really required. Are you doing > > it > > because the layerspace hw implements the PME_Turn_Off bit as "level > > triggered"? > > I am not sure how hardware implement this. But reference manual said: > > PMXMTTURNOFF: > Generate PM turnoff message for power management of PCI Express controllers. > This bit should be cleared by software. > 0 Clear PM turnoff (default) > 1 Trigger PM turnoff > Hmm, okay. Atleast add the below comment to make it understandable in the future: "Layerscape hardware reference manual recommends clearing the PMXMTTURNOFF bit to complete the PME_Turn_Off handshake." - Mani > Frank > > > > > > + regmap_read(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), ); > > > + val &= ~PMXMTTURNOFF; > > > + regmap_write(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), val); > > > +} > > > + > > > +static void ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp) > > > +{ > > > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); > > > + struct ls_pcie *pcie = to_ls_pcie(pci); > > > + u32 val; > > > + > > > > A comment here would be good. > > > > > + regmap_read(pcie->scfg, SCFG_PEXSFTRSTCR, ); > > > + val |= PEXSR(pcie->index); > > > + regmap_write(pcie->scfg, SCFG_PEXSFTRSTCR, val); > > > + > > > + regmap_read(pcie->scfg, SCFG_PEXSFTRSTCR, ); > > > + val &= ~PEXSR(pcie->index); > > > + regmap_write(pcie->scfg, SCFG_PEXSFTRSTCR, val); > > > +} > > > + > > > +static int ls1021a_pcie_host_init(struct dw_pcie_rp *pp) > > > +{ > > > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); > > > + struct ls_pcie *pcie = to_ls_pcie(pci); > > > + struct device *dev = pcie->pci->dev; > > > + u32 index[2]; > > > + int ret; > > > + > > > + ret = ls_pcie_host_init(pp); > > > + if (ret) > > > + return ret; > > > + > > > + pcie->scfg = syscon_regmap_lookup_by_phandle(dev->of_node, > > > "fsl,pcie-scfg"); > > > + if (IS_ERR(pcie->scfg)) { > > > + ret = PTR_ERR(pcie->scfg); > > > + dev_err(dev, "No syscfg phandle specified\n"); > > > + pcie->scfg = NULL; > > > + return ret; > > > + } > > > + > > > + ret = of_property_read_u32_array(dev->of_node, "fsl,pcie-scfg", index, > > > 2); > > > + if (ret) { > > > + pcie->scfg = NULL; > > > + return ret; > > > + } > > > + > > > + pcie->index = index[1]; > > > + > > > > The above syscon parsing could be done conditionally during probe itself. > > There > > is no need to do it
Re: [PATCH v2 0/3] Add generic data patching functions
On 17/10/23 5:39 pm, Christophe Leroy wrote: Le 16/10/2023 à 07:01, Benjamin Gray a écrit : Currently patch_instruction() bases the write length on the value being written. If the value looks like a prefixed instruction it writes 8 bytes, otherwise it writes 4 bytes. This makes it potentially buggy to use for writing arbitrary data, as if you want to write 4 bytes but it decides to write 8 bytes it may clobber the following memory or be unaligned and trigger an oops if it tries to cross a page boundary. To solve this, this series pulls out the size parameter to the 'top' of the text patching logic, and propagates it through the various functions. The two sizes supported are int and long; this allows for patching instructions and pointers on both ppc32 and ppc64. On ppc32 these are the same size, so care is taken to only use the size parameter on static functions, so the compiler can optimise it out entirely. Unfortunately GCC trips over its own feet here and won't optimise in a way that is optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX (pmac32_defconfig). In the first case, patch_memory() is very large and can only be inlined if a single function calls it. While the source only calls it in patch_instruction(), an earlier optimisation pass inlined patch_instruction() into patch_branch(), so now there are 'two' references to patch_memory() and it cannot be inlined into patch_instruction(). Instead patch_instruction() becomes a single branch directly to patch_memory(). We can fix this by marking patch_instruction() as noinline, but this prevents patch_memory() from being directly inlined into patch_branch() when RWX is disabled and patch_memory() is very small. It may be possible to avoid this by merging together patch_instruction() and patch_memory() on ppc32, but the only way I can think to do this without duplicating the implementation involves using the preprocessor to change if is_dword is a parameter or a local variable, which is gross. What about: diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 7c6056bb1706..af89ef450c93 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -72,7 +72,7 @@ static inline int create_branch(ppc_inst_t *instr, const u32 *addr, int create_cond_branch(ppc_inst_t *instr, const u32 *addr, unsigned long target, int flags); int patch_branch(u32 *addr, unsigned long target, int flags); -int patch_instruction(u32 *addr, ppc_inst_t instr); +int patch_memory(void *addr, unsigned long val, bool is_dword); int raw_patch_instruction(u32 *addr, ppc_inst_t instr); /* @@ -87,24 +87,28 @@ int raw_patch_instruction(u32 *addr, ppc_inst_t instr); #ifdef CONFIG_PPC64 -int patch_uint(void *addr, unsigned int val); -int patch_ulong(void *addr, unsigned long val); +int patch_instruction(u32 *addr, ppc_inst_t instr); #define patch_u64 patch_ulong #else -static inline int patch_uint(u32 *addr, unsigned int val) +static inline int patch_instruction(u32 *addr, ppc_inst_t instr) { - return patch_instruction(addr, ppc_inst(val)); + return patch_memory(addr, ppc_inst_val(instr), false); } +#endif + static inline int patch_ulong(void *addr, unsigned long val) { - return patch_instruction(addr, ppc_inst(val)); + return patch_memory(addr, val, true); } -#endif +static inline int patch_uint(void *addr, unsigned int val) +{ + return patch_memory(addr, val, false); +} #define patch_u32 patch_uint diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 60289332412f..77418b2a4aa4 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -355,7 +355,7 @@ static int __do_patch_memory(void *addr, unsigned long val, bool is_dword) return err; } -static int patch_memory(void *addr, unsigned long val, bool is_dword) +int patch_memory(void *addr, unsigned long val, bool is_dword) { int err; unsigned long flags; @@ -378,6 +378,7 @@ static int patch_memory(void *addr, unsigned long val, bool is_dword) return err; } +NOKPROBE_SYMBOL(patch_memory) #ifdef CONFIG_PPC64 @@ -390,26 +391,6 @@ int patch_instruction(u32 *addr, ppc_inst_t instr) } NOKPROBE_SYMBOL(patch_instruction) -int patch_uint(void *addr, unsigned int val) -{ - return patch_memory(addr, val, false); -} -NOKPROBE_SYMBOL(patch_uint) - -int patch_ulong(void *addr, unsigned long val) -{ - return patch_memory(addr, val, true); -} -NOKPROBE_SYMBOL(patch_ulong) - -#else - -int patch_instruction(u32 *addr, ppc_inst_t instr) -{ - return patch_memory(addr, ppc_inst_val(instr), false); -} -NOKPROBE_SYMBOL(patch_instruction) - #endif int patch_branch(u32 *addr, unsigned long target, int flags) Wouldn't every caller need to initialise the is_dword parameter in that case? It can't tell it's unused across a
Re: [PATCH v2 0/3] Add generic data patching functions
Le 16/10/2023 à 07:01, Benjamin Gray a écrit : > Currently patch_instruction() bases the write length on the value being > written. If the value looks like a prefixed instruction it writes 8 bytes, > otherwise it writes 4 bytes. This makes it potentially buggy to use for > writing arbitrary data, as if you want to write 4 bytes but it decides to > write 8 bytes it may clobber the following memory or be unaligned and > trigger an oops if it tries to cross a page boundary. > > To solve this, this series pulls out the size parameter to the 'top' of > the text patching logic, and propagates it through the various functions. > > The two sizes supported are int and long; this allows for patching > instructions and pointers on both ppc32 and ppc64. On ppc32 these are the > same size, so care is taken to only use the size parameter on static > functions, so the compiler can optimise it out entirely. Unfortunately > GCC trips over its own feet here and won't optimise in a way that is > optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX > (pmac32_defconfig). > > In the first case, patch_memory() is very large and can only be inlined > if a single function calls it. While the source only calls it in > patch_instruction(), an earlier optimisation pass inlined > patch_instruction() into patch_branch(), so now there are 'two' references > to patch_memory() and it cannot be inlined into patch_instruction(). > Instead patch_instruction() becomes a single branch directly to > patch_memory(). > > We can fix this by marking patch_instruction() as noinline, but this > prevents patch_memory() from being directly inlined into patch_branch() > when RWX is disabled and patch_memory() is very small. > > It may be possible to avoid this by merging together patch_instruction() > and patch_memory() on ppc32, but the only way I can think to do this > without duplicating the implementation involves using the preprocessor > to change if is_dword is a parameter or a local variable, which is gross. What about: diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 7c6056bb1706..af89ef450c93 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -72,7 +72,7 @@ static inline int create_branch(ppc_inst_t *instr, const u32 *addr, int create_cond_branch(ppc_inst_t *instr, const u32 *addr, unsigned long target, int flags); int patch_branch(u32 *addr, unsigned long target, int flags); -int patch_instruction(u32 *addr, ppc_inst_t instr); +int patch_memory(void *addr, unsigned long val, bool is_dword); int raw_patch_instruction(u32 *addr, ppc_inst_t instr); /* @@ -87,24 +87,28 @@ int raw_patch_instruction(u32 *addr, ppc_inst_t instr); #ifdef CONFIG_PPC64 -int patch_uint(void *addr, unsigned int val); -int patch_ulong(void *addr, unsigned long val); +int patch_instruction(u32 *addr, ppc_inst_t instr); #define patch_u64 patch_ulong #else -static inline int patch_uint(u32 *addr, unsigned int val) +static inline int patch_instruction(u32 *addr, ppc_inst_t instr) { - return patch_instruction(addr, ppc_inst(val)); + return patch_memory(addr, ppc_inst_val(instr), false); } +#endif + static inline int patch_ulong(void *addr, unsigned long val) { - return patch_instruction(addr, ppc_inst(val)); + return patch_memory(addr, val, true); } -#endif +static inline int patch_uint(void *addr, unsigned int val) +{ + return patch_memory(addr, val, false); +} #define patch_u32 patch_uint diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 60289332412f..77418b2a4aa4 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -355,7 +355,7 @@ static int __do_patch_memory(void *addr, unsigned long val, bool is_dword) return err; } -static int patch_memory(void *addr, unsigned long val, bool is_dword) +int patch_memory(void *addr, unsigned long val, bool is_dword) { int err; unsigned long flags; @@ -378,6 +378,7 @@ static int patch_memory(void *addr, unsigned long val, bool is_dword) return err; } +NOKPROBE_SYMBOL(patch_memory) #ifdef CONFIG_PPC64 @@ -390,26 +391,6 @@ int patch_instruction(u32 *addr, ppc_inst_t instr) } NOKPROBE_SYMBOL(patch_instruction) -int patch_uint(void *addr, unsigned int val) -{ - return patch_memory(addr, val, false); -} -NOKPROBE_SYMBOL(patch_uint) - -int patch_ulong(void *addr, unsigned long val) -{ - return patch_memory(addr, val, true); -} -NOKPROBE_SYMBOL(patch_ulong) - -#else - -int patch_instruction(u32 *addr, ppc_inst_t instr) -{ - return patch_memory(addr, ppc_inst_val(instr), false); -} -NOKPROBE_SYMBOL(patch_instruction) - #endif int patch_branch(u32 *addr, unsigned long target, int flags) > > For now I've removed the noinline, because at least the compiler might > get smarter in future and do
Re: [PATCH v6 0/5] powerpc/bpf: use BPF prog pack allocator
On 16/10/23 5:37 pm, Daniel Borkmann wrote: On 10/12/23 10:03 PM, Hari Bathini wrote: Most BPF programs are small, but they consume a page each. For systems with busy traffic and many BPF programs, this may also add significant pressure on instruction TLB. High iTLB pressure usually slows down the whole system causing visible performance degradation for production workloads. bpf_prog_pack, a customized allocator that packs multiple bpf programs into preallocated memory chunks, was proposed [1] to address it. This series extends this support on powerpc. Both bpf_arch_text_copy() & bpf_arch_text_invalidate() functions, needed for this support depend on instruction patching in text area. Currently, patch_instruction() supports patching only one instruction at a time. The first patch introduces patch_instructions() function to enable patching more than one instruction at a time. This helps in avoiding performance degradation while JITing bpf programs. Patches 2 & 3 implement the above mentioned arch specific functions using patch_instructions(). Patch 4 fixes a misnomer in bpf JITing code. The last patch enables the use of BPF prog pack allocator on powerpc and also, ensures cleanup is handled gracefully. [1] https://lore.kernel.org/bpf/20220204185742.271030-1-s...@kernel.org/ Changes in v6: * No changes in patches 2-5/5 except addition of Acked-by tags from Song. * Skipped merging code path of patch_instruction() & patch_instructions() to avoid performance overhead observed on ppc32 with that. I presume this will be routed via Michael? Yes, Daniel. This can go via linuxppc tree. Thanks Hari
Re: [PATCH 0/2] Allow nesting of lazy MMU mode
Erhard Furtner writes: > On Thu, 12 Oct 2023 20:54:13 +0100 > "Matthew Wilcox (Oracle)" wrote: > >> Dave Woodhouse reported that we now nest calls to >> arch_enter_lazy_mmu_mode(). That was inadvertent, but in principle we >> should allow it. On further investigation, Juergen already fixed it >> for Xen, but didn't tell anyone. Fix it for Sparc & PowerPC too. >> This may or may not help fix the problem that Erhard reported. >> >> Matthew Wilcox (Oracle) (2): >> powerpc: Allow nesting of lazy MMU mode >> sparc: Allow nesting of lazy MMU mode >> >> arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 5 ++--- >> arch/sparc/mm/tlb.c| 5 ++--- >> 2 files changed, 4 insertions(+), 6 deletions(-) >> >> -- >> 2.40.1 > > Applied the patch on top of v6.6-rc5 but unfortunately it did not fix my > reported issue. > > Regards, > Erhard > With the problem reported I guess we are finding the page->compound_head wrong and hence folio->flags PG_dcache_clean check crashing. I still don't know why we find page->compound_head wrong. Michael noted we are using FLAT_MEM. That implies we are suppose to inialize struct page correctly via init_unavailable_range because we are hitting this on an ioremap address. We need to instrument the kernel to track the initialization of the struct page backing these pfns which we know is crashing. W.r.t arch_enter_lazy_mmu_mode() we can skip that completely on powerpc because we don't allow the usage of set_pte on a valid pte entries. pte updates are not done via set_pte interface and hence there is no TLB invalidate required while using set_pte(). ie, we can do something like below. The change also make sure we call set_pte_filter on all the ptes we are setting via set_ptes(). I haven't sent this as a proper patch because we still are not able to fix the issue Erhard reported. diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 3ba9fe411604..95ab20cca2da 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr) { /* -* Make sure hardware valid bit is not set. We don't do -* tlb flush for this update. +* We don't need to call arch_enter/leave_lazy_mmu_mode() +* because we expect set_ptes to be only be used on not present +* and not hw_valid ptes. Hence there is not translation cache flush +* involved that need to be batched. */ - VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); + for (;;) { - /* Note: mm->context.id might not yet have been assigned as -* this context might not have been activated yet when this -* is called. -*/ - pte = set_pte_filter(pte); + /* +* Make sure hardware valid bit is not set. We don't do +* tlb flush for this update. +*/ + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); - /* Perform the setting of the PTE */ - arch_enter_lazy_mmu_mode(); - for (;;) { + /* Note: mm->context.id might not yet have been assigned as +* this context might not have been activated yet when this +* is called. +*/ + pte = set_pte_filter(pte); + + /* Perform the setting of the PTE */ __set_pte_at(mm, addr, ptep, pte, 0); if (--nr == 0) break; ptep++; - pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT)); addr += PAGE_SIZE; + /* increment the pfn */ + pte = __pte(pte_val(pte) + PAGE_SIZE); + } - arch_leave_lazy_mmu_mode(); } void unmap_kernel_page(unsigned long va)