Re: [PATCH v3] mm: compaction: update the cc->nr_migratepages when allocating or freeing the freepages
On 2/20/24 07:16, Baolin Wang wrote: > Currently we will use 'cc->nr_freepages >= cc->nr_migratepages' comparison > to ensure that enough freepages are isolated in isolate_freepages(), however > it just decreases the cc->nr_freepages without updating cc->nr_migratepages > in compaction_alloc(), which will waste more CPU cycles and cause too many > freepages to be isolated. > > So we should also update the cc->nr_migratepages when allocating or freeing > the freepages to avoid isolating excess freepages. And I can see fewer free > pages are scanned and isolated when running thpcompact on my Arm64 server: >k6.7 k6.7_patched > Ops Compaction pages isolated 120692036.00 118160797.00 > Ops Compaction migrate scanned 131210329.00 154093268.00 > Ops Compaction free scanned 1090587971.00 1080632536.00 > Ops Compact scan efficiency 12.03 14.26 > > Moreover, I did not see an obvious latency improvements, this is likely > because > isolating freepages is not the bottleneck in the thpcompact test case. > k6.7 k6.7_patched > Amean fault-both-1 1089.76 ( 0.00%) 1080.16 * 0.88%* > Amean fault-both-3 1616.48 ( 0.00%) 1636.65 * -1.25%* > Amean fault-both-5 2266.66 ( 0.00%) 2219.20 * 2.09%* > Amean fault-both-7 2909.84 ( 0.00%) 2801.90 * 3.71%* > Amean fault-both-12 4861.26 ( 0.00%) 4733.25 * 2.63%* > Amean fault-both-18 7351.11 ( 0.00%) 6950.51 * 5.45%* > Amean fault-both-24 9059.30 ( 0.00%) 9159.99 * -1.11%* > Amean fault-both-3010685.68 ( 0.00%)11399.02 * -6.68%* > > Signed-off-by: Baolin Wang > Acked-by: Mel Gorman Reviewed-by: Vlastimil Babka Thanks. > --- > Hi Andrew, please use this patch to replace below 2 old patches. Thanks. > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-20-04-19=cb30899d456d64fb83ee3e3d95cd9fbb18735d87 > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-20-04-19=65713d1c4fc498d35dc1f7c1234ef815aa128429 > > Changes from v2: > - Add acked tag from Mel. > - Fix the mm_compaction_migratepages trace event. > Changes from v1: > - Rebased on the latest mm-unstable branch. > --- > include/trace/events/compaction.h | 6 +++--- > mm/compaction.c | 12 ++-- > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/include/trace/events/compaction.h > b/include/trace/events/compaction.h > index 2b2a975efd20..d05759d18538 100644 > --- a/include/trace/events/compaction.h > +++ b/include/trace/events/compaction.h > @@ -78,10 +78,10 @@ DEFINE_EVENT(mm_compaction_isolate_template, > mm_compaction_fast_isolate_freepage > #ifdef CONFIG_COMPACTION > TRACE_EVENT(mm_compaction_migratepages, > > - TP_PROTO(struct compact_control *cc, > + TP_PROTO(unsigned int nr_migratepages, > unsigned int nr_succeeded), > > - TP_ARGS(cc, nr_succeeded), > + TP_ARGS(nr_migratepages, nr_succeeded), > > TP_STRUCT__entry( > __field(unsigned long, nr_migrated) > @@ -90,7 +90,7 @@ TRACE_EVENT(mm_compaction_migratepages, > > TP_fast_assign( > __entry->nr_migrated = nr_succeeded; > - __entry->nr_failed = cc->nr_migratepages - nr_succeeded; > + __entry->nr_failed = nr_migratepages - nr_succeeded; > ), > > TP_printk("nr_migrated=%lu nr_failed=%lu", > diff --git a/mm/compaction.c b/mm/compaction.c > index 4494b2914386..218089b29f13 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1798,6 +1798,7 @@ static struct folio *compaction_alloc(struct folio > *src, unsigned long data) > dst = list_entry(cc->freepages.next, struct folio, lru); > list_del(>lru); > cc->nr_freepages--; > + cc->nr_migratepages--; > > return dst; > } > @@ -1813,6 +1814,7 @@ static void compaction_free(struct folio *dst, unsigned > long data) > > list_add(>lru, >freepages); > cc->nr_freepages++; > + cc->nr_migratepages++; > } > > /* possible outcome of isolate_migratepages */ > @@ -2435,7 +2437,7 @@ compact_zone(struct compact_control *cc, struct > capture_control *capc) > unsigned long last_migrated_pfn; > const bool sync = cc->mode != MIGRATE_ASYNC; > bool update_cached; > - unsigned int nr_succeeded = 0; > + unsigned int nr_succeeded = 0, nr_migratepages; > > /* >* These counters track activities during zone compaction. Initialize > @@ -2553,11 +2555,17 @@ compact_zone(struct compact_control *cc, struct > capture_control *capc) > pageblock_start_pfn(cc->migrate_pfn - 1)); > } > > + /* > + * Record the number of pages to migrate since the > + *
[PATCH v3] mm: compaction: update the cc->nr_migratepages when allocating or freeing the freepages
Currently we will use 'cc->nr_freepages >= cc->nr_migratepages' comparison to ensure that enough freepages are isolated in isolate_freepages(), however it just decreases the cc->nr_freepages without updating cc->nr_migratepages in compaction_alloc(), which will waste more CPU cycles and cause too many freepages to be isolated. So we should also update the cc->nr_migratepages when allocating or freeing the freepages to avoid isolating excess freepages. And I can see fewer free pages are scanned and isolated when running thpcompact on my Arm64 server: k6.7 k6.7_patched Ops Compaction pages isolated 120692036.00 118160797.00 Ops Compaction migrate scanned 131210329.00 154093268.00 Ops Compaction free scanned 1090587971.00 1080632536.00 Ops Compact scan efficiency 12.03 14.26 Moreover, I did not see an obvious latency improvements, this is likely because isolating freepages is not the bottleneck in the thpcompact test case. k6.7 k6.7_patched Amean fault-both-1 1089.76 ( 0.00%) 1080.16 * 0.88%* Amean fault-both-3 1616.48 ( 0.00%) 1636.65 * -1.25%* Amean fault-both-5 2266.66 ( 0.00%) 2219.20 * 2.09%* Amean fault-both-7 2909.84 ( 0.00%) 2801.90 * 3.71%* Amean fault-both-12 4861.26 ( 0.00%) 4733.25 * 2.63%* Amean fault-both-18 7351.11 ( 0.00%) 6950.51 * 5.45%* Amean fault-both-24 9059.30 ( 0.00%) 9159.99 * -1.11%* Amean fault-both-3010685.68 ( 0.00%)11399.02 * -6.68%* Signed-off-by: Baolin Wang Acked-by: Mel Gorman --- Hi Andrew, please use this patch to replace below 2 old patches. Thanks. https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-20-04-19=cb30899d456d64fb83ee3e3d95cd9fbb18735d87 https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-20-04-19=65713d1c4fc498d35dc1f7c1234ef815aa128429 Changes from v2: - Add acked tag from Mel. - Fix the mm_compaction_migratepages trace event. Changes from v1: - Rebased on the latest mm-unstable branch. --- include/trace/events/compaction.h | 6 +++--- mm/compaction.c | 12 ++-- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h index 2b2a975efd20..d05759d18538 100644 --- a/include/trace/events/compaction.h +++ b/include/trace/events/compaction.h @@ -78,10 +78,10 @@ DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_fast_isolate_freepage #ifdef CONFIG_COMPACTION TRACE_EVENT(mm_compaction_migratepages, - TP_PROTO(struct compact_control *cc, + TP_PROTO(unsigned int nr_migratepages, unsigned int nr_succeeded), - TP_ARGS(cc, nr_succeeded), + TP_ARGS(nr_migratepages, nr_succeeded), TP_STRUCT__entry( __field(unsigned long, nr_migrated) @@ -90,7 +90,7 @@ TRACE_EVENT(mm_compaction_migratepages, TP_fast_assign( __entry->nr_migrated = nr_succeeded; - __entry->nr_failed = cc->nr_migratepages - nr_succeeded; + __entry->nr_failed = nr_migratepages - nr_succeeded; ), TP_printk("nr_migrated=%lu nr_failed=%lu", diff --git a/mm/compaction.c b/mm/compaction.c index 4494b2914386..218089b29f13 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1798,6 +1798,7 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data) dst = list_entry(cc->freepages.next, struct folio, lru); list_del(>lru); cc->nr_freepages--; + cc->nr_migratepages--; return dst; } @@ -1813,6 +1814,7 @@ static void compaction_free(struct folio *dst, unsigned long data) list_add(>lru, >freepages); cc->nr_freepages++; + cc->nr_migratepages++; } /* possible outcome of isolate_migratepages */ @@ -2435,7 +2437,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) unsigned long last_migrated_pfn; const bool sync = cc->mode != MIGRATE_ASYNC; bool update_cached; - unsigned int nr_succeeded = 0; + unsigned int nr_succeeded = 0, nr_migratepages; /* * These counters track activities during zone compaction. Initialize @@ -2553,11 +2555,17 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) pageblock_start_pfn(cc->migrate_pfn - 1)); } + /* +* Record the number of pages to migrate since the +* compaction_alloc/free() will update cc->nr_migratepages +* properly. +*/ + nr_migratepages = cc->nr_migratepages; err = migrate_pages(>migratepages, compaction_alloc,
Re: WARNING in shmem_release_dquot
On Mon, 29 Jan 2024, Ubisectech Sirius wrote: > Hello. > We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. > Recently, our team has discovered a issue in Linux kernel > 6.8.0-rc1-gecb1b8288dc7. Attached to the email were a POC file of the issue. > > Stack dump: > [ 246.195553][ T4096] [ cut here ] > [ 246.196540][ T4096] quota id 16384 from dquot 888051bd3000, not in rb > tree! > [ 246.198829][ T4096] WARNING: CPU: 1 PID: 4096 at mm/shmem_quota.c:290 > shmem_release_dquot (mm/shmem_quota.c:290 (discriminator 3)) > [ 246.199955][ T4096] Modules linked in: > [ 246.200435][ T4096] CPU: 1 PID: 4096 Comm: kworker/u6:6 Not tainted > 6.8.0-rc1-gecb1b8288dc7 #21 > [ 246.201566][ T4096] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.15.0-1 04/01/2014 > [ 246.202667][ T4096] Workqueue: events_unbound quota_release_workfn > [ 246.203516][ T4096] RIP: 0010:shmem_release_dquot (mm/shmem_quota.c:290 > (discriminator 3)) > [ 246.204276][ T4096] Code: e8 28 d9 18 00 e9 b3 f8 ff ff e8 6e e1 c2 ff c6 > 05 bf e8 1b 0d 01 90 48 c7 c7 80 f0 b8 8a 4c 89 ea 44 89 e6 e8 14 6d 89 ff 90 > <0f> 0b 90 90 e9 18 fb ff ff e8 f5 d8 18 00 e9 a2 fa ff ff e8 0b d9 > All code > >0: e8 28 d9 18 00 call 0x18d92d >5: e9 b3 f8 ff ff jmp0xf8bd >a: e8 6e e1 c2 ff call 0xffc2e17d >f: c6 05 bf e8 1b 0d 01movb $0x1,0xd1be8bf(%rip)# 0xd1be8d5 > 16: 90 nop > 17: 48 c7 c7 80 f0 b8 8amov$0x8ab8f080,%rdi > 1e: 4c 89 eamov%r13,%rdx > 21: 44 89 e6mov%r12d,%esi > 24: e8 14 6d 89 ff call 0xff896d3d > 29: 90 nop > 2a:* 0f 0b ud2 <-- trapping instruction > 2c: 90 nop > 2d: 90 nop > 2e: e9 18 fb ff ff jmp0xfb4b > 33: e8 f5 d8 18 00 call 0x18d92d > 38: e9 a2 fa ff ff jmp0xfadf > 3d: e8 .byte 0xe8 > 3e: 0b d9 or %ecx,%ebx > > Code starting with the faulting instruction > === >0: 0f 0b ud2 >2: 90 nop >3: 90 nop >4: e9 18 fb ff ff jmp0xfb21 >9: e8 f5 d8 18 00 call 0x18d903 >e: e9 a2 fa ff ff jmp0xfab5 > 13: e8 .byte 0xe8 > 14: 0b d9 or %ecx,%ebx > [ 246.206640][ T4096] RSP: 0018:c9000604fbc0 EFLAGS: 00010286 > [ 246.207403][ T4096] RAX: RBX: RCX: > 814c77da > [ 246.208514][ T4096] RDX: 888049a58000 RSI: 814c77e7 RDI: > 0001 > [ 246.209429][ T4096] RBP: R08: 0001 R09: > > [ 246.210362][ T4096] R10: 0001 R11: 0001 R12: > 4000 > [ 246.211367][ T4096] R13: 888051bd3000 R14: dc00 R15: > 888051bd3040 > [ 246.212327][ T4096] FS: () GS:88807ec0() > knlGS: > [ 246.213387][ T4096] CS: 0010 DS: ES: CR0: 80050033 > [ 246.214232][ T4096] CR2: 7ffee748ec80 CR3: 0cb78000 CR4: > 00750ef0 > [ 246.215216][ T4096] DR0: DR1: DR2: > > [ 246.216187][ T4096] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 246.217148][ T4096] PKRU: 5554 > [ 246.217615][ T4096] Call Trace: > [ 246.218090][ T4096] > [ 246.218467][ T4096] ? show_regs (arch/x86/kernel/dumpstack.c:479) > [ 246.218979][ T4096] ? __warn (kernel/panic.c:677) > [ 246.219505][ T4096] ? shmem_release_dquot (mm/shmem_quota.c:290 > (discriminator 3)) > [ 246.220197][ T4096] ? report_bug (lib/bug.c:201 lib/bug.c:219) > [ 246.220775][ T4096] ? shmem_release_dquot (mm/shmem_quota.c:290 > (discriminator 3)) > [ 246.221500][ T4096] ? handle_bug (arch/x86/kernel/traps.c:238) > [ 246.222081][ T4096] ? exc_invalid_op (arch/x86/kernel/traps.c:259 > (discriminator 1)) > [ 246.222687][ T4096] ? asm_exc_invalid_op > (./arch/x86/include/asm/idtentry.h:568) > [ 246.223296][ T4096] ? __warn_printk (./include/linux/context_tracking.h:155 > kernel/panic.c:726) > [ 246.223878][ T4096] ? __warn_printk (kernel/panic.c:717) > [ 246.224460][ T4096] ? shmem_release_dquot (mm/shmem_quota.c:290 > (discriminator 3)) > [ 246.225125][ T4096] quota_release_workfn (fs/quota/dquot.c:839) > [ 246.225792][ T4096] ? dquot_release (fs/quota/dquot.c:810) > [ 246.226401][ T4096] process_one_work (kernel/workqueue.c:2638) > [ 246.227001][ T4096] ? lock_sync (kernel/locking/lockdep.c:5722) > [ 246.227509][ T4096] ? workqueue_congested
Re: [v2 PATCH 0/1] ALSA: virtio: add support for audio controls
Hi Marcin, Thanks for the help! On 20.02.2024 02:24, Marcin Radomski wrote: Thanks Anton for the reupload. I tested this series with a 6.1 kernel guest on a proprietary hypervisor. The controls exposed by the host (BOOLEAN/INTEGER ones, as that was all I could get) worked as expected when adjusted via ALSA APIs. Reviewed-by: Marcin Radomski Tested-By: Marcin Radomski -- Anton Yakovlev Senior Software Engineer OpenSynergy GmbH Rotherstr. 20, 10245 Berlin
Re: [PATCH 10/10] csky/vdso: Use generic union vdso_data_store
On Mon, Feb 19, 2024 at 11:40 PM Anna-Maria Behnsen wrote: > > There is already a generic union definition for vdso_data_store in vdso > datapage header. > > Use this definition to prevent code duplication. > > Signed-off-by: Anna-Maria Behnsen > Cc: Guo Ren > Cc: linux-c...@vger.kernel.org > --- > arch/csky/kernel/vdso.c | 10 ++ > 1 file changed, 2 insertions(+), 8 deletions(-) > > diff --git a/arch/csky/kernel/vdso.c b/arch/csky/kernel/vdso.c > index e74a2504d331..2ca886e4a458 100644 > --- a/arch/csky/kernel/vdso.c > +++ b/arch/csky/kernel/vdso.c > @@ -15,14 +15,8 @@ extern char vdso_start[], vdso_end[]; > static unsigned int vdso_pages; > static struct page **vdso_pagelist; > > -/* > - * The vDSO data page. > - */ > -static union { > - struct vdso_datadata; > - u8 page[PAGE_SIZE]; > -} vdso_data_store __page_aligned_data; > -struct vdso_data *vdso_data = _data_store.data; > +static union vdso_data_store vdso_data_store __page_aligned_data; > +struct vdso_data *vdso_data = vdso_data_store.data; > > static int __init vdso_init(void) > { > -- > 2.39.2 > Acked-by: Guo Ren -- Best Regards Guo Ren
Re: [PATCH 03/10] csky/vdso: Remove superfluous ifdeffery
On Mon, Feb 19, 2024 at 11:40 PM Anna-Maria Behnsen wrote: > > CSKY selects GENERIC_TIME_VSYSCALL. GENERIC_TIME_VSYSCALL dependent > ifdeffery is superfluous. Clean it up. > > Signed-off-by: Anna-Maria Behnsen > Cc: Guo Ren > Cc: linux-c...@vger.kernel.org > --- > arch/csky/include/asm/vdso.h | 5 - > arch/csky/kernel/vdso.c | 4 > 2 files changed, 9 deletions(-) > > diff --git a/arch/csky/include/asm/vdso.h b/arch/csky/include/asm/vdso.h > index bdce581b5fcb..181a15edafe8 100644 > --- a/arch/csky/include/asm/vdso.h > +++ b/arch/csky/include/asm/vdso.h > @@ -5,11 +5,6 @@ > > #include > > -#ifndef GENERIC_TIME_VSYSCALL > -struct vdso_data { > -}; > -#endif > - > /* > * The VDSO symbols are mapped into Linux so we can just use regular symbol > * addressing to get their offsets in userspace. The symbols are mapped at > an > diff --git a/arch/csky/kernel/vdso.c b/arch/csky/kernel/vdso.c > index 16c20d64d165..e74a2504d331 100644 > --- a/arch/csky/kernel/vdso.c > +++ b/arch/csky/kernel/vdso.c > @@ -8,11 +8,7 @@ > #include > > #include > -#ifdef GENERIC_TIME_VSYSCALL > #include > -#else > -#include > -#endif > > extern char vdso_start[], vdso_end[]; > > -- > 2.39.2 > Acked-by: Guo Ren -- Best Regards Guo Ren
Re: [syzbot] [virtualization?] linux-next boot error: WARNING: refcount bug in __free_pages_ok
On Mon, 19 Feb 2024 02:35:20 -0500 "Michael S. Tsirkin" wrote: > On Sun, Feb 18, 2024 at 09:06:18PM -0800, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit:d37e1e4c52bc Add linux-next specific files for 20240216 > > git tree: linux-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=171ca65218 > > kernel config: https://syzkaller.appspot.com/x/.config?x=4bc446d42a7d56c0 > > dashboard link: https://syzkaller.appspot.com/bug?extid=6f3c38e8a6a0297caa5a > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for > > Debian) 2.40 > > > > Downloadable assets: > > disk image: > > https://storage.googleapis.com/syzbot-assets/14d0894504b9/disk-d37e1e4c.raw.xz > > vmlinux: > > https://storage.googleapis.com/syzbot-assets/6cda61e084ee/vmlinux-d37e1e4c.xz > > kernel image: > > https://storage.googleapis.com/syzbot-assets/720c85283c05/bzImage-d37e1e4c.xz > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+6f3c38e8a6a0297ca...@syzkaller.appspotmail.com > > > > ... > > > usbcore: registered new interface driver port100 > > usbcore: registered new interface driver nfcmrvl > > Loading iSCSI transport class v2.0-870. > > virtio_scsi virtio0: 1/0/0 default/read/poll queues > > [ cut here ] > > refcount_t: decrement hit 0; leaking memory. > > WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 > > refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31 > > Modules linked in: > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc4-next-20240216-syzkaller > > #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 01/25/2024 > > RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31 > > Code: b2 00 00 00 e8 b7 94 f0 fc 5b 5d c3 cc cc cc cc e8 ab 94 f0 fc c6 05 > > c6 16 ce 0a 01 90 48 c7 c7 a0 5a fe 8b e8 67 69 b4 fc 90 <0f> 0b 90 90 eb > > d9 e8 8b 94 f0 fc c6 05 a3 16 ce 0a 01 90 48 c7 c7 > > RSP: :c9066e10 EFLAGS: 00010246 > > RAX: 15c2c224c9b50400 RBX: 888020827d2c RCX: 8880162d8000 > > RDX: RSI: RDI: > > RBP: 0004 R08: 8157b942 R09: fbfff1bf95cc > > R10: dc00 R11: fbfff1bf95cc R12: ea000502fdc0 > > R13: ea000502fdc8 R14: 1d4000a05fb9 R15: > > FS: () GS:8880b940() knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: 88823000 CR3: 0df32000 CR4: 003506f0 > > DR0: DR1: DR2: > > DR3: DR6: fffe0ff0 DR7: 0400 > > Call Trace: > > > > reset_page_owner include/linux/page_owner.h:24 [inline] > > free_pages_prepare mm/page_alloc.c:1140 [inline] > > __free_pages_ok+0xc42/0xd70 mm/page_alloc.c:1269 > > make_alloc_exact+0xc4/0x140 mm/page_alloc.c:4847 > > vring_alloc_queue drivers/virtio/virtio_ring.c:319 [inline] > > Wow this seems to be breakage deep in mm/ - all virtio does is > call alloc_pages_exact and that corrupts the refcounts? > Will the bot be performing a bisection? If not, can one please be performed?
[PATCH v3] ring-buffer: Simplify reservation with try_cmpxchg() loop
From: "Steven Rostedt (Google)" Instead of using local_add_return() to reserve the ring buffer data, Mathieu Desnoyers suggested using local_cmpxchg(). This would simplify the reservation with the time keeping code. Although, it does not get rid of the double time stamps (before_stamp and write_stamp), using cmpxchg() does get rid of the more complex case when an interrupting event occurs between getting the timestamps and reserving the data, as when that happens, it just tries again instead of dealing with it. Before we had: w = local_read(_page->write); /* get time stamps */ write = local_add_return(length, _page->write); if (write - length == w) { /* do simple case */ } else { /* do complex case */ } By switching the local_add_return() to a local_try_cmpxchg() it can now be: w = local_read(_page->write); again: /* get time stamps */ if (!local_try_cmpxchg(_page->write, , w + length)) goto again; /* do simple case */ The benchmarks between the two showed no regressions when trying this: Enable: CONFIG_TRACEPOINT_BENCHMARK # trace-cmd record -m 80 -e benchmark sleep 60 Before the patch: # trace-cmd report | tail event_benchmark-944 [003] 1998.910191: benchmark_event: last=150 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150 event_benchmark-944 [003] 1998.910192: benchmark_event: last=149 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=149 event_benchmark-944 [003] 1998.910193: benchmark_event: last=150 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150 event_benchmark-944 [003] 1998.910193: benchmark_event: last=150 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150 event_benchmark-944 [003] 1998.910194: benchmark_event: last=136 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=136 event_benchmark-944 [003] 1998.910194: benchmark_event: last=138 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=138 event_benchmark-944 [003] 1998.910195: benchmark_event: last=150 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150 event_benchmark-944 [003] 1998.910196: benchmark_event: last=151 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=151 event_benchmark-944 [003] 1998.910196: benchmark_event: last=150 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150 event_benchmark-944 [003] 1998.910197: benchmark_event: last=152 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=152 After the patch: # trace-cmd report | tail event_benchmark-848 [004] 171.414716: benchmark_event: last=143 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=143 event_benchmark-848 [004] 171.414717: benchmark_event: last=142 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142 event_benchmark-848 [004] 171.414718: benchmark_event: last=142 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142 event_benchmark-848 [004] 171.414718: benchmark_event: last=141 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141 event_benchmark-848 [004] 171.414719: benchmark_event: last=141 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141 event_benchmark-848 [004] 171.414719: benchmark_event: last=141 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141 event_benchmark-848 [004] 171.414720: benchmark_event: last=140 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=140 event_benchmark-848 [004] 171.414721: benchmark_event: last=142 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142 event_benchmark-848 [004] 171.414721: benchmark_event: last=145 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=145 event_benchmark-848 [004] 171.414722: benchmark_event: last=144 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=144 It may have even improved! Suggested-by: Mathieu Desnoyers Signed-off-by: Steven Rostedt (Google) --- Changes since v2: https://lore.kernel.org/linux-trace-kernel/20240219173003.08339...@gandalf.local.home - Fixed info.field to be info->field kernel/trace/ring_buffer.c | 103 - 1 file changed, 34 insertions(+), 69 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index fd4bfe3ecf01..6809d085ae98 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3455,9 +3455,11 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer, /* Don't let the compiler play games with cpu_buffer->tail_page */ tail_page = info->tail_page = READ_ONCE(cpu_buffer->tail_page); - /*A*/ w = local_read(_page->write) &
Re: [PATCH v2] ring-buffer: Simplify reservation with try_cmpxchg() loop
On Mon, 19 Feb 2024 17:30:03 -0500 Steven Rostedt wrote: > - /*C*/ write = local_add_return(info->length, _page->write); > + /*C*/ if (!local_try_cmpxchg(_page->write, , w + > info->length)) { > + if (info.add_timestamp & (RB_ADD_STAMP_FORCE | > RB_ADD_STAMP_EXTEND)) > + info.length -= RB_LEN_TIME_EXTEND; > + goto again; > + } Bah, I had my ktest config testing a different tree and this wasn't what I was testing :-p The above needs to be: if (info->add_timestamp & (RB_ADD_STAMP_FORCE | RB_ADD_STAMP_EXTEND)) info->length -= RB_LEN_TIME_EXTEND; Will send out a v3. -- Steve
Re: [RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters
On Mon Feb 19, 2024 at 10:25 PM UTC, Haitao Huang wrote: > On Mon, 19 Feb 2024 14:42:29 -0600, Jarkko Sakkinen > wrote: > > > On Mon Feb 19, 2024 at 3:56 PM UTC, Dave Hansen wrote: > >> On 2/19/24 07:39, Haitao Huang wrote: > >> > Remove all boolean parameters for 'reclaim' from the function > >> > sgx_alloc_epc_page() and its callers by making two versions of each > >> > function. > >> > > >> > Also opportunistically remove non-static declaration of > >> > __sgx_alloc_epc_page() and a typo > >> > > >> > Signed-off-by: Haitao Huang > >> > Suggested-by: Jarkko Sakkinen > >> > --- > >> > arch/x86/kernel/cpu/sgx/encl.c | 56 +-- > >> > arch/x86/kernel/cpu/sgx/encl.h | 6 ++- > >> > arch/x86/kernel/cpu/sgx/ioctl.c | 23 --- > >> > arch/x86/kernel/cpu/sgx/main.c | 68 > >> ++--- > >> > arch/x86/kernel/cpu/sgx/sgx.h | 4 +- > >> > arch/x86/kernel/cpu/sgx/virt.c | 2 +- > >> > 6 files changed, 115 insertions(+), 44 deletions(-) > >> > >> Jarkko, did this turn out how you expected? > >> > >> I think passing around a function pointer to *only* communicate 1 bit of > >> information is a _bit_ overkill here. > >> > >> Simply replacing the bool with: > >> > >> enum sgx_reclaim { > >>SGX_NO_RECLAIM, > >>SGX_DO_RECLAIM > >> }; > >> > >> would do the same thing. Right? > >> > >> Are you sure you want a function pointer for this? > > > > To look this in context I drafted quickly two branches representing > > imaginary next version of the patch set. > > > > I guess this would simpler and totally sufficient approach. > > > > With this approach I'd then change also: > > > > [PATCH v9 04/15] x86/sgx: Implement basic EPC misc cgroup functionality > > > > And add the enum-parameter already in that patch with just "no reclaim" > > enum. I.e. then 10/15 will add only "do reclaim" and the new > > functionality. > > > > BR, Jarkko > > > > Thanks. My understanding is: > > 1) For this patch, replace the boolean with the enum as Dave suggested. No > two versions of the same functions. And this is a prerequisite for the > cgroup series, positioned before [PATCH v9 04/15] > > 2) For [PATCH v9 04/15], pass a hard coded SGX_NO_RECLAIM to > sgx_epc_cg_try_charge() from sgx_alloc_epc_page(). Yup, this will make the whole patch set also a bit leaner as the API does not change in the middle. > > 3) For [PATCH v9 10/15], remove the hard coded value, pass the reclaim > enum parameter value from sgx_alloc_epc_page() to sgx_epc_cg_try_charge() > and add the reclaim logic. > > I'll send patches soon. But please let me know if I misunderstood. BR, Jarkko
[PATCH v2] ring-buffer: Simplify reservation with try_cmpxchg() loop
From: "Steven Rostedt (Google)" Instead of using local_add_return() to reserve the ring buffer data, Mathieu Desnoyers suggested using local_cmpxchg(). This would simplify the reservation with the time keeping code. Although, it does not get rid of the double time stamps (before_stamp and write_stamp), using cmpxchg() does get rid of the more complex case when an interrupting event occurs between getting the timestamps and reserving the data, as when that happens, it just tries again instead of dealing with it. Before we had: w = local_read(_page->write); /* get time stamps */ write = local_add_return(length, _page->write); if (write - length == w) { /* do simple case */ } else { /* do complex case */ } By switching the local_add_return() to a local_try_cmpxchg() it can now be: w = local_read(_page->write); again: /* get time stamps */ if (!local_try_cmpxchg(_page->write, , w + length)) goto again; /* do simple case */ The benchmarks between the two showed no regressions when trying this: Enable: CONFIG_TRACEPOINT_BENCHMARK # trace-cmd record -m 80 -e benchmark sleep 60 Before the patch: # trace-cmd report | tail event_benchmark-944 [003] 1998.910191: benchmark_event: last=150 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150 event_benchmark-944 [003] 1998.910192: benchmark_event: last=149 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=149 event_benchmark-944 [003] 1998.910193: benchmark_event: last=150 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150 event_benchmark-944 [003] 1998.910193: benchmark_event: last=150 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150 event_benchmark-944 [003] 1998.910194: benchmark_event: last=136 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=136 event_benchmark-944 [003] 1998.910194: benchmark_event: last=138 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=138 event_benchmark-944 [003] 1998.910195: benchmark_event: last=150 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150 event_benchmark-944 [003] 1998.910196: benchmark_event: last=151 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=151 event_benchmark-944 [003] 1998.910196: benchmark_event: last=150 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150 event_benchmark-944 [003] 1998.910197: benchmark_event: last=152 first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=152 After the patch: # trace-cmd report | tail event_benchmark-848 [004] 171.414716: benchmark_event: last=143 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=143 event_benchmark-848 [004] 171.414717: benchmark_event: last=142 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142 event_benchmark-848 [004] 171.414718: benchmark_event: last=142 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142 event_benchmark-848 [004] 171.414718: benchmark_event: last=141 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141 event_benchmark-848 [004] 171.414719: benchmark_event: last=141 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141 event_benchmark-848 [004] 171.414719: benchmark_event: last=141 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141 event_benchmark-848 [004] 171.414720: benchmark_event: last=140 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=140 event_benchmark-848 [004] 171.414721: benchmark_event: last=142 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142 event_benchmark-848 [004] 171.414721: benchmark_event: last=145 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=145 event_benchmark-848 [004] 171.414722: benchmark_event: last=144 first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=144 It may have even improved! Suggested-by: Mathieu Desnoyers Signed-off-by: Steven Rostedt (Google) --- Changes since v1: https://lore.kernel.org/linux-trace-kernel/20240118181206.4977d...@gandalf.local.home - If the try_cmpxchg() fails when it added a timestamp, it is likely to add the length of that timestamp again. Subtract the length if it fails. kernel/trace/ring_buffer.c | 103 - 1 file changed, 34 insertions(+), 69 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index fd4bfe3ecf01..4ba0af82c33e 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3455,9 +3455,11 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer, /* Don't let the compiler play games with cpu_buffer->tail_page */
Re: [RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters
On Mon, 19 Feb 2024 14:42:29 -0600, Jarkko Sakkinen wrote: On Mon Feb 19, 2024 at 3:56 PM UTC, Dave Hansen wrote: On 2/19/24 07:39, Haitao Huang wrote: > Remove all boolean parameters for 'reclaim' from the function > sgx_alloc_epc_page() and its callers by making two versions of each > function. > > Also opportunistically remove non-static declaration of > __sgx_alloc_epc_page() and a typo > > Signed-off-by: Haitao Huang > Suggested-by: Jarkko Sakkinen > --- > arch/x86/kernel/cpu/sgx/encl.c | 56 +-- > arch/x86/kernel/cpu/sgx/encl.h | 6 ++- > arch/x86/kernel/cpu/sgx/ioctl.c | 23 --- > arch/x86/kernel/cpu/sgx/main.c | 68 ++--- > arch/x86/kernel/cpu/sgx/sgx.h | 4 +- > arch/x86/kernel/cpu/sgx/virt.c | 2 +- > 6 files changed, 115 insertions(+), 44 deletions(-) Jarkko, did this turn out how you expected? I think passing around a function pointer to *only* communicate 1 bit of information is a _bit_ overkill here. Simply replacing the bool with: enum sgx_reclaim { SGX_NO_RECLAIM, SGX_DO_RECLAIM }; would do the same thing. Right? Are you sure you want a function pointer for this? To look this in context I drafted quickly two branches representing imaginary next version of the patch set. I guess this would simpler and totally sufficient approach. With this approach I'd then change also: [PATCH v9 04/15] x86/sgx: Implement basic EPC misc cgroup functionality And add the enum-parameter already in that patch with just "no reclaim" enum. I.e. then 10/15 will add only "do reclaim" and the new functionality. BR, Jarkko Thanks. My understanding is: 1) For this patch, replace the boolean with the enum as Dave suggested. No two versions of the same functions. And this is a prerequisite for the cgroup series, positioned before [PATCH v9 04/15] 2) For [PATCH v9 04/15], pass a hard coded SGX_NO_RECLAIM to sgx_epc_cg_try_charge() from sgx_alloc_epc_page(). 3) For [PATCH v9 10/15], remove the hard coded value, pass the reclaim enum parameter value from sgx_alloc_epc_page() to sgx_epc_cg_try_charge() and add the reclaim logic. I'll send patches soon. But please let me know if I misunderstood. Thanks Haitao
[PATCH v3 2/2] ARM: dts: qcom: Add support for Samsung Galaxy Tab 4 8.0 Wi-Fi
Add support for this tablet based on the MSM8226 SoC, codenamed "milletwifi". Acked-by: Linus Walleij Reviewed-by: Luca Weiss Signed-off-by: Bryant Mairs --- arch/arm/boot/dts/qcom/Makefile | 1 + .../qcom/qcom-apq8026-samsung-milletwifi.dts | 573 ++ 2 files changed, 574 insertions(+) create mode 100644 arch/arm/boot/dts/qcom/qcom-apq8026-samsung-milletwifi.dts diff --git a/arch/arm/boot/dts/qcom/Makefile b/arch/arm/boot/dts/qcom/Makefile index 9cc1e14e6cd0..730d98c2c715 100644 --- a/arch/arm/boot/dts/qcom/Makefile +++ b/arch/arm/boot/dts/qcom/Makefile @@ -5,6 +5,7 @@ dtb-$(CONFIG_ARCH_QCOM) += \ qcom-apq8026-huawei-sturgeon.dtb \ qcom-apq8026-lg-lenok.dtb \ qcom-apq8026-samsung-matisse-wifi.dtb \ + qcom-apq8026-samsung-milletwifi.dtb \ qcom-apq8060-dragonboard.dtb \ qcom-apq8064-cm-qs600.dtb \ qcom-apq8064-ifc6410.dtb \ diff --git a/arch/arm/boot/dts/qcom/qcom-apq8026-samsung-milletwifi.dts b/arch/arm/boot/dts/qcom/qcom-apq8026-samsung-milletwifi.dts new file mode 100644 index ..7d519156d91d --- /dev/null +++ b/arch/arm/boot/dts/qcom/qcom-apq8026-samsung-milletwifi.dts @@ -0,0 +1,573 @@ +// SPDX-License-Identifier: BSD-3-Clause +/* + * Copyright (c) 2022, Matti Lehtimäki + * Copyright (c) 2023, Bryant Mairs + */ + +/dts-v1/; + +#include +#include +#include "qcom-msm8226.dtsi" +#include "pm8226.dtsi" + +/delete-node/ _region; +/delete-node/ _region; + +/ { + model = "Samsung Galaxy Tab 4 8.0 Wi-Fi"; + compatible = "samsung,milletwifi", "qcom,apq8026"; + chassis-type = "tablet"; + + aliases { + display0 = + mmc0 = _1; /* SDC1 eMMC slot */ + mmc1 = _2; /* SDC2 SD card slot */ + }; + + chosen { + #address-cells = <1>; + #size-cells = <1>; + ranges; + + stdout-path = "display0"; + + framebuffer0: framebuffer@320 { + compatible = "simple-framebuffer"; + reg = <0x0320 0x80>; + width = <800>; + height = <1280>; + stride = <(800 * 3)>; + format = "r8g8b8"; + }; + }; + + gpio-hall-sensor { + compatible = "gpio-keys"; + + event-hall-sensor { + label = "Cover"; + gpios = < 37 GPIO_ACTIVE_LOW>; + linux,input-type = ; + linux,code = ; + debounce-interval = <15>; + linux,can-disable; + wakeup-source; + }; + }; + + gpio-keys { + compatible = "gpio-keys"; + autorepeat; + + key-home { + label = "Home"; + gpios = < 108 GPIO_ACTIVE_LOW>; + linux,code = ; + debounce-interval = <15>; + }; + + key-volume-down { + label = "Volume Down"; + gpios = < 107 GPIO_ACTIVE_LOW>; + linux,code = ; + debounce-interval = <15>; + }; + + key-volume-up { + label = "Volume Up"; + gpios = < 106 GPIO_ACTIVE_LOW>; + linux,code = ; + debounce-interval = <15>; + }; + }; + + i2c-backlight { + compatible = "i2c-gpio"; + sda-gpios = < 20 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>; + scl-gpios = < 21 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>; + + pinctrl-0 = <_i2c_default_state>; + pinctrl-names = "default"; + + i2c-gpio,delay-us = <4>; + + #address-cells = <1>; + #size-cells = <0>; + + backlight@2c { + compatible = "ti,lp8556"; + reg = <0x2c>; + enable-supply = <_backlight_vddio>; + + dev-ctrl = /bits/ 8 <0x80>; + init-brt = /bits/ 8 <0x3f>; + + /* +* Change transition duration: 200ms, Change +* transition strength: heavy, PWM hysteresis: +* 1-bit w/ 8-bit resolution +*/ + rom-a3h { + rom-addr = /bits/ 8 <0xa3>; + rom-val = /bits/ 8 <0x5e>; + }; + + /* +* PWM phase configuration: 3-phase/3 drivers +* (0, 120deg, 240deg, -, -, -), +* PWM frequency: 9616Hz (10-bit) +*/ +
[PATCH v3 1/2] dt-bindings: arm: qcom: Document samsung,milletwifi device
Add binding documentation for Samsung Galaxy Tab 4 8.0 Wi-Fi tablet which is based on Snapdragon 400 (apq8026) SoC. Acked-by: Linus Walleij Acked-by: Conor Dooley Signed-off-by: Bryant Mairs --- Documentation/devicetree/bindings/arm/qcom.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/arm/qcom.yaml b/Documentation/devicetree/bindings/arm/qcom.yaml index 2b993b4c51dc..c11bb2a81643 100644 --- a/Documentation/devicetree/bindings/arm/qcom.yaml +++ b/Documentation/devicetree/bindings/arm/qcom.yaml @@ -104,6 +104,7 @@ properties: - huawei,sturgeon - lg,lenok - samsung,matisse-wifi + - samsung,milletwifi - const: qcom,apq8026 - items: -- 2.43.0
[PATCH v3 0/2] Add samsung-milletwifi
This series adds support for samsung-milletwifi, the smaller cousin to samsung-matisselte. I've used the manufacturer's naming convention for consistency. Hardware currently supported: - Display - Cover detection - Physical buttons - Touchscreen and touchkeys - Accelerometer Acked-by: Linus Walleij Signed-off-by: Bryant Mairs --- Changes in v3: - Minor style changes to the DTS - Corrected list of supported hardware peripherals - Link to v2: https://lore.kernel.org/all/20240215172617.115307-1-bry...@mai.rs/ --- Changes in v2: - Squashed DTS work into a single commit - Add touchkeys to the DTS - Link to v1: https://lore.kernel.org/all/20231105204759.37107-1-bry...@mai.rs/ --- Bryant Mairs (2): dt-bindings: arm: qcom: Document samsung,milletwifi device ARM: dts: qcom: Add support for Samsung Galaxy Tab 4 8.0 Wi-Fi .../devicetree/bindings/arm/qcom.yaml | 1 + arch/arm/boot/dts/qcom/Makefile | 1 + .../qcom/qcom-apq8026-samsung-milletwifi.dts | 573 ++ 3 files changed, 575 insertions(+) create mode 100644 arch/arm/boot/dts/qcom/qcom-apq8026-samsung-milletwifi.dts -- 2.43.0
Re: [RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters
On Mon Feb 19, 2024 at 3:56 PM UTC, Dave Hansen wrote: > On 2/19/24 07:39, Haitao Huang wrote: > > Remove all boolean parameters for 'reclaim' from the function > > sgx_alloc_epc_page() and its callers by making two versions of each > > function. > > > > Also opportunistically remove non-static declaration of > > __sgx_alloc_epc_page() and a typo > > > > Signed-off-by: Haitao Huang > > Suggested-by: Jarkko Sakkinen > > --- > > arch/x86/kernel/cpu/sgx/encl.c | 56 +-- > > arch/x86/kernel/cpu/sgx/encl.h | 6 ++- > > arch/x86/kernel/cpu/sgx/ioctl.c | 23 --- > > arch/x86/kernel/cpu/sgx/main.c | 68 ++--- > > arch/x86/kernel/cpu/sgx/sgx.h | 4 +- > > arch/x86/kernel/cpu/sgx/virt.c | 2 +- > > 6 files changed, 115 insertions(+), 44 deletions(-) > > Jarkko, did this turn out how you expected? > > I think passing around a function pointer to *only* communicate 1 bit of > information is a _bit_ overkill here. > > Simply replacing the bool with: > > enum sgx_reclaim { > SGX_NO_RECLAIM, > SGX_DO_RECLAIM > }; > > would do the same thing. Right? > > Are you sure you want a function pointer for this? To look this in context I drafted quickly two branches representing imaginary next version of the patch set. I guess this would simpler and totally sufficient approach. With this approach I'd then change also: [PATCH v9 04/15] x86/sgx: Implement basic EPC misc cgroup functionality And add the enum-parameter already in that patch with just "no reclaim" enum. I.e. then 10/15 will add only "do reclaim" and the new functionality. BR, Jarkko
Re: [RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters
On Mon Feb 19, 2024 at 3:39 PM UTC, Haitao Huang wrote: > Remove all boolean parameters for 'reclaim' from the function > sgx_alloc_epc_page() and its callers by making two versions of each > function. > > Also opportunistically remove non-static declaration of > __sgx_alloc_epc_page() and a typo > > Signed-off-by: Haitao Huang > Suggested-by: Jarkko Sakkinen I think this is for better. My view point for kernel patches overally is that: 1. A feature should leave the subsystem in cleaner state as far as the existing framework of doing things goes. 2. A bugfix can sometimes do the opposite if corner case requires some weird dance to perform. BR, Jarkko
Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()
On Mon Feb 19, 2024 at 3:12 PM UTC, Haitao Huang wrote: > On Tue, 13 Feb 2024 19:52:25 -0600, Jarkko Sakkinen > wrote: > > > On Tue Feb 13, 2024 at 1:15 AM EET, Haitao Huang wrote: > >> Hi Jarkko > >> > >> On Mon, 12 Feb 2024 13:55:46 -0600, Jarkko Sakkinen > >> wrote: > >> > >> > On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote: > >> >> From: Kristen Carlson Accardi > >> >> > >> >> When the EPC usage of a cgroup is near its limit, the cgroup needs to > >> >> reclaim pages used in the same cgroup to make room for new > >> allocations. > >> >> This is analogous to the behavior that the global reclaimer is > >> triggered > >> >> when the global usage is close to total available EPC. > >> >> > >> >> Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate > >> >> whether synchronous reclaim is allowed or not. And trigger the > >> >> synchronous/asynchronous reclamation flow accordingly. > >> >> > >> >> Note at this point, all reclaimable EPC pages are still tracked in > >> the > >> >> global LRU and per-cgroup LRUs are empty. So no per-cgroup > >> reclamation > >> >> is activated yet. > >> >> > >> >> Co-developed-by: Sean Christopherson > >> > >> >> Signed-off-by: Sean Christopherson > >> >> Signed-off-by: Kristen Carlson Accardi > >> >> Co-developed-by: Haitao Huang > >> >> Signed-off-by: Haitao Huang > >> >> --- > >> >> V7: > >> >> - Split this out from the big patch, #10 in V6. (Dave, Kai) > >> >> --- > >> >> arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 -- > >> >> arch/x86/kernel/cpu/sgx/epc_cgroup.h | 4 ++-- > >> >> arch/x86/kernel/cpu/sgx/main.c | 2 +- > >> >> 3 files changed, 27 insertions(+), 5 deletions(-) > >> >> > >> >> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c > >> >> b/arch/x86/kernel/cpu/sgx/epc_cgroup.c > >> >> index d399fda2b55e..abf74fdb12b4 100644 > >> >> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c > >> >> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c > >> >> @@ -184,13 +184,35 @@ static void > >> >> sgx_epc_cgroup_reclaim_work_func(struct work_struct *work) > >> >> /** > >> >> * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single > >> EPC > >> >> page > >> >> * @epc_cg:The EPC cgroup to be charged for the page. > >> >> + * @reclaim: Whether or not synchronous reclaim is allowed > >> >> * Return: > >> >> * * %0 - If successfully charged. > >> >> * * -errno - for failures. > >> >> */ > >> >> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg) > >> >> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool > >> >> reclaim) > >> >> { > >> >> - return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, > >> PAGE_SIZE); > >> >> + for (;;) { > >> >> + if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, > >> >> + PAGE_SIZE)) > >> >> + break; > >> >> + > >> >> + if (sgx_epc_cgroup_lru_empty(epc_cg->cg)) > >> >> + return -ENOMEM; > >> >> + + if (signal_pending(current)) > >> >> + return -ERESTARTSYS; > >> >> + > >> >> + if (!reclaim) { > >> >> + queue_work(sgx_epc_cg_wq, > >> >> _cg->reclaim_work); > >> >> + return -EBUSY; > >> >> + } > >> >> + > >> >> + if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false)) > >> >> + /* All pages were too young to reclaim, try > >> >> again a little later > >> */ > >> >> + schedule(); > >> > > >> > This will be total pain to backtrack after a while when something > >> > needs to be changed so there definitely should be inline comments > >> > addressing each branch condition. > >> > > >> > I'd rethink this as: > >> > > >> > 1. Create static __sgx_epc_cgroup_try_charge() for addressing single > >> >iteration with the new "reclaim" parameter. > >> > 2. Add a new sgx_epc_group_try_charge_reclaim() function. > >> > > >> > There's a bit of redundancy with sgx_epc_cgroup_try_charge() and > >> > sgx_epc_cgroup_try_charge_reclaim() because both have almost the > >> > same loop calling internal __sgx_epc_cgroup_try_charge() with > >> > different parameters. That is totally acceptable. > >> > > >> > Please also add my suggested-by. > >> > > >> > BR, Jarkko > >> > > >> > BR, Jarkko > >> > > >> For #2: > >> The only caller of this function, sgx_alloc_epc_page(), has the same > >> boolean which is passed into this this function. > > > > I know. This would be good opportunity to fix that up. Large patch > > sets should try to make the space for its feature best possible and > > thus also clean up the code base overally. > > > >> If we separate it into sgx_epc_cgroup_try_charge() and > >> sgx_epc_cgroup_try_charge_reclaim(), then the caller has to have the > >> if/else branches. So separation here seems not help? > > > > Of course it does. It makes the code in
Re: [PATCH v17 3/6] tracing: Add snapshot refcount
On Mon, 19 Feb 2024 13:17:54 -0500 Steven Rostedt wrote: > On Tue, 13 Feb 2024 11:49:42 + > Vincent Donnefort wrote: > > > @@ -9678,7 +9739,9 @@ trace_array_create_systems(const char *name, const > > char *systems) > > raw_spin_lock_init(>start_lock); > > > > tr->max_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; > > - > > +#ifdef CONFIG_TRCER_MAX_TRACE > > Oops! > > I'll fix this too. > > > > + spinlock_init(>snapshot_trigger_lock); And this too: kernel/trace/trace.c:9245:9: error: implicit declaration of function ‘spinlock_init’; did you mean ‘spin_lock_init’? [-Werror=implicit-function-declaration] 9245 | spinlock_init(>snapshot_trigger_lock); | ^ -- Steve > > +#endif > > tr->current_trace = _trace; > > > > INIT_LIST_HEAD(>systems);
Re: [PATCH v17 3/6] tracing: Add snapshot refcount
On Tue, 13 Feb 2024 11:49:42 + Vincent Donnefort wrote: > @@ -9678,7 +9739,9 @@ trace_array_create_systems(const char *name, const char > *systems) > raw_spin_lock_init(>start_lock); > > tr->max_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; > - > +#ifdef CONFIG_TRCER_MAX_TRACE Oops! I'll fix this too. -- Steve > + spinlock_init(>snapshot_trigger_lock); > +#endif > tr->current_trace = _trace; > > INIT_LIST_HEAD(>systems);
Re: [PATCH v11 0/4] add zynqmp TCM bindings
Hello, By mistake same set of patches were sent twice in same git send-email command. Anyone can be reviewed. Please let me know if I need to take any action to fix it. Thanks. On 2/19/24 11:44 AM, Tanmay Shah wrote: > Tightly-Coupled Memories(TCMs) are low-latency memory that provides > predictable instruction execution and predictable data load/store > timing. Each Cortex-R5F processor contains exclusive two 64 KB memory > banks on the ATCM and BTCM ports, for a total of 128 KB of memory. > In lockstep mode, both 128KB memory is accessible to the cluster. > > As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following > is address space of TCM memory. The bindings in this patch series > introduces properties to accommodate following address space with > address translation between Linux and Cortex-R5 views. > > | | | | > | --- | --- | --- | > | *Mode*| *R5 View* | *Linux view* | Notes | > | *Split Mode* | *start addr*| *start addr* | | > | R5_0 ATCM (64 KB) | 0x_ | 0xFFE0_ | | > | R5_0 BTCM (64 KB) | 0x0002_ | 0xFFE2_ | | > | R5_1 ATCM (64 KB) | 0x_ | 0xFFE9_ | alias of 0xFFE1_ | > | R5_1 BTCM (64 KB) | 0x0002_ | 0xFFEB_ | alias of 0xFFE3_ | > | ___ | ___ |___ | | > | *Lockstep Mode*| | | | > | R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_ | | > | R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_ | | > > References: > UG1085 TCM address space: > https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map > > Changes in v11: > - Fix yamllint warning and reduce indentation as needed > - Remove redundant initialization of the variable > - Return correct error code if memory allocation failed > > Changs in v10: > - Add new patch (1/4) to series that changes hardcode TCM addresses in > lockstep mode and removes separate handling of TCM in lockstep and > split mode > - modify number of "reg", "reg-names" and "power-domains" entries > based on cluster mode > - Add extra optional atcm and btcm in "reg" property for lockstep mode > - Add "reg-names" for extra optional atcm and btcm for lockstep mode > - Drop previous Ack as bindings has new change > - Add individual tcm regions via "reg" and "reg-names" for lockstep mode > - Add each tcm's power-domains in lockstep mode > - Drop previous Ack as new change in dts patchset > - Remove redundant changes in driver to handle TCM in lockstep mode > > Changes in v9: > - Fix rproc lockstep dts > - Introduce new API to request and release core1 TCM power-domains in > lockstep mode. This will be used during prepare -> add_tcm_banks > callback to enable TCM in lockstep mode. > - Parse TCM from device-tree in lockstep mode and split mode in > uniform way. > - Fix TCM representation in device-tree in lockstep mode. > - Fix comments as suggested > > Changes in v8: > - Remove use of pm_domains framework > - Remove checking of pm_domain_id validation to power on/off tcm > - Remove spurious change > - parse power-domains property from device-tree and use EEMI calls > to power on/off TCM instead of using pm domains framework > > Changes in v7: > - %s/pm_dev1/pm_dev_core0/r > - %s/pm_dev_link1/pm_dev_core0_link/r > - %s/pm_dev2/pm_dev_core1/r > - %s/pm_dev_link2/pm_dev_core1_link/r > - remove pm_domain_id check to move next patch > - add comment about how 1st entry in pm domain list is used > - fix loop when jump to fail_add_pm_domains loop > - move checking of pm_domain_id from previous patch > - fix mem_bank_data memory allocation > > Changes in v6: > - Introduce new node entry for r5f cluster split mode dts and > keep it disabled by default. > - Keep remoteproc lockstep mode enabled by default to maintian > back compatibility. > - Enable split mode only for zcu102 board to demo split mode use > - Remove spurious change > - Handle errors in add_pm_domains function > - Remove redundant code to handle errors from remove_pm_domains > - Missing . at the end of the commit message > - remove redundant initialization of variables > - remove fail_tcm label and relevant code to free memory > acquired using devm_* API. As this will be freed when device free it > - add extra check to see if "reg" property is supported or not > > Changes in v5: > - maintain Rob's Ack on bindings patch as no changes in bindings > - split previous patch into multiple patches > - Use pm domain framework to turn on/off TCM > - Add support of parsing TCM information from device-tree > - maintain backward compatibility with previous bindings without > TCM information available in device-tree > > This patch series continues
[PATCH v11 4/4] remoteproc: zynqmp: parse TCM from device tree
ZynqMP TCM information was fixed in driver. Now ZynqMP TCM information is available in device-tree. Parse TCM information in driver as per new bindings. Signed-off-by: Tanmay Shah --- Changes in v11: - Remove redundant initialization of the variable - return correct error code if memory allocation failed drivers/remoteproc/xlnx_r5_remoteproc.c | 112 ++-- 1 file changed, 107 insertions(+), 5 deletions(-) diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c index 42b0384d34f2..d4a22caebaad 100644 --- a/drivers/remoteproc/xlnx_r5_remoteproc.c +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c @@ -74,8 +74,8 @@ struct mbox_info { }; /* - * Hardcoded TCM bank values. This will be removed once TCM bindings are - * accepted for system-dt specifications and upstreamed in linux kernel + * Hardcoded TCM bank values. This will stay in driver to maintain backward + * compatibility with device-tree that does not have TCM information. */ static const struct mem_bank_data zynqmp_tcm_banks_split[] = { {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each */ @@ -757,6 +757,103 @@ static struct zynqmp_r5_core *zynqmp_r5_add_rproc_core(struct device *cdev) return ERR_PTR(ret); } +static int zynqmp_r5_get_tcm_node_from_dt(struct zynqmp_r5_cluster *cluster) +{ + int i, j, tcm_bank_count, ret, tcm_pd_idx, pd_count; + struct of_phandle_args out_args; + struct zynqmp_r5_core *r5_core; + struct platform_device *cpdev; + struct mem_bank_data *tcm; + struct device_node *np; + struct resource *res; + u64 abs_addr, size; + struct device *dev; + + for (i = 0; i < cluster->core_count; i++) { + r5_core = cluster->r5_cores[i]; + dev = r5_core->dev; + np = r5_core->np; + + pd_count = of_count_phandle_with_args(np, "power-domains", + "#power-domain-cells"); + + if (pd_count <= 0) { + dev_err(dev, "invalid power-domains property, %d\n", pd_count); + return -EINVAL; + } + + /* First entry in power-domains list is for r5 core, rest for TCM. */ + tcm_bank_count = pd_count - 1; + + if (tcm_bank_count <= 0) { + dev_err(dev, "invalid TCM count %d\n", tcm_bank_count); + return -EINVAL; + } + + r5_core->tcm_banks = devm_kcalloc(dev, tcm_bank_count, + sizeof(struct mem_bank_data *), + GFP_KERNEL); + if (!r5_core->tcm_banks) + return -ENOMEM; + + r5_core->tcm_bank_count = tcm_bank_count; + for (j = 0, tcm_pd_idx = 1; j < tcm_bank_count; j++, tcm_pd_idx++) { + tcm = devm_kzalloc(dev, sizeof(struct mem_bank_data), + GFP_KERNEL); + if (!tcm) + return -ENOMEM; + + r5_core->tcm_banks[j] = tcm; + + /* Get power-domains id of TCM. */ + ret = of_parse_phandle_with_args(np, "power-domains", +"#power-domain-cells", +tcm_pd_idx, _args); + if (ret) { + dev_err(r5_core->dev, + "failed to get tcm %d pm domain, ret %d\n", + tcm_pd_idx, ret); + return ret; + } + tcm->pm_domain_id = out_args.args[0]; + of_node_put(out_args.np); + + /* Get TCM address without translation. */ + ret = of_property_read_reg(np, j, _addr, ); + if (ret) { + dev_err(dev, "failed to get reg property\n"); + return ret; + } + + /* +* Remote processor can address only 32 bits +* so convert 64-bits into 32-bits. This will discard +* any unwanted upper 32-bits. +*/ + tcm->da = (u32)abs_addr; + tcm->size = (u32)size; + + cpdev = to_platform_device(dev); + res = platform_get_resource(cpdev, IORESOURCE_MEM, j); + if (!res) { + dev_err(dev, "failed to get tcm resource\n"); + return -EINVAL; + } + +
[PATCH v11 3/4] dts: zynqmp: add properties for TCM in remoteproc
Add properties as per new bindings in zynqmp remoteproc node to represent TCM address and size. This patch also adds alternative remoteproc node to represent remoteproc cluster in split mode. By default lockstep mode is enabled and users should disable it before using split mode dts. Both device-tree nodes can't be used simultaneously one of them must be disabled. For zcu102-1.0 and zcu102-1.1 board remoteproc split mode dts node is enabled and lockstep mode dts is disabled. Signed-off-by: Tanmay Shah --- .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts | 8 +++ arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 65 +-- 2 files changed, 68 insertions(+), 5 deletions(-) diff --git a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts index c8f71a1aec89..495ca94b45db 100644 --- a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts +++ b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts @@ -14,6 +14,14 @@ / { compatible = "xlnx,zynqmp-zcu102-rev1.0", "xlnx,zynqmp-zcu102", "xlnx,zynqmp"; }; +_split { + status = "okay"; +}; + +_lockstep { + status = "disabled"; +}; + { #address-cells = <1>; #size-cells = <1>; diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi index eaba466804bc..c8a7fd0f3a1e 100644 --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi @@ -248,19 +248,74 @@ fpga_full: fpga-full { ranges; }; - remoteproc { + rproc_lockstep: remoteproc@ffe0 { compatible = "xlnx,zynqmp-r5fss"; xlnx,cluster-mode = <1>; - r5f-0 { + #address-cells = <2>; + #size-cells = <2>; + + ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>, +<0x0 0x2 0x0 0xffe2 0x0 0x1>, +<0x0 0x1 0x0 0xffe1 0x0 0x1>, +<0x0 0x3 0x0 0xffe3 0x0 0x1>; + + r5f@0 { + compatible = "xlnx,zynqmp-r5f"; + reg = <0x0 0x0 0x0 0x1>, + <0x0 0x2 0x0 0x1>, + <0x0 0x1 0x0 0x1>, + <0x0 0x3 0x0 0x1>; + reg-names = "atcm0", "btcm0", "atcm1", "btcm1"; + power-domains = <_firmware PD_RPU_0>, + <_firmware PD_R5_0_ATCM>, + <_firmware PD_R5_0_BTCM>, + <_firmware PD_R5_1_ATCM>, + <_firmware PD_R5_1_BTCM>; + memory-region = <_0_fw_image>; + }; + + r5f@1 { + compatible = "xlnx,zynqmp-r5f"; + reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>; + reg-names = "atcm0", "btcm0"; + power-domains = <_firmware PD_RPU_1>, + <_firmware PD_R5_1_ATCM>, + <_firmware PD_R5_1_BTCM>; + memory-region = <_1_fw_image>; + }; + }; + + rproc_split: remoteproc-split@ffe0 { + status = "disabled"; + compatible = "xlnx,zynqmp-r5fss"; + xlnx,cluster-mode = <0>; + + #address-cells = <2>; + #size-cells = <2>; + + ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>, +<0x0 0x2 0x0 0xffe2 0x0 0x1>, +<0x1 0x0 0x0 0xffe9 0x0 0x1>, +<0x1 0x2 0x0 0xffeb 0x0 0x1>; + + r5f@0 { compatible = "xlnx,zynqmp-r5f"; - power-domains = <_firmware PD_RPU_0>; + reg = <0x0 0x0 0x0 0x1>, <0x0 0x2 0x0 0x1>; + reg-names = "atcm0", "btcm0"; + power-domains = <_firmware PD_RPU_0>, + <_firmware PD_R5_0_ATCM>, + <_firmware PD_R5_0_BTCM>; memory-region = <_0_fw_image>; }; - r5f-1 { + r5f@1 { compatible = "xlnx,zynqmp-r5f"; - power-domains = <_firmware PD_RPU_1>; + reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>; + reg-names = "atcm0", "btcm0"; + power-domains = <_firmware PD_RPU_1>, + <_firmware PD_R5_1_ATCM>, + <_firmware PD_R5_1_BTCM>; memory-region = <_1_fw_image>; }; }; -- 2.25.1
[PATCH v11 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings
From: Radhey Shyam Pandey Introduce bindings for TCM memory address space on AMD-xilinx Zynq UltraScale+ platform. It will help in defining TCM in device-tree and make it's access platform agnostic and data-driven. Tightly-coupled memories(TCMs) are low-latency memory that provides predictable instruction execution and predictable data load/store timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory banks on the ATCM and BTCM ports, for a total of 128 KB of memory. The TCM resources(reg, reg-names and power-domain) are documented for each TCM in the R5 node. The reg and reg-names are made as required properties as we don't want to hardcode TCM addresses for future platforms and for zu+ legacy implementation will ensure that the old dts w/o reg/reg-names works and stable ABI is maintained. It also extends the examples for TCM split and lockstep modes. Signed-off-by: Radhey Shyam Pandey Signed-off-by: Tanmay Shah --- Changes in v11: - Fix yamllint warning and reduce indentation as needed .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 -- 1 file changed, 170 insertions(+), 22 deletions(-) diff --git a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml index 78aac69f1060..77030edf41fa 100644 --- a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml +++ b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml @@ -20,9 +20,21 @@ properties: compatible: const: xlnx,zynqmp-r5fss + "#address-cells": +const: 2 + + "#size-cells": +const: 2 + + ranges: +description: | + Standard ranges definition providing address translations for + local R5F TCM address spaces to bus addresses. + xlnx,cluster-mode: $ref: /schemas/types.yaml#/definitions/uint32 enum: [0, 1, 2] +default: 1 description: | The RPU MPCore can operate in split mode (Dual-processor performance), Safety lock-step mode(Both RPU cores execute the same code in lock-step, @@ -37,7 +49,7 @@ properties: 2: single cpu mode patternProperties: - "^r5f-[a-f0-9]+$": + "^r5f@[0-9a-f]+$": type: object description: | The RPU is located in the Low Power Domain of the Processor Subsystem. @@ -54,9 +66,6 @@ patternProperties: compatible: const: xlnx,zynqmp-r5f - power-domains: -maxItems: 1 - mboxes: minItems: 1 items: @@ -101,35 +110,174 @@ patternProperties: required: - compatible - - power-domains -unevaluatedProperties: false +allOf: + - if: + properties: +xlnx,cluster-mode: + enum: +- 1 +then: + patternProperties: +"^r5f@[0-9a-f]+$": + type: object + + properties: +reg: + minItems: 1 + items: +- description: ATCM internal memory +- description: BTCM internal memory +- description: extra ATCM memory in lockstep mode +- description: extra BTCM memory in lockstep mode + +reg-names: + minItems: 1 + items: +- const: atcm0 +- const: btcm0 +- const: atcm1 +- const: btcm1 + +power-domains: + minItems: 2 + maxItems: 5 + + required: +- reg +- reg-names +- power-domains + +else: + patternProperties: +"^r5f@[0-9a-f]+$": + type: object + + properties: +reg: + minItems: 1 + items: +- description: ATCM internal memory +- description: BTCM internal memory + +reg-names: + minItems: 1 + items: +- const: atcm0 +- const: btcm0 + +power-domains: + minItems: 2 + maxItems: 3 + + required: +- reg +- reg-names +- power-domains required: - compatible + - "#address-cells" + - "#size-cells" + - ranges additionalProperties: false examples: - | -remoteproc { -compatible = "xlnx,zynqmp-r5fss"; -xlnx,cluster-mode = <1>; - -r5f-0 { -compatible = "xlnx,zynqmp-r5f"; -power-domains = <_firmware 0x7>; -memory-region = <_0_fw_image>, <>, <>, <>; -mboxes = <_mailbox_rpu0 0>, <_mailbox_rpu0 1>; -mbox-names = "tx", "rx"; +#include + +// Split mode configuration +soc { +#address-cells = <2>; +#size-cells = <2>; + +remoteproc@ffe0 { +compatible = "xlnx,zynqmp-r5fss"; +xlnx,cluster-mode = <0>; + +#address-cells = <2>; +#size-cells = <2>; +ranges = <0x0 0x0 0x0
[PATCH v11 0/4] add zynqmp TCM bindings
Tightly-Coupled Memories(TCMs) are low-latency memory that provides predictable instruction execution and predictable data load/store timing. Each Cortex-R5F processor contains exclusive two 64 KB memory banks on the ATCM and BTCM ports, for a total of 128 KB of memory. In lockstep mode, both 128KB memory is accessible to the cluster. As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following is address space of TCM memory. The bindings in this patch series introduces properties to accommodate following address space with address translation between Linux and Cortex-R5 views. | | | | | --- | --- | --- | | *Mode*| *R5 View* | *Linux view* | Notes | | *Split Mode* | *start addr*| *start addr* | | | R5_0 ATCM (64 KB) | 0x_ | 0xFFE0_ | | | R5_0 BTCM (64 KB) | 0x0002_ | 0xFFE2_ | | | R5_1 ATCM (64 KB) | 0x_ | 0xFFE9_ | alias of 0xFFE1_ | | R5_1 BTCM (64 KB) | 0x0002_ | 0xFFEB_ | alias of 0xFFE3_ | | ___ | ___ |___ | | | *Lockstep Mode*| | | | | R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_ | | | R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_ | | References: UG1085 TCM address space: https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map Changes in v11: - Fix yamllint warning and reduce indentation as needed - Remove redundant initialization of the variable - Return correct error code if memory allocation failed Changs in v10: - Add new patch (1/4) to series that changes hardcode TCM addresses in lockstep mode and removes separate handling of TCM in lockstep and split mode - modify number of "reg", "reg-names" and "power-domains" entries based on cluster mode - Add extra optional atcm and btcm in "reg" property for lockstep mode - Add "reg-names" for extra optional atcm and btcm for lockstep mode - Drop previous Ack as bindings has new change - Add individual tcm regions via "reg" and "reg-names" for lockstep mode - Add each tcm's power-domains in lockstep mode - Drop previous Ack as new change in dts patchset - Remove redundant changes in driver to handle TCM in lockstep mode Changes in v9: - Fix rproc lockstep dts - Introduce new API to request and release core1 TCM power-domains in lockstep mode. This will be used during prepare -> add_tcm_banks callback to enable TCM in lockstep mode. - Parse TCM from device-tree in lockstep mode and split mode in uniform way. - Fix TCM representation in device-tree in lockstep mode. - Fix comments as suggested Changes in v8: - Remove use of pm_domains framework - Remove checking of pm_domain_id validation to power on/off tcm - Remove spurious change - parse power-domains property from device-tree and use EEMI calls to power on/off TCM instead of using pm domains framework Changes in v7: - %s/pm_dev1/pm_dev_core0/r - %s/pm_dev_link1/pm_dev_core0_link/r - %s/pm_dev2/pm_dev_core1/r - %s/pm_dev_link2/pm_dev_core1_link/r - remove pm_domain_id check to move next patch - add comment about how 1st entry in pm domain list is used - fix loop when jump to fail_add_pm_domains loop - move checking of pm_domain_id from previous patch - fix mem_bank_data memory allocation Changes in v6: - Introduce new node entry for r5f cluster split mode dts and keep it disabled by default. - Keep remoteproc lockstep mode enabled by default to maintian back compatibility. - Enable split mode only for zcu102 board to demo split mode use - Remove spurious change - Handle errors in add_pm_domains function - Remove redundant code to handle errors from remove_pm_domains - Missing . at the end of the commit message - remove redundant initialization of variables - remove fail_tcm label and relevant code to free memory acquired using devm_* API. As this will be freed when device free it - add extra check to see if "reg" property is supported or not Changes in v5: - maintain Rob's Ack on bindings patch as no changes in bindings - split previous patch into multiple patches - Use pm domain framework to turn on/off TCM - Add support of parsing TCM information from device-tree - maintain backward compatibility with previous bindings without TCM information available in device-tree This patch series continues previous effort to upstream ZynqMP TCM bindings: Previous v4 version link: https://lore.kernel.org/all/20230829181900.2561194-1-tanmay.s...@amd.com/ Previous v3 version link: https://lore.kernel.org/all/1689964908-22371-1-git-send-email-radhey.shyam.pan...@amd.com/ Radhey Shyam Pandey (1): dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings Radhey Shyam Pandey (1): dt-bindings:
[PATCH v11 1/4] remoteproc: zynqmp: fix lockstep mode memory region
In lockstep mode, r5 core0 uses TCM of R5 core1. Following is lockstep mode memory region as per hardware reference manual. | *TCM* | *R5 View* | *Linux view* | | R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_ | | R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_ | However, driver shouldn't model it as above because R5 core0 TCM and core1 TCM has different power-domains mapped to it. Hence, TCM address space in lockstep mode should be modeled as 64KB regions only where each region has its own power-domain as following: | *TCM* | *R5 View* | *Linux view* | | R5_0 ATCM0 (64 KB) | 0x_ | 0xFFE0_ | | R5_0 BTCM0 (64 KB) | 0x0002_ | 0xFFE2_ | | R5_0 ATCM1 (64 KB) | 0x0001_ | 0xFFE1_ | | R5_0 BTCM1 (64 KB) | 0x0003_ | 0xFFE3_ | This makes driver maintanance easy and makes design robust for future platorms as well. Signed-off-by: Tanmay Shah --- drivers/remoteproc/xlnx_r5_remoteproc.c | 145 ++-- 1 file changed, 12 insertions(+), 133 deletions(-) diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c index 4395edea9a64..42b0384d34f2 100644 --- a/drivers/remoteproc/xlnx_r5_remoteproc.c +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c @@ -84,12 +84,12 @@ static const struct mem_bank_data zynqmp_tcm_banks_split[] = { {0xffebUL, 0x2, 0x1UL, PD_R5_1_BTCM, "btcm1"}, }; -/* In lockstep mode cluster combines each 64KB TCM and makes 128KB TCM */ +/* In lockstep mode cluster uses each 64KB TCM from second core as well */ static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = { - {0xffe0UL, 0x0, 0x2UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 128KB each */ - {0xffe2UL, 0x2, 0x2UL, PD_R5_0_BTCM, "btcm0"}, - {0, 0, 0, PD_R5_1_ATCM, ""}, - {0, 0, 0, PD_R5_1_BTCM, ""}, + {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each */ + {0xffe2UL, 0x2, 0x1UL, PD_R5_0_BTCM, "btcm0"}, + {0xffe1UL, 0x1, 0x1UL, PD_R5_1_ATCM, "atcm1"}, + {0xffe3UL, 0x3, 0x1UL, PD_R5_1_BTCM, "btcm1"}, }; /** @@ -540,14 +540,14 @@ static int tcm_mem_map(struct rproc *rproc, } /* - * add_tcm_carveout_split_mode() + * add_tcm_banks() * @rproc: single R5 core's corresponding rproc instance * - * allocate and add remoteproc carveout for TCM memory in split mode + * allocate and add remoteproc carveout for TCM memory * * return 0 on success, otherwise non-zero value on failure */ -static int add_tcm_carveout_split_mode(struct rproc *rproc) +static int add_tcm_banks(struct rproc *rproc) { struct rproc_mem_entry *rproc_mem; struct zynqmp_r5_core *r5_core; @@ -580,10 +580,10 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) ZYNQMP_PM_REQUEST_ACK_BLOCKING); if (ret < 0) { dev_err(dev, "failed to turn on TCM 0x%x", pm_domain_id); - goto release_tcm_split; + goto release_tcm; } - dev_dbg(dev, "TCM carveout split mode %s addr=%llx, da=0x%x, size=0x%lx", + dev_dbg(dev, "TCM carveout %s addr=%llx, da=0x%x, size=0x%lx", bank_name, bank_addr, da, bank_size); rproc_mem = rproc_mem_entry_init(dev, NULL, bank_addr, @@ -593,7 +593,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) if (!rproc_mem) { ret = -ENOMEM; zynqmp_pm_release_node(pm_domain_id); - goto release_tcm_split; + goto release_tcm; } rproc_add_carveout(rproc, rproc_mem); @@ -601,7 +601,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) return 0; -release_tcm_split: +release_tcm: /* If failed, Turn off all TCM banks turned on before */ for (i--; i >= 0; i--) { pm_domain_id = r5_core->tcm_banks[i]->pm_domain_id; @@ -610,127 +610,6 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) return ret; } -/* - * add_tcm_carveout_lockstep_mode() - * @rproc: single R5 core's corresponding rproc instance - * - * allocate and add remoteproc carveout for TCM memory in lockstep mode - * - * return 0 on success, otherwise non-zero value on failure - */ -static int add_tcm_carveout_lockstep_mode(struct rproc *rproc) -{ - struct rproc_mem_entry *rproc_mem; - struct zynqmp_r5_core *r5_core; - int i, num_banks, ret; - phys_addr_t bank_addr; - size_t bank_size = 0; - struct device *dev; - u32 pm_domain_id; - char *bank_name; - u32 da; - - r5_core = rproc->priv; - dev = r5_core->dev; - - /* Go through zynqmp banks for r5 node */ - num_banks =
[PATCH v11 4/4] remoteproc: zynqmp: parse TCM from device tree
ZynqMP TCM information was fixed in driver. Now ZynqMP TCM information is available in device-tree. Parse TCM information in driver as per new bindings. Signed-off-by: Tanmay Shah --- Changes in v11: - Remove redundant initialization of the variable - return correct error code if memory allocation failed drivers/remoteproc/xlnx_r5_remoteproc.c | 112 ++-- 1 file changed, 107 insertions(+), 5 deletions(-) diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c index 42b0384d34f2..d4a22caebaad 100644 --- a/drivers/remoteproc/xlnx_r5_remoteproc.c +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c @@ -74,8 +74,8 @@ struct mbox_info { }; /* - * Hardcoded TCM bank values. This will be removed once TCM bindings are - * accepted for system-dt specifications and upstreamed in linux kernel + * Hardcoded TCM bank values. This will stay in driver to maintain backward + * compatibility with device-tree that does not have TCM information. */ static const struct mem_bank_data zynqmp_tcm_banks_split[] = { {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each */ @@ -757,6 +757,103 @@ static struct zynqmp_r5_core *zynqmp_r5_add_rproc_core(struct device *cdev) return ERR_PTR(ret); } +static int zynqmp_r5_get_tcm_node_from_dt(struct zynqmp_r5_cluster *cluster) +{ + int i, j, tcm_bank_count, ret, tcm_pd_idx, pd_count; + struct of_phandle_args out_args; + struct zynqmp_r5_core *r5_core; + struct platform_device *cpdev; + struct mem_bank_data *tcm; + struct device_node *np; + struct resource *res; + u64 abs_addr, size; + struct device *dev; + + for (i = 0; i < cluster->core_count; i++) { + r5_core = cluster->r5_cores[i]; + dev = r5_core->dev; + np = r5_core->np; + + pd_count = of_count_phandle_with_args(np, "power-domains", + "#power-domain-cells"); + + if (pd_count <= 0) { + dev_err(dev, "invalid power-domains property, %d\n", pd_count); + return -EINVAL; + } + + /* First entry in power-domains list is for r5 core, rest for TCM. */ + tcm_bank_count = pd_count - 1; + + if (tcm_bank_count <= 0) { + dev_err(dev, "invalid TCM count %d\n", tcm_bank_count); + return -EINVAL; + } + + r5_core->tcm_banks = devm_kcalloc(dev, tcm_bank_count, + sizeof(struct mem_bank_data *), + GFP_KERNEL); + if (!r5_core->tcm_banks) + return -ENOMEM; + + r5_core->tcm_bank_count = tcm_bank_count; + for (j = 0, tcm_pd_idx = 1; j < tcm_bank_count; j++, tcm_pd_idx++) { + tcm = devm_kzalloc(dev, sizeof(struct mem_bank_data), + GFP_KERNEL); + if (!tcm) + return -ENOMEM; + + r5_core->tcm_banks[j] = tcm; + + /* Get power-domains id of TCM. */ + ret = of_parse_phandle_with_args(np, "power-domains", +"#power-domain-cells", +tcm_pd_idx, _args); + if (ret) { + dev_err(r5_core->dev, + "failed to get tcm %d pm domain, ret %d\n", + tcm_pd_idx, ret); + return ret; + } + tcm->pm_domain_id = out_args.args[0]; + of_node_put(out_args.np); + + /* Get TCM address without translation. */ + ret = of_property_read_reg(np, j, _addr, ); + if (ret) { + dev_err(dev, "failed to get reg property\n"); + return ret; + } + + /* +* Remote processor can address only 32 bits +* so convert 64-bits into 32-bits. This will discard +* any unwanted upper 32-bits. +*/ + tcm->da = (u32)abs_addr; + tcm->size = (u32)size; + + cpdev = to_platform_device(dev); + res = platform_get_resource(cpdev, IORESOURCE_MEM, j); + if (!res) { + dev_err(dev, "failed to get tcm resource\n"); + return -EINVAL; + } + +
[PATCH v11 3/4] dts: zynqmp: add properties for TCM in remoteproc
Add properties as per new bindings in zynqmp remoteproc node to represent TCM address and size. This patch also adds alternative remoteproc node to represent remoteproc cluster in split mode. By default lockstep mode is enabled and users should disable it before using split mode dts. Both device-tree nodes can't be used simultaneously one of them must be disabled. For zcu102-1.0 and zcu102-1.1 board remoteproc split mode dts node is enabled and lockstep mode dts is disabled. Signed-off-by: Tanmay Shah --- .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts | 8 +++ arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 65 +-- 2 files changed, 68 insertions(+), 5 deletions(-) diff --git a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts index c8f71a1aec89..495ca94b45db 100644 --- a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts +++ b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts @@ -14,6 +14,14 @@ / { compatible = "xlnx,zynqmp-zcu102-rev1.0", "xlnx,zynqmp-zcu102", "xlnx,zynqmp"; }; +_split { + status = "okay"; +}; + +_lockstep { + status = "disabled"; +}; + { #address-cells = <1>; #size-cells = <1>; diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi index eaba466804bc..c8a7fd0f3a1e 100644 --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi @@ -248,19 +248,74 @@ fpga_full: fpga-full { ranges; }; - remoteproc { + rproc_lockstep: remoteproc@ffe0 { compatible = "xlnx,zynqmp-r5fss"; xlnx,cluster-mode = <1>; - r5f-0 { + #address-cells = <2>; + #size-cells = <2>; + + ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>, +<0x0 0x2 0x0 0xffe2 0x0 0x1>, +<0x0 0x1 0x0 0xffe1 0x0 0x1>, +<0x0 0x3 0x0 0xffe3 0x0 0x1>; + + r5f@0 { + compatible = "xlnx,zynqmp-r5f"; + reg = <0x0 0x0 0x0 0x1>, + <0x0 0x2 0x0 0x1>, + <0x0 0x1 0x0 0x1>, + <0x0 0x3 0x0 0x1>; + reg-names = "atcm0", "btcm0", "atcm1", "btcm1"; + power-domains = <_firmware PD_RPU_0>, + <_firmware PD_R5_0_ATCM>, + <_firmware PD_R5_0_BTCM>, + <_firmware PD_R5_1_ATCM>, + <_firmware PD_R5_1_BTCM>; + memory-region = <_0_fw_image>; + }; + + r5f@1 { + compatible = "xlnx,zynqmp-r5f"; + reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>; + reg-names = "atcm0", "btcm0"; + power-domains = <_firmware PD_RPU_1>, + <_firmware PD_R5_1_ATCM>, + <_firmware PD_R5_1_BTCM>; + memory-region = <_1_fw_image>; + }; + }; + + rproc_split: remoteproc-split@ffe0 { + status = "disabled"; + compatible = "xlnx,zynqmp-r5fss"; + xlnx,cluster-mode = <0>; + + #address-cells = <2>; + #size-cells = <2>; + + ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>, +<0x0 0x2 0x0 0xffe2 0x0 0x1>, +<0x1 0x0 0x0 0xffe9 0x0 0x1>, +<0x1 0x2 0x0 0xffeb 0x0 0x1>; + + r5f@0 { compatible = "xlnx,zynqmp-r5f"; - power-domains = <_firmware PD_RPU_0>; + reg = <0x0 0x0 0x0 0x1>, <0x0 0x2 0x0 0x1>; + reg-names = "atcm0", "btcm0"; + power-domains = <_firmware PD_RPU_0>, + <_firmware PD_R5_0_ATCM>, + <_firmware PD_R5_0_BTCM>; memory-region = <_0_fw_image>; }; - r5f-1 { + r5f@1 { compatible = "xlnx,zynqmp-r5f"; - power-domains = <_firmware PD_RPU_1>; + reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>; + reg-names = "atcm0", "btcm0"; + power-domains = <_firmware PD_RPU_1>, + <_firmware PD_R5_1_ATCM>, + <_firmware PD_R5_1_BTCM>; memory-region = <_1_fw_image>; }; }; -- 2.25.1
[PATCH v11 1/4] remoteproc: zynqmp: fix lockstep mode memory region
In lockstep mode, r5 core0 uses TCM of R5 core1. Following is lockstep mode memory region as per hardware reference manual. | *TCM* | *R5 View* | *Linux view* | | R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_ | | R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_ | However, driver shouldn't model it as above because R5 core0 TCM and core1 TCM has different power-domains mapped to it. Hence, TCM address space in lockstep mode should be modeled as 64KB regions only where each region has its own power-domain as following: | *TCM* | *R5 View* | *Linux view* | | R5_0 ATCM0 (64 KB) | 0x_ | 0xFFE0_ | | R5_0 BTCM0 (64 KB) | 0x0002_ | 0xFFE2_ | | R5_0 ATCM1 (64 KB) | 0x0001_ | 0xFFE1_ | | R5_0 BTCM1 (64 KB) | 0x0003_ | 0xFFE3_ | This makes driver maintanance easy and makes design robust for future platorms as well. Signed-off-by: Tanmay Shah --- drivers/remoteproc/xlnx_r5_remoteproc.c | 145 ++-- 1 file changed, 12 insertions(+), 133 deletions(-) diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c index 4395edea9a64..42b0384d34f2 100644 --- a/drivers/remoteproc/xlnx_r5_remoteproc.c +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c @@ -84,12 +84,12 @@ static const struct mem_bank_data zynqmp_tcm_banks_split[] = { {0xffebUL, 0x2, 0x1UL, PD_R5_1_BTCM, "btcm1"}, }; -/* In lockstep mode cluster combines each 64KB TCM and makes 128KB TCM */ +/* In lockstep mode cluster uses each 64KB TCM from second core as well */ static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = { - {0xffe0UL, 0x0, 0x2UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 128KB each */ - {0xffe2UL, 0x2, 0x2UL, PD_R5_0_BTCM, "btcm0"}, - {0, 0, 0, PD_R5_1_ATCM, ""}, - {0, 0, 0, PD_R5_1_BTCM, ""}, + {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each */ + {0xffe2UL, 0x2, 0x1UL, PD_R5_0_BTCM, "btcm0"}, + {0xffe1UL, 0x1, 0x1UL, PD_R5_1_ATCM, "atcm1"}, + {0xffe3UL, 0x3, 0x1UL, PD_R5_1_BTCM, "btcm1"}, }; /** @@ -540,14 +540,14 @@ static int tcm_mem_map(struct rproc *rproc, } /* - * add_tcm_carveout_split_mode() + * add_tcm_banks() * @rproc: single R5 core's corresponding rproc instance * - * allocate and add remoteproc carveout for TCM memory in split mode + * allocate and add remoteproc carveout for TCM memory * * return 0 on success, otherwise non-zero value on failure */ -static int add_tcm_carveout_split_mode(struct rproc *rproc) +static int add_tcm_banks(struct rproc *rproc) { struct rproc_mem_entry *rproc_mem; struct zynqmp_r5_core *r5_core; @@ -580,10 +580,10 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) ZYNQMP_PM_REQUEST_ACK_BLOCKING); if (ret < 0) { dev_err(dev, "failed to turn on TCM 0x%x", pm_domain_id); - goto release_tcm_split; + goto release_tcm; } - dev_dbg(dev, "TCM carveout split mode %s addr=%llx, da=0x%x, size=0x%lx", + dev_dbg(dev, "TCM carveout %s addr=%llx, da=0x%x, size=0x%lx", bank_name, bank_addr, da, bank_size); rproc_mem = rproc_mem_entry_init(dev, NULL, bank_addr, @@ -593,7 +593,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) if (!rproc_mem) { ret = -ENOMEM; zynqmp_pm_release_node(pm_domain_id); - goto release_tcm_split; + goto release_tcm; } rproc_add_carveout(rproc, rproc_mem); @@ -601,7 +601,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) return 0; -release_tcm_split: +release_tcm: /* If failed, Turn off all TCM banks turned on before */ for (i--; i >= 0; i--) { pm_domain_id = r5_core->tcm_banks[i]->pm_domain_id; @@ -610,127 +610,6 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc) return ret; } -/* - * add_tcm_carveout_lockstep_mode() - * @rproc: single R5 core's corresponding rproc instance - * - * allocate and add remoteproc carveout for TCM memory in lockstep mode - * - * return 0 on success, otherwise non-zero value on failure - */ -static int add_tcm_carveout_lockstep_mode(struct rproc *rproc) -{ - struct rproc_mem_entry *rproc_mem; - struct zynqmp_r5_core *r5_core; - int i, num_banks, ret; - phys_addr_t bank_addr; - size_t bank_size = 0; - struct device *dev; - u32 pm_domain_id; - char *bank_name; - u32 da; - - r5_core = rproc->priv; - dev = r5_core->dev; - - /* Go through zynqmp banks for r5 node */ - num_banks =
[PATCH v11 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings
From: Radhey Shyam Pandey Introduce bindings for TCM memory address space on AMD-xilinx Zynq UltraScale+ platform. It will help in defining TCM in device-tree and make it's access platform agnostic and data-driven. Tightly-coupled memories(TCMs) are low-latency memory that provides predictable instruction execution and predictable data load/store timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory banks on the ATCM and BTCM ports, for a total of 128 KB of memory. The TCM resources(reg, reg-names and power-domain) are documented for each TCM in the R5 node. The reg and reg-names are made as required properties as we don't want to hardcode TCM addresses for future platforms and for zu+ legacy implementation will ensure that the old dts w/o reg/reg-names works and stable ABI is maintained. It also extends the examples for TCM split and lockstep modes. Signed-off-by: Radhey Shyam Pandey Signed-off-by: Tanmay Shah --- Changes in v11: - Fix yamllint warning and reduce indentation as needed .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 -- 1 file changed, 170 insertions(+), 22 deletions(-) diff --git a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml index 78aac69f1060..77030edf41fa 100644 --- a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml +++ b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml @@ -20,9 +20,21 @@ properties: compatible: const: xlnx,zynqmp-r5fss + "#address-cells": +const: 2 + + "#size-cells": +const: 2 + + ranges: +description: | + Standard ranges definition providing address translations for + local R5F TCM address spaces to bus addresses. + xlnx,cluster-mode: $ref: /schemas/types.yaml#/definitions/uint32 enum: [0, 1, 2] +default: 1 description: | The RPU MPCore can operate in split mode (Dual-processor performance), Safety lock-step mode(Both RPU cores execute the same code in lock-step, @@ -37,7 +49,7 @@ properties: 2: single cpu mode patternProperties: - "^r5f-[a-f0-9]+$": + "^r5f@[0-9a-f]+$": type: object description: | The RPU is located in the Low Power Domain of the Processor Subsystem. @@ -54,9 +66,6 @@ patternProperties: compatible: const: xlnx,zynqmp-r5f - power-domains: -maxItems: 1 - mboxes: minItems: 1 items: @@ -101,35 +110,174 @@ patternProperties: required: - compatible - - power-domains -unevaluatedProperties: false +allOf: + - if: + properties: +xlnx,cluster-mode: + enum: +- 1 +then: + patternProperties: +"^r5f@[0-9a-f]+$": + type: object + + properties: +reg: + minItems: 1 + items: +- description: ATCM internal memory +- description: BTCM internal memory +- description: extra ATCM memory in lockstep mode +- description: extra BTCM memory in lockstep mode + +reg-names: + minItems: 1 + items: +- const: atcm0 +- const: btcm0 +- const: atcm1 +- const: btcm1 + +power-domains: + minItems: 2 + maxItems: 5 + + required: +- reg +- reg-names +- power-domains + +else: + patternProperties: +"^r5f@[0-9a-f]+$": + type: object + + properties: +reg: + minItems: 1 + items: +- description: ATCM internal memory +- description: BTCM internal memory + +reg-names: + minItems: 1 + items: +- const: atcm0 +- const: btcm0 + +power-domains: + minItems: 2 + maxItems: 3 + + required: +- reg +- reg-names +- power-domains required: - compatible + - "#address-cells" + - "#size-cells" + - ranges additionalProperties: false examples: - | -remoteproc { -compatible = "xlnx,zynqmp-r5fss"; -xlnx,cluster-mode = <1>; - -r5f-0 { -compatible = "xlnx,zynqmp-r5f"; -power-domains = <_firmware 0x7>; -memory-region = <_0_fw_image>, <>, <>, <>; -mboxes = <_mailbox_rpu0 0>, <_mailbox_rpu0 1>; -mbox-names = "tx", "rx"; +#include + +// Split mode configuration +soc { +#address-cells = <2>; +#size-cells = <2>; + +remoteproc@ffe0 { +compatible = "xlnx,zynqmp-r5fss"; +xlnx,cluster-mode = <0>; + +#address-cells = <2>; +#size-cells = <2>; +ranges = <0x0 0x0 0x0
[PATCH v11 0/4] add zynqmp TCM bindings
Tightly-Coupled Memories(TCMs) are low-latency memory that provides predictable instruction execution and predictable data load/store timing. Each Cortex-R5F processor contains exclusive two 64 KB memory banks on the ATCM and BTCM ports, for a total of 128 KB of memory. In lockstep mode, both 128KB memory is accessible to the cluster. As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following is address space of TCM memory. The bindings in this patch series introduces properties to accommodate following address space with address translation between Linux and Cortex-R5 views. | | | | | --- | --- | --- | | *Mode*| *R5 View* | *Linux view* | Notes | | *Split Mode* | *start addr*| *start addr* | | | R5_0 ATCM (64 KB) | 0x_ | 0xFFE0_ | | | R5_0 BTCM (64 KB) | 0x0002_ | 0xFFE2_ | | | R5_1 ATCM (64 KB) | 0x_ | 0xFFE9_ | alias of 0xFFE1_ | | R5_1 BTCM (64 KB) | 0x0002_ | 0xFFEB_ | alias of 0xFFE3_ | | ___ | ___ |___ | | | *Lockstep Mode*| | | | | R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_ | | | R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_ | | References: UG1085 TCM address space: https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map Changes in v11: - Fix yamllint warning and reduce indentation as needed - Remove redundant initialization of the variable - Return correct error code if memory allocation failed Changs in v10: - Add new patch (1/4) to series that changes hardcode TCM addresses in lockstep mode and removes separate handling of TCM in lockstep and split mode - modify number of "reg", "reg-names" and "power-domains" entries based on cluster mode - Add extra optional atcm and btcm in "reg" property for lockstep mode - Add "reg-names" for extra optional atcm and btcm for lockstep mode - Drop previous Ack as bindings has new change - Add individual tcm regions via "reg" and "reg-names" for lockstep mode - Add each tcm's power-domains in lockstep mode - Drop previous Ack as new change in dts patchset - Remove redundant changes in driver to handle TCM in lockstep mode Changes in v9: - Fix rproc lockstep dts - Introduce new API to request and release core1 TCM power-domains in lockstep mode. This will be used during prepare -> add_tcm_banks callback to enable TCM in lockstep mode. - Parse TCM from device-tree in lockstep mode and split mode in uniform way. - Fix TCM representation in device-tree in lockstep mode. - Fix comments as suggested Changes in v8: - Remove use of pm_domains framework - Remove checking of pm_domain_id validation to power on/off tcm - Remove spurious change - parse power-domains property from device-tree and use EEMI calls to power on/off TCM instead of using pm domains framework Changes in v7: - %s/pm_dev1/pm_dev_core0/r - %s/pm_dev_link1/pm_dev_core0_link/r - %s/pm_dev2/pm_dev_core1/r - %s/pm_dev_link2/pm_dev_core1_link/r - remove pm_domain_id check to move next patch - add comment about how 1st entry in pm domain list is used - fix loop when jump to fail_add_pm_domains loop - move checking of pm_domain_id from previous patch - fix mem_bank_data memory allocation Changes in v6: - Introduce new node entry for r5f cluster split mode dts and keep it disabled by default. - Keep remoteproc lockstep mode enabled by default to maintian back compatibility. - Enable split mode only for zcu102 board to demo split mode use - Remove spurious change - Handle errors in add_pm_domains function - Remove redundant code to handle errors from remove_pm_domains - Missing . at the end of the commit message - remove redundant initialization of variables - remove fail_tcm label and relevant code to free memory acquired using devm_* API. As this will be freed when device free it - add extra check to see if "reg" property is supported or not Changes in v5: - maintain Rob's Ack on bindings patch as no changes in bindings - split previous patch into multiple patches - Use pm domain framework to turn on/off TCM - Add support of parsing TCM information from device-tree - maintain backward compatibility with previous bindings without TCM information available in device-tree This patch series continues previous effort to upstream ZynqMP TCM bindings: Previous v4 version link: https://lore.kernel.org/all/20230829181900.2561194-1-tanmay.s...@amd.com/ Previous v3 version link: https://lore.kernel.org/all/1689964908-22371-1-git-send-email-radhey.shyam.pan...@amd.com/ Radhey Shyam Pandey (1): dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings Radhey Shyam Pandey (1): dt-bindings:
Re: [v2 PATCH 0/1] ALSA: virtio: add support for audio controls
Thanks Anton for the reupload. I tested this series with a 6.1 kernel guest on a proprietary hypervisor. The controls exposed by the host (BOOLEAN/INTEGER ones, as that was all I could get) worked as expected when adjusted via ALSA APIs. Reviewed-by: Marcin Radomski Tested-By: Marcin Radomski
[PATCH] vduse: implement DMA sync callbacks
Since commit 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers"), VDUSE device require support for DMA's .sync_single_for_cpu() operation as the memory is non-coherent between the device and CPU because of the use of a bounce buffer. This patch implements both .sync_single_for_cpu() and .sync_single_for_device() callbacks, and also skip bounce buffer copies during DMA map and unmap operations if the DMA_ATTR_SKIP_CPU_SYNC attribute is set to avoid extra copies of the same buffer. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/iova_domain.c | 27 --- drivers/vdpa/vdpa_user/iova_domain.h | 8 drivers/vdpa/vdpa_user/vduse_dev.c | 22 ++ 3 files changed, 54 insertions(+), 3 deletions(-) diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c index 5e4a77b9bae6..791d38d6284c 100644 --- a/drivers/vdpa/vdpa_user/iova_domain.c +++ b/drivers/vdpa/vdpa_user/iova_domain.c @@ -373,6 +373,26 @@ static void vduse_domain_free_iova(struct iova_domain *iovad, free_iova_fast(iovad, iova >> shift, iova_len); } +void vduse_domain_sync_single_for_device(struct vduse_iova_domain *domain, + dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir) +{ + read_lock(>bounce_lock); + if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) + vduse_domain_bounce(domain, dma_addr, size, DMA_TO_DEVICE); + read_unlock(>bounce_lock); +} + +void vduse_domain_sync_single_for_cpu(struct vduse_iova_domain *domain, + dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir) +{ + read_lock(>bounce_lock); + if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) + vduse_domain_bounce(domain, dma_addr, size, DMA_FROM_DEVICE); + read_unlock(>bounce_lock); +} + dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, @@ -393,7 +413,8 @@ dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain, if (vduse_domain_map_bounce_page(domain, (u64)iova, (u64)size, pa)) goto err_unlock; - if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && + (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) vduse_domain_bounce(domain, iova, size, DMA_TO_DEVICE); read_unlock(>bounce_lock); @@ -411,9 +432,9 @@ void vduse_domain_unmap_page(struct vduse_iova_domain *domain, enum dma_data_direction dir, unsigned long attrs) { struct iova_domain *iovad = >stream_iovad; - read_lock(>bounce_lock); - if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && + (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) vduse_domain_bounce(domain, dma_addr, size, DMA_FROM_DEVICE); vduse_domain_unmap_bounce_page(domain, (u64)dma_addr, (u64)size); diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h index 173e979b84a9..f92f22a7267d 100644 --- a/drivers/vdpa/vdpa_user/iova_domain.h +++ b/drivers/vdpa/vdpa_user/iova_domain.h @@ -44,6 +44,14 @@ int vduse_domain_set_map(struct vduse_iova_domain *domain, void vduse_domain_clear_map(struct vduse_iova_domain *domain, struct vhost_iotlb *iotlb); +void vduse_domain_sync_single_for_device(struct vduse_iova_domain *domain, + dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir); + +void vduse_domain_sync_single_for_cpu(struct vduse_iova_domain *domain, + dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir); + dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 1d24da79c399..75354ce186a1 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -798,6 +798,26 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = { .free = vduse_vdpa_free, }; +static void vduse_dev_sync_single_for_device(struct device *dev, +dma_addr_t dma_addr, size_t size, +enum dma_data_direction dir) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); +
Re: [PATCH v7 22/36] function_graph: Add a new entry handler with parent_ip and ftrace_regs
On Wed, 7 Feb 2024 00:11:34 +0900 "Masami Hiramatsu (Google)" wrote: > From: Masami Hiramatsu (Google) > > Add a new entry handler to fgraph_ops as 'entryregfunc' which takes > parent_ip and ftrace_regs. Note that the 'entryfunc' and 'entryregfunc' > are mutual exclusive. You can set only one of them. > > Signed-off-by: Masami Hiramatsu (Google) > --- > Changes in v3: > - Update for new multiple fgraph. > --- > arch/arm64/kernel/ftrace.c |2 + > arch/loongarch/kernel/ftrace_dyn.c |2 + > arch/powerpc/kernel/trace/ftrace.c |2 + > arch/powerpc/kernel/trace/ftrace_64_pg.c | 10 --- > arch/x86/kernel/ftrace.c | 42 > -- > include/linux/ftrace.h | 19 +++--- > kernel/trace/fgraph.c| 30 + > 7 files changed, 72 insertions(+), 35 deletions(-) > > diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c > index b96740829798..779b975f03f5 100644 > --- a/arch/arm64/kernel/ftrace.c > +++ b/arch/arm64/kernel/ftrace.c > @@ -497,7 +497,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long > parent_ip, > return; > > if (!function_graph_enter_ops(*parent, ip, frame_pointer, > - (void *)frame_pointer, gops)) > + (void *)frame_pointer, fregs, gops)) I would like to replace that second frame_pointer with fregs. > *parent = (unsigned long)_to_handler; > > ftrace_test_recursion_unlock(bit); > diff --git a/arch/loongarch/kernel/ftrace_dyn.c > b/arch/loongarch/kernel/ftrace_dyn.c > index 81d18b911cc1..45d26c6e6564 100644 > --- a/arch/loongarch/kernel/ftrace_dyn.c > +++ b/arch/loongarch/kernel/ftrace_dyn.c > @@ -250,7 +250,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long > parent_ip, > > old = *parent; > > - if (!function_graph_enter_ops(old, ip, 0, parent, gops)) > + if (!function_graph_enter_ops(old, ip, 0, parent, fregs, gops)) That is, to replace the parent with fregs, as the parent can be retrieved from fregs. We should add a fregs helper (something like): unsigned long *fregs_caller_addr(fregs) { return (unsigned long *)(kernel_stack_pointer(fregs->regs) + PT_R1); } That returns the address that points to the parent caller on the stack. This was on my todo list to do. That is, replace the passing of the parent of the stack with fregs as it is redundant information. > *parent = return_hooker; > } > #else > diff --git a/arch/powerpc/kernel/trace/ftrace.c > b/arch/powerpc/kernel/trace/ftrace.c > index 4ef8bf480279..eeaaa798f4f9 100644 > --- a/arch/powerpc/kernel/trace/ftrace.c > +++ b/arch/powerpc/kernel/trace/ftrace.c > @@ -423,7 +423,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long > parent_ip, > if (bit < 0) > goto out; > > - if (!function_graph_enter_ops(parent_ip, ip, 0, (unsigned long *)sp, > gops)) > + if (!function_graph_enter_ops(parent_ip, ip, 0, (unsigned long *)sp, > fregs, gops)) > parent_ip = ppc_function_entry(return_to_handler); > > ftrace_test_recursion_unlock(bit); > diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c > b/arch/powerpc/kernel/trace/ftrace_64_pg.c > index 7b85c3b460a3..43f6cfaaf7db 100644 > --- a/arch/powerpc/kernel/trace/ftrace_64_pg.c > +++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c > @@ -795,7 +795,8 @@ int ftrace_disable_ftrace_graph_caller(void) > * in current thread info. Return the address we want to divert to. > */ > static unsigned long > -__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned > long sp) > +__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned > long sp, > + struct ftrace_regs *fregs) And sp shouldn't need to be passed in either, as hat should be part of the fregs. I really like to consolidate the parameters and not just keep adding to them. This all slows down the logic to load the parameters. -- Steve > { > unsigned long return_hooker; > int bit;
Re: [RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters
On 2/19/24 07:39, Haitao Huang wrote: > Remove all boolean parameters for 'reclaim' from the function > sgx_alloc_epc_page() and its callers by making two versions of each > function. > > Also opportunistically remove non-static declaration of > __sgx_alloc_epc_page() and a typo > > Signed-off-by: Haitao Huang > Suggested-by: Jarkko Sakkinen > --- > arch/x86/kernel/cpu/sgx/encl.c | 56 +-- > arch/x86/kernel/cpu/sgx/encl.h | 6 ++- > arch/x86/kernel/cpu/sgx/ioctl.c | 23 --- > arch/x86/kernel/cpu/sgx/main.c | 68 ++--- > arch/x86/kernel/cpu/sgx/sgx.h | 4 +- > arch/x86/kernel/cpu/sgx/virt.c | 2 +- > 6 files changed, 115 insertions(+), 44 deletions(-) Jarkko, did this turn out how you expected? I think passing around a function pointer to *only* communicate 1 bit of information is a _bit_ overkill here. Simply replacing the bool with: enum sgx_reclaim { SGX_NO_RECLAIM, SGX_DO_RECLAIM }; would do the same thing. Right? Are you sure you want a function pointer for this?
[PATCH 10/10] csky/vdso: Use generic union vdso_data_store
There is already a generic union definition for vdso_data_store in vdso datapage header. Use this definition to prevent code duplication. Signed-off-by: Anna-Maria Behnsen Cc: Guo Ren Cc: linux-c...@vger.kernel.org --- arch/csky/kernel/vdso.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/csky/kernel/vdso.c b/arch/csky/kernel/vdso.c index e74a2504d331..2ca886e4a458 100644 --- a/arch/csky/kernel/vdso.c +++ b/arch/csky/kernel/vdso.c @@ -15,14 +15,8 @@ extern char vdso_start[], vdso_end[]; static unsigned int vdso_pages; static struct page **vdso_pagelist; -/* - * The vDSO data page. - */ -static union { - struct vdso_datadata; - u8 page[PAGE_SIZE]; -} vdso_data_store __page_aligned_data; -struct vdso_data *vdso_data = _data_store.data; +static union vdso_data_store vdso_data_store __page_aligned_data; +struct vdso_data *vdso_data = vdso_data_store.data; static int __init vdso_init(void) { -- 2.39.2
[PATCH 03/10] csky/vdso: Remove superfluous ifdeffery
CSKY selects GENERIC_TIME_VSYSCALL. GENERIC_TIME_VSYSCALL dependent ifdeffery is superfluous. Clean it up. Signed-off-by: Anna-Maria Behnsen Cc: Guo Ren Cc: linux-c...@vger.kernel.org --- arch/csky/include/asm/vdso.h | 5 - arch/csky/kernel/vdso.c | 4 2 files changed, 9 deletions(-) diff --git a/arch/csky/include/asm/vdso.h b/arch/csky/include/asm/vdso.h index bdce581b5fcb..181a15edafe8 100644 --- a/arch/csky/include/asm/vdso.h +++ b/arch/csky/include/asm/vdso.h @@ -5,11 +5,6 @@ #include -#ifndef GENERIC_TIME_VSYSCALL -struct vdso_data { -}; -#endif - /* * The VDSO symbols are mapped into Linux so we can just use regular symbol * addressing to get their offsets in userspace. The symbols are mapped at an diff --git a/arch/csky/kernel/vdso.c b/arch/csky/kernel/vdso.c index 16c20d64d165..e74a2504d331 100644 --- a/arch/csky/kernel/vdso.c +++ b/arch/csky/kernel/vdso.c @@ -8,11 +8,7 @@ #include #include -#ifdef GENERIC_TIME_VSYSCALL #include -#else -#include -#endif extern char vdso_start[], vdso_end[]; -- 2.39.2
[RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters
Remove all boolean parameters for 'reclaim' from the function sgx_alloc_epc_page() and its callers by making two versions of each function. Also opportunistically remove non-static declaration of __sgx_alloc_epc_page() and a typo Signed-off-by: Haitao Huang Suggested-by: Jarkko Sakkinen --- arch/x86/kernel/cpu/sgx/encl.c | 56 +-- arch/x86/kernel/cpu/sgx/encl.h | 6 ++- arch/x86/kernel/cpu/sgx/ioctl.c | 23 --- arch/x86/kernel/cpu/sgx/main.c | 68 ++--- arch/x86/kernel/cpu/sgx/sgx.h | 4 +- arch/x86/kernel/cpu/sgx/virt.c | 2 +- 6 files changed, 115 insertions(+), 44 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 279148e72459..07f369ae855c 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -217,7 +217,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page, struct sgx_epc_page *epc_page; int ret; - epc_page = sgx_alloc_epc_page(encl_page, false); + epc_page = sgx_alloc_epc_page(encl_page); if (IS_ERR(epc_page)) return epc_page; @@ -359,14 +359,14 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, goto err_out_unlock; } - epc_page = sgx_alloc_epc_page(encl_page, false); + epc_page = sgx_alloc_epc_page(encl_page); if (IS_ERR(epc_page)) { if (PTR_ERR(epc_page) == -EBUSY) vmret = VM_FAULT_NOPAGE; goto err_out_unlock; } - va_page = sgx_encl_grow(encl, false); + va_page = sgx_encl_grow(encl); if (IS_ERR(va_page)) { if (PTR_ERR(va_page) == -EBUSY) vmret = VM_FAULT_NOPAGE; @@ -1230,23 +1230,23 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr) } while (unlikely(encl->mm_list_version != mm_list_version)); } -/** - * sgx_alloc_va_page() - Allocate a Version Array (VA) page - * @reclaim: Reclaim EPC pages directly if none available. Enclave - * mutex should not be held if this is set. - * - * Allocate a free EPC page and convert it to a Version Array (VA) page. +typedef struct sgx_epc_page *(*sgx_alloc_epc_page_fn_t)(void *owner); + +/* + * Allocate a Version Array (VA) page + * @alloc_fn: the EPC page allocation function. * * Return: * a VA page, * -errno otherwise */ -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim) +static struct sgx_epc_page *__sgx_alloc_va_page(sgx_alloc_epc_page_fn_t alloc_fn) { struct sgx_epc_page *epc_page; int ret; - epc_page = sgx_alloc_epc_page(NULL, reclaim); + epc_page = alloc_fn(NULL); + if (IS_ERR(epc_page)) return ERR_CAST(epc_page); @@ -1260,6 +1260,40 @@ struct sgx_epc_page *sgx_alloc_va_page(bool reclaim) return epc_page; } +/** + * sgx_alloc_va_page() - Allocate a Version Array (VA) page + * + * Allocate a free EPC page and convert it to a VA page. + * + * Do not reclaim EPC pages if none available. + * + * Return: + * a VA page, + * -errno otherwise + */ +struct sgx_epc_page *sgx_alloc_va_page(void) +{ +return __sgx_alloc_va_page(sgx_alloc_epc_page); +} + +/** + * sgx_alloc_va_page_reclaim() - Allocate a Version Array (VA) page + * + * Allocate a free EPC page and convert it to a VA page. + * + * Reclaim EPC pages directly if none available. Enclave mutex should not be + * held. + * + * Return: + * a VA page, + * -errno otherwise + */ + +struct sgx_epc_page *sgx_alloc_va_page_reclaim(void) +{ +return __sgx_alloc_va_page(sgx_alloc_epc_page_reclaim); +} + /** * sgx_alloc_va_slot - allocate a VA slot * @va_page: a sgx_va_page instance diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index f94ff14c9486..3248ff72e573 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -116,14 +116,16 @@ struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, unsigned long offset, u64 secinfo_flags); void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr); -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim); +struct sgx_epc_page *sgx_alloc_va_page(void); +struct sgx_epc_page *sgx_alloc_va_page_reclaim(void); unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page); void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset); bool sgx_va_page_full(struct sgx_va_page *va_page); void sgx_encl_free_epc_page(struct sgx_epc_page *page); struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, unsigned long addr); -struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim); +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl); +struct sgx_va_page *sgx_encl_grow_reclaim(struct sgx_encl
Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()
On Tue, 13 Feb 2024 19:52:25 -0600, Jarkko Sakkinen wrote: On Tue Feb 13, 2024 at 1:15 AM EET, Haitao Huang wrote: Hi Jarkko On Mon, 12 Feb 2024 13:55:46 -0600, Jarkko Sakkinen wrote: > On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote: >> From: Kristen Carlson Accardi >> >> When the EPC usage of a cgroup is near its limit, the cgroup needs to >> reclaim pages used in the same cgroup to make room for new allocations. >> This is analogous to the behavior that the global reclaimer is triggered >> when the global usage is close to total available EPC. >> >> Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate >> whether synchronous reclaim is allowed or not. And trigger the >> synchronous/asynchronous reclamation flow accordingly. >> >> Note at this point, all reclaimable EPC pages are still tracked in the >> global LRU and per-cgroup LRUs are empty. So no per-cgroup reclamation >> is activated yet. >> >> Co-developed-by: Sean Christopherson >> Signed-off-by: Sean Christopherson >> Signed-off-by: Kristen Carlson Accardi >> Co-developed-by: Haitao Huang >> Signed-off-by: Haitao Huang >> --- >> V7: >> - Split this out from the big patch, #10 in V6. (Dave, Kai) >> --- >> arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 -- >> arch/x86/kernel/cpu/sgx/epc_cgroup.h | 4 ++-- >> arch/x86/kernel/cpu/sgx/main.c | 2 +- >> 3 files changed, 27 insertions(+), 5 deletions(-) >> >> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c >> b/arch/x86/kernel/cpu/sgx/epc_cgroup.c >> index d399fda2b55e..abf74fdb12b4 100644 >> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c >> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c >> @@ -184,13 +184,35 @@ static void >> sgx_epc_cgroup_reclaim_work_func(struct work_struct *work) >> /** >> * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single EPC >> page >> * @epc_cg: The EPC cgroup to be charged for the page. >> + * @reclaim: Whether or not synchronous reclaim is allowed >> * Return: >> * * %0 - If successfully charged. >> * * -errno - for failures. >> */ >> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg) >> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool >> reclaim) >> { >> - return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); >> + for (;;) { >> + if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, >> + PAGE_SIZE)) >> + break; >> + >> + if (sgx_epc_cgroup_lru_empty(epc_cg->cg)) >> + return -ENOMEM; >> + +if (signal_pending(current)) >> + return -ERESTARTSYS; >> + >> + if (!reclaim) { >> + queue_work(sgx_epc_cg_wq, _cg->reclaim_work); >> + return -EBUSY; >> + } >> + >> + if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false)) >> + /* All pages were too young to reclaim, try again a little later */ >> + schedule(); > > This will be total pain to backtrack after a while when something > needs to be changed so there definitely should be inline comments > addressing each branch condition. > > I'd rethink this as: > > 1. Create static __sgx_epc_cgroup_try_charge() for addressing single >iteration with the new "reclaim" parameter. > 2. Add a new sgx_epc_group_try_charge_reclaim() function. > > There's a bit of redundancy with sgx_epc_cgroup_try_charge() and > sgx_epc_cgroup_try_charge_reclaim() because both have almost the > same loop calling internal __sgx_epc_cgroup_try_charge() with > different parameters. That is totally acceptable. > > Please also add my suggested-by. > > BR, Jarkko > > BR, Jarkko > For #2: The only caller of this function, sgx_alloc_epc_page(), has the same boolean which is passed into this this function. I know. This would be good opportunity to fix that up. Large patch sets should try to make the space for its feature best possible and thus also clean up the code base overally. If we separate it into sgx_epc_cgroup_try_charge() and sgx_epc_cgroup_try_charge_reclaim(), then the caller has to have the if/else branches. So separation here seems not help? Of course it does. It makes the code in that location self-documenting and easier to remember what it does. BR, Jarkko Please let me know if this aligns with your suggestion. static int ___sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg) { if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE)) return 0; if (sgx_epc_cgroup_lru_empty(epc_cg->cg)) return -ENOMEM; if (signal_pending(current)) return -ERESTARTSYS; return -EBUSY; } /** * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single page * @epc_cg: The EPC cgroup to be charged for the page. * * Try
[PATCH] arm64: dts: qcom: Fix type of "wdog" IRQs for remoteprocs
rrupts-extended = < 6 IRQ_TYPE_LEVEL_HIGH>, + interrupts-extended = < 6 IRQ_TYPE_EDGE_RISING>, <_adsp_in 0 IRQ_TYPE_EDGE_RISING>, <_adsp_in 1 IRQ_TYPE_EDGE_RISING>, <_adsp_in 2 IRQ_TYPE_EDGE_RISING>, @@ -1511,7 +1511,7 @@ cdsp: remoteproc@830 { compatible = "qcom,sm6350-cdsp-pas"; reg = <0 0x0830 0 0x1>; - interrupts-extended = < GIC_SPI 578 IRQ_TYPE_LEVEL_HIGH>, + interrupts-extended = < GIC_SPI 578 IRQ_TYPE_EDGE_RISING>, <_cdsp_in 0 IRQ_TYPE_EDGE_RISING>, <_cdsp_in 1 IRQ_TYPE_EDGE_RISING>, <_cdsp_in 2 IRQ_TYPE_EDGE_RISING>, diff --git a/arch/arm64/boot/dts/qcom/sm6375.dtsi b/arch/arm64/boot/dts/qcom/sm6375.dtsi index 4386f8a9c636..f40509d91bbd 100644 --- a/arch/arm64/boot/dts/qcom/sm6375.dtsi +++ b/arch/arm64/boot/dts/qcom/sm6375.dtsi @@ -1561,7 +1561,7 @@ remoteproc_adsp: remoteproc@a40 { compatible = "qcom,sm6375-adsp-pas"; reg = <0 0x0a40 0 0x100>; - interrupts-extended = < GIC_SPI 282 IRQ_TYPE_LEVEL_HIGH>, + interrupts-extended = < GIC_SPI 282 IRQ_TYPE_EDGE_RISING>, <_adsp_in 0 IRQ_TYPE_EDGE_RISING>, <_adsp_in 1 IRQ_TYPE_EDGE_RISING>, <_adsp_in 2 IRQ_TYPE_EDGE_RISING>, diff --git a/arch/arm64/boot/dts/qcom/sm8250.dtsi b/arch/arm64/boot/dts/qcom/sm8250.dtsi index f3c70b87efad..03c7dda1d542 100644 --- a/arch/arm64/boot/dts/qcom/sm8250.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8250.dtsi @@ -3062,7 +3062,7 @@ slpi: remoteproc@5c0 { compatible = "qcom,sm8250-slpi-pas"; reg = <0 0x05c0 0 0x4000>; - interrupts-extended = < 9 IRQ_TYPE_LEVEL_HIGH>, + interrupts-extended = < 9 IRQ_TYPE_EDGE_RISING>, <_slpi_in 0 IRQ_TYPE_EDGE_RISING>, <_slpi_in 1 IRQ_TYPE_EDGE_RISING>, <_slpi_in 2 IRQ_TYPE_EDGE_RISING>, @@ -3766,7 +3766,7 @@ cdsp: remoteproc@830 { compatible = "qcom,sm8250-cdsp-pas"; reg = <0 0x0830 0 0x1>; - interrupts-extended = < GIC_SPI 578 IRQ_TYPE_LEVEL_HIGH>, + interrupts-extended = < GIC_SPI 578 IRQ_TYPE_EDGE_RISING>, <_cdsp_in 0 IRQ_TYPE_EDGE_RISING>, <_cdsp_in 1 IRQ_TYPE_EDGE_RISING>, <_cdsp_in 2 IRQ_TYPE_EDGE_RISING>, @@ -5928,7 +5928,7 @@ adsp: remoteproc@1730 { compatible = "qcom,sm8250-adsp-pas"; reg = <0 0x1730 0 0x100>; - interrupts-extended = < 6 IRQ_TYPE_LEVEL_HIGH>, + interrupts-extended = < 6 IRQ_TYPE_EDGE_RISING>, <_adsp_in 0 IRQ_TYPE_EDGE_RISING>, <_adsp_in 1 IRQ_TYPE_EDGE_RISING>, <_adsp_in 2 IRQ_TYPE_EDGE_RISING>, --- base-commit: 35a4fdde2466b9d90af297f249436a270ef9d30e change-id: 20240219-remoteproc-irqs-63f3293af260 Best regards, -- Luca Weiss
Re: [PATCH v7 23/36] function_graph: Add a new exit handler with parent_ip and ftrace_regs
Hi Steve, On Sun, 18 Feb 2024 11:53:28 +0900 Masami Hiramatsu (Google) wrote: > On Fri, 16 Feb 2024 17:51:08 +0900 > Masami Hiramatsu (Google) wrote: > > > > > @@ -798,10 +798,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret > > > > *trace, unsigned long *ret, > > > > > > > > *index += FGRAPH_RET_INDEX; > > > > *ret = ret_stack->ret; > > > > - trace->func = ret_stack->func; > > > > - trace->calltime = ret_stack->calltime; > > > > - trace->overrun = atomic_read(>trace_overrun); > > > > - trace->depth = current->curr_ret_depth; > > > > > > There's a lot of information stored in the trace structure. Why not pass > > > that to the new retregfunc? > > > > > > Then you don't need to separate this code out. > > > > Sorry, I couldn't catch what you meant, Would you mean to call > > ftrace_pop_return_trace() before calling retregfunc()?? because some of the > > information are found from ret_stack, which is poped from shadow stack. > > Ah, sorry I got what you said. I think this `trace` is not usable for the new > interface. Most of the information is only used for the function-graph tracer. > For example, trace->calltime and trace->overrun, trace->depth are used only > for the function-graph tracer, but not for the other tracers. > > But yeah, this idea is considerable. It also allows us to just update > entryfunc() and retfunc() to pass fgraph_regs and return address. The reason why I didn't use the those for *regfunc() is not only those have unused information, but those does not have some params. - ftrace_graph_ent only have current `func`, but entryregfunc() needs `parent_ip` (== return address) - ftrace_graph_ret only have current `func`, but retregfunc() needs `ret` (== return address) too. If I update the ftrace_graph_ent/ret to add 'retaddr', we can just pass ftrace_graph_ent/ret, ftrace_regs, and fgraph_ops to *regfunc(). Moreover, maybe we don't need *regfunc, but just update entryfunc/retfunc to pass ftrace_regs *, which will be NULL if it is not supported. Thank you, -- Masami Hiramatsu (Google)
Re: [PATCH v9 04/15] x86/sgx: Implement basic EPC misc cgroup functionality
[PATCH] dax: constify the struct device_type usage
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver core can properly handle constant struct device_type. Move the dax_mapping_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman Signed-off-by: Ricardo B. Marliere --- drivers/dax/bus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 1ff1ab5fa105..e265ba019785 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -763,7 +763,7 @@ static const struct attribute_group *dax_mapping_attribute_groups[] = { NULL, }; -static struct device_type dax_mapping_type = { +static const struct device_type dax_mapping_type = { .release = dax_mapping_release, .groups = dax_mapping_attribute_groups, }; --- base-commit: b401b621758e46812da61fa58a67c3fd8d91de0d change-id: 20240219-device_cleanup-dax-d82fd0c67ffd Best regards, -- Ricardo B. Marliere
Re: [PATCH] tracing: Inform kmemleak of saved_cmdlines allocation
On Wed, Feb 14, 2024 at 10:26:14AM -0500, Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > The allocation of the struct saved_cmdlines_buffer structure changed from: > > s = kmalloc(sizeof(*s), GFP_KERNEL); > s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL); > > to: > > orig_size = sizeof(*s) + val * TASK_COMM_LEN; > order = get_order(orig_size); > size = 1 << (order + PAGE_SHIFT); > page = alloc_pages(GFP_KERNEL, order); > if (!page) > return NULL; > > s = page_address(page); > memset(s, 0, sizeof(*s)); > > s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL); > > Where that s->saved_cmdlines allocation looks to be a dangling allocation > to kmemleak. That's because kmemleak only keeps track of kmalloc() > allocations. For allocations that use page_alloc() directly, the kmemleak > needs to be explicitly informed about it. > > Add kmemleak_alloc() and kmemleak_free() around the page allocation so > that it doesn't give the following false positive: > > unreferenced object 0x8881010c8000 (size 32760): > comm "swapper", pid 0, jiffies 4294667296 > hex dump (first 32 bytes): > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > backtrace (crc ae6ec1b9): > [] kmemleak_alloc+0x45/0x80 > [] __kmalloc_large_node+0x10d/0x190 > [] __kmalloc+0x3b1/0x4c0 > [] allocate_cmdlines_buffer+0x113/0x230 > [] tracer_alloc_buffers.isra.0+0x124/0x460 > [] early_trace_init+0x14/0xa0 > [] start_kernel+0x12e/0x3c0 > [] x86_64_start_reservations+0x18/0x30 > [] x86_64_start_kernel+0x7b/0x80 > [] secondary_startup_64_no_verify+0x15e/0x16b > > Link: https://lore.kernel.org/linux-trace-kernel/87r0hfnr9r@kernel.org/ > > Fixes: 44dc5c41b5b1 ("tracing: Fix wasted memory in saved_cmdlines logic") > Reported-by: Kalle Valo > Signed-off-by: Steven Rostedt (Google) Reviewed-by: Catalin Marinas
Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system
On 2024/2/19 下午5:02, Huacai Chen wrote: On Mon, Feb 19, 2024 at 3:37 PM maobibo wrote: On 2024/2/19 下午3:16, Huacai Chen wrote: On Mon, Feb 19, 2024 at 12:18 PM maobibo wrote: On 2024/2/19 上午10:45, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao wrote: On LoongArch system, ipi hw uses iocsr registers, there is one iocsr register access on ipi sending, and two iocsr access on ipi receiving which is ipi interrupt handler. On VM mode all iocsr registers accessing will cause VM to trap into hypervisor. So with ipi hw notification once there will be three times of trap. This patch adds pv ipi support for VM, hypercall instruction is used to ipi sender, and hypervisor will inject SWI on the VM. During SWI interrupt handler, only estat CSR register is written to clear irq. Estat CSR register access will not trap into hypervisor. So with pv ipi supported, pv ipi sender will trap into hypervsor one time, pv ipi revicer will not trap, there is only one time of trap. Also this patch adds ipi multicast support, the method is similar with x86. With ipi multicast support, ipi notification can be sent to at most 128 vcpus at one time. It reduces trap times into hypervisor greatly. Signed-off-by: Bibo Mao --- arch/loongarch/include/asm/hardirq.h | 1 + arch/loongarch/include/asm/kvm_host.h | 1 + arch/loongarch/include/asm/kvm_para.h | 124 + arch/loongarch/include/asm/loongarch.h | 1 + arch/loongarch/kernel/irq.c| 2 +- arch/loongarch/kernel/paravirt.c | 113 ++ arch/loongarch/kernel/smp.c| 2 +- arch/loongarch/kvm/exit.c | 73 ++- arch/loongarch/kvm/vcpu.c | 1 + 9 files changed, 314 insertions(+), 4 deletions(-) diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h index 9f0038e19c7f..8a611843c1f0 100644 --- a/arch/loongarch/include/asm/hardirq.h +++ b/arch/loongarch/include/asm/hardirq.h @@ -21,6 +21,7 @@ enum ipi_msg_type { typedef struct { unsigned int ipi_irqs[NR_IPI]; unsigned int __softirq_pending; + atomic_t messages cacheline_aligned_in_smp; Do we really need atomic_t? A plain "unsigned int" can reduce cost significantly. For IPI, there are multiple senders and one receiver, the sender uses atomic_fetch_or(action, >messages) and the receiver uses atomic_xchg(>messages, 0) to clear message. There needs sync mechanism between senders and receiver, atomic is the most simple method. At least from receiver side, the native IPI doesn't need atomic for read and clear: static u32 ipi_read_clear(int cpu) { u32 action; /* Load the ipi register to figure out what we're supposed to do */ action = iocsr_read32(LOONGARCH_IOCSR_IPI_STATUS); /* Clear the ipi register to clear the interrupt */ iocsr_write32(action, LOONGARCH_IOCSR_IPI_CLEAR); wbflush(); It is because on physical hardware it is two IOCSR registers and also there is no method to use atomic read and clear method for IOCSR registers. However if ipi message is stored on ddr memory, atomic read/clear can used. Your can compare price of one iocsr_read32 + one iocsr_write32 + wbflush with one atomic_xchg(>messages, 0) atomic ops is of course cheaper than iocsr, but unsigned int is even cheaper. Can you give an example about how to use unsigned int for ipi sender and ipi handler? When clearing ipi message, other ipi senders need know immediately, else ipi senders will call hypercall notification unconditionally. Regards Bibo Mao Regards Bibo Mao return action; } } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h index 57399d7cf8b7..1bf927e2bfac 100644 --- a/arch/loongarch/include/asm/kvm_host.h +++ b/arch/loongarch/include/asm/kvm_host.h @@ -43,6 +43,7 @@ struct kvm_vcpu_stat { u64 idle_exits; u64 cpucfg_exits; u64 signal_exits; + u64 hvcl_exits; hypercall_exits is better. yeap, hypercall_exits is better, will fix in next version. }; #define KVM_MEM_HUGEPAGE_CAPABLE (1UL << 0) diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 41200e922a82..a25a84e372b9 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -9,6 +9,10 @@ #define HYPERVISOR_VENDOR_SHIFT8 #define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) +#define KVM_HC_CODE_SERVICE0 +#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, KVM_HC_CODE_SERVICE) +#define KVM_HC_FUNC_IPI 1 Change HC to HCALL is better. will modify in next version. + /* *
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
On 2024/2/19 下午5:38, Huacai Chen wrote: On Mon, Feb 19, 2024 at 5:21 PM maobibo wrote: On 2024/2/19 下午4:48, Huacai Chen wrote: On Mon, Feb 19, 2024 at 12:11 PM maobibo wrote: On 2024/2/19 上午10:42, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: The patch adds paravirt interface for guest kernel, function pv_guest_initi() firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect whether current VMM is KVM hypervisor. And the paravirt function can work only if current VMM is KVM hypervisor, since there is only KVM hypervisor supported on LoongArch now. This patch only adds paravirt interface for guest kernel, however there is not effective pv functions added here. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 arch/loongarch/include/asm/kvm_para.h | 7 arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/paravirt.c | 41 +++ arch/loongarch/kernel/setup.c | 2 + 7 files changed, 88 insertions(+) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 10959e6c3583..817a56dff80f 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 9425d3b7e486..41200e922a82 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -2,6 +2,13 @@ #ifndef _ASM_LOONGARCH_KVM_PARA_H #define _ASM_LOONGARCH_KVM_PARA_H +/* + * Hypcall code field + */ +#define HYPERVISOR_KVM 1 +#define HYPERVISOR_VENDOR_SHIFT8 +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) + /* * LoongArch hypcall return code */ diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..b64813592ba0 --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} The steal time code can be removed in this patch, I think. Originally I want to remove this piece of code, but it fails to compile if CONFIG_PARAVIRT is selected. Here is reference code, function paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected. static __always_inline u64 steal_account_process_time(u64 maxtime) { #ifdef CONFIG_PARAVIRT if (static_key_false(_steal_enabled)) { u64 steal; steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; steal = min(steal, maxtime); account_steal_time(steal); this_rq()->prev_steal_time += steal; return steal; } #endif return 0; } OK, then keep it. + +int pv_guest_init(void); +#else +static inline int pv_guest_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
On Mon, Feb 19, 2024 at 5:21 PM maobibo wrote: > > > > On 2024/2/19 下午4:48, Huacai Chen wrote: > > On Mon, Feb 19, 2024 at 12:11 PM maobibo wrote: > >> > >> > >> > >> On 2024/2/19 上午10:42, Huacai Chen wrote: > >>> Hi, Bibo, > >>> > >>> On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: > > The patch adds paravirt interface for guest kernel, function > pv_guest_initi() firstly checks whether system runs on VM mode. If kernel > runs on VM mode, it will call function kvm_para_available() to detect > whether current VMM is KVM hypervisor. And the paravirt function can work > only if current VMM is KVM hypervisor, since there is only KVM hypervisor > supported on LoongArch now. > > This patch only adds paravirt interface for guest kernel, however there > is not effective pv functions added here. > > Signed-off-by: Bibo Mao > --- > arch/loongarch/Kconfig| 9 > arch/loongarch/include/asm/kvm_para.h | 7 > arch/loongarch/include/asm/paravirt.h | 27 > .../include/asm/paravirt_api_clock.h | 1 + > arch/loongarch/kernel/Makefile| 1 + > arch/loongarch/kernel/paravirt.c | 41 +++ > arch/loongarch/kernel/setup.c | 2 + > 7 files changed, 88 insertions(+) > create mode 100644 arch/loongarch/include/asm/paravirt.h > create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h > create mode 100644 arch/loongarch/kernel/paravirt.c > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 10959e6c3583..817a56dff80f 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH > bool > default y > > +config PARAVIRT > + bool "Enable paravirtualization code" > + depends on AS_HAS_LVZ_EXTENSION > + help > + This changes the kernel so it can modify itself when it is run > + under a hypervisor, potentially improving performance > significantly > + over full virtualization. However, when run without a > hypervisor > + the kernel is theoretically slower and slightly larger. > + > config ARCH_SUPPORTS_KEXEC > def_bool y > > diff --git a/arch/loongarch/include/asm/kvm_para.h > b/arch/loongarch/include/asm/kvm_para.h > index 9425d3b7e486..41200e922a82 100644 > --- a/arch/loongarch/include/asm/kvm_para.h > +++ b/arch/loongarch/include/asm/kvm_para.h > @@ -2,6 +2,13 @@ > #ifndef _ASM_LOONGARCH_KVM_PARA_H > #define _ASM_LOONGARCH_KVM_PARA_H > > +/* > + * Hypcall code field > + */ > +#define HYPERVISOR_KVM 1 > +#define HYPERVISOR_VENDOR_SHIFT8 > +#define HYPERCALL_CODE(vendor, code) ((vendor << > HYPERVISOR_VENDOR_SHIFT) + code) > + > /* > * LoongArch hypcall return code > */ > diff --git a/arch/loongarch/include/asm/paravirt.h > b/arch/loongarch/include/asm/paravirt.h > new file mode 100644 > index ..b64813592ba0 > --- /dev/null > +++ b/arch/loongarch/include/asm/paravirt.h > @@ -0,0 +1,27 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_LOONGARCH_PARAVIRT_H > +#define _ASM_LOONGARCH_PARAVIRT_H > + > +#ifdef CONFIG_PARAVIRT > +#include > +struct static_key; > +extern struct static_key paravirt_steal_enabled; > +extern struct static_key paravirt_steal_rq_enabled; > + > +u64 dummy_steal_clock(int cpu); > +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); > + > +static inline u64 paravirt_steal_clock(int cpu) > +{ > + return static_call(pv_steal_clock)(cpu); > +} > >>> The steal time code can be removed in this patch, I think. > >>> > >> Originally I want to remove this piece of code, but it fails to compile > >> if CONFIG_PARAVIRT is selected. Here is reference code, function > >> paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected. > >> > >> static __always_inline u64 steal_account_process_time(u64 maxtime) > >> { > >> #ifdef CONFIG_PARAVIRT > >> if (static_key_false(_steal_enabled)) { > >> u64 steal; > >> > >> steal = paravirt_steal_clock(smp_processor_id()); > >> steal -= this_rq()->prev_steal_time; > >> steal = min(steal, maxtime); > >> account_steal_time(steal); > >> this_rq()->prev_steal_time += steal; > >> > >> return steal; > >> } > >> #endif > >> return 0; > >> } > > OK, then keep it. > > > >> >
Re: [PATCH v18 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper
>> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c >> b/drivers/vfio/pci/nvgrace-gpu/main.c new file mode 100644 >> index ..5a251a6a782e >> --- /dev/null >> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c >> @@ -0,0 +1,888 @@ >> +// SPDX-License-Identifier: GPL-2.0-only >> +/* >> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights >> reserved >> + */ >> + >> +#include >> +#include >> + > > Let's keep the header inclusion in an alphabet order. > > With that addressed, > > Reviewed-by: Zhi Wang Yes, will adjust that. Thanks! >> +/* >> + * The device memory usable to the workloads running in the VM is >> cached >> + * and showcased as a 64b device BAR (comprising of BAR4 and BAR5 >> region) >> + * to the VM and is represented as usemem. >> + * Moreover, the VM GPU device driver needs a non-cacheable region to >> + * support the MIG feature. This region is also exposed as a 64b BAR >> + * (comprising of BAR2 and BAR3 region) and represented as resmem. >> + */ >> +#define RESMEM_REGION_INDEX VFIO_PCI_BAR2_REGION_INDEX >> +#define USEMEM_REGION_INDEX VFIO_PCI_BAR4_REGION_INDEX
Re: [PATCH v18 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper
On Fri, 16 Feb 2024 08:31:28 +0530 wrote: > From: Ankit Agrawal > > NVIDIA's upcoming Grace Hopper Superchip provides a PCI-like device > for the on-chip GPU that is the logical OS representation of the > internal proprietary chip-to-chip cache coherent interconnect. > > The device is peculiar compared to a real PCI device in that whilst > there is a real 64b PCI BAR1 (comprising region 2 & region 3) on the > device, it is not used to access device memory once the faster > chip-to-chip interconnect is initialized (occurs at the time of host > system boot). The device memory is accessed instead using the > chip-to-chip interconnect that is exposed as a contiguous physically > addressable region on the host. This device memory aperture can be > obtained from host ACPI table using device_property_read_u64(), > according to the FW specification. Since the device memory is cache > coherent with the CPU, it can be mmap into the user VMA with a > cacheable mapping using remap_pfn_range() and used like a regular > RAM. The device memory is not added to the host kernel, but mapped > directly as this reduces memory wastage due to struct pages. > > There is also a requirement of a minimum reserved 1G uncached region > (termed as resmem) to support the Multi-Instance GPU (MIG) feature > [1]. This is to work around a HW defect. Based on [2], the requisite > properties (uncached, unaligned access) can be achieved through a VM > mapping (S1) of NORMAL_NC and host (S2) mapping with > MemAttr[2:0]=0b101. To provide a different non-cached property to the > reserved 1G region, it needs to be carved out from the device memory > and mapped as a separate region in Qemu VMA with > pgprot_writecombine(). pgprot_writecombine() sets the Qemu VMA page > properties (pgprot) as NORMAL_NC. > > Provide a VFIO PCI variant driver that adapts the unique device memory > representation into a more standard PCI representation facing > userspace. > > The variant driver exposes these two regions - the non-cached reserved > (resmem) and the cached rest of the device memory (termed as usemem) > as separate VFIO 64b BAR regions. This is divergent from the baremetal > approach, where the device memory is exposed as a device memory > region. The decision for a different approach was taken in view of > the fact that it would necessiate additional code in Qemu to discover > and insert those regions in the VM IPA, along with the additional VM > ACPI DSDT changes to communicate the device memory region IPA to the > VM workloads. Moreover, this behavior would have to be added to a > variety of emulators (beyond top of tree Qemu) out there desiring > grace hopper support. > > Since the device implements 64-bit BAR0, the VFIO PCI variant driver > maps the uncached carved out region to the next available PCI BAR > (i.e. comprising of region 2 and 3). The cached device memory > aperture is assigned BAR region 4 and 5. Qemu will then naturally > generate a PCI device in the VM with the uncached aperture reported > as BAR2 region, the cacheable as BAR4. The variant driver provides > emulation for these fake BARs' PCI config space offset registers. > > The hardware ensures that the system does not crash when the memory > is accessed with the memory enable turned off. It synthesis ~0 reads > and dropped writes on such access. So there is no need to support the > disablement/enablement of BAR through PCI_COMMAND config space > register. > > The memory layout on the host looks like the following: >devmem (memlength) > |--| > |-cached|--NC--| > | | > usemem.memphys resmem.memphys > > PCI BARs need to be aligned to the power-of-2, but the actual memory > on the device may not. A read or write access to the physical address > from the last device PFN up to the next power-of-2 aligned physical > address results in reading ~0 and dropped writes. Note that the GPU > device driver [6] is capable of knowing the exact device memory size > through separate means. The device memory size is primarily kept in > the system ACPI tables for use by the VFIO PCI variant module. > > Note that the usemem memory is added by the VM Nvidia device driver > [5] to the VM kernel as memblocks. Hence make the usable memory size > memblock (MEMBLK_SIZE) aligned. This is a hardwired ABI value between > the GPU FW and VFIO driver. The VM device driver make use of the same > value for its calculation to determine USEMEM size. > > Currently there is no provision in KVM for a S2 mapping with > MemAttr[2:0]=0b101, but there is an ongoing effort to provide the > same [3]. As previously mentioned, resmem is mapped > pgprot_writecombine(), that sets the Qemu VMA page properties > (pgprot) as NORMAL_NC. Using the proposed changes in [3] and [4], KVM > marks the region with MemAttr[2:0]=0b101 in S2. > > If the device
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
On 2024/2/19 下午4:48, Huacai Chen wrote: On Mon, Feb 19, 2024 at 12:11 PM maobibo wrote: On 2024/2/19 上午10:42, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: The patch adds paravirt interface for guest kernel, function pv_guest_initi() firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect whether current VMM is KVM hypervisor. And the paravirt function can work only if current VMM is KVM hypervisor, since there is only KVM hypervisor supported on LoongArch now. This patch only adds paravirt interface for guest kernel, however there is not effective pv functions added here. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 arch/loongarch/include/asm/kvm_para.h | 7 arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/paravirt.c | 41 +++ arch/loongarch/kernel/setup.c | 2 + 7 files changed, 88 insertions(+) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 10959e6c3583..817a56dff80f 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 9425d3b7e486..41200e922a82 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -2,6 +2,13 @@ #ifndef _ASM_LOONGARCH_KVM_PARA_H #define _ASM_LOONGARCH_KVM_PARA_H +/* + * Hypcall code field + */ +#define HYPERVISOR_KVM 1 +#define HYPERVISOR_VENDOR_SHIFT8 +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) + /* * LoongArch hypcall return code */ diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..b64813592ba0 --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} The steal time code can be removed in this patch, I think. Originally I want to remove this piece of code, but it fails to compile if CONFIG_PARAVIRT is selected. Here is reference code, function paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected. static __always_inline u64 steal_account_process_time(u64 maxtime) { #ifdef CONFIG_PARAVIRT if (static_key_false(_steal_enabled)) { u64 steal; steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; steal = min(steal, maxtime); account_steal_time(steal); this_rq()->prev_steal_time += steal; return steal; } #endif return 0; } OK, then keep it. + +int pv_guest_init(void); +#else +static inline int pv_guest_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) +=
Re: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held
On Fri, Feb 16, 2024 at 05:17:20PM -0800, Dan Williams wrote: > > commit 807fbe9cac9b190dab83e3ff377a30d18859c8ab > > Author: Ira Weiny > > Date: Wed Feb 14 15:25:24 2024 -0800 > > > > Revert "acpi/ghes: Process CXL Component Events" > > > > This reverts commit 671a794c33c6e048ca5cedd5ad6af44d52d5d7e5. > > Even reverts need changelogs, I can add one. I got conflicts trying to > apply this to current fixes branch. I think I am going to just > surgically backout the drivers/acpi/apei/ghes.c changes. The `git revert` command comes from ancient times and it just sets people up for failure... :/ regards, dan carpenter
Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system
On Mon, Feb 19, 2024 at 3:37 PM maobibo wrote: > > > > On 2024/2/19 下午3:16, Huacai Chen wrote: > > On Mon, Feb 19, 2024 at 12:18 PM maobibo wrote: > >> > >> > >> > >> On 2024/2/19 上午10:45, Huacai Chen wrote: > >>> Hi, Bibo, > >>> > >>> On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao wrote: > > On LoongArch system, ipi hw uses iocsr registers, there is one iocsr > register access on ipi sending, and two iocsr access on ipi receiving > which is ipi interrupt handler. On VM mode all iocsr registers > accessing will cause VM to trap into hypervisor. So with ipi hw > notification once there will be three times of trap. > > This patch adds pv ipi support for VM, hypercall instruction is used > to ipi sender, and hypervisor will inject SWI on the VM. During SWI > interrupt handler, only estat CSR register is written to clear irq. > Estat CSR register access will not trap into hypervisor. So with pv ipi > supported, pv ipi sender will trap into hypervsor one time, pv ipi > revicer will not trap, there is only one time of trap. > > Also this patch adds ipi multicast support, the method is similar with > x86. With ipi multicast support, ipi notification can be sent to at most > 128 vcpus at one time. It reduces trap times into hypervisor greatly. > > Signed-off-by: Bibo Mao > --- > arch/loongarch/include/asm/hardirq.h | 1 + > arch/loongarch/include/asm/kvm_host.h | 1 + > arch/loongarch/include/asm/kvm_para.h | 124 + > arch/loongarch/include/asm/loongarch.h | 1 + > arch/loongarch/kernel/irq.c| 2 +- > arch/loongarch/kernel/paravirt.c | 113 ++ > arch/loongarch/kernel/smp.c| 2 +- > arch/loongarch/kvm/exit.c | 73 ++- > arch/loongarch/kvm/vcpu.c | 1 + > 9 files changed, 314 insertions(+), 4 deletions(-) > > diff --git a/arch/loongarch/include/asm/hardirq.h > b/arch/loongarch/include/asm/hardirq.h > index 9f0038e19c7f..8a611843c1f0 100644 > --- a/arch/loongarch/include/asm/hardirq.h > +++ b/arch/loongarch/include/asm/hardirq.h > @@ -21,6 +21,7 @@ enum ipi_msg_type { > typedef struct { > unsigned int ipi_irqs[NR_IPI]; > unsigned int __softirq_pending; > + atomic_t messages cacheline_aligned_in_smp; > >>> Do we really need atomic_t? A plain "unsigned int" can reduce cost > >>> significantly. > >> For IPI, there are multiple senders and one receiver, the sender uses > >> atomic_fetch_or(action, >messages) and the receiver uses > >> atomic_xchg(>messages, 0) to clear message. > >> > >> There needs sync mechanism between senders and receiver, atomic is the > >> most simple method. > > At least from receiver side, the native IPI doesn't need atomic for > > read and clear: > > static u32 ipi_read_clear(int cpu) > > { > > u32 action; > > > > /* Load the ipi register to figure out what we're supposed to do */ > > action = iocsr_read32(LOONGARCH_IOCSR_IPI_STATUS); > > /* Clear the ipi register to clear the interrupt */ > > iocsr_write32(action, LOONGARCH_IOCSR_IPI_CLEAR); > > wbflush(); > It is because on physical hardware it is two IOCSR registers and also > there is no method to use atomic read and clear method for IOCSR registers. > > However if ipi message is stored on ddr memory, atomic read/clear can > used. Your can compare price of one iocsr_read32 + one iocsr_write32 + > wbflush with one atomic_xchg(>messages, 0) atomic ops is of course cheaper than iocsr, but unsigned int is even cheaper. > > Regards > Bibo Mao > > > > return action; > > } > > > >>> > } cacheline_aligned irq_cpustat_t; > > DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); > diff --git a/arch/loongarch/include/asm/kvm_host.h > b/arch/loongarch/include/asm/kvm_host.h > index 57399d7cf8b7..1bf927e2bfac 100644 > --- a/arch/loongarch/include/asm/kvm_host.h > +++ b/arch/loongarch/include/asm/kvm_host.h > @@ -43,6 +43,7 @@ struct kvm_vcpu_stat { > u64 idle_exits; > u64 cpucfg_exits; > u64 signal_exits; > + u64 hvcl_exits; > >>> hypercall_exits is better. > >> yeap, hypercall_exits is better, will fix in next version. > >>> > }; > > #define KVM_MEM_HUGEPAGE_CAPABLE (1UL << 0) > diff --git a/arch/loongarch/include/asm/kvm_para.h > b/arch/loongarch/include/asm/kvm_para.h > index 41200e922a82..a25a84e372b9 100644 > --- a/arch/loongarch/include/asm/kvm_para.h > +++ b/arch/loongarch/include/asm/kvm_para.h > @@ -9,6 +9,10 @@ > #define HYPERVISOR_VENDOR_SHIFT8 > #define HYPERCALL_CODE(vendor, code) ((vendor << >
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
On Mon, Feb 19, 2024 at 12:11 PM maobibo wrote: > > > > On 2024/2/19 上午10:42, Huacai Chen wrote: > > Hi, Bibo, > > > > On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: > >> > >> The patch adds paravirt interface for guest kernel, function > >> pv_guest_initi() firstly checks whether system runs on VM mode. If kernel > >> runs on VM mode, it will call function kvm_para_available() to detect > >> whether current VMM is KVM hypervisor. And the paravirt function can work > >> only if current VMM is KVM hypervisor, since there is only KVM hypervisor > >> supported on LoongArch now. > >> > >> This patch only adds paravirt interface for guest kernel, however there > >> is not effective pv functions added here. > >> > >> Signed-off-by: Bibo Mao > >> --- > >> arch/loongarch/Kconfig| 9 > >> arch/loongarch/include/asm/kvm_para.h | 7 > >> arch/loongarch/include/asm/paravirt.h | 27 > >> .../include/asm/paravirt_api_clock.h | 1 + > >> arch/loongarch/kernel/Makefile| 1 + > >> arch/loongarch/kernel/paravirt.c | 41 +++ > >> arch/loongarch/kernel/setup.c | 2 + > >> 7 files changed, 88 insertions(+) > >> create mode 100644 arch/loongarch/include/asm/paravirt.h > >> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h > >> create mode 100644 arch/loongarch/kernel/paravirt.c > >> > >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > >> index 10959e6c3583..817a56dff80f 100644 > >> --- a/arch/loongarch/Kconfig > >> +++ b/arch/loongarch/Kconfig > >> @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH > >> bool > >> default y > >> > >> +config PARAVIRT > >> + bool "Enable paravirtualization code" > >> + depends on AS_HAS_LVZ_EXTENSION > >> + help > >> + This changes the kernel so it can modify itself when it is run > >> + under a hypervisor, potentially improving performance > >> significantly > >> + over full virtualization. However, when run without a hypervisor > >> + the kernel is theoretically slower and slightly larger. > >> + > >> config ARCH_SUPPORTS_KEXEC > >> def_bool y > >> > >> diff --git a/arch/loongarch/include/asm/kvm_para.h > >> b/arch/loongarch/include/asm/kvm_para.h > >> index 9425d3b7e486..41200e922a82 100644 > >> --- a/arch/loongarch/include/asm/kvm_para.h > >> +++ b/arch/loongarch/include/asm/kvm_para.h > >> @@ -2,6 +2,13 @@ > >> #ifndef _ASM_LOONGARCH_KVM_PARA_H > >> #define _ASM_LOONGARCH_KVM_PARA_H > >> > >> +/* > >> + * Hypcall code field > >> + */ > >> +#define HYPERVISOR_KVM 1 > >> +#define HYPERVISOR_VENDOR_SHIFT8 > >> +#define HYPERCALL_CODE(vendor, code) ((vendor << > >> HYPERVISOR_VENDOR_SHIFT) + code) > >> + > >> /* > >>* LoongArch hypcall return code > >>*/ > >> diff --git a/arch/loongarch/include/asm/paravirt.h > >> b/arch/loongarch/include/asm/paravirt.h > >> new file mode 100644 > >> index ..b64813592ba0 > >> --- /dev/null > >> +++ b/arch/loongarch/include/asm/paravirt.h > >> @@ -0,0 +1,27 @@ > >> +/* SPDX-License-Identifier: GPL-2.0 */ > >> +#ifndef _ASM_LOONGARCH_PARAVIRT_H > >> +#define _ASM_LOONGARCH_PARAVIRT_H > >> + > >> +#ifdef CONFIG_PARAVIRT > >> +#include > >> +struct static_key; > >> +extern struct static_key paravirt_steal_enabled; > >> +extern struct static_key paravirt_steal_rq_enabled; > >> + > >> +u64 dummy_steal_clock(int cpu); > >> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); > >> + > >> +static inline u64 paravirt_steal_clock(int cpu) > >> +{ > >> + return static_call(pv_steal_clock)(cpu); > >> +} > > The steal time code can be removed in this patch, I think. > > > Originally I want to remove this piece of code, but it fails to compile > if CONFIG_PARAVIRT is selected. Here is reference code, function > paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected. > > static __always_inline u64 steal_account_process_time(u64 maxtime) > { > #ifdef CONFIG_PARAVIRT > if (static_key_false(_steal_enabled)) { > u64 steal; > > steal = paravirt_steal_clock(smp_processor_id()); > steal -= this_rq()->prev_steal_time; > steal = min(steal, maxtime); > account_steal_time(steal); > this_rq()->prev_steal_time += steal; > > return steal; > } > #endif > return 0; > } OK, then keep it. > > >> + > >> +int pv_guest_init(void); > >> +#else > >> +static inline int pv_guest_init(void) > >> +{ > >> + return 0; > >> +} > >> + > >> +#endif // CONFIG_PARAVIRT > >> +#endif > >> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h > >> b/arch/loongarch/include/asm/paravirt_api_clock.h > >> new file mode 100644 > >> index ..65ac7cee0dad > >> --- /dev/null > >> +++