Re: [PATCH v3] mm: compaction: update the cc->nr_migratepages when allocating or freeing the freepages

2024-02-19 Thread Vlastimil Babka
On 2/20/24 07:16, Baolin Wang wrote:
> Currently we will use 'cc->nr_freepages >= cc->nr_migratepages' comparison
> to ensure that enough freepages are isolated in isolate_freepages(), however
> it just decreases the cc->nr_freepages without updating cc->nr_migratepages
> in compaction_alloc(), which will waste more CPU cycles and cause too many
> freepages to be isolated.
> 
> So we should also update the cc->nr_migratepages when allocating or freeing
> the freepages to avoid isolating excess freepages. And I can see fewer free
> pages are scanned and isolated when running thpcompact on my Arm64 server:
>k6.7 k6.7_patched
> Ops Compaction pages isolated  120692036.00   118160797.00
> Ops Compaction migrate scanned 131210329.00   154093268.00
> Ops Compaction free scanned   1090587971.00  1080632536.00
> Ops Compact scan efficiency   12.03  14.26
> 
> Moreover, I did not see an obvious latency improvements, this is likely 
> because
> isolating freepages is not the bottleneck in the thpcompact test case.
>   k6.7  k6.7_patched
> Amean fault-both-1  1089.76 (   0.00%) 1080.16 *   0.88%*
> Amean fault-both-3  1616.48 (   0.00%) 1636.65 *  -1.25%*
> Amean fault-both-5  2266.66 (   0.00%) 2219.20 *   2.09%*
> Amean fault-both-7  2909.84 (   0.00%) 2801.90 *   3.71%*
> Amean fault-both-12 4861.26 (   0.00%) 4733.25 *   2.63%*
> Amean fault-both-18 7351.11 (   0.00%) 6950.51 *   5.45%*
> Amean fault-both-24 9059.30 (   0.00%) 9159.99 *  -1.11%*
> Amean fault-both-3010685.68 (   0.00%)11399.02 *  -6.68%*
> 
> Signed-off-by: Baolin Wang 
> Acked-by: Mel Gorman 

Reviewed-by: Vlastimil Babka 

Thanks.

> ---
> Hi Andrew, please use this patch to replace below 2 old patches. Thanks.
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-20-04-19=cb30899d456d64fb83ee3e3d95cd9fbb18735d87
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-20-04-19=65713d1c4fc498d35dc1f7c1234ef815aa128429
> 
> Changes from v2:
>  - Add acked tag from Mel.
>  - Fix the mm_compaction_migratepages trace event.
> Changes from v1:
>  - Rebased on the latest mm-unstable branch.
> ---
>  include/trace/events/compaction.h |  6 +++---
>  mm/compaction.c   | 12 ++--
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/include/trace/events/compaction.h 
> b/include/trace/events/compaction.h
> index 2b2a975efd20..d05759d18538 100644
> --- a/include/trace/events/compaction.h
> +++ b/include/trace/events/compaction.h
> @@ -78,10 +78,10 @@ DEFINE_EVENT(mm_compaction_isolate_template, 
> mm_compaction_fast_isolate_freepage
>  #ifdef CONFIG_COMPACTION
>  TRACE_EVENT(mm_compaction_migratepages,
>  
> - TP_PROTO(struct compact_control *cc,
> + TP_PROTO(unsigned int nr_migratepages,
>   unsigned int nr_succeeded),
>  
> - TP_ARGS(cc, nr_succeeded),
> + TP_ARGS(nr_migratepages, nr_succeeded),
>  
>   TP_STRUCT__entry(
>   __field(unsigned long, nr_migrated)
> @@ -90,7 +90,7 @@ TRACE_EVENT(mm_compaction_migratepages,
>  
>   TP_fast_assign(
>   __entry->nr_migrated = nr_succeeded;
> - __entry->nr_failed = cc->nr_migratepages - nr_succeeded;
> + __entry->nr_failed = nr_migratepages - nr_succeeded;
>   ),
>  
>   TP_printk("nr_migrated=%lu nr_failed=%lu",
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4494b2914386..218089b29f13 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1798,6 +1798,7 @@ static struct folio *compaction_alloc(struct folio 
> *src, unsigned long data)
>   dst = list_entry(cc->freepages.next, struct folio, lru);
>   list_del(>lru);
>   cc->nr_freepages--;
> + cc->nr_migratepages--;
>  
>   return dst;
>  }
> @@ -1813,6 +1814,7 @@ static void compaction_free(struct folio *dst, unsigned 
> long data)
>  
>   list_add(>lru, >freepages);
>   cc->nr_freepages++;
> + cc->nr_migratepages++;
>  }
>  
>  /* possible outcome of isolate_migratepages */
> @@ -2435,7 +2437,7 @@ compact_zone(struct compact_control *cc, struct 
> capture_control *capc)
>   unsigned long last_migrated_pfn;
>   const bool sync = cc->mode != MIGRATE_ASYNC;
>   bool update_cached;
> - unsigned int nr_succeeded = 0;
> + unsigned int nr_succeeded = 0, nr_migratepages;
>  
>   /*
>* These counters track activities during zone compaction.  Initialize
> @@ -2553,11 +2555,17 @@ compact_zone(struct compact_control *cc, struct 
> capture_control *capc)
>   pageblock_start_pfn(cc->migrate_pfn - 1));
>   }
>  
> + /*
> +  * Record the number of pages to migrate since the
> +  * 

[PATCH v3] mm: compaction: update the cc->nr_migratepages when allocating or freeing the freepages

2024-02-19 Thread Baolin Wang
Currently we will use 'cc->nr_freepages >= cc->nr_migratepages' comparison
to ensure that enough freepages are isolated in isolate_freepages(), however
it just decreases the cc->nr_freepages without updating cc->nr_migratepages
in compaction_alloc(), which will waste more CPU cycles and cause too many
freepages to be isolated.

So we should also update the cc->nr_migratepages when allocating or freeing
the freepages to avoid isolating excess freepages. And I can see fewer free
pages are scanned and isolated when running thpcompact on my Arm64 server:
   k6.7 k6.7_patched
Ops Compaction pages isolated  120692036.00   118160797.00
Ops Compaction migrate scanned 131210329.00   154093268.00
Ops Compaction free scanned   1090587971.00  1080632536.00
Ops Compact scan efficiency   12.03  14.26

Moreover, I did not see an obvious latency improvements, this is likely because
isolating freepages is not the bottleneck in the thpcompact test case.
  k6.7  k6.7_patched
Amean fault-both-1  1089.76 (   0.00%) 1080.16 *   0.88%*
Amean fault-both-3  1616.48 (   0.00%) 1636.65 *  -1.25%*
Amean fault-both-5  2266.66 (   0.00%) 2219.20 *   2.09%*
Amean fault-both-7  2909.84 (   0.00%) 2801.90 *   3.71%*
Amean fault-both-12 4861.26 (   0.00%) 4733.25 *   2.63%*
Amean fault-both-18 7351.11 (   0.00%) 6950.51 *   5.45%*
Amean fault-both-24 9059.30 (   0.00%) 9159.99 *  -1.11%*
Amean fault-both-3010685.68 (   0.00%)11399.02 *  -6.68%*

Signed-off-by: Baolin Wang 
Acked-by: Mel Gorman 
---
Hi Andrew, please use this patch to replace below 2 old patches. Thanks.
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-20-04-19=cb30899d456d64fb83ee3e3d95cd9fbb18735d87
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-20-04-19=65713d1c4fc498d35dc1f7c1234ef815aa128429

Changes from v2:
 - Add acked tag from Mel.
 - Fix the mm_compaction_migratepages trace event.
Changes from v1:
 - Rebased on the latest mm-unstable branch.
---
 include/trace/events/compaction.h |  6 +++---
 mm/compaction.c   | 12 ++--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/compaction.h 
b/include/trace/events/compaction.h
index 2b2a975efd20..d05759d18538 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -78,10 +78,10 @@ DEFINE_EVENT(mm_compaction_isolate_template, 
mm_compaction_fast_isolate_freepage
 #ifdef CONFIG_COMPACTION
 TRACE_EVENT(mm_compaction_migratepages,
 
-   TP_PROTO(struct compact_control *cc,
+   TP_PROTO(unsigned int nr_migratepages,
unsigned int nr_succeeded),
 
-   TP_ARGS(cc, nr_succeeded),
+   TP_ARGS(nr_migratepages, nr_succeeded),
 
TP_STRUCT__entry(
__field(unsigned long, nr_migrated)
@@ -90,7 +90,7 @@ TRACE_EVENT(mm_compaction_migratepages,
 
TP_fast_assign(
__entry->nr_migrated = nr_succeeded;
-   __entry->nr_failed = cc->nr_migratepages - nr_succeeded;
+   __entry->nr_failed = nr_migratepages - nr_succeeded;
),
 
TP_printk("nr_migrated=%lu nr_failed=%lu",
diff --git a/mm/compaction.c b/mm/compaction.c
index 4494b2914386..218089b29f13 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1798,6 +1798,7 @@ static struct folio *compaction_alloc(struct folio *src, 
unsigned long data)
dst = list_entry(cc->freepages.next, struct folio, lru);
list_del(>lru);
cc->nr_freepages--;
+   cc->nr_migratepages--;
 
return dst;
 }
@@ -1813,6 +1814,7 @@ static void compaction_free(struct folio *dst, unsigned 
long data)
 
list_add(>lru, >freepages);
cc->nr_freepages++;
+   cc->nr_migratepages++;
 }
 
 /* possible outcome of isolate_migratepages */
@@ -2435,7 +2437,7 @@ compact_zone(struct compact_control *cc, struct 
capture_control *capc)
unsigned long last_migrated_pfn;
const bool sync = cc->mode != MIGRATE_ASYNC;
bool update_cached;
-   unsigned int nr_succeeded = 0;
+   unsigned int nr_succeeded = 0, nr_migratepages;
 
/*
 * These counters track activities during zone compaction.  Initialize
@@ -2553,11 +2555,17 @@ compact_zone(struct compact_control *cc, struct 
capture_control *capc)
pageblock_start_pfn(cc->migrate_pfn - 1));
}
 
+   /*
+* Record the number of pages to migrate since the
+* compaction_alloc/free() will update cc->nr_migratepages
+* properly.
+*/
+   nr_migratepages = cc->nr_migratepages;
err = migrate_pages(>migratepages, compaction_alloc,
   

Re: WARNING in shmem_release_dquot

2024-02-19 Thread Hugh Dickins
On Mon, 29 Jan 2024, Ubisectech Sirius wrote:

> Hello.
> We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
> Recently, our team has discovered a issue in Linux kernel 
> 6.8.0-rc1-gecb1b8288dc7. Attached to the email were a POC file of the issue.
> 
> Stack dump:
> [  246.195553][ T4096] [ cut here ]
> [  246.196540][ T4096] quota id 16384 from dquot 888051bd3000, not in rb 
> tree!
> [ 246.198829][ T4096] WARNING: CPU: 1 PID: 4096 at mm/shmem_quota.c:290 
> shmem_release_dquot (mm/shmem_quota.c:290 (discriminator 3))
> [  246.199955][ T4096] Modules linked in:
> [  246.200435][ T4096] CPU: 1 PID: 4096 Comm: kworker/u6:6 Not tainted 
> 6.8.0-rc1-gecb1b8288dc7 #21
> [  246.201566][ T4096] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS 1.15.0-1 04/01/2014
> [  246.202667][ T4096] Workqueue: events_unbound quota_release_workfn
> [ 246.203516][ T4096] RIP: 0010:shmem_release_dquot (mm/shmem_quota.c:290 
> (discriminator 3))
> [ 246.204276][ T4096] Code: e8 28 d9 18 00 e9 b3 f8 ff ff e8 6e e1 c2 ff c6 
> 05 bf e8 1b 0d 01 90 48 c7 c7 80 f0 b8 8a 4c 89 ea 44 89 e6 e8 14 6d 89 ff 90 
> <0f> 0b 90 90 e9 18 fb ff ff e8 f5 d8 18 00 e9 a2 fa ff ff e8 0b d9
> All code
> 
>0:   e8 28 d9 18 00  call   0x18d92d
>5:   e9 b3 f8 ff ff  jmp0xf8bd
>a:   e8 6e e1 c2 ff  call   0xffc2e17d
>f:   c6 05 bf e8 1b 0d 01movb   $0x1,0xd1be8bf(%rip)# 0xd1be8d5
>   16:   90  nop
>   17:   48 c7 c7 80 f0 b8 8amov$0x8ab8f080,%rdi
>   1e:   4c 89 eamov%r13,%rdx
>   21:   44 89 e6mov%r12d,%esi
>   24:   e8 14 6d 89 ff  call   0xff896d3d
>   29:   90  nop
>   2a:*  0f 0b   ud2 <-- trapping instruction
>   2c:   90  nop
>   2d:   90  nop
>   2e:   e9 18 fb ff ff  jmp0xfb4b
>   33:   e8 f5 d8 18 00  call   0x18d92d
>   38:   e9 a2 fa ff ff  jmp0xfadf
>   3d:   e8  .byte 0xe8
>   3e:   0b d9   or %ecx,%ebx
> 
> Code starting with the faulting instruction
> ===
>0:   0f 0b   ud2
>2:   90  nop
>3:   90  nop
>4:   e9 18 fb ff ff  jmp0xfb21
>9:   e8 f5 d8 18 00  call   0x18d903
>e:   e9 a2 fa ff ff  jmp0xfab5
>   13:   e8  .byte 0xe8
>   14:   0b d9   or %ecx,%ebx
> [  246.206640][ T4096] RSP: 0018:c9000604fbc0 EFLAGS: 00010286
> [  246.207403][ T4096] RAX:  RBX:  RCX: 
> 814c77da
> [  246.208514][ T4096] RDX: 888049a58000 RSI: 814c77e7 RDI: 
> 0001
> [  246.209429][ T4096] RBP:  R08: 0001 R09: 
> 
> [  246.210362][ T4096] R10: 0001 R11: 0001 R12: 
> 4000
> [  246.211367][ T4096] R13: 888051bd3000 R14: dc00 R15: 
> 888051bd3040
> [  246.212327][ T4096] FS:  () GS:88807ec0() 
> knlGS:
> [  246.213387][ T4096] CS:  0010 DS:  ES:  CR0: 80050033
> [  246.214232][ T4096] CR2: 7ffee748ec80 CR3: 0cb78000 CR4: 
> 00750ef0
> [  246.215216][ T4096] DR0:  DR1:  DR2: 
> 
> [  246.216187][ T4096] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  246.217148][ T4096] PKRU: 5554
> [  246.217615][ T4096] Call Trace:
> [  246.218090][ T4096]  
> [ 246.218467][ T4096] ? show_regs (arch/x86/kernel/dumpstack.c:479)
> [ 246.218979][ T4096] ? __warn (kernel/panic.c:677)
> [ 246.219505][ T4096] ? shmem_release_dquot (mm/shmem_quota.c:290 
> (discriminator 3))
> [ 246.220197][ T4096] ? report_bug (lib/bug.c:201 lib/bug.c:219)
> [ 246.220775][ T4096] ? shmem_release_dquot (mm/shmem_quota.c:290 
> (discriminator 3))
> [ 246.221500][ T4096] ? handle_bug (arch/x86/kernel/traps.c:238)
> [ 246.222081][ T4096] ? exc_invalid_op (arch/x86/kernel/traps.c:259 
> (discriminator 1))
> [ 246.222687][ T4096] ? asm_exc_invalid_op 
> (./arch/x86/include/asm/idtentry.h:568)
> [ 246.223296][ T4096] ? __warn_printk (./include/linux/context_tracking.h:155 
> kernel/panic.c:726)
> [ 246.223878][ T4096] ? __warn_printk (kernel/panic.c:717)
> [ 246.224460][ T4096] ? shmem_release_dquot (mm/shmem_quota.c:290 
> (discriminator 3))
> [ 246.225125][ T4096] quota_release_workfn (fs/quota/dquot.c:839)
> [ 246.225792][ T4096] ? dquot_release (fs/quota/dquot.c:810)
> [ 246.226401][ T4096] process_one_work (kernel/workqueue.c:2638)
> [ 246.227001][ T4096] ? lock_sync (kernel/locking/lockdep.c:5722)
> [ 246.227509][ T4096] ? workqueue_congested 

Re: [v2 PATCH 0/1] ALSA: virtio: add support for audio controls

2024-02-19 Thread Anton Yakovlev

Hi Marcin,


Thanks for the help!



On 20.02.2024 02:24, Marcin Radomski wrote:

Thanks Anton for the reupload.

I tested this series with a 6.1 kernel guest on a proprietary hypervisor. The
controls exposed by the host (BOOLEAN/INTEGER ones, as that was all I could
get) worked as expected when adjusted via ALSA APIs.

Reviewed-by: Marcin Radomski 
Tested-By: Marcin Radomski 


--
Anton Yakovlev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin



Re: [PATCH 10/10] csky/vdso: Use generic union vdso_data_store

2024-02-19 Thread Guo Ren
On Mon, Feb 19, 2024 at 11:40 PM Anna-Maria Behnsen
 wrote:
>
> There is already a generic union definition for vdso_data_store in vdso
> datapage header.
>
> Use this definition to prevent code duplication.
>
> Signed-off-by: Anna-Maria Behnsen 
> Cc: Guo Ren 
> Cc: linux-c...@vger.kernel.org
> ---
>  arch/csky/kernel/vdso.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/arch/csky/kernel/vdso.c b/arch/csky/kernel/vdso.c
> index e74a2504d331..2ca886e4a458 100644
> --- a/arch/csky/kernel/vdso.c
> +++ b/arch/csky/kernel/vdso.c
> @@ -15,14 +15,8 @@ extern char vdso_start[], vdso_end[];
>  static unsigned int vdso_pages;
>  static struct page **vdso_pagelist;
>
> -/*
> - * The vDSO data page.
> - */
> -static union {
> -   struct vdso_datadata;
> -   u8  page[PAGE_SIZE];
> -} vdso_data_store __page_aligned_data;
> -struct vdso_data *vdso_data = _data_store.data;
> +static union vdso_data_store vdso_data_store __page_aligned_data;
> +struct vdso_data *vdso_data = vdso_data_store.data;
>
>  static int __init vdso_init(void)
>  {
> --
> 2.39.2
>

Acked-by: Guo Ren 

-- 
Best Regards
 Guo Ren



Re: [PATCH 03/10] csky/vdso: Remove superfluous ifdeffery

2024-02-19 Thread Guo Ren
On Mon, Feb 19, 2024 at 11:40 PM Anna-Maria Behnsen
 wrote:
>
> CSKY selects GENERIC_TIME_VSYSCALL. GENERIC_TIME_VSYSCALL dependent
> ifdeffery is superfluous. Clean it up.
>
> Signed-off-by: Anna-Maria Behnsen 
> Cc: Guo Ren 
> Cc: linux-c...@vger.kernel.org
> ---
>  arch/csky/include/asm/vdso.h | 5 -
>  arch/csky/kernel/vdso.c  | 4 
>  2 files changed, 9 deletions(-)
>
> diff --git a/arch/csky/include/asm/vdso.h b/arch/csky/include/asm/vdso.h
> index bdce581b5fcb..181a15edafe8 100644
> --- a/arch/csky/include/asm/vdso.h
> +++ b/arch/csky/include/asm/vdso.h
> @@ -5,11 +5,6 @@
>
>  #include 
>
> -#ifndef GENERIC_TIME_VSYSCALL
> -struct vdso_data {
> -};
> -#endif
> -
>  /*
>   * The VDSO symbols are mapped into Linux so we can just use regular symbol
>   * addressing to get their offsets in userspace.  The symbols are mapped at 
> an
> diff --git a/arch/csky/kernel/vdso.c b/arch/csky/kernel/vdso.c
> index 16c20d64d165..e74a2504d331 100644
> --- a/arch/csky/kernel/vdso.c
> +++ b/arch/csky/kernel/vdso.c
> @@ -8,11 +8,7 @@
>  #include 
>
>  #include 
> -#ifdef GENERIC_TIME_VSYSCALL
>  #include 
> -#else
> -#include 
> -#endif
>
>  extern char vdso_start[], vdso_end[];
>
> --
> 2.39.2
>
Acked-by: Guo Ren 

-- 
Best Regards
 Guo Ren



Re: [syzbot] [virtualization?] linux-next boot error: WARNING: refcount bug in __free_pages_ok

2024-02-19 Thread Andrew Morton
On Mon, 19 Feb 2024 02:35:20 -0500 "Michael S. Tsirkin"  wrote:

> On Sun, Feb 18, 2024 at 09:06:18PM -0800, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following issue on:
> > 
> > HEAD commit:d37e1e4c52bc Add linux-next specific files for 20240216
> > git tree:   linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=171ca65218
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=4bc446d42a7d56c0
> > dashboard link: https://syzkaller.appspot.com/bug?extid=6f3c38e8a6a0297caa5a
> > compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for 
> > Debian) 2.40
> > 
> > Downloadable assets:
> > disk image: 
> > https://storage.googleapis.com/syzbot-assets/14d0894504b9/disk-d37e1e4c.raw.xz
> > vmlinux: 
> > https://storage.googleapis.com/syzbot-assets/6cda61e084ee/vmlinux-d37e1e4c.xz
> > kernel image: 
> > https://storage.googleapis.com/syzbot-assets/720c85283c05/bzImage-d37e1e4c.xz
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+6f3c38e8a6a0297ca...@syzkaller.appspotmail.com
> > 
>
> ...
>
> > usbcore: registered new interface driver port100
> > usbcore: registered new interface driver nfcmrvl
> > Loading iSCSI transport class v2.0-870.
> > virtio_scsi virtio0: 1/0/0 default/read/poll queues
> > [ cut here ]
> > refcount_t: decrement hit 0; leaking memory.
> > WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 
> > refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31
> > Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc4-next-20240216-syzkaller 
> > #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> > Google 01/25/2024
> > RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31
> > Code: b2 00 00 00 e8 b7 94 f0 fc 5b 5d c3 cc cc cc cc e8 ab 94 f0 fc c6 05 
> > c6 16 ce 0a 01 90 48 c7 c7 a0 5a fe 8b e8 67 69 b4 fc 90 <0f> 0b 90 90 eb 
> > d9 e8 8b 94 f0 fc c6 05 a3 16 ce 0a 01 90 48 c7 c7
> > RSP: :c9066e10 EFLAGS: 00010246
> > RAX: 15c2c224c9b50400 RBX: 888020827d2c RCX: 8880162d8000
> > RDX:  RSI:  RDI: 
> > RBP: 0004 R08: 8157b942 R09: fbfff1bf95cc
> > R10: dc00 R11: fbfff1bf95cc R12: ea000502fdc0
> > R13: ea000502fdc8 R14: 1d4000a05fb9 R15: 
> > FS:  () GS:8880b940() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 88823000 CR3: 0df32000 CR4: 003506f0
> > DR0:  DR1:  DR2: 
> > DR3:  DR6: fffe0ff0 DR7: 0400
> > Call Trace:
> >  
> >  reset_page_owner include/linux/page_owner.h:24 [inline]
> >  free_pages_prepare mm/page_alloc.c:1140 [inline]
> >  __free_pages_ok+0xc42/0xd70 mm/page_alloc.c:1269
> >  make_alloc_exact+0xc4/0x140 mm/page_alloc.c:4847
> >  vring_alloc_queue drivers/virtio/virtio_ring.c:319 [inline]
> 
> Wow this seems to be breakage deep in mm/ - all virtio does is
> call alloc_pages_exact and that corrupts the refcounts?
> 

Will the bot be performing a bisection?  If not, can one please be
performed?



[PATCH v3] ring-buffer: Simplify reservation with try_cmpxchg() loop

2024-02-19 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Instead of using local_add_return() to reserve the ring buffer data,
Mathieu Desnoyers suggested using local_cmpxchg(). This would simplify the
reservation with the time keeping code.

Although, it does not get rid of the double time stamps (before_stamp and
write_stamp), using cmpxchg() does get rid of the more complex case when
an interrupting event occurs between getting the timestamps and reserving
the data, as when that happens, it just tries again instead of dealing
with it.

Before we had:

w = local_read(_page->write);
/* get time stamps */
write = local_add_return(length, _page->write);
if (write - length == w) {
/* do simple case */
} else {
/* do complex case */
}

By switching the local_add_return() to a local_try_cmpxchg() it can now be:

 w = local_read(_page->write);
 again:
/* get time stamps */
if (!local_try_cmpxchg(_page->write, , w + length))
goto again;

 /* do simple case */

The benchmarks between the two showed no regressions when trying this:

 Enable: CONFIG_TRACEPOINT_BENCHMARK

 # trace-cmd record -m 80 -e benchmark sleep 60

Before the patch:

 # trace-cmd report | tail
 event_benchmark-944   [003]  1998.910191: benchmark_event:  last=150 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150
 event_benchmark-944   [003]  1998.910192: benchmark_event:  last=149 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=149
 event_benchmark-944   [003]  1998.910193: benchmark_event:  last=150 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150
 event_benchmark-944   [003]  1998.910193: benchmark_event:  last=150 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150
 event_benchmark-944   [003]  1998.910194: benchmark_event:  last=136 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=136
 event_benchmark-944   [003]  1998.910194: benchmark_event:  last=138 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=138
 event_benchmark-944   [003]  1998.910195: benchmark_event:  last=150 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150
 event_benchmark-944   [003]  1998.910196: benchmark_event:  last=151 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=151
 event_benchmark-944   [003]  1998.910196: benchmark_event:  last=150 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150
 event_benchmark-944   [003]  1998.910197: benchmark_event:  last=152 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=152

After the patch:

 # trace-cmd report | tail
 event_benchmark-848   [004]   171.414716: benchmark_event:  last=143 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=143
 event_benchmark-848   [004]   171.414717: benchmark_event:  last=142 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142
 event_benchmark-848   [004]   171.414718: benchmark_event:  last=142 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142
 event_benchmark-848   [004]   171.414718: benchmark_event:  last=141 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141
 event_benchmark-848   [004]   171.414719: benchmark_event:  last=141 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141
 event_benchmark-848   [004]   171.414719: benchmark_event:  last=141 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141
 event_benchmark-848   [004]   171.414720: benchmark_event:  last=140 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=140
 event_benchmark-848   [004]   171.414721: benchmark_event:  last=142 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142
 event_benchmark-848   [004]   171.414721: benchmark_event:  last=145 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=145
 event_benchmark-848   [004]   171.414722: benchmark_event:  last=144 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=144

It may have even improved!

Suggested-by: Mathieu Desnoyers 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v2: 
https://lore.kernel.org/linux-trace-kernel/20240219173003.08339...@gandalf.local.home

- Fixed info.field to be info->field

 kernel/trace/ring_buffer.c | 103 -
 1 file changed, 34 insertions(+), 69 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index fd4bfe3ecf01..6809d085ae98 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3455,9 +3455,11 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
/* Don't let the compiler play games with cpu_buffer->tail_page */
tail_page = info->tail_page = READ_ONCE(cpu_buffer->tail_page);
 
- /*A*/ w = local_read(_page->write) & 

Re: [PATCH v2] ring-buffer: Simplify reservation with try_cmpxchg() loop

2024-02-19 Thread Steven Rostedt
On Mon, 19 Feb 2024 17:30:03 -0500
Steven Rostedt  wrote:

> - /*C*/   write = local_add_return(info->length, _page->write);
> + /*C*/   if (!local_try_cmpxchg(_page->write, , w + 
> info->length)) {
> + if (info.add_timestamp & (RB_ADD_STAMP_FORCE | 
> RB_ADD_STAMP_EXTEND))
> + info.length -= RB_LEN_TIME_EXTEND;
> + goto again;
> + }

Bah, I had my ktest config testing a different tree and this wasn't what I
was testing :-p

The above needs to be:

if (info->add_timestamp & (RB_ADD_STAMP_FORCE | 
RB_ADD_STAMP_EXTEND))
info->length -= RB_LEN_TIME_EXTEND;

Will send out a v3.

-- Steve



Re: [RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters

2024-02-19 Thread Jarkko Sakkinen
On Mon Feb 19, 2024 at 10:25 PM UTC, Haitao Huang wrote:
> On Mon, 19 Feb 2024 14:42:29 -0600, Jarkko Sakkinen   
> wrote:
>
> > On Mon Feb 19, 2024 at 3:56 PM UTC, Dave Hansen wrote:
> >> On 2/19/24 07:39, Haitao Huang wrote:
> >> > Remove all boolean parameters for 'reclaim' from the function
> >> > sgx_alloc_epc_page() and its callers by making two versions of each
> >> > function.
> >> >
> >> > Also opportunistically remove non-static declaration of
> >> > __sgx_alloc_epc_page() and a typo
> >> >
> >> > Signed-off-by: Haitao Huang 
> >> > Suggested-by: Jarkko Sakkinen 
> >> > ---
> >> >  arch/x86/kernel/cpu/sgx/encl.c  | 56 +--
> >> >  arch/x86/kernel/cpu/sgx/encl.h  |  6 ++-
> >> >  arch/x86/kernel/cpu/sgx/ioctl.c | 23 ---
> >> >  arch/x86/kernel/cpu/sgx/main.c  | 68  
> >> ++---
> >> >  arch/x86/kernel/cpu/sgx/sgx.h   |  4 +-
> >> >  arch/x86/kernel/cpu/sgx/virt.c  |  2 +-
> >> >  6 files changed, 115 insertions(+), 44 deletions(-)
> >>
> >> Jarkko, did this turn out how you expected?
> >>
> >> I think passing around a function pointer to *only* communicate 1 bit of
> >> information is a _bit_ overkill here.
> >>
> >> Simply replacing the bool with:
> >>
> >> enum sgx_reclaim {
> >>SGX_NO_RECLAIM,
> >>SGX_DO_RECLAIM
> >> };
> >>
> >> would do the same thing.  Right?
> >>
> >> Are you sure you want a function pointer for this?
> >
> > To look this in context I drafted quickly two branches representing
> > imaginary next version of the patch set.
> >
> > I guess this would simpler and totally sufficient approach.
> >
> > With this approach I'd then change also:
> >
> > [PATCH v9 04/15] x86/sgx: Implement basic EPC misc cgroup functionality
> >
> > And add the enum-parameter already in that patch with just "no reclaim"
> > enum. I.e. then 10/15 will add only "do reclaim" and the new
> > functionality.
> >
> > BR, Jarkko
> >
>
> Thanks. My understanding is:
>
> 1) For this patch, replace the boolean with the enum as Dave suggested. No  
> two versions of the same functions. And this is a prerequisite for the  
> cgroup series, positioned before [PATCH v9 04/15]
>
> 2) For [PATCH v9 04/15], pass a hard coded SGX_NO_RECLAIM to  
> sgx_epc_cg_try_charge() from sgx_alloc_epc_page().

Yup, this will make the whole patch set also a bit leaner as the API
does not change in the middle.

>
> 3) For [PATCH v9 10/15], remove the hard coded value, pass the reclaim  
> enum parameter value from sgx_alloc_epc_page() to  sgx_epc_cg_try_charge()  
> and add the reclaim logic.
>
> I'll send patches soon. But please let me know if I misunderstood.


BR, Jarkko



[PATCH v2] ring-buffer: Simplify reservation with try_cmpxchg() loop

2024-02-19 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Instead of using local_add_return() to reserve the ring buffer data,
Mathieu Desnoyers suggested using local_cmpxchg(). This would simplify the
reservation with the time keeping code.

Although, it does not get rid of the double time stamps (before_stamp and
write_stamp), using cmpxchg() does get rid of the more complex case when
an interrupting event occurs between getting the timestamps and reserving
the data, as when that happens, it just tries again instead of dealing
with it.

Before we had:

w = local_read(_page->write);
/* get time stamps */
write = local_add_return(length, _page->write);
if (write - length == w) {
/* do simple case */
} else {
/* do complex case */
}

By switching the local_add_return() to a local_try_cmpxchg() it can now be:

 w = local_read(_page->write);
 again:
/* get time stamps */
if (!local_try_cmpxchg(_page->write, , w + length))
goto again;

 /* do simple case */

The benchmarks between the two showed no regressions when trying this:

 Enable: CONFIG_TRACEPOINT_BENCHMARK

 # trace-cmd record -m 80 -e benchmark sleep 60

Before the patch:

 # trace-cmd report | tail
 event_benchmark-944   [003]  1998.910191: benchmark_event:  last=150 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150
 event_benchmark-944   [003]  1998.910192: benchmark_event:  last=149 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=149
 event_benchmark-944   [003]  1998.910193: benchmark_event:  last=150 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150
 event_benchmark-944   [003]  1998.910193: benchmark_event:  last=150 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150
 event_benchmark-944   [003]  1998.910194: benchmark_event:  last=136 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=136
 event_benchmark-944   [003]  1998.910194: benchmark_event:  last=138 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=138
 event_benchmark-944   [003]  1998.910195: benchmark_event:  last=150 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150
 event_benchmark-944   [003]  1998.910196: benchmark_event:  last=151 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=151
 event_benchmark-944   [003]  1998.910196: benchmark_event:  last=150 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=150
 event_benchmark-944   [003]  1998.910197: benchmark_event:  last=152 
first=3488 max=1199686 min=124 avg=208 std=39 std^2=1579 delta=152

After the patch:

 # trace-cmd report | tail
 event_benchmark-848   [004]   171.414716: benchmark_event:  last=143 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=143
 event_benchmark-848   [004]   171.414717: benchmark_event:  last=142 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142
 event_benchmark-848   [004]   171.414718: benchmark_event:  last=142 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142
 event_benchmark-848   [004]   171.414718: benchmark_event:  last=141 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141
 event_benchmark-848   [004]   171.414719: benchmark_event:  last=141 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141
 event_benchmark-848   [004]   171.414719: benchmark_event:  last=141 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=141
 event_benchmark-848   [004]   171.414720: benchmark_event:  last=140 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=140
 event_benchmark-848   [004]   171.414721: benchmark_event:  last=142 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=142
 event_benchmark-848   [004]   171.414721: benchmark_event:  last=145 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=145
 event_benchmark-848   [004]   171.414722: benchmark_event:  last=144 
first=14483 max=1155491 min=125 avg=189 std=16 std^2=264 delta=144

It may have even improved!

Suggested-by: Mathieu Desnoyers 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v1: 
https://lore.kernel.org/linux-trace-kernel/20240118181206.4977d...@gandalf.local.home

- If the try_cmpxchg() fails when it added a timestamp, it is likely to add
  the length of that timestamp again. Subtract the length if it fails.

 kernel/trace/ring_buffer.c | 103 -
 1 file changed, 34 insertions(+), 69 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index fd4bfe3ecf01..4ba0af82c33e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3455,9 +3455,11 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
/* Don't let the compiler play games with cpu_buffer->tail_page */

Re: [RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters

2024-02-19 Thread Haitao Huang
On Mon, 19 Feb 2024 14:42:29 -0600, Jarkko Sakkinen   
wrote:



On Mon Feb 19, 2024 at 3:56 PM UTC, Dave Hansen wrote:

On 2/19/24 07:39, Haitao Huang wrote:
> Remove all boolean parameters for 'reclaim' from the function
> sgx_alloc_epc_page() and its callers by making two versions of each
> function.
>
> Also opportunistically remove non-static declaration of
> __sgx_alloc_epc_page() and a typo
>
> Signed-off-by: Haitao Huang 
> Suggested-by: Jarkko Sakkinen 
> ---
>  arch/x86/kernel/cpu/sgx/encl.c  | 56 +--
>  arch/x86/kernel/cpu/sgx/encl.h  |  6 ++-
>  arch/x86/kernel/cpu/sgx/ioctl.c | 23 ---
>  arch/x86/kernel/cpu/sgx/main.c  | 68  
++---

>  arch/x86/kernel/cpu/sgx/sgx.h   |  4 +-
>  arch/x86/kernel/cpu/sgx/virt.c  |  2 +-
>  6 files changed, 115 insertions(+), 44 deletions(-)

Jarkko, did this turn out how you expected?

I think passing around a function pointer to *only* communicate 1 bit of
information is a _bit_ overkill here.

Simply replacing the bool with:

enum sgx_reclaim {
SGX_NO_RECLAIM,
SGX_DO_RECLAIM
};

would do the same thing.  Right?

Are you sure you want a function pointer for this?


To look this in context I drafted quickly two branches representing
imaginary next version of the patch set.

I guess this would simpler and totally sufficient approach.

With this approach I'd then change also:

[PATCH v9 04/15] x86/sgx: Implement basic EPC misc cgroup functionality

And add the enum-parameter already in that patch with just "no reclaim"
enum. I.e. then 10/15 will add only "do reclaim" and the new
functionality.

BR, Jarkko



Thanks. My understanding is:

1) For this patch, replace the boolean with the enum as Dave suggested. No  
two versions of the same functions. And this is a prerequisite for the  
cgroup series, positioned before [PATCH v9 04/15]


2) For [PATCH v9 04/15], pass a hard coded SGX_NO_RECLAIM to  
sgx_epc_cg_try_charge() from sgx_alloc_epc_page().


3) For [PATCH v9 10/15], remove the hard coded value, pass the reclaim  
enum parameter value from sgx_alloc_epc_page() to  sgx_epc_cg_try_charge()  
and add the reclaim logic.


I'll send patches soon. But please let me know if I misunderstood.

Thanks
Haitao



[PATCH v3 2/2] ARM: dts: qcom: Add support for Samsung Galaxy Tab 4 8.0 Wi-Fi

2024-02-19 Thread Bryant Mairs
Add support for this tablet based on the MSM8226 SoC, codenamed
"milletwifi".

Acked-by: Linus Walleij 
Reviewed-by: Luca Weiss 
Signed-off-by: Bryant Mairs 
---
 arch/arm/boot/dts/qcom/Makefile   |   1 +
 .../qcom/qcom-apq8026-samsung-milletwifi.dts  | 573 ++
 2 files changed, 574 insertions(+)
 create mode 100644 arch/arm/boot/dts/qcom/qcom-apq8026-samsung-milletwifi.dts

diff --git a/arch/arm/boot/dts/qcom/Makefile b/arch/arm/boot/dts/qcom/Makefile
index 9cc1e14e6cd0..730d98c2c715 100644
--- a/arch/arm/boot/dts/qcom/Makefile
+++ b/arch/arm/boot/dts/qcom/Makefile
@@ -5,6 +5,7 @@ dtb-$(CONFIG_ARCH_QCOM) += \
qcom-apq8026-huawei-sturgeon.dtb \
qcom-apq8026-lg-lenok.dtb \
qcom-apq8026-samsung-matisse-wifi.dtb \
+   qcom-apq8026-samsung-milletwifi.dtb \
qcom-apq8060-dragonboard.dtb \
qcom-apq8064-cm-qs600.dtb \
qcom-apq8064-ifc6410.dtb \
diff --git a/arch/arm/boot/dts/qcom/qcom-apq8026-samsung-milletwifi.dts 
b/arch/arm/boot/dts/qcom/qcom-apq8026-samsung-milletwifi.dts
new file mode 100644
index ..7d519156d91d
--- /dev/null
+++ b/arch/arm/boot/dts/qcom/qcom-apq8026-samsung-milletwifi.dts
@@ -0,0 +1,573 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Copyright (c) 2022, Matti Lehtimäki 
+ * Copyright (c) 2023, Bryant Mairs 
+ */
+
+/dts-v1/;
+
+#include 
+#include 
+#include "qcom-msm8226.dtsi"
+#include "pm8226.dtsi"
+
+/delete-node/ _region;
+/delete-node/ _region;
+
+/ {
+   model = "Samsung Galaxy Tab 4 8.0 Wi-Fi";
+   compatible = "samsung,milletwifi", "qcom,apq8026";
+   chassis-type = "tablet";
+
+   aliases {
+   display0 = 
+   mmc0 = _1; /* SDC1 eMMC slot */
+   mmc1 = _2; /* SDC2 SD card slot */
+   };
+
+   chosen {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   stdout-path = "display0";
+
+   framebuffer0: framebuffer@320 {
+   compatible = "simple-framebuffer";
+   reg = <0x0320 0x80>;
+   width = <800>;
+   height = <1280>;
+   stride = <(800 * 3)>;
+   format = "r8g8b8";
+   };
+   };
+
+   gpio-hall-sensor {
+   compatible = "gpio-keys";
+
+   event-hall-sensor {
+   label = "Cover";
+   gpios = < 37 GPIO_ACTIVE_LOW>;
+   linux,input-type = ;
+   linux,code = ;
+   debounce-interval = <15>;
+   linux,can-disable;
+   wakeup-source;
+   };
+   };
+
+   gpio-keys {
+   compatible = "gpio-keys";
+   autorepeat;
+
+   key-home {
+   label = "Home";
+   gpios = < 108 GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   debounce-interval = <15>;
+   };
+
+   key-volume-down {
+   label = "Volume Down";
+   gpios = < 107 GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   debounce-interval = <15>;
+   };
+
+   key-volume-up {
+   label = "Volume Up";
+   gpios = < 106 GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   debounce-interval = <15>;
+   };
+   };
+
+   i2c-backlight {
+   compatible = "i2c-gpio";
+   sda-gpios = < 20 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>;
+   scl-gpios = < 21 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>;
+
+   pinctrl-0 = <_i2c_default_state>;
+   pinctrl-names = "default";
+
+   i2c-gpio,delay-us = <4>;
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   backlight@2c {
+   compatible = "ti,lp8556";
+   reg = <0x2c>;
+   enable-supply = <_backlight_vddio>;
+
+   dev-ctrl = /bits/ 8 <0x80>;
+   init-brt = /bits/ 8 <0x3f>;
+
+   /*
+* Change transition duration: 200ms, Change
+* transition strength: heavy, PWM hysteresis:
+* 1-bit w/ 8-bit resolution
+*/
+   rom-a3h {
+   rom-addr = /bits/ 8 <0xa3>;
+   rom-val = /bits/ 8 <0x5e>;
+   };
+
+   /*
+* PWM phase configuration: 3-phase/3 drivers
+* (0, 120deg, 240deg, -, -, -),
+* PWM frequency: 9616Hz (10-bit)
+*/
+   

[PATCH v3 1/2] dt-bindings: arm: qcom: Document samsung,milletwifi device

2024-02-19 Thread Bryant Mairs
Add binding documentation for Samsung Galaxy Tab 4 8.0 Wi-Fi
tablet which is based on Snapdragon 400 (apq8026) SoC.

Acked-by: Linus Walleij 
Acked-by: Conor Dooley 
Signed-off-by: Bryant Mairs 
---
 Documentation/devicetree/bindings/arm/qcom.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/arm/qcom.yaml 
b/Documentation/devicetree/bindings/arm/qcom.yaml
index 2b993b4c51dc..c11bb2a81643 100644
--- a/Documentation/devicetree/bindings/arm/qcom.yaml
+++ b/Documentation/devicetree/bindings/arm/qcom.yaml
@@ -104,6 +104,7 @@ properties:
   - huawei,sturgeon
   - lg,lenok
   - samsung,matisse-wifi
+  - samsung,milletwifi
   - const: qcom,apq8026
 
   - items:
-- 
2.43.0




[PATCH v3 0/2] Add samsung-milletwifi

2024-02-19 Thread Bryant Mairs
This series adds support for samsung-milletwifi, the smaller cousin
to samsung-matisselte. I've used the manufacturer's naming convention
for consistency.

Hardware currently supported:
- Display
- Cover detection
- Physical buttons
- Touchscreen and touchkeys
- Accelerometer

Acked-by: Linus Walleij 
Signed-off-by: Bryant Mairs 
---
Changes in v3:
- Minor style changes to the DTS
- Corrected list of supported hardware peripherals
- Link to v2: https://lore.kernel.org/all/20240215172617.115307-1-bry...@mai.rs/
---
Changes in v2:
- Squashed DTS work into a single commit
- Add touchkeys to the DTS
- Link to v1: https://lore.kernel.org/all/20231105204759.37107-1-bry...@mai.rs/
---

Bryant Mairs (2):
  dt-bindings: arm: qcom: Document samsung,milletwifi device
  ARM: dts: qcom: Add support for Samsung Galaxy Tab 4 8.0 Wi-Fi

 .../devicetree/bindings/arm/qcom.yaml |   1 +
 arch/arm/boot/dts/qcom/Makefile   |   1 +
 .../qcom/qcom-apq8026-samsung-milletwifi.dts  | 573 ++
 3 files changed, 575 insertions(+)
 create mode 100644 arch/arm/boot/dts/qcom/qcom-apq8026-samsung-milletwifi.dts

-- 
2.43.0




Re: [RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters

2024-02-19 Thread Jarkko Sakkinen
On Mon Feb 19, 2024 at 3:56 PM UTC, Dave Hansen wrote:
> On 2/19/24 07:39, Haitao Huang wrote:
> > Remove all boolean parameters for 'reclaim' from the function
> > sgx_alloc_epc_page() and its callers by making two versions of each
> > function.
> > 
> > Also opportunistically remove non-static declaration of
> > __sgx_alloc_epc_page() and a typo
> > 
> > Signed-off-by: Haitao Huang 
> > Suggested-by: Jarkko Sakkinen 
> > ---
> >  arch/x86/kernel/cpu/sgx/encl.c  | 56 +--
> >  arch/x86/kernel/cpu/sgx/encl.h  |  6 ++-
> >  arch/x86/kernel/cpu/sgx/ioctl.c | 23 ---
> >  arch/x86/kernel/cpu/sgx/main.c  | 68 ++---
> >  arch/x86/kernel/cpu/sgx/sgx.h   |  4 +-
> >  arch/x86/kernel/cpu/sgx/virt.c  |  2 +-
> >  6 files changed, 115 insertions(+), 44 deletions(-)
>
> Jarkko, did this turn out how you expected?
>
> I think passing around a function pointer to *only* communicate 1 bit of
> information is a _bit_ overkill here.
>
> Simply replacing the bool with:
>
> enum sgx_reclaim {
>   SGX_NO_RECLAIM,
>   SGX_DO_RECLAIM
> };
>
> would do the same thing.  Right?
>
> Are you sure you want a function pointer for this?

To look this in context I drafted quickly two branches representing
imaginary next version of the patch set.

I guess this would simpler and totally sufficient approach.

With this approach I'd then change also:

[PATCH v9 04/15] x86/sgx: Implement basic EPC misc cgroup functionality

And add the enum-parameter already in that patch with just "no reclaim"
enum. I.e. then 10/15 will add only "do reclaim" and the new
functionality.

BR, Jarkko



Re: [RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters

2024-02-19 Thread Jarkko Sakkinen
On Mon Feb 19, 2024 at 3:39 PM UTC, Haitao Huang wrote:
> Remove all boolean parameters for 'reclaim' from the function
> sgx_alloc_epc_page() and its callers by making two versions of each
> function.
>
> Also opportunistically remove non-static declaration of
> __sgx_alloc_epc_page() and a typo
>
> Signed-off-by: Haitao Huang 
> Suggested-by: Jarkko Sakkinen 

I think this is for better.

My view point for kernel patches overally is that:

1. A feature should leave the subsystem in cleaner state as
   far as the existing framework of doing things goes.
2. A bugfix can sometimes do the opposite if corner case
   requires some weird dance to perform.


BR, Jarkko



Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-19 Thread Jarkko Sakkinen
On Mon Feb 19, 2024 at 3:12 PM UTC, Haitao Huang wrote:
> On Tue, 13 Feb 2024 19:52:25 -0600, Jarkko Sakkinen   
> wrote:
>
> > On Tue Feb 13, 2024 at 1:15 AM EET, Haitao Huang wrote:
> >> Hi Jarkko
> >>
> >> On Mon, 12 Feb 2024 13:55:46 -0600, Jarkko Sakkinen 
> >> wrote:
> >>
> >> > On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote:
> >> >> From: Kristen Carlson Accardi 
> >> >>
> >> >> When the EPC usage of a cgroup is near its limit, the cgroup needs to
> >> >> reclaim pages used in the same cgroup to make room for new  
> >> allocations.
> >> >> This is analogous to the behavior that the global reclaimer is  
> >> triggered
> >> >> when the global usage is close to total available EPC.
> >> >>
> >> >> Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate
> >> >> whether synchronous reclaim is allowed or not. And trigger the
> >> >> synchronous/asynchronous reclamation flow accordingly.
> >> >>
> >> >> Note at this point, all reclaimable EPC pages are still tracked in  
> >> the
> >> >> global LRU and per-cgroup LRUs are empty. So no per-cgroup  
> >> reclamation
> >> >> is activated yet.
> >> >>
> >> >> Co-developed-by: Sean Christopherson  
> >> 
> >> >> Signed-off-by: Sean Christopherson 
> >> >> Signed-off-by: Kristen Carlson Accardi 
> >> >> Co-developed-by: Haitao Huang 
> >> >> Signed-off-by: Haitao Huang 
> >> >> ---
> >> >> V7:
> >> >> - Split this out from the big patch, #10 in V6. (Dave, Kai)
> >> >> ---
> >> >>  arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 --
> >> >>  arch/x86/kernel/cpu/sgx/epc_cgroup.h |  4 ++--
> >> >>  arch/x86/kernel/cpu/sgx/main.c   |  2 +-
> >> >>  3 files changed, 27 insertions(+), 5 deletions(-)
> >> >>
> >> >> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> >> b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> >> index d399fda2b55e..abf74fdb12b4 100644
> >> >> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> >> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
> >> >> @@ -184,13 +184,35 @@ static void
> >> >> sgx_epc_cgroup_reclaim_work_func(struct work_struct *work)
> >> >>  /**
> >> >>   * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single  
> >> EPC
> >> >> page
> >> >>   * @epc_cg:The EPC cgroup to be charged for the page.
> >> >> + * @reclaim:   Whether or not synchronous reclaim is allowed
> >> >>   * Return:
> >> >>   * * %0 - If successfully charged.
> >> >>   * * -errno - for failures.
> >> >>   */
> >> >> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
> >> >> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool
> >> >> reclaim)
> >> >>  {
> >> >> -   return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,  
> >> PAGE_SIZE);
> >> >> +   for (;;) {
> >> >> +   if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
> >> >> +   PAGE_SIZE))
> >> >> +   break;
> >> >> +
> >> >> +   if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
> >> >> +   return -ENOMEM;
> >> >> + + if (signal_pending(current))
> >> >> +   return -ERESTARTSYS;
> >> >> +
> >> >> +   if (!reclaim) {
> >> >> +   queue_work(sgx_epc_cg_wq, 
> >> >> _cg->reclaim_work);
> >> >> +   return -EBUSY;
> >> >> +   }
> >> >> +
> >> >> +   if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
> >> >> +   /* All pages were too young to reclaim, try 
> >> >> again a little later  
> >> */
> >> >> +   schedule();
> >> >
> >> > This will be total pain to backtrack after a while when something
> >> > needs to be changed so there definitely should be inline comments
> >> > addressing each branch condition.
> >> >
> >> > I'd rethink this as:
> >> >
> >> > 1. Create static __sgx_epc_cgroup_try_charge() for addressing single
> >> >iteration with the new "reclaim" parameter.
> >> > 2. Add a new sgx_epc_group_try_charge_reclaim() function.
> >> >
> >> > There's a bit of redundancy with sgx_epc_cgroup_try_charge() and
> >> > sgx_epc_cgroup_try_charge_reclaim() because both have almost the
> >> > same loop calling internal __sgx_epc_cgroup_try_charge() with
> >> > different parameters. That is totally acceptable.
> >> >
> >> > Please also add my suggested-by.
> >> >
> >> > BR, Jarkko
> >> >
> >> > BR, Jarkko
> >> >
> >> For #2:
> >> The only caller of this function, sgx_alloc_epc_page(), has the same
> >> boolean which is passed into this this function.
> >
> > I know. This would be good opportunity to fix that up. Large patch
> > sets should try to make the space for its feature best possible and
> > thus also clean up the code base overally.
> >
> >> If we separate it into sgx_epc_cgroup_try_charge() and
> >> sgx_epc_cgroup_try_charge_reclaim(), then the caller has to have the
> >> if/else branches. So separation here seems not help?
> >
> > Of course it does. It makes the code in 

Re: [PATCH v17 3/6] tracing: Add snapshot refcount

2024-02-19 Thread Steven Rostedt
On Mon, 19 Feb 2024 13:17:54 -0500
Steven Rostedt  wrote:

> On Tue, 13 Feb 2024 11:49:42 +
> Vincent Donnefort  wrote:
> 
> > @@ -9678,7 +9739,9 @@ trace_array_create_systems(const char *name, const 
> > char *systems)
> > raw_spin_lock_init(>start_lock);
> >  
> > tr->max_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
> > -
> > +#ifdef CONFIG_TRCER_MAX_TRACE  
> 
> Oops!
> 
> I'll fix this too.
> 
> 
> > +   spinlock_init(>snapshot_trigger_lock);

And this too:

kernel/trace/trace.c:9245:9: error: implicit declaration of function 
‘spinlock_init’; did you mean ‘spin_lock_init’? 
[-Werror=implicit-function-declaration]
 9245 | spinlock_init(>snapshot_trigger_lock);
  | ^

-- Steve

> > +#endif
> > tr->current_trace = _trace;
> >  
> > INIT_LIST_HEAD(>systems);  




Re: [PATCH v17 3/6] tracing: Add snapshot refcount

2024-02-19 Thread Steven Rostedt
On Tue, 13 Feb 2024 11:49:42 +
Vincent Donnefort  wrote:

> @@ -9678,7 +9739,9 @@ trace_array_create_systems(const char *name, const char 
> *systems)
>   raw_spin_lock_init(>start_lock);
>  
>   tr->max_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
> -
> +#ifdef CONFIG_TRCER_MAX_TRACE

Oops!

I'll fix this too.

-- Steve

> + spinlock_init(>snapshot_trigger_lock);
> +#endif
>   tr->current_trace = _trace;
>  
>   INIT_LIST_HEAD(>systems);



Re: [PATCH v11 0/4] add zynqmp TCM bindings

2024-02-19 Thread Tanmay Shah
Hello,

By mistake same set of patches were sent twice in same git send-email command.

Anyone can be reviewed. Please let me know if I need to take any action to fix 
it.

Thanks.

On 2/19/24 11:44 AM, Tanmay Shah wrote:
> Tightly-Coupled Memories(TCMs) are low-latency memory that provides
> predictable instruction execution and predictable data load/store
> timing. Each Cortex-R5F processor contains exclusive two 64 KB memory
> banks on the ATCM and BTCM ports, for a total of 128 KB of memory.
> In lockstep mode, both 128KB memory is accessible to the cluster.
>
> As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following
> is address space of TCM memory. The bindings in this patch series
> introduces properties to accommodate following address space with
> address translation between Linux and Cortex-R5 views.
>
> | | | |
> | --- | --- | --- |
> |  *Mode*|   *R5 View* | *Linux view* |  Notes   |
> | *Split Mode*   | *start addr*| *start addr* |  |
> | R5_0 ATCM (64 KB)  | 0x_ | 0xFFE0_  |  |
> | R5_0 BTCM (64 KB)  | 0x0002_ | 0xFFE2_  |  |
> | R5_1 ATCM (64 KB)  | 0x_ | 0xFFE9_  | alias of 0xFFE1_ |
> | R5_1 BTCM (64 KB)  | 0x0002_ | 0xFFEB_  | alias of 0xFFE3_ |
> |  ___   | ___ |___   |  |
> | *Lockstep Mode*| |  |  |
> | R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |  |
> | R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |  |
>
> References:
> UG1085 TCM address space:
> https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map
>
> Changes in v11:
>   - Fix yamllint warning and reduce indentation as needed
>   - Remove redundant initialization of the variable
>   - Return correct error code if memory allocation failed
>
> Changs in v10:
>   - Add new patch (1/4) to series that changes hardcode TCM addresses in
> lockstep mode and removes separate handling of TCM in lockstep and
> split mode
>   - modify number of "reg", "reg-names" and "power-domains" entries
> based on cluster mode
>   - Add extra optional atcm and btcm in "reg" property for lockstep mode
>   - Add "reg-names" for extra optional atcm and btcm for lockstep mode
>   - Drop previous Ack as bindings has new change
>   - Add individual tcm regions via "reg" and "reg-names" for lockstep mode
>   - Add each tcm's power-domains in lockstep mode
>   - Drop previous Ack as new change in dts patchset
>   - Remove redundant changes in driver to handle TCM in lockstep mode
>
> Changes in v9:
>   - Fix rproc lockstep dts
>   - Introduce new API to request and release core1 TCM power-domains in
> lockstep mode. This will be used during prepare -> add_tcm_banks
> callback to enable TCM in lockstep mode.
>   - Parse TCM from device-tree in lockstep mode and split mode in
> uniform way.
>   - Fix TCM representation in device-tree in lockstep mode.
>   - Fix comments as suggested
>
> Changes in v8:
>   - Remove use of pm_domains framework
>   - Remove checking of pm_domain_id validation to power on/off tcm
>   - Remove spurious change
>   - parse power-domains property from device-tree and use EEMI calls
> to power on/off TCM instead of using pm domains framework
>
> Changes in v7:
>   - %s/pm_dev1/pm_dev_core0/r
>   - %s/pm_dev_link1/pm_dev_core0_link/r
>   - %s/pm_dev2/pm_dev_core1/r
>   - %s/pm_dev_link2/pm_dev_core1_link/r
>   - remove pm_domain_id check to move next patch
>   - add comment about how 1st entry in pm domain list is used
>   - fix loop when jump to fail_add_pm_domains loop
>   - move checking of pm_domain_id from previous patch
>   - fix mem_bank_data memory allocation
>
> Changes in v6:
>   - Introduce new node entry for r5f cluster split mode dts and
> keep it disabled by default.
>   - Keep remoteproc lockstep mode enabled by default to maintian
> back compatibility.
>   - Enable split mode only for zcu102 board to demo split mode use
>   - Remove spurious change
>   - Handle errors in add_pm_domains function
>   - Remove redundant code to handle errors from remove_pm_domains
>   - Missing . at the end of the commit message
>   - remove redundant initialization of variables
>   - remove fail_tcm label and relevant code to free memory
> acquired using devm_* API. As this will be freed when device free it
>   - add extra check to see if "reg" property is supported or not
>
> Changes in v5:
>   - maintain Rob's Ack on bindings patch as no changes in bindings
>   - split previous patch into multiple patches
>   - Use pm domain framework to turn on/off TCM
>   - Add support of parsing TCM information from device-tree
>   - maintain backward compatibility with previous bindings without
> TCM information available in device-tree
>
> This patch series continues 

[PATCH v11 4/4] remoteproc: zynqmp: parse TCM from device tree

2024-02-19 Thread Tanmay Shah
ZynqMP TCM information was fixed in driver. Now ZynqMP TCM information
is available in device-tree. Parse TCM information in driver
as per new bindings.

Signed-off-by: Tanmay Shah 
---

Changes in v11:
  - Remove redundant initialization of the variable
  - return correct error code if memory allocation failed

 drivers/remoteproc/xlnx_r5_remoteproc.c | 112 ++--
 1 file changed, 107 insertions(+), 5 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 42b0384d34f2..d4a22caebaad 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -74,8 +74,8 @@ struct mbox_info {
 };
 
 /*
- * Hardcoded TCM bank values. This will be removed once TCM bindings are
- * accepted for system-dt specifications and upstreamed in linux kernel
+ * Hardcoded TCM bank values. This will stay in driver to maintain backward
+ * compatibility with device-tree that does not have TCM information.
  */
 static const struct mem_bank_data zynqmp_tcm_banks_split[] = {
{0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
@@ -757,6 +757,103 @@ static struct zynqmp_r5_core 
*zynqmp_r5_add_rproc_core(struct device *cdev)
return ERR_PTR(ret);
 }
 
+static int zynqmp_r5_get_tcm_node_from_dt(struct zynqmp_r5_cluster *cluster)
+{
+   int i, j, tcm_bank_count, ret, tcm_pd_idx, pd_count;
+   struct of_phandle_args out_args;
+   struct zynqmp_r5_core *r5_core;
+   struct platform_device *cpdev;
+   struct mem_bank_data *tcm;
+   struct device_node *np;
+   struct resource *res;
+   u64 abs_addr, size;
+   struct device *dev;
+
+   for (i = 0; i < cluster->core_count; i++) {
+   r5_core = cluster->r5_cores[i];
+   dev = r5_core->dev;
+   np = r5_core->np;
+
+   pd_count = of_count_phandle_with_args(np, "power-domains",
+ "#power-domain-cells");
+
+   if (pd_count <= 0) {
+   dev_err(dev, "invalid power-domains property, %d\n", 
pd_count);
+   return -EINVAL;
+   }
+
+   /* First entry in power-domains list is for r5 core, rest for 
TCM. */
+   tcm_bank_count = pd_count - 1;
+
+   if (tcm_bank_count <= 0) {
+   dev_err(dev, "invalid TCM count %d\n", tcm_bank_count);
+   return -EINVAL;
+   }
+
+   r5_core->tcm_banks = devm_kcalloc(dev, tcm_bank_count,
+ sizeof(struct mem_bank_data 
*),
+ GFP_KERNEL);
+   if (!r5_core->tcm_banks)
+   return -ENOMEM;
+
+   r5_core->tcm_bank_count = tcm_bank_count;
+   for (j = 0, tcm_pd_idx = 1; j < tcm_bank_count; j++, 
tcm_pd_idx++) {
+   tcm = devm_kzalloc(dev, sizeof(struct mem_bank_data),
+  GFP_KERNEL);
+   if (!tcm)
+   return -ENOMEM;
+
+   r5_core->tcm_banks[j] = tcm;
+
+   /* Get power-domains id of TCM. */
+   ret = of_parse_phandle_with_args(np, "power-domains",
+"#power-domain-cells",
+tcm_pd_idx, _args);
+   if (ret) {
+   dev_err(r5_core->dev,
+   "failed to get tcm %d pm domain, ret 
%d\n",
+   tcm_pd_idx, ret);
+   return ret;
+   }
+   tcm->pm_domain_id = out_args.args[0];
+   of_node_put(out_args.np);
+
+   /* Get TCM address without translation. */
+   ret = of_property_read_reg(np, j, _addr, );
+   if (ret) {
+   dev_err(dev, "failed to get reg property\n");
+   return ret;
+   }
+
+   /*
+* Remote processor can address only 32 bits
+* so convert 64-bits into 32-bits. This will discard
+* any unwanted upper 32-bits.
+*/
+   tcm->da = (u32)abs_addr;
+   tcm->size = (u32)size;
+
+   cpdev = to_platform_device(dev);
+   res = platform_get_resource(cpdev, IORESOURCE_MEM, j);
+   if (!res) {
+   dev_err(dev, "failed to get tcm resource\n");
+   return -EINVAL;
+   }
+
+   

[PATCH v11 3/4] dts: zynqmp: add properties for TCM in remoteproc

2024-02-19 Thread Tanmay Shah
Add properties as per new bindings in zynqmp remoteproc node
to represent TCM address and size.

This patch also adds alternative remoteproc node to represent
remoteproc cluster in split mode. By default lockstep mode is
enabled and users should disable it before using split mode
dts. Both device-tree nodes can't be used simultaneously one
of them must be disabled. For zcu102-1.0 and zcu102-1.1 board
remoteproc split mode dts node is enabled and lockstep mode
dts is disabled.

Signed-off-by: Tanmay Shah 
---
 .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts  |  8 +++
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 65 +--
 2 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts 
b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
index c8f71a1aec89..495ca94b45db 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
+++ b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
@@ -14,6 +14,14 @@ / {
compatible = "xlnx,zynqmp-zcu102-rev1.0", "xlnx,zynqmp-zcu102", 
"xlnx,zynqmp";
 };
 
+_split {
+   status = "okay";
+};
+
+_lockstep {
+   status = "disabled";
+};
+
  {
#address-cells = <1>;
#size-cells = <1>;
diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi 
b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
index eaba466804bc..c8a7fd0f3a1e 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
@@ -248,19 +248,74 @@ fpga_full: fpga-full {
ranges;
};
 
-   remoteproc {
+   rproc_lockstep: remoteproc@ffe0 {
compatible = "xlnx,zynqmp-r5fss";
xlnx,cluster-mode = <1>;
 
-   r5f-0 {
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+<0x0 0x2 0x0 0xffe2 0x0 0x1>,
+<0x0 0x1 0x0 0xffe1 0x0 0x1>,
+<0x0 0x3 0x0 0xffe3 0x0 0x1>;
+
+   r5f@0 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x0 0x0 0x0 0x1>,
+ <0x0 0x2 0x0 0x1>,
+ <0x0 0x1 0x0 0x1>,
+ <0x0 0x3 0x0 0x1>;
+   reg-names = "atcm0", "btcm0", "atcm1", "btcm1";
+   power-domains = <_firmware PD_RPU_0>,
+   <_firmware PD_R5_0_ATCM>,
+   <_firmware PD_R5_0_BTCM>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
+   memory-region = <_0_fw_image>;
+   };
+
+   r5f@1 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_1>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
+   memory-region = <_1_fw_image>;
+   };
+   };
+
+   rproc_split: remoteproc-split@ffe0 {
+   status = "disabled";
+   compatible = "xlnx,zynqmp-r5fss";
+   xlnx,cluster-mode = <0>;
+
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+<0x0 0x2 0x0 0xffe2 0x0 0x1>,
+<0x1 0x0 0x0 0xffe9 0x0 0x1>,
+<0x1 0x2 0x0 0xffeb 0x0 0x1>;
+
+   r5f@0 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <_firmware PD_RPU_0>;
+   reg = <0x0 0x0 0x0 0x1>, <0x0 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_0>,
+   <_firmware PD_R5_0_ATCM>,
+   <_firmware PD_R5_0_BTCM>;
memory-region = <_0_fw_image>;
};
 
-   r5f-1 {
+   r5f@1 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <_firmware PD_RPU_1>;
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_1>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
memory-region = <_1_fw_image>;
};
};
-- 
2.25.1




[PATCH v11 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

2024-02-19 Thread Tanmay Shah
From: Radhey Shyam Pandey 

Introduce bindings for TCM memory address space on AMD-xilinx Zynq
UltraScale+ platform. It will help in defining TCM in device-tree
and make it's access platform agnostic and data-driven.

Tightly-coupled memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.

The TCM resources(reg, reg-names and power-domain) are documented for
each TCM in the R5 node. The reg and reg-names are made as required
properties as we don't want to hardcode TCM addresses for future
platforms and for zu+ legacy implementation will ensure that the
old dts w/o reg/reg-names works and stable ABI is maintained.

It also extends the examples for TCM split and lockstep modes.

Signed-off-by: Radhey Shyam Pandey 
Signed-off-by: Tanmay Shah 
---

Changes in v11:
  - Fix yamllint warning and reduce indentation as needed

 .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 --
 1 file changed, 170 insertions(+), 22 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml 
b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
index 78aac69f1060..77030edf41fa 100644
--- a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
@@ -20,9 +20,21 @@ properties:
   compatible:
 const: xlnx,zynqmp-r5fss
 
+  "#address-cells":
+const: 2
+
+  "#size-cells":
+const: 2
+
+  ranges:
+description: |
+  Standard ranges definition providing address translations for
+  local R5F TCM address spaces to bus addresses.
+
   xlnx,cluster-mode:
 $ref: /schemas/types.yaml#/definitions/uint32
 enum: [0, 1, 2]
+default: 1
 description: |
   The RPU MPCore can operate in split mode (Dual-processor performance), 
Safety
   lock-step mode(Both RPU cores execute the same code in lock-step,
@@ -37,7 +49,7 @@ properties:
   2: single cpu mode
 
 patternProperties:
-  "^r5f-[a-f0-9]+$":
+  "^r5f@[0-9a-f]+$":
 type: object
 description: |
   The RPU is located in the Low Power Domain of the Processor Subsystem.
@@ -54,9 +66,6 @@ patternProperties:
   compatible:
 const: xlnx,zynqmp-r5f
 
-  power-domains:
-maxItems: 1
-
   mboxes:
 minItems: 1
 items:
@@ -101,35 +110,174 @@ patternProperties:
 
 required:
   - compatible
-  - power-domains
 
-unevaluatedProperties: false
+allOf:
+  - if:
+  properties:
+xlnx,cluster-mode:
+  enum:
+- 1
+then:
+  patternProperties:
+"^r5f@[0-9a-f]+$":
+  type: object
+
+  properties:
+reg:
+  minItems: 1
+  items:
+- description: ATCM internal memory
+- description: BTCM internal memory
+- description: extra ATCM memory in lockstep mode
+- description: extra BTCM memory in lockstep mode
+
+reg-names:
+  minItems: 1
+  items:
+- const: atcm0
+- const: btcm0
+- const: atcm1
+- const: btcm1
+
+power-domains:
+  minItems: 2
+  maxItems: 5
+
+  required:
+- reg
+- reg-names
+- power-domains
+
+else:
+  patternProperties:
+"^r5f@[0-9a-f]+$":
+  type: object
+
+  properties:
+reg:
+  minItems: 1
+  items:
+- description: ATCM internal memory
+- description: BTCM internal memory
+
+reg-names:
+  minItems: 1
+  items:
+- const: atcm0
+- const: btcm0
+
+power-domains:
+  minItems: 2
+  maxItems: 3
+
+  required:
+- reg
+- reg-names
+- power-domains
 
 required:
   - compatible
+  - "#address-cells"
+  - "#size-cells"
+  - ranges
 
 additionalProperties: false
 
 examples:
   - |
-remoteproc {
-compatible = "xlnx,zynqmp-r5fss";
-xlnx,cluster-mode = <1>;
-
-r5f-0 {
-compatible = "xlnx,zynqmp-r5f";
-power-domains = <_firmware 0x7>;
-memory-region = <_0_fw_image>, <>, 
<>, <>;
-mboxes = <_mailbox_rpu0 0>, <_mailbox_rpu0 1>;
-mbox-names = "tx", "rx";
+#include 
+
+// Split mode configuration
+soc {
+#address-cells = <2>;
+#size-cells = <2>;
+
+remoteproc@ffe0 {
+compatible = "xlnx,zynqmp-r5fss";
+xlnx,cluster-mode = <0>;
+
+#address-cells = <2>;
+#size-cells = <2>;
+ranges = <0x0 0x0 0x0 

[PATCH v11 0/4] add zynqmp TCM bindings

2024-02-19 Thread Tanmay Shah
Tightly-Coupled Memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains exclusive two 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.
In lockstep mode, both 128KB memory is accessible to the cluster.

As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following
is address space of TCM memory. The bindings in this patch series
introduces properties to accommodate following address space with
address translation between Linux and Cortex-R5 views.

| | | |
| --- | --- | --- |
|  *Mode*|   *R5 View* | *Linux view* |  Notes   |
| *Split Mode*   | *start addr*| *start addr* |  |
| R5_0 ATCM (64 KB)  | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (64 KB)  | 0x0002_ | 0xFFE2_  |  |
| R5_1 ATCM (64 KB)  | 0x_ | 0xFFE9_  | alias of 0xFFE1_ |
| R5_1 BTCM (64 KB)  | 0x0002_ | 0xFFEB_  | alias of 0xFFE3_ |
|  ___   | ___ |___   |  |
| *Lockstep Mode*| |  |  |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |  |

References:
UG1085 TCM address space:
https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map

Changes in v11:
  - Fix yamllint warning and reduce indentation as needed
  - Remove redundant initialization of the variable
  - Return correct error code if memory allocation failed

Changs in v10:
  - Add new patch (1/4) to series that changes hardcode TCM addresses in
lockstep mode and removes separate handling of TCM in lockstep and
split mode
  - modify number of "reg", "reg-names" and "power-domains" entries
based on cluster mode
  - Add extra optional atcm and btcm in "reg" property for lockstep mode
  - Add "reg-names" for extra optional atcm and btcm for lockstep mode
  - Drop previous Ack as bindings has new change
  - Add individual tcm regions via "reg" and "reg-names" for lockstep mode
  - Add each tcm's power-domains in lockstep mode
  - Drop previous Ack as new change in dts patchset
  - Remove redundant changes in driver to handle TCM in lockstep mode

Changes in v9:
  - Fix rproc lockstep dts
  - Introduce new API to request and release core1 TCM power-domains in
lockstep mode. This will be used during prepare -> add_tcm_banks
callback to enable TCM in lockstep mode.
  - Parse TCM from device-tree in lockstep mode and split mode in
uniform way.
  - Fix TCM representation in device-tree in lockstep mode.
  - Fix comments as suggested

Changes in v8:
  - Remove use of pm_domains framework
  - Remove checking of pm_domain_id validation to power on/off tcm
  - Remove spurious change
  - parse power-domains property from device-tree and use EEMI calls
to power on/off TCM instead of using pm domains framework

Changes in v7:
  - %s/pm_dev1/pm_dev_core0/r
  - %s/pm_dev_link1/pm_dev_core0_link/r
  - %s/pm_dev2/pm_dev_core1/r
  - %s/pm_dev_link2/pm_dev_core1_link/r
  - remove pm_domain_id check to move next patch
  - add comment about how 1st entry in pm domain list is used
  - fix loop when jump to fail_add_pm_domains loop
  - move checking of pm_domain_id from previous patch
  - fix mem_bank_data memory allocation

Changes in v6:
  - Introduce new node entry for r5f cluster split mode dts and
keep it disabled by default.
  - Keep remoteproc lockstep mode enabled by default to maintian
back compatibility.
  - Enable split mode only for zcu102 board to demo split mode use
  - Remove spurious change
  - Handle errors in add_pm_domains function
  - Remove redundant code to handle errors from remove_pm_domains
  - Missing . at the end of the commit message
  - remove redundant initialization of variables
  - remove fail_tcm label and relevant code to free memory
acquired using devm_* API. As this will be freed when device free it
  - add extra check to see if "reg" property is supported or not

Changes in v5:
  - maintain Rob's Ack on bindings patch as no changes in bindings
  - split previous patch into multiple patches
  - Use pm domain framework to turn on/off TCM
  - Add support of parsing TCM information from device-tree
  - maintain backward compatibility with previous bindings without
TCM information available in device-tree

This patch series continues previous effort to upstream ZynqMP
TCM bindings:
Previous v4 version link:
https://lore.kernel.org/all/20230829181900.2561194-1-tanmay.s...@amd.com/

Previous v3 version link:
https://lore.kernel.org/all/1689964908-22371-1-git-send-email-radhey.shyam.pan...@amd.com/
Radhey Shyam Pandey (1):
  dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings



Radhey Shyam Pandey (1):
  dt-bindings: 

[PATCH v11 1/4] remoteproc: zynqmp: fix lockstep mode memory region

2024-02-19 Thread Tanmay Shah
In lockstep mode, r5 core0 uses TCM of R5 core1. Following is lockstep
mode memory region as per hardware reference manual.

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |

However, driver shouldn't model it as above because R5 core0 TCM and core1
TCM has different power-domains mapped to it.
Hence, TCM address space in lockstep mode should be modeled as 64KB
regions only where each region has its own power-domain as following:

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM0 (64 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM0 (64 KB) | 0x0002_ | 0xFFE2_  |
| R5_0 ATCM1 (64 KB) | 0x0001_ | 0xFFE1_  |
| R5_0 BTCM1 (64 KB) | 0x0003_ | 0xFFE3_  |

This makes driver maintanance easy and makes design robust for future
platorms as well.

Signed-off-by: Tanmay Shah 
---
 drivers/remoteproc/xlnx_r5_remoteproc.c | 145 ++--
 1 file changed, 12 insertions(+), 133 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 4395edea9a64..42b0384d34f2 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -84,12 +84,12 @@ static const struct mem_bank_data zynqmp_tcm_banks_split[] 
= {
{0xffebUL, 0x2, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
-/* In lockstep mode cluster combines each 64KB TCM and makes 128KB TCM */
+/* In lockstep mode cluster uses each 64KB TCM from second core as well */
 static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = {
-   {0xffe0UL, 0x0, 0x2UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 128KB 
each */
-   {0xffe2UL, 0x2, 0x2UL, PD_R5_0_BTCM, "btcm0"},
-   {0, 0, 0, PD_R5_1_ATCM, ""},
-   {0, 0, 0, PD_R5_1_BTCM, ""},
+   {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
+   {0xffe2UL, 0x2, 0x1UL, PD_R5_0_BTCM, "btcm0"},
+   {0xffe1UL, 0x1, 0x1UL, PD_R5_1_ATCM, "atcm1"},
+   {0xffe3UL, 0x3, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
 /**
@@ -540,14 +540,14 @@ static int tcm_mem_map(struct rproc *rproc,
 }
 
 /*
- * add_tcm_carveout_split_mode()
+ * add_tcm_banks()
  * @rproc: single R5 core's corresponding rproc instance
  *
- * allocate and add remoteproc carveout for TCM memory in split mode
+ * allocate and add remoteproc carveout for TCM memory
  *
  * return 0 on success, otherwise non-zero value on failure
  */
-static int add_tcm_carveout_split_mode(struct rproc *rproc)
+static int add_tcm_banks(struct rproc *rproc)
 {
struct rproc_mem_entry *rproc_mem;
struct zynqmp_r5_core *r5_core;
@@ -580,10 +580,10 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
 ZYNQMP_PM_REQUEST_ACK_BLOCKING);
if (ret < 0) {
dev_err(dev, "failed to turn on TCM 0x%x", 
pm_domain_id);
-   goto release_tcm_split;
+   goto release_tcm;
}
 
-   dev_dbg(dev, "TCM carveout split mode %s addr=%llx, da=0x%x, 
size=0x%lx",
+   dev_dbg(dev, "TCM carveout %s addr=%llx, da=0x%x, size=0x%lx",
bank_name, bank_addr, da, bank_size);
 
rproc_mem = rproc_mem_entry_init(dev, NULL, bank_addr,
@@ -593,7 +593,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc)
if (!rproc_mem) {
ret = -ENOMEM;
zynqmp_pm_release_node(pm_domain_id);
-   goto release_tcm_split;
+   goto release_tcm;
}
 
rproc_add_carveout(rproc, rproc_mem);
@@ -601,7 +601,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc)
 
return 0;
 
-release_tcm_split:
+release_tcm:
/* If failed, Turn off all TCM banks turned on before */
for (i--; i >= 0; i--) {
pm_domain_id = r5_core->tcm_banks[i]->pm_domain_id;
@@ -610,127 +610,6 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
return ret;
 }
 
-/*
- * add_tcm_carveout_lockstep_mode()
- * @rproc: single R5 core's corresponding rproc instance
- *
- * allocate and add remoteproc carveout for TCM memory in lockstep mode
- *
- * return 0 on success, otherwise non-zero value on failure
- */
-static int add_tcm_carveout_lockstep_mode(struct rproc *rproc)
-{
-   struct rproc_mem_entry *rproc_mem;
-   struct zynqmp_r5_core *r5_core;
-   int i, num_banks, ret;
-   phys_addr_t bank_addr;
-   size_t bank_size = 0;
-   struct device *dev;
-   u32 pm_domain_id;
-   char *bank_name;
-   u32 da;
-
-   r5_core = rproc->priv;
-   dev = r5_core->dev;
-
-   /* Go through zynqmp banks for r5 node */
-   num_banks = 

[PATCH v11 4/4] remoteproc: zynqmp: parse TCM from device tree

2024-02-19 Thread Tanmay Shah
ZynqMP TCM information was fixed in driver. Now ZynqMP TCM information
is available in device-tree. Parse TCM information in driver
as per new bindings.

Signed-off-by: Tanmay Shah 
---

Changes in v11:
  - Remove redundant initialization of the variable
  - return correct error code if memory allocation failed

 drivers/remoteproc/xlnx_r5_remoteproc.c | 112 ++--
 1 file changed, 107 insertions(+), 5 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 42b0384d34f2..d4a22caebaad 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -74,8 +74,8 @@ struct mbox_info {
 };
 
 /*
- * Hardcoded TCM bank values. This will be removed once TCM bindings are
- * accepted for system-dt specifications and upstreamed in linux kernel
+ * Hardcoded TCM bank values. This will stay in driver to maintain backward
+ * compatibility with device-tree that does not have TCM information.
  */
 static const struct mem_bank_data zynqmp_tcm_banks_split[] = {
{0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
@@ -757,6 +757,103 @@ static struct zynqmp_r5_core 
*zynqmp_r5_add_rproc_core(struct device *cdev)
return ERR_PTR(ret);
 }
 
+static int zynqmp_r5_get_tcm_node_from_dt(struct zynqmp_r5_cluster *cluster)
+{
+   int i, j, tcm_bank_count, ret, tcm_pd_idx, pd_count;
+   struct of_phandle_args out_args;
+   struct zynqmp_r5_core *r5_core;
+   struct platform_device *cpdev;
+   struct mem_bank_data *tcm;
+   struct device_node *np;
+   struct resource *res;
+   u64 abs_addr, size;
+   struct device *dev;
+
+   for (i = 0; i < cluster->core_count; i++) {
+   r5_core = cluster->r5_cores[i];
+   dev = r5_core->dev;
+   np = r5_core->np;
+
+   pd_count = of_count_phandle_with_args(np, "power-domains",
+ "#power-domain-cells");
+
+   if (pd_count <= 0) {
+   dev_err(dev, "invalid power-domains property, %d\n", 
pd_count);
+   return -EINVAL;
+   }
+
+   /* First entry in power-domains list is for r5 core, rest for 
TCM. */
+   tcm_bank_count = pd_count - 1;
+
+   if (tcm_bank_count <= 0) {
+   dev_err(dev, "invalid TCM count %d\n", tcm_bank_count);
+   return -EINVAL;
+   }
+
+   r5_core->tcm_banks = devm_kcalloc(dev, tcm_bank_count,
+ sizeof(struct mem_bank_data 
*),
+ GFP_KERNEL);
+   if (!r5_core->tcm_banks)
+   return -ENOMEM;
+
+   r5_core->tcm_bank_count = tcm_bank_count;
+   for (j = 0, tcm_pd_idx = 1; j < tcm_bank_count; j++, 
tcm_pd_idx++) {
+   tcm = devm_kzalloc(dev, sizeof(struct mem_bank_data),
+  GFP_KERNEL);
+   if (!tcm)
+   return -ENOMEM;
+
+   r5_core->tcm_banks[j] = tcm;
+
+   /* Get power-domains id of TCM. */
+   ret = of_parse_phandle_with_args(np, "power-domains",
+"#power-domain-cells",
+tcm_pd_idx, _args);
+   if (ret) {
+   dev_err(r5_core->dev,
+   "failed to get tcm %d pm domain, ret 
%d\n",
+   tcm_pd_idx, ret);
+   return ret;
+   }
+   tcm->pm_domain_id = out_args.args[0];
+   of_node_put(out_args.np);
+
+   /* Get TCM address without translation. */
+   ret = of_property_read_reg(np, j, _addr, );
+   if (ret) {
+   dev_err(dev, "failed to get reg property\n");
+   return ret;
+   }
+
+   /*
+* Remote processor can address only 32 bits
+* so convert 64-bits into 32-bits. This will discard
+* any unwanted upper 32-bits.
+*/
+   tcm->da = (u32)abs_addr;
+   tcm->size = (u32)size;
+
+   cpdev = to_platform_device(dev);
+   res = platform_get_resource(cpdev, IORESOURCE_MEM, j);
+   if (!res) {
+   dev_err(dev, "failed to get tcm resource\n");
+   return -EINVAL;
+   }
+
+   

[PATCH v11 3/4] dts: zynqmp: add properties for TCM in remoteproc

2024-02-19 Thread Tanmay Shah
Add properties as per new bindings in zynqmp remoteproc node
to represent TCM address and size.

This patch also adds alternative remoteproc node to represent
remoteproc cluster in split mode. By default lockstep mode is
enabled and users should disable it before using split mode
dts. Both device-tree nodes can't be used simultaneously one
of them must be disabled. For zcu102-1.0 and zcu102-1.1 board
remoteproc split mode dts node is enabled and lockstep mode
dts is disabled.

Signed-off-by: Tanmay Shah 
---
 .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts  |  8 +++
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 65 +--
 2 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts 
b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
index c8f71a1aec89..495ca94b45db 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
+++ b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts
@@ -14,6 +14,14 @@ / {
compatible = "xlnx,zynqmp-zcu102-rev1.0", "xlnx,zynqmp-zcu102", 
"xlnx,zynqmp";
 };
 
+_split {
+   status = "okay";
+};
+
+_lockstep {
+   status = "disabled";
+};
+
  {
#address-cells = <1>;
#size-cells = <1>;
diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi 
b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
index eaba466804bc..c8a7fd0f3a1e 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
@@ -248,19 +248,74 @@ fpga_full: fpga-full {
ranges;
};
 
-   remoteproc {
+   rproc_lockstep: remoteproc@ffe0 {
compatible = "xlnx,zynqmp-r5fss";
xlnx,cluster-mode = <1>;
 
-   r5f-0 {
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+<0x0 0x2 0x0 0xffe2 0x0 0x1>,
+<0x0 0x1 0x0 0xffe1 0x0 0x1>,
+<0x0 0x3 0x0 0xffe3 0x0 0x1>;
+
+   r5f@0 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x0 0x0 0x0 0x1>,
+ <0x0 0x2 0x0 0x1>,
+ <0x0 0x1 0x0 0x1>,
+ <0x0 0x3 0x0 0x1>;
+   reg-names = "atcm0", "btcm0", "atcm1", "btcm1";
+   power-domains = <_firmware PD_RPU_0>,
+   <_firmware PD_R5_0_ATCM>,
+   <_firmware PD_R5_0_BTCM>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
+   memory-region = <_0_fw_image>;
+   };
+
+   r5f@1 {
+   compatible = "xlnx,zynqmp-r5f";
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_1>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
+   memory-region = <_1_fw_image>;
+   };
+   };
+
+   rproc_split: remoteproc-split@ffe0 {
+   status = "disabled";
+   compatible = "xlnx,zynqmp-r5fss";
+   xlnx,cluster-mode = <0>;
+
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>,
+<0x0 0x2 0x0 0xffe2 0x0 0x1>,
+<0x1 0x0 0x0 0xffe9 0x0 0x1>,
+<0x1 0x2 0x0 0xffeb 0x0 0x1>;
+
+   r5f@0 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <_firmware PD_RPU_0>;
+   reg = <0x0 0x0 0x0 0x1>, <0x0 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_0>,
+   <_firmware PD_R5_0_ATCM>,
+   <_firmware PD_R5_0_BTCM>;
memory-region = <_0_fw_image>;
};
 
-   r5f-1 {
+   r5f@1 {
compatible = "xlnx,zynqmp-r5f";
-   power-domains = <_firmware PD_RPU_1>;
+   reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>;
+   reg-names = "atcm0", "btcm0";
+   power-domains = <_firmware PD_RPU_1>,
+   <_firmware PD_R5_1_ATCM>,
+   <_firmware PD_R5_1_BTCM>;
memory-region = <_1_fw_image>;
};
};
-- 
2.25.1




[PATCH v11 1/4] remoteproc: zynqmp: fix lockstep mode memory region

2024-02-19 Thread Tanmay Shah
In lockstep mode, r5 core0 uses TCM of R5 core1. Following is lockstep
mode memory region as per hardware reference manual.

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |

However, driver shouldn't model it as above because R5 core0 TCM and core1
TCM has different power-domains mapped to it.
Hence, TCM address space in lockstep mode should be modeled as 64KB
regions only where each region has its own power-domain as following:

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM0 (64 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM0 (64 KB) | 0x0002_ | 0xFFE2_  |
| R5_0 ATCM1 (64 KB) | 0x0001_ | 0xFFE1_  |
| R5_0 BTCM1 (64 KB) | 0x0003_ | 0xFFE3_  |

This makes driver maintanance easy and makes design robust for future
platorms as well.

Signed-off-by: Tanmay Shah 
---
 drivers/remoteproc/xlnx_r5_remoteproc.c | 145 ++--
 1 file changed, 12 insertions(+), 133 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 4395edea9a64..42b0384d34f2 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -84,12 +84,12 @@ static const struct mem_bank_data zynqmp_tcm_banks_split[] 
= {
{0xffebUL, 0x2, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
-/* In lockstep mode cluster combines each 64KB TCM and makes 128KB TCM */
+/* In lockstep mode cluster uses each 64KB TCM from second core as well */
 static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = {
-   {0xffe0UL, 0x0, 0x2UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 128KB 
each */
-   {0xffe2UL, 0x2, 0x2UL, PD_R5_0_BTCM, "btcm0"},
-   {0, 0, 0, PD_R5_1_ATCM, ""},
-   {0, 0, 0, PD_R5_1_BTCM, ""},
+   {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
+   {0xffe2UL, 0x2, 0x1UL, PD_R5_0_BTCM, "btcm0"},
+   {0xffe1UL, 0x1, 0x1UL, PD_R5_1_ATCM, "atcm1"},
+   {0xffe3UL, 0x3, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
 /**
@@ -540,14 +540,14 @@ static int tcm_mem_map(struct rproc *rproc,
 }
 
 /*
- * add_tcm_carveout_split_mode()
+ * add_tcm_banks()
  * @rproc: single R5 core's corresponding rproc instance
  *
- * allocate and add remoteproc carveout for TCM memory in split mode
+ * allocate and add remoteproc carveout for TCM memory
  *
  * return 0 on success, otherwise non-zero value on failure
  */
-static int add_tcm_carveout_split_mode(struct rproc *rproc)
+static int add_tcm_banks(struct rproc *rproc)
 {
struct rproc_mem_entry *rproc_mem;
struct zynqmp_r5_core *r5_core;
@@ -580,10 +580,10 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
 ZYNQMP_PM_REQUEST_ACK_BLOCKING);
if (ret < 0) {
dev_err(dev, "failed to turn on TCM 0x%x", 
pm_domain_id);
-   goto release_tcm_split;
+   goto release_tcm;
}
 
-   dev_dbg(dev, "TCM carveout split mode %s addr=%llx, da=0x%x, 
size=0x%lx",
+   dev_dbg(dev, "TCM carveout %s addr=%llx, da=0x%x, size=0x%lx",
bank_name, bank_addr, da, bank_size);
 
rproc_mem = rproc_mem_entry_init(dev, NULL, bank_addr,
@@ -593,7 +593,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc)
if (!rproc_mem) {
ret = -ENOMEM;
zynqmp_pm_release_node(pm_domain_id);
-   goto release_tcm_split;
+   goto release_tcm;
}
 
rproc_add_carveout(rproc, rproc_mem);
@@ -601,7 +601,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc)
 
return 0;
 
-release_tcm_split:
+release_tcm:
/* If failed, Turn off all TCM banks turned on before */
for (i--; i >= 0; i--) {
pm_domain_id = r5_core->tcm_banks[i]->pm_domain_id;
@@ -610,127 +610,6 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
return ret;
 }
 
-/*
- * add_tcm_carveout_lockstep_mode()
- * @rproc: single R5 core's corresponding rproc instance
- *
- * allocate and add remoteproc carveout for TCM memory in lockstep mode
- *
- * return 0 on success, otherwise non-zero value on failure
- */
-static int add_tcm_carveout_lockstep_mode(struct rproc *rproc)
-{
-   struct rproc_mem_entry *rproc_mem;
-   struct zynqmp_r5_core *r5_core;
-   int i, num_banks, ret;
-   phys_addr_t bank_addr;
-   size_t bank_size = 0;
-   struct device *dev;
-   u32 pm_domain_id;
-   char *bank_name;
-   u32 da;
-
-   r5_core = rproc->priv;
-   dev = r5_core->dev;
-
-   /* Go through zynqmp banks for r5 node */
-   num_banks = 

[PATCH v11 2/4] dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings

2024-02-19 Thread Tanmay Shah
From: Radhey Shyam Pandey 

Introduce bindings for TCM memory address space on AMD-xilinx Zynq
UltraScale+ platform. It will help in defining TCM in device-tree
and make it's access platform agnostic and data-driven.

Tightly-coupled memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains two 64-bit wide 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.

The TCM resources(reg, reg-names and power-domain) are documented for
each TCM in the R5 node. The reg and reg-names are made as required
properties as we don't want to hardcode TCM addresses for future
platforms and for zu+ legacy implementation will ensure that the
old dts w/o reg/reg-names works and stable ABI is maintained.

It also extends the examples for TCM split and lockstep modes.

Signed-off-by: Radhey Shyam Pandey 
Signed-off-by: Tanmay Shah 
---

Changes in v11:
  - Fix yamllint warning and reduce indentation as needed

 .../remoteproc/xlnx,zynqmp-r5fss.yaml | 192 --
 1 file changed, 170 insertions(+), 22 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml 
b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
index 78aac69f1060..77030edf41fa 100644
--- a/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/xlnx,zynqmp-r5fss.yaml
@@ -20,9 +20,21 @@ properties:
   compatible:
 const: xlnx,zynqmp-r5fss
 
+  "#address-cells":
+const: 2
+
+  "#size-cells":
+const: 2
+
+  ranges:
+description: |
+  Standard ranges definition providing address translations for
+  local R5F TCM address spaces to bus addresses.
+
   xlnx,cluster-mode:
 $ref: /schemas/types.yaml#/definitions/uint32
 enum: [0, 1, 2]
+default: 1
 description: |
   The RPU MPCore can operate in split mode (Dual-processor performance), 
Safety
   lock-step mode(Both RPU cores execute the same code in lock-step,
@@ -37,7 +49,7 @@ properties:
   2: single cpu mode
 
 patternProperties:
-  "^r5f-[a-f0-9]+$":
+  "^r5f@[0-9a-f]+$":
 type: object
 description: |
   The RPU is located in the Low Power Domain of the Processor Subsystem.
@@ -54,9 +66,6 @@ patternProperties:
   compatible:
 const: xlnx,zynqmp-r5f
 
-  power-domains:
-maxItems: 1
-
   mboxes:
 minItems: 1
 items:
@@ -101,35 +110,174 @@ patternProperties:
 
 required:
   - compatible
-  - power-domains
 
-unevaluatedProperties: false
+allOf:
+  - if:
+  properties:
+xlnx,cluster-mode:
+  enum:
+- 1
+then:
+  patternProperties:
+"^r5f@[0-9a-f]+$":
+  type: object
+
+  properties:
+reg:
+  minItems: 1
+  items:
+- description: ATCM internal memory
+- description: BTCM internal memory
+- description: extra ATCM memory in lockstep mode
+- description: extra BTCM memory in lockstep mode
+
+reg-names:
+  minItems: 1
+  items:
+- const: atcm0
+- const: btcm0
+- const: atcm1
+- const: btcm1
+
+power-domains:
+  minItems: 2
+  maxItems: 5
+
+  required:
+- reg
+- reg-names
+- power-domains
+
+else:
+  patternProperties:
+"^r5f@[0-9a-f]+$":
+  type: object
+
+  properties:
+reg:
+  minItems: 1
+  items:
+- description: ATCM internal memory
+- description: BTCM internal memory
+
+reg-names:
+  minItems: 1
+  items:
+- const: atcm0
+- const: btcm0
+
+power-domains:
+  minItems: 2
+  maxItems: 3
+
+  required:
+- reg
+- reg-names
+- power-domains
 
 required:
   - compatible
+  - "#address-cells"
+  - "#size-cells"
+  - ranges
 
 additionalProperties: false
 
 examples:
   - |
-remoteproc {
-compatible = "xlnx,zynqmp-r5fss";
-xlnx,cluster-mode = <1>;
-
-r5f-0 {
-compatible = "xlnx,zynqmp-r5f";
-power-domains = <_firmware 0x7>;
-memory-region = <_0_fw_image>, <>, 
<>, <>;
-mboxes = <_mailbox_rpu0 0>, <_mailbox_rpu0 1>;
-mbox-names = "tx", "rx";
+#include 
+
+// Split mode configuration
+soc {
+#address-cells = <2>;
+#size-cells = <2>;
+
+remoteproc@ffe0 {
+compatible = "xlnx,zynqmp-r5fss";
+xlnx,cluster-mode = <0>;
+
+#address-cells = <2>;
+#size-cells = <2>;
+ranges = <0x0 0x0 0x0 

[PATCH v11 0/4] add zynqmp TCM bindings

2024-02-19 Thread Tanmay Shah
Tightly-Coupled Memories(TCMs) are low-latency memory that provides
predictable instruction execution and predictable data load/store
timing. Each Cortex-R5F processor contains exclusive two 64 KB memory
banks on the ATCM and BTCM ports, for a total of 128 KB of memory.
In lockstep mode, both 128KB memory is accessible to the cluster.

As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following
is address space of TCM memory. The bindings in this patch series
introduces properties to accommodate following address space with
address translation between Linux and Cortex-R5 views.

| | | |
| --- | --- | --- |
|  *Mode*|   *R5 View* | *Linux view* |  Notes   |
| *Split Mode*   | *start addr*| *start addr* |  |
| R5_0 ATCM (64 KB)  | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (64 KB)  | 0x0002_ | 0xFFE2_  |  |
| R5_1 ATCM (64 KB)  | 0x_ | 0xFFE9_  | alias of 0xFFE1_ |
| R5_1 BTCM (64 KB)  | 0x0002_ | 0xFFEB_  | alias of 0xFFE3_ |
|  ___   | ___ |___   |  |
| *Lockstep Mode*| |  |  |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |  |

References:
UG1085 TCM address space:
https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map

Changes in v11:
  - Fix yamllint warning and reduce indentation as needed
  - Remove redundant initialization of the variable
  - Return correct error code if memory allocation failed

Changs in v10:
  - Add new patch (1/4) to series that changes hardcode TCM addresses in
lockstep mode and removes separate handling of TCM in lockstep and
split mode
  - modify number of "reg", "reg-names" and "power-domains" entries
based on cluster mode
  - Add extra optional atcm and btcm in "reg" property for lockstep mode
  - Add "reg-names" for extra optional atcm and btcm for lockstep mode
  - Drop previous Ack as bindings has new change
  - Add individual tcm regions via "reg" and "reg-names" for lockstep mode
  - Add each tcm's power-domains in lockstep mode
  - Drop previous Ack as new change in dts patchset
  - Remove redundant changes in driver to handle TCM in lockstep mode

Changes in v9:
  - Fix rproc lockstep dts
  - Introduce new API to request and release core1 TCM power-domains in
lockstep mode. This will be used during prepare -> add_tcm_banks
callback to enable TCM in lockstep mode.
  - Parse TCM from device-tree in lockstep mode and split mode in
uniform way.
  - Fix TCM representation in device-tree in lockstep mode.
  - Fix comments as suggested

Changes in v8:
  - Remove use of pm_domains framework
  - Remove checking of pm_domain_id validation to power on/off tcm
  - Remove spurious change
  - parse power-domains property from device-tree and use EEMI calls
to power on/off TCM instead of using pm domains framework

Changes in v7:
  - %s/pm_dev1/pm_dev_core0/r
  - %s/pm_dev_link1/pm_dev_core0_link/r
  - %s/pm_dev2/pm_dev_core1/r
  - %s/pm_dev_link2/pm_dev_core1_link/r
  - remove pm_domain_id check to move next patch
  - add comment about how 1st entry in pm domain list is used
  - fix loop when jump to fail_add_pm_domains loop
  - move checking of pm_domain_id from previous patch
  - fix mem_bank_data memory allocation

Changes in v6:
  - Introduce new node entry for r5f cluster split mode dts and
keep it disabled by default.
  - Keep remoteproc lockstep mode enabled by default to maintian
back compatibility.
  - Enable split mode only for zcu102 board to demo split mode use
  - Remove spurious change
  - Handle errors in add_pm_domains function
  - Remove redundant code to handle errors from remove_pm_domains
  - Missing . at the end of the commit message
  - remove redundant initialization of variables
  - remove fail_tcm label and relevant code to free memory
acquired using devm_* API. As this will be freed when device free it
  - add extra check to see if "reg" property is supported or not

Changes in v5:
  - maintain Rob's Ack on bindings patch as no changes in bindings
  - split previous patch into multiple patches
  - Use pm domain framework to turn on/off TCM
  - Add support of parsing TCM information from device-tree
  - maintain backward compatibility with previous bindings without
TCM information available in device-tree

This patch series continues previous effort to upstream ZynqMP
TCM bindings:
Previous v4 version link:
https://lore.kernel.org/all/20230829181900.2561194-1-tanmay.s...@amd.com/

Previous v3 version link:
https://lore.kernel.org/all/1689964908-22371-1-git-send-email-radhey.shyam.pan...@amd.com/
Radhey Shyam Pandey (1):
  dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings



Radhey Shyam Pandey (1):
  dt-bindings: 

Re: [v2 PATCH 0/1] ALSA: virtio: add support for audio controls

2024-02-19 Thread Marcin Radomski
Thanks Anton for the reupload.

I tested this series with a 6.1 kernel guest on a proprietary hypervisor. The
controls exposed by the host (BOOLEAN/INTEGER ones, as that was all I could
get) worked as expected when adjusted via ALSA APIs.

Reviewed-by: Marcin Radomski 
Tested-By: Marcin Radomski 



[PATCH] vduse: implement DMA sync callbacks

2024-02-19 Thread Maxime Coquelin
Since commit 295525e29a5b ("virtio_net: merge dma
operations when filling mergeable buffers"), VDUSE device
require support for DMA's .sync_single_for_cpu() operation
as the memory is non-coherent between the device and CPU
because of the use of a bounce buffer.

This patch implements both .sync_single_for_cpu() and
.sync_single_for_device() callbacks, and also skip bounce
buffer copies during DMA map and unmap operations if the
DMA_ATTR_SKIP_CPU_SYNC attribute is set to avoid extra
copies of the same buffer.

Signed-off-by: Maxime Coquelin 
---
 drivers/vdpa/vdpa_user/iova_domain.c | 27 ---
 drivers/vdpa/vdpa_user/iova_domain.h |  8 
 drivers/vdpa/vdpa_user/vduse_dev.c   | 22 ++
 3 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
b/drivers/vdpa/vdpa_user/iova_domain.c
index 5e4a77b9bae6..791d38d6284c 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.c
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -373,6 +373,26 @@ static void vduse_domain_free_iova(struct iova_domain 
*iovad,
free_iova_fast(iovad, iova >> shift, iova_len);
 }
 
+void vduse_domain_sync_single_for_device(struct vduse_iova_domain *domain,
+ dma_addr_t dma_addr, size_t size,
+ enum dma_data_direction dir)
+{
+   read_lock(>bounce_lock);
+   if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
+   vduse_domain_bounce(domain, dma_addr, size, DMA_TO_DEVICE);
+   read_unlock(>bounce_lock);
+}
+
+void vduse_domain_sync_single_for_cpu(struct vduse_iova_domain *domain,
+ dma_addr_t dma_addr, size_t size,
+ enum dma_data_direction dir)
+{
+   read_lock(>bounce_lock);
+   if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
+   vduse_domain_bounce(domain, dma_addr, size, DMA_FROM_DEVICE);
+   read_unlock(>bounce_lock);
+}
+
 dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain,
 struct page *page, unsigned long offset,
 size_t size, enum dma_data_direction dir,
@@ -393,7 +413,8 @@ dma_addr_t vduse_domain_map_page(struct vduse_iova_domain 
*domain,
if (vduse_domain_map_bounce_page(domain, (u64)iova, (u64)size, pa))
goto err_unlock;
 
-   if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
vduse_domain_bounce(domain, iova, size, DMA_TO_DEVICE);
 
read_unlock(>bounce_lock);
@@ -411,9 +432,9 @@ void vduse_domain_unmap_page(struct vduse_iova_domain 
*domain,
 enum dma_data_direction dir, unsigned long attrs)
 {
struct iova_domain *iovad = >stream_iovad;
-
read_lock(>bounce_lock);
-   if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
vduse_domain_bounce(domain, dma_addr, size, DMA_FROM_DEVICE);
 
vduse_domain_unmap_bounce_page(domain, (u64)dma_addr, (u64)size);
diff --git a/drivers/vdpa/vdpa_user/iova_domain.h 
b/drivers/vdpa/vdpa_user/iova_domain.h
index 173e979b84a9..f92f22a7267d 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.h
+++ b/drivers/vdpa/vdpa_user/iova_domain.h
@@ -44,6 +44,14 @@ int vduse_domain_set_map(struct vduse_iova_domain *domain,
 void vduse_domain_clear_map(struct vduse_iova_domain *domain,
struct vhost_iotlb *iotlb);
 
+void vduse_domain_sync_single_for_device(struct vduse_iova_domain *domain,
+ dma_addr_t dma_addr, size_t size,
+ enum dma_data_direction dir);
+
+void vduse_domain_sync_single_for_cpu(struct vduse_iova_domain *domain,
+ dma_addr_t dma_addr, size_t size,
+ enum dma_data_direction dir);
+
 dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain,
 struct page *page, unsigned long offset,
 size_t size, enum dma_data_direction dir,
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index 1d24da79c399..75354ce186a1 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -798,6 +798,26 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops 
= {
.free   = vduse_vdpa_free,
 };
 
+static void vduse_dev_sync_single_for_device(struct device *dev,
+dma_addr_t dma_addr, size_t size,
+enum dma_data_direction dir)
+{
+   struct vduse_dev *vdev = dev_to_vduse(dev);
+   

Re: [PATCH v7 22/36] function_graph: Add a new entry handler with parent_ip and ftrace_regs

2024-02-19 Thread Steven Rostedt
On Wed,  7 Feb 2024 00:11:34 +0900
"Masami Hiramatsu (Google)"  wrote:

> From: Masami Hiramatsu (Google) 
> 
> Add a new entry handler to fgraph_ops as 'entryregfunc'  which takes
> parent_ip and ftrace_regs. Note that the 'entryfunc' and 'entryregfunc'
> are mutual exclusive. You can set only one of them.
> 
> Signed-off-by: Masami Hiramatsu (Google) 
> ---
>  Changes in v3:
>   - Update for new multiple fgraph.
> ---
>  arch/arm64/kernel/ftrace.c   |2 +
>  arch/loongarch/kernel/ftrace_dyn.c   |2 +
>  arch/powerpc/kernel/trace/ftrace.c   |2 +
>  arch/powerpc/kernel/trace/ftrace_64_pg.c |   10 ---
>  arch/x86/kernel/ftrace.c |   42 
> --
>  include/linux/ftrace.h   |   19 +++---
>  kernel/trace/fgraph.c|   30 +
>  7 files changed, 72 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
> index b96740829798..779b975f03f5 100644
> --- a/arch/arm64/kernel/ftrace.c
> +++ b/arch/arm64/kernel/ftrace.c
> @@ -497,7 +497,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
> parent_ip,
>   return;
>  
>   if (!function_graph_enter_ops(*parent, ip, frame_pointer,
> -   (void *)frame_pointer, gops))
> +   (void *)frame_pointer, fregs, gops))

I would like to replace that second frame_pointer with fregs.


>   *parent = (unsigned long)_to_handler;
>  
>   ftrace_test_recursion_unlock(bit);
> diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
> b/arch/loongarch/kernel/ftrace_dyn.c
> index 81d18b911cc1..45d26c6e6564 100644
> --- a/arch/loongarch/kernel/ftrace_dyn.c
> +++ b/arch/loongarch/kernel/ftrace_dyn.c
> @@ -250,7 +250,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
> parent_ip,
>  
>   old = *parent;
>  
> - if (!function_graph_enter_ops(old, ip, 0, parent, gops))
> + if (!function_graph_enter_ops(old, ip, 0, parent, fregs, gops))

That is, to replace the parent with fregs, as the parent can be retrieved
from fregs.

We should add a fregs helper (something like):

unsigned long *fregs_caller_addr(fregs) {
return (unsigned long *)(kernel_stack_pointer(fregs->regs) + PT_R1);
}

That returns the address that points to the parent caller on the stack.

This was on my todo list to do. That is, replace the passing of the parent
of the stack with fregs as it is redundant information.

>   *parent = return_hooker;
>  }
>  #else
> diff --git a/arch/powerpc/kernel/trace/ftrace.c 
> b/arch/powerpc/kernel/trace/ftrace.c
> index 4ef8bf480279..eeaaa798f4f9 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -423,7 +423,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
> parent_ip,
>   if (bit < 0)
>   goto out;
>  
> - if (!function_graph_enter_ops(parent_ip, ip, 0, (unsigned long *)sp, 
> gops))
> + if (!function_graph_enter_ops(parent_ip, ip, 0, (unsigned long *)sp, 
> fregs, gops))
>   parent_ip = ppc_function_entry(return_to_handler);
>  
>   ftrace_test_recursion_unlock(bit);
> diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c 
> b/arch/powerpc/kernel/trace/ftrace_64_pg.c
> index 7b85c3b460a3..43f6cfaaf7db 100644
> --- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
> +++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
> @@ -795,7 +795,8 @@ int ftrace_disable_ftrace_graph_caller(void)
>   * in current thread info. Return the address we want to divert to.
>   */
>  static unsigned long
> -__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned 
> long sp)
> +__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned 
> long sp,
> + struct ftrace_regs *fregs)

And sp shouldn't need to be passed in either, as hat should be part of the 
fregs.

I really like to consolidate the parameters and not just keep adding to
them. This all slows down the logic to load the parameters.

-- Steve


>  {
>   unsigned long return_hooker;
>   int bit;



Re: [RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters

2024-02-19 Thread Dave Hansen
On 2/19/24 07:39, Haitao Huang wrote:
> Remove all boolean parameters for 'reclaim' from the function
> sgx_alloc_epc_page() and its callers by making two versions of each
> function.
> 
> Also opportunistically remove non-static declaration of
> __sgx_alloc_epc_page() and a typo
> 
> Signed-off-by: Haitao Huang 
> Suggested-by: Jarkko Sakkinen 
> ---
>  arch/x86/kernel/cpu/sgx/encl.c  | 56 +--
>  arch/x86/kernel/cpu/sgx/encl.h  |  6 ++-
>  arch/x86/kernel/cpu/sgx/ioctl.c | 23 ---
>  arch/x86/kernel/cpu/sgx/main.c  | 68 ++---
>  arch/x86/kernel/cpu/sgx/sgx.h   |  4 +-
>  arch/x86/kernel/cpu/sgx/virt.c  |  2 +-
>  6 files changed, 115 insertions(+), 44 deletions(-)

Jarkko, did this turn out how you expected?

I think passing around a function pointer to *only* communicate 1 bit of
information is a _bit_ overkill here.

Simply replacing the bool with:

enum sgx_reclaim {
SGX_NO_RECLAIM,
SGX_DO_RECLAIM
};

would do the same thing.  Right?

Are you sure you want a function pointer for this?



[PATCH 10/10] csky/vdso: Use generic union vdso_data_store

2024-02-19 Thread Anna-Maria Behnsen
There is already a generic union definition for vdso_data_store in vdso
datapage header.

Use this definition to prevent code duplication.

Signed-off-by: Anna-Maria Behnsen 
Cc: Guo Ren 
Cc: linux-c...@vger.kernel.org
---
 arch/csky/kernel/vdso.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/csky/kernel/vdso.c b/arch/csky/kernel/vdso.c
index e74a2504d331..2ca886e4a458 100644
--- a/arch/csky/kernel/vdso.c
+++ b/arch/csky/kernel/vdso.c
@@ -15,14 +15,8 @@ extern char vdso_start[], vdso_end[];
 static unsigned int vdso_pages;
 static struct page **vdso_pagelist;
 
-/*
- * The vDSO data page.
- */
-static union {
-   struct vdso_datadata;
-   u8  page[PAGE_SIZE];
-} vdso_data_store __page_aligned_data;
-struct vdso_data *vdso_data = _data_store.data;
+static union vdso_data_store vdso_data_store __page_aligned_data;
+struct vdso_data *vdso_data = vdso_data_store.data;
 
 static int __init vdso_init(void)
 {
-- 
2.39.2




[PATCH 03/10] csky/vdso: Remove superfluous ifdeffery

2024-02-19 Thread Anna-Maria Behnsen
CSKY selects GENERIC_TIME_VSYSCALL. GENERIC_TIME_VSYSCALL dependent
ifdeffery is superfluous. Clean it up.

Signed-off-by: Anna-Maria Behnsen 
Cc: Guo Ren 
Cc: linux-c...@vger.kernel.org
---
 arch/csky/include/asm/vdso.h | 5 -
 arch/csky/kernel/vdso.c  | 4 
 2 files changed, 9 deletions(-)

diff --git a/arch/csky/include/asm/vdso.h b/arch/csky/include/asm/vdso.h
index bdce581b5fcb..181a15edafe8 100644
--- a/arch/csky/include/asm/vdso.h
+++ b/arch/csky/include/asm/vdso.h
@@ -5,11 +5,6 @@
 
 #include 
 
-#ifndef GENERIC_TIME_VSYSCALL
-struct vdso_data {
-};
-#endif
-
 /*
  * The VDSO symbols are mapped into Linux so we can just use regular symbol
  * addressing to get their offsets in userspace.  The symbols are mapped at an
diff --git a/arch/csky/kernel/vdso.c b/arch/csky/kernel/vdso.c
index 16c20d64d165..e74a2504d331 100644
--- a/arch/csky/kernel/vdso.c
+++ b/arch/csky/kernel/vdso.c
@@ -8,11 +8,7 @@
 #include 
 
 #include 
-#ifdef GENERIC_TIME_VSYSCALL
 #include 
-#else
-#include 
-#endif
 
 extern char vdso_start[], vdso_end[];
 
-- 
2.39.2




[RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters

2024-02-19 Thread Haitao Huang
Remove all boolean parameters for 'reclaim' from the function
sgx_alloc_epc_page() and its callers by making two versions of each
function.

Also opportunistically remove non-static declaration of
__sgx_alloc_epc_page() and a typo

Signed-off-by: Haitao Huang 
Suggested-by: Jarkko Sakkinen 
---
 arch/x86/kernel/cpu/sgx/encl.c  | 56 +--
 arch/x86/kernel/cpu/sgx/encl.h  |  6 ++-
 arch/x86/kernel/cpu/sgx/ioctl.c | 23 ---
 arch/x86/kernel/cpu/sgx/main.c  | 68 ++---
 arch/x86/kernel/cpu/sgx/sgx.h   |  4 +-
 arch/x86/kernel/cpu/sgx/virt.c  |  2 +-
 6 files changed, 115 insertions(+), 44 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..07f369ae855c 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -217,7 +217,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct 
sgx_encl_page *encl_page,
struct sgx_epc_page *epc_page;
int ret;
 
-   epc_page = sgx_alloc_epc_page(encl_page, false);
+   epc_page = sgx_alloc_epc_page(encl_page);
if (IS_ERR(epc_page))
return epc_page;
 
@@ -359,14 +359,14 @@ static vm_fault_t sgx_encl_eaug_page(struct 
vm_area_struct *vma,
goto err_out_unlock;
}
 
-   epc_page = sgx_alloc_epc_page(encl_page, false);
+   epc_page = sgx_alloc_epc_page(encl_page);
if (IS_ERR(epc_page)) {
if (PTR_ERR(epc_page) == -EBUSY)
vmret =  VM_FAULT_NOPAGE;
goto err_out_unlock;
}
 
-   va_page = sgx_encl_grow(encl, false);
+   va_page = sgx_encl_grow(encl);
if (IS_ERR(va_page)) {
if (PTR_ERR(va_page) == -EBUSY)
vmret = VM_FAULT_NOPAGE;
@@ -1230,23 +1230,23 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, 
unsigned long addr)
} while (unlikely(encl->mm_list_version != mm_list_version));
 }
 
-/**
- * sgx_alloc_va_page() - Allocate a Version Array (VA) page
- * @reclaim: Reclaim EPC pages directly if none available. Enclave
- *   mutex should not be held if this is set.
- *
- * Allocate a free EPC page and convert it to a Version Array (VA) page.
+typedef struct sgx_epc_page *(*sgx_alloc_epc_page_fn_t)(void *owner);
+
+/*
+ * Allocate a Version Array (VA) page
+ * @alloc_fn: the EPC page allocation function.
  *
  * Return:
  *   a VA page,
  *   -errno otherwise
  */
-struct sgx_epc_page *sgx_alloc_va_page(bool reclaim)
+static struct sgx_epc_page *__sgx_alloc_va_page(sgx_alloc_epc_page_fn_t 
alloc_fn)
 {
struct sgx_epc_page *epc_page;
int ret;
 
-   epc_page = sgx_alloc_epc_page(NULL, reclaim);
+   epc_page = alloc_fn(NULL);
+
if (IS_ERR(epc_page))
return ERR_CAST(epc_page);
 
@@ -1260,6 +1260,40 @@ struct sgx_epc_page *sgx_alloc_va_page(bool reclaim)
return epc_page;
 }
 
+/**
+ * sgx_alloc_va_page() - Allocate a Version Array (VA) page
+ *
+ * Allocate a free EPC page and convert it to a VA page.
+ *
+ * Do not reclaim EPC pages if none available.
+ *
+ * Return:
+ *   a VA page,
+ *   -errno otherwise
+ */
+struct sgx_epc_page *sgx_alloc_va_page(void)
+{
+return __sgx_alloc_va_page(sgx_alloc_epc_page);
+}
+
+/**
+ * sgx_alloc_va_page_reclaim() - Allocate a Version Array (VA) page
+ *
+ * Allocate a free EPC page and convert it to a VA page.
+ *
+ * Reclaim EPC pages directly if none available. Enclave mutex should not be
+ * held.
+ *
+ * Return:
+ *   a VA page,
+ *   -errno otherwise
+ */
+
+struct sgx_epc_page *sgx_alloc_va_page_reclaim(void)
+{
+return __sgx_alloc_va_page(sgx_alloc_epc_page_reclaim);
+}
+
 /**
  * sgx_alloc_va_slot - allocate a VA slot
  * @va_page:   a  sgx_va_page instance
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..3248ff72e573 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -116,14 +116,16 @@ struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl 
*encl,
  unsigned long offset,
  u64 secinfo_flags);
 void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr);
-struct sgx_epc_page *sgx_alloc_va_page(bool reclaim);
+struct sgx_epc_page *sgx_alloc_va_page(void);
+struct sgx_epc_page *sgx_alloc_va_page_reclaim(void);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
 void sgx_encl_free_epc_page(struct sgx_epc_page *page);
 struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 unsigned long addr);
-struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim);
+struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl);
+struct sgx_va_page *sgx_encl_grow_reclaim(struct sgx_encl 

Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-19 Thread Haitao Huang
On Tue, 13 Feb 2024 19:52:25 -0600, Jarkko Sakkinen   
wrote:



On Tue Feb 13, 2024 at 1:15 AM EET, Haitao Huang wrote:

Hi Jarkko

On Mon, 12 Feb 2024 13:55:46 -0600, Jarkko Sakkinen 
wrote:

> On Mon Feb 5, 2024 at 11:06 PM EET, Haitao Huang wrote:
>> From: Kristen Carlson Accardi 
>>
>> When the EPC usage of a cgroup is near its limit, the cgroup needs to
>> reclaim pages used in the same cgroup to make room for new  
allocations.
>> This is analogous to the behavior that the global reclaimer is  
triggered

>> when the global usage is close to total available EPC.
>>
>> Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate
>> whether synchronous reclaim is allowed or not. And trigger the
>> synchronous/asynchronous reclamation flow accordingly.
>>
>> Note at this point, all reclaimable EPC pages are still tracked in  
the
>> global LRU and per-cgroup LRUs are empty. So no per-cgroup  
reclamation

>> is activated yet.
>>
>> Co-developed-by: Sean Christopherson  


>> Signed-off-by: Sean Christopherson 
>> Signed-off-by: Kristen Carlson Accardi 
>> Co-developed-by: Haitao Huang 
>> Signed-off-by: Haitao Huang 
>> ---
>> V7:
>> - Split this out from the big patch, #10 in V6. (Dave, Kai)
>> ---
>>  arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 --
>>  arch/x86/kernel/cpu/sgx/epc_cgroup.h |  4 ++--
>>  arch/x86/kernel/cpu/sgx/main.c   |  2 +-
>>  3 files changed, 27 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> index d399fda2b55e..abf74fdb12b4 100644
>> --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> @@ -184,13 +184,35 @@ static void
>> sgx_epc_cgroup_reclaim_work_func(struct work_struct *work)
>>  /**
>>   * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single  
EPC

>> page
>>   * @epc_cg:   The EPC cgroup to be charged for the page.
>> + * @reclaim:  Whether or not synchronous reclaim is allowed
>>   * Return:
>>   * * %0 - If successfully charged.
>>   * * -errno - for failures.
>>   */
>> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
>> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool
>> reclaim)
>>  {
>> -	return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,  
PAGE_SIZE);

>> +  for (;;) {
>> +  if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
>> +  PAGE_SIZE))
>> +  break;
>> +
>> +  if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
>> +  return -ENOMEM;
>> + +if (signal_pending(current))
>> +  return -ERESTARTSYS;
>> +
>> +  if (!reclaim) {
>> +  queue_work(sgx_epc_cg_wq, _cg->reclaim_work);
>> +  return -EBUSY;
>> +  }
>> +
>> +  if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
>> +			/* All pages were too young to reclaim, try again a little later  
*/

>> +  schedule();
>
> This will be total pain to backtrack after a while when something
> needs to be changed so there definitely should be inline comments
> addressing each branch condition.
>
> I'd rethink this as:
>
> 1. Create static __sgx_epc_cgroup_try_charge() for addressing single
>iteration with the new "reclaim" parameter.
> 2. Add a new sgx_epc_group_try_charge_reclaim() function.
>
> There's a bit of redundancy with sgx_epc_cgroup_try_charge() and
> sgx_epc_cgroup_try_charge_reclaim() because both have almost the
> same loop calling internal __sgx_epc_cgroup_try_charge() with
> different parameters. That is totally acceptable.
>
> Please also add my suggested-by.
>
> BR, Jarkko
>
> BR, Jarkko
>
For #2:
The only caller of this function, sgx_alloc_epc_page(), has the same
boolean which is passed into this this function.


I know. This would be good opportunity to fix that up. Large patch
sets should try to make the space for its feature best possible and
thus also clean up the code base overally.


If we separate it into sgx_epc_cgroup_try_charge() and
sgx_epc_cgroup_try_charge_reclaim(), then the caller has to have the
if/else branches. So separation here seems not help?


Of course it does. It makes the code in that location self-documenting
and easier to remember what it does.

BR, Jarkko



Please let me know if this aligns with your suggestion.


static int ___sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
{
if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
PAGE_SIZE))
return 0;

if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
return -ENOMEM;

if (signal_pending(current))
return -ERESTARTSYS;

return -EBUSY;
}

/**
 * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single page
 * @epc_cg: The EPC cgroup to be charged for the page.
 *
 * Try 

[PATCH] arm64: dts: qcom: Fix type of "wdog" IRQs for remoteprocs

2024-02-19 Thread Luca Weiss
rrupts-extended = < 6 IRQ_TYPE_LEVEL_HIGH>,
+   interrupts-extended = < 6 IRQ_TYPE_EDGE_RISING>,
  <_adsp_in 0 
IRQ_TYPE_EDGE_RISING>,
  <_adsp_in 1 
IRQ_TYPE_EDGE_RISING>,
  <_adsp_in 2 
IRQ_TYPE_EDGE_RISING>,
@@ -1511,7 +1511,7 @@ cdsp: remoteproc@830 {
compatible = "qcom,sm6350-cdsp-pas";
reg = <0 0x0830 0 0x1>;
 
-   interrupts-extended = < GIC_SPI 578 
IRQ_TYPE_LEVEL_HIGH>,
+   interrupts-extended = < GIC_SPI 578 
IRQ_TYPE_EDGE_RISING>,
  <_cdsp_in 0 
IRQ_TYPE_EDGE_RISING>,
  <_cdsp_in 1 
IRQ_TYPE_EDGE_RISING>,
  <_cdsp_in 2 
IRQ_TYPE_EDGE_RISING>,
diff --git a/arch/arm64/boot/dts/qcom/sm6375.dtsi 
b/arch/arm64/boot/dts/qcom/sm6375.dtsi
index 4386f8a9c636..f40509d91bbd 100644
--- a/arch/arm64/boot/dts/qcom/sm6375.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm6375.dtsi
@@ -1561,7 +1561,7 @@ remoteproc_adsp: remoteproc@a40 {
compatible = "qcom,sm6375-adsp-pas";
reg = <0 0x0a40 0 0x100>;
 
-   interrupts-extended = < GIC_SPI 282 
IRQ_TYPE_LEVEL_HIGH>,
+   interrupts-extended = < GIC_SPI 282 
IRQ_TYPE_EDGE_RISING>,
  <_adsp_in 0 
IRQ_TYPE_EDGE_RISING>,
  <_adsp_in 1 
IRQ_TYPE_EDGE_RISING>,
  <_adsp_in 2 
IRQ_TYPE_EDGE_RISING>,
diff --git a/arch/arm64/boot/dts/qcom/sm8250.dtsi 
b/arch/arm64/boot/dts/qcom/sm8250.dtsi
index f3c70b87efad..03c7dda1d542 100644
--- a/arch/arm64/boot/dts/qcom/sm8250.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8250.dtsi
@@ -3062,7 +3062,7 @@ slpi: remoteproc@5c0 {
compatible = "qcom,sm8250-slpi-pas";
reg = <0 0x05c0 0 0x4000>;
 
-   interrupts-extended = < 9 IRQ_TYPE_LEVEL_HIGH>,
+   interrupts-extended = < 9 IRQ_TYPE_EDGE_RISING>,
  <_slpi_in 0 
IRQ_TYPE_EDGE_RISING>,
  <_slpi_in 1 
IRQ_TYPE_EDGE_RISING>,
  <_slpi_in 2 
IRQ_TYPE_EDGE_RISING>,
@@ -3766,7 +3766,7 @@ cdsp: remoteproc@830 {
compatible = "qcom,sm8250-cdsp-pas";
reg = <0 0x0830 0 0x1>;
 
-   interrupts-extended = < GIC_SPI 578 
IRQ_TYPE_LEVEL_HIGH>,
+   interrupts-extended = < GIC_SPI 578 
IRQ_TYPE_EDGE_RISING>,
  <_cdsp_in 0 
IRQ_TYPE_EDGE_RISING>,
  <_cdsp_in 1 
IRQ_TYPE_EDGE_RISING>,
  <_cdsp_in 2 
IRQ_TYPE_EDGE_RISING>,
@@ -5928,7 +5928,7 @@ adsp: remoteproc@1730 {
compatible = "qcom,sm8250-adsp-pas";
reg = <0 0x1730 0 0x100>;
 
-   interrupts-extended = < 6 IRQ_TYPE_LEVEL_HIGH>,
+   interrupts-extended = < 6 IRQ_TYPE_EDGE_RISING>,
  <_adsp_in 0 
IRQ_TYPE_EDGE_RISING>,
  <_adsp_in 1 
IRQ_TYPE_EDGE_RISING>,
  <_adsp_in 2 
IRQ_TYPE_EDGE_RISING>,

---
base-commit: 35a4fdde2466b9d90af297f249436a270ef9d30e
change-id: 20240219-remoteproc-irqs-63f3293af260

Best regards,
-- 
Luca Weiss 




Re: [PATCH v7 23/36] function_graph: Add a new exit handler with parent_ip and ftrace_regs

2024-02-19 Thread Google
Hi Steve,

On Sun, 18 Feb 2024 11:53:28 +0900
Masami Hiramatsu (Google)  wrote:

> On Fri, 16 Feb 2024 17:51:08 +0900
> Masami Hiramatsu (Google)  wrote:
> 
> > > > @@ -798,10 +798,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret 
> > > > *trace, unsigned long *ret,
> > > >  
> > > > *index += FGRAPH_RET_INDEX;
> > > > *ret = ret_stack->ret;
> > > > -   trace->func = ret_stack->func;
> > > > -   trace->calltime = ret_stack->calltime;
> > > > -   trace->overrun = atomic_read(>trace_overrun);
> > > > -   trace->depth = current->curr_ret_depth;
> > > 
> > > There's a lot of information stored in the trace structure. Why not pass
> > > that to the new retregfunc?
> > > 
> > > Then you don't need to separate this code out.
> > 
> > Sorry, I couldn't catch what you meant, Would you mean to call
> > ftrace_pop_return_trace() before calling retregfunc()?? because some of the
> > information are found from ret_stack, which is poped from shadow stack.
> 
> Ah, sorry I got what you said. I think this `trace` is not usable for the new
> interface. Most of the information is only used for the function-graph tracer.
> For example, trace->calltime and trace->overrun, trace->depth are used only
> for the function-graph tracer, but not for the other tracers.
> 
> But yeah, this idea is considerable. It also allows us to just update
> entryfunc() and retfunc() to pass fgraph_regs and return address.

The reason why I didn't use the those for *regfunc() is not only those
have unused information, but those does not have some params.

 - ftrace_graph_ent only have current `func`, but entryregfunc()
needs `parent_ip` (== return address)
 - ftrace_graph_ret only have current `func`, but retregfunc()
needs `ret` (== return address) too.

If I update the ftrace_graph_ent/ret to add 'retaddr', we can just pass
ftrace_graph_ent/ret, ftrace_regs, and fgraph_ops to *regfunc().
Moreover, maybe we don't need *regfunc, but just update entryfunc/retfunc
to pass ftrace_regs *, which will be NULL if it is not supported.

Thank you,

-- 
Masami Hiramatsu (Google) 



Re: [PATCH v9 04/15] x86/sgx: Implement basic EPC misc cgroup functionality

2024-02-19 Thread Huang, Kai





[PATCH] dax: constify the struct device_type usage

2024-02-19 Thread Ricardo B. Marliere
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver
core can properly handle constant struct device_type. Move the
dax_mapping_type variable to be a constant structure as well, placing it
into read-only memory which can not be modified at runtime.

Cc: Greg Kroah-Hartman 
Signed-off-by: Ricardo B. Marliere 
---
 drivers/dax/bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 1ff1ab5fa105..e265ba019785 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -763,7 +763,7 @@ static const struct attribute_group 
*dax_mapping_attribute_groups[] = {
NULL,
 };
 
-static struct device_type dax_mapping_type = {
+static const struct device_type dax_mapping_type = {
.release = dax_mapping_release,
.groups = dax_mapping_attribute_groups,
 };

---
base-commit: b401b621758e46812da61fa58a67c3fd8d91de0d
change-id: 20240219-device_cleanup-dax-d82fd0c67ffd

Best regards,
-- 
Ricardo B. Marliere 




Re: [PATCH] tracing: Inform kmemleak of saved_cmdlines allocation

2024-02-19 Thread Catalin Marinas
On Wed, Feb 14, 2024 at 10:26:14AM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" 
> 
> The allocation of the struct saved_cmdlines_buffer structure changed from:
> 
> s = kmalloc(sizeof(*s), GFP_KERNEL);
>   s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);
> 
> to:
> 
>   orig_size = sizeof(*s) + val * TASK_COMM_LEN;
>   order = get_order(orig_size);
>   size = 1 << (order + PAGE_SHIFT);
>   page = alloc_pages(GFP_KERNEL, order);
>   if (!page)
>   return NULL;
> 
>   s = page_address(page);
>   memset(s, 0, sizeof(*s));
> 
>   s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);
> 
> Where that s->saved_cmdlines allocation looks to be a dangling allocation
> to kmemleak. That's because kmemleak only keeps track of kmalloc()
> allocations. For allocations that use page_alloc() directly, the kmemleak
> needs to be explicitly informed about it.
> 
> Add kmemleak_alloc() and kmemleak_free() around the page allocation so
> that it doesn't give the following false positive:
> 
> unreferenced object 0x8881010c8000 (size 32760):
>   comm "swapper", pid 0, jiffies 4294667296
>   hex dump (first 32 bytes):
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  
>   backtrace (crc ae6ec1b9):
> [] kmemleak_alloc+0x45/0x80
> [] __kmalloc_large_node+0x10d/0x190
> [] __kmalloc+0x3b1/0x4c0
> [] allocate_cmdlines_buffer+0x113/0x230
> [] tracer_alloc_buffers.isra.0+0x124/0x460
> [] early_trace_init+0x14/0xa0
> [] start_kernel+0x12e/0x3c0
> [] x86_64_start_reservations+0x18/0x30
> [] x86_64_start_kernel+0x7b/0x80
> [] secondary_startup_64_no_verify+0x15e/0x16b
> 
> Link: https://lore.kernel.org/linux-trace-kernel/87r0hfnr9r@kernel.org/
> 
> Fixes: 44dc5c41b5b1 ("tracing: Fix wasted memory in saved_cmdlines logic")
> Reported-by: Kalle Valo 
> Signed-off-by: Steven Rostedt (Google) 

Reviewed-by: Catalin Marinas 



Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system

2024-02-19 Thread maobibo




On 2024/2/19 下午5:02, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 3:37 PM maobibo  wrote:




On 2024/2/19 下午3:16, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 12:18 PM maobibo  wrote:




On 2024/2/19 上午10:45, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao  wrote:


On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
register access on ipi sending, and two iocsr access on ipi receiving
which is ipi interrupt handler. On VM mode all iocsr registers
accessing will cause VM to trap into hypervisor. So with ipi hw
notification once there will be three times of trap.

This patch adds pv ipi support for VM, hypercall instruction is used
to ipi sender, and hypervisor will inject SWI on the VM. During SWI
interrupt handler, only estat CSR register is written to clear irq.
Estat CSR register access will not trap into hypervisor. So with pv ipi
supported, pv ipi sender will trap into hypervsor one time, pv ipi
revicer will not trap, there is only one time of trap.

Also this patch adds ipi multicast support, the method is similar with
x86. With ipi multicast support, ipi notification can be sent to at most
128 vcpus at one time. It reduces trap times into hypervisor greatly.

Signed-off-by: Bibo Mao 
---
arch/loongarch/include/asm/hardirq.h   |   1 +
arch/loongarch/include/asm/kvm_host.h  |   1 +
arch/loongarch/include/asm/kvm_para.h  | 124 +
arch/loongarch/include/asm/loongarch.h |   1 +
arch/loongarch/kernel/irq.c|   2 +-
arch/loongarch/kernel/paravirt.c   | 113 ++
arch/loongarch/kernel/smp.c|   2 +-
arch/loongarch/kvm/exit.c  |  73 ++-
arch/loongarch/kvm/vcpu.c  |   1 +
9 files changed, 314 insertions(+), 4 deletions(-)

diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..8a611843c1f0 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
typedef struct {
   unsigned int ipi_irqs[NR_IPI];
   unsigned int __softirq_pending;
+   atomic_t messages cacheline_aligned_in_smp;

Do we really need atomic_t? A plain "unsigned int" can reduce cost
significantly.

For IPI, there are multiple senders and one receiver, the sender uses
atomic_fetch_or(action, >messages) and the receiver uses
atomic_xchg(>messages, 0) to clear message.

There needs sync mechanism between senders and receiver, atomic is the
most simple method.

At least from receiver side, the native IPI doesn't need atomic for
read and clear:
static u32 ipi_read_clear(int cpu)
{
  u32 action;

  /* Load the ipi register to figure out what we're supposed to do */
  action = iocsr_read32(LOONGARCH_IOCSR_IPI_STATUS);
  /* Clear the ipi register to clear the interrupt */
  iocsr_write32(action, LOONGARCH_IOCSR_IPI_CLEAR);
  wbflush();

It is because on physical hardware it is two IOCSR registers and also
there is no method to use atomic read and clear method for IOCSR registers.

However if ipi message is stored on ddr memory, atomic read/clear can
used. Your can compare price of one iocsr_read32 + one iocsr_write32 +
wbflush with one atomic_xchg(>messages, 0)

atomic ops is of course cheaper than iocsr, but unsigned int is even cheaper.
Can you give an example about how to use unsigned int for ipi sender and 
ipi handler? When clearing ipi message, other ipi senders need know 
immediately, else ipi senders will call hypercall notification 
unconditionally.


Regards
Bibo Mao





Regards
Bibo Mao


  return action;
}




} cacheline_aligned irq_cpustat_t;

DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 57399d7cf8b7..1bf927e2bfac 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
   u64 idle_exits;
   u64 cpucfg_exits;
   u64 signal_exits;
+   u64 hvcl_exits;

hypercall_exits is better.

yeap, hypercall_exits is better, will fix in next version.



};

#define KVM_MEM_HUGEPAGE_CAPABLE   (1UL << 0)
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 41200e922a82..a25a84e372b9 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -9,6 +9,10 @@
#define HYPERVISOR_VENDOR_SHIFT8
#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) 
+ code)

+#define KVM_HC_CODE_SERVICE0
+#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, 
KVM_HC_CODE_SERVICE)
+#define  KVM_HC_FUNC_IPI   1

Change HC to HCALL is better.

will modify in next version.



+
/*
 * 

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread maobibo




On 2024/2/19 下午5:38, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 5:21 PM maobibo  wrote:




On 2024/2/19 下午4:48, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:




On 2024/2/19 上午10:42, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:


The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
arch/loongarch/Kconfig|  9 
arch/loongarch/include/asm/kvm_para.h |  7 
arch/loongarch/include/asm/paravirt.h | 27 
.../include/asm/paravirt_api_clock.h  |  1 +
arch/loongarch/kernel/Makefile|  1 +
arch/loongarch/kernel/paravirt.c  | 41 +++
arch/loongarch/kernel/setup.c |  2 +
7 files changed, 88 insertions(+)
create mode 100644 arch/loongarch/include/asm/paravirt.h
create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
   bool
   default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+ the kernel is theoretically slower and slightly larger.
+
config ARCH_SUPPORTS_KEXEC
   def_bool y

diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
#ifndef _ASM_LOONGARCH_KVM_PARA_H
#define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
/*
 * LoongArch hypcall return code
 */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}

The steal time code can be removed in this patch, I think.


Originally I want to remove this piece of code, but it fails to compile
if CONFIG_PARAVIRT is selected. Here is reference code, function
paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.

static __always_inline u64 steal_account_process_time(u64 maxtime)
{
#ifdef CONFIG_PARAVIRT
   if (static_key_false(_steal_enabled)) {
   u64 steal;

   steal = paravirt_steal_clock(smp_processor_id());
   steal -= this_rq()->prev_steal_time;
   steal = min(steal, maxtime);
   account_steal_time(steal);
   this_rq()->prev_steal_time += steal;

   return steal;
   }
#endif
   return 0;
}

OK, then keep it.




+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
obj-$(CONFIG_STACKTRACE)   += stacktrace.o


Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread Huacai Chen
On Mon, Feb 19, 2024 at 5:21 PM maobibo  wrote:
>
>
>
> On 2024/2/19 下午4:48, Huacai Chen wrote:
> > On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:
> >>
> >>
> >>
> >> On 2024/2/19 上午10:42, Huacai Chen wrote:
> >>> Hi, Bibo,
> >>>
> >>> On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:
> 
>  The patch adds paravirt interface for guest kernel, function
>  pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
>  runs on VM mode, it will call function kvm_para_available() to detect
>  whether current VMM is KVM hypervisor. And the paravirt function can work
>  only if current VMM is KVM hypervisor, since there is only KVM hypervisor
>  supported on LoongArch now.
> 
>  This patch only adds paravirt interface for guest kernel, however there
>  is not effective pv functions added here.
> 
>  Signed-off-by: Bibo Mao 
>  ---
> arch/loongarch/Kconfig|  9 
> arch/loongarch/include/asm/kvm_para.h |  7 
> arch/loongarch/include/asm/paravirt.h | 27 
> .../include/asm/paravirt_api_clock.h  |  1 +
> arch/loongarch/kernel/Makefile|  1 +
> arch/loongarch/kernel/paravirt.c  | 41 +++
> arch/loongarch/kernel/setup.c |  2 +
> 7 files changed, 88 insertions(+)
> create mode 100644 arch/loongarch/include/asm/paravirt.h
> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> create mode 100644 arch/loongarch/kernel/paravirt.c
> 
>  diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
>  index 10959e6c3583..817a56dff80f 100644
>  --- a/arch/loongarch/Kconfig
>  +++ b/arch/loongarch/Kconfig
>  @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
>    bool
>    default y
> 
>  +config PARAVIRT
>  +   bool "Enable paravirtualization code"
>  +   depends on AS_HAS_LVZ_EXTENSION
>  +   help
>  +  This changes the kernel so it can modify itself when it is run
>  + under a hypervisor, potentially improving performance 
>  significantly
>  + over full virtualization.  However, when run without a 
>  hypervisor
>  + the kernel is theoretically slower and slightly larger.
>  +
> config ARCH_SUPPORTS_KEXEC
>    def_bool y
> 
>  diff --git a/arch/loongarch/include/asm/kvm_para.h 
>  b/arch/loongarch/include/asm/kvm_para.h
>  index 9425d3b7e486..41200e922a82 100644
>  --- a/arch/loongarch/include/asm/kvm_para.h
>  +++ b/arch/loongarch/include/asm/kvm_para.h
>  @@ -2,6 +2,13 @@
> #ifndef _ASM_LOONGARCH_KVM_PARA_H
> #define _ASM_LOONGARCH_KVM_PARA_H
> 
>  +/*
>  + * Hypcall code field
>  + */
>  +#define HYPERVISOR_KVM 1
>  +#define HYPERVISOR_VENDOR_SHIFT8
>  +#define HYPERCALL_CODE(vendor, code)   ((vendor << 
>  HYPERVISOR_VENDOR_SHIFT) + code)
>  +
> /*
>  * LoongArch hypcall return code
>  */
>  diff --git a/arch/loongarch/include/asm/paravirt.h 
>  b/arch/loongarch/include/asm/paravirt.h
>  new file mode 100644
>  index ..b64813592ba0
>  --- /dev/null
>  +++ b/arch/loongarch/include/asm/paravirt.h
>  @@ -0,0 +1,27 @@
>  +/* SPDX-License-Identifier: GPL-2.0 */
>  +#ifndef _ASM_LOONGARCH_PARAVIRT_H
>  +#define _ASM_LOONGARCH_PARAVIRT_H
>  +
>  +#ifdef CONFIG_PARAVIRT
>  +#include 
>  +struct static_key;
>  +extern struct static_key paravirt_steal_enabled;
>  +extern struct static_key paravirt_steal_rq_enabled;
>  +
>  +u64 dummy_steal_clock(int cpu);
>  +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
>  +
>  +static inline u64 paravirt_steal_clock(int cpu)
>  +{
>  +   return static_call(pv_steal_clock)(cpu);
>  +}
> >>> The steal time code can be removed in this patch, I think.
> >>>
> >> Originally I want to remove this piece of code, but it fails to compile
> >> if CONFIG_PARAVIRT is selected. Here is reference code, function
> >> paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.
> >>
> >> static __always_inline u64 steal_account_process_time(u64 maxtime)
> >> {
> >> #ifdef CONFIG_PARAVIRT
> >>   if (static_key_false(_steal_enabled)) {
> >>   u64 steal;
> >>
> >>   steal = paravirt_steal_clock(smp_processor_id());
> >>   steal -= this_rq()->prev_steal_time;
> >>   steal = min(steal, maxtime);
> >>   account_steal_time(steal);
> >>   this_rq()->prev_steal_time += steal;
> >>
> >>   return steal;
> >>   }
> >> #endif
> >>   return 0;
> >> }
> > OK, then keep it.
> >
> >>
>  

Re: [PATCH v18 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper

2024-02-19 Thread Ankit Agrawal
>> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c
>> b/drivers/vfio/pci/nvgrace-gpu/main.c new file mode 100644
>> index ..5a251a6a782e
>> --- /dev/null
>> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
>> @@ -0,0 +1,888 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights
>> reserved
>> + */
>> +
>> +#include 
>> +#include 
>> +
>
> Let's keep the header inclusion in an alphabet order.
>
> With that addressed,
>
> Reviewed-by: Zhi Wang 

Yes, will adjust that. Thanks!

>> +/*
>> + * The device memory usable to the workloads running in the VM is
>> cached
>> + * and showcased as a 64b device BAR (comprising of BAR4 and BAR5
>> region)
>> + * to the VM and is represented as usemem.
>> + * Moreover, the VM GPU device driver needs a non-cacheable region to
>> + * support the MIG feature. This region is also exposed as a 64b BAR
>> + * (comprising of BAR2 and BAR3 region) and represented as resmem.
>> + */
>> +#define RESMEM_REGION_INDEX VFIO_PCI_BAR2_REGION_INDEX
>> +#define USEMEM_REGION_INDEX VFIO_PCI_BAR4_REGION_INDEX



Re: [PATCH v18 3/3] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper

2024-02-19 Thread Zhi Wang
On Fri, 16 Feb 2024 08:31:28 +0530
 wrote:

> From: Ankit Agrawal 
> 
> NVIDIA's upcoming Grace Hopper Superchip provides a PCI-like device
> for the on-chip GPU that is the logical OS representation of the
> internal proprietary chip-to-chip cache coherent interconnect.
> 
> The device is peculiar compared to a real PCI device in that whilst
> there is a real 64b PCI BAR1 (comprising region 2 & region 3) on the
> device, it is not used to access device memory once the faster
> chip-to-chip interconnect is initialized (occurs at the time of host
> system boot). The device memory is accessed instead using the
> chip-to-chip interconnect that is exposed as a contiguous physically
> addressable region on the host. This device memory aperture can be
> obtained from host ACPI table using device_property_read_u64(),
> according to the FW specification. Since the device memory is cache
> coherent with the CPU, it can be mmap into the user VMA with a
> cacheable mapping using remap_pfn_range() and used like a regular
> RAM. The device memory is not added to the host kernel, but mapped
> directly as this reduces memory wastage due to struct pages.
> 
> There is also a requirement of a minimum reserved 1G uncached region
> (termed as resmem) to support the Multi-Instance GPU (MIG) feature
> [1]. This is to work around a HW defect. Based on [2], the requisite
> properties (uncached, unaligned access) can be achieved through a VM
> mapping (S1) of NORMAL_NC and host (S2) mapping with
> MemAttr[2:0]=0b101. To provide a different non-cached property to the
> reserved 1G region, it needs to be carved out from the device memory
> and mapped as a separate region in Qemu VMA with
> pgprot_writecombine(). pgprot_writecombine() sets the Qemu VMA page
> properties (pgprot) as NORMAL_NC.
> 
> Provide a VFIO PCI variant driver that adapts the unique device memory
> representation into a more standard PCI representation facing
> userspace.
> 
> The variant driver exposes these two regions - the non-cached reserved
> (resmem) and the cached rest of the device memory (termed as usemem)
> as separate VFIO 64b BAR regions. This is divergent from the baremetal
> approach, where the device memory is exposed as a device memory
> region. The decision for a different approach was taken in view of
> the fact that it would necessiate additional code in Qemu to discover
> and insert those regions in the VM IPA, along with the additional VM
> ACPI DSDT changes to communicate the device memory region IPA to the
> VM workloads. Moreover, this behavior would have to be added to a
> variety of emulators (beyond top of tree Qemu) out there desiring
> grace hopper support.
> 
> Since the device implements 64-bit BAR0, the VFIO PCI variant driver
> maps the uncached carved out region to the next available PCI BAR
> (i.e. comprising of region 2 and 3). The cached device memory
> aperture is assigned BAR region 4 and 5. Qemu will then naturally
> generate a PCI device in the VM with the uncached aperture reported
> as BAR2 region, the cacheable as BAR4. The variant driver provides
> emulation for these fake BARs' PCI config space offset registers.
> 
> The hardware ensures that the system does not crash when the memory
> is accessed with the memory enable turned off. It synthesis ~0 reads
> and dropped writes on such access. So there is no need to support the
> disablement/enablement of BAR through PCI_COMMAND config space
> register.
> 
> The memory layout on the host looks like the following:
>devmem (memlength)
> |--|
> |-cached|--NC--|
> |   |
> usemem.memphys  resmem.memphys
> 
> PCI BARs need to be aligned to the power-of-2, but the actual memory
> on the device may not. A read or write access to the physical address
> from the last device PFN up to the next power-of-2 aligned physical
> address results in reading ~0 and dropped writes. Note that the GPU
> device driver [6] is capable of knowing the exact device memory size
> through separate means. The device memory size is primarily kept in
> the system ACPI tables for use by the VFIO PCI variant module.
> 
> Note that the usemem memory is added by the VM Nvidia device driver
> [5] to the VM kernel as memblocks. Hence make the usable memory size
> memblock (MEMBLK_SIZE) aligned. This is a hardwired ABI value between
> the GPU FW and VFIO driver. The VM device driver make use of the same
> value for its calculation to determine USEMEM size.
> 
> Currently there is no provision in KVM for a S2 mapping with
> MemAttr[2:0]=0b101, but there is an ongoing effort to provide the
> same [3]. As previously mentioned, resmem is mapped
> pgprot_writecombine(), that sets the Qemu VMA page properties
> (pgprot) as NORMAL_NC. Using the proposed changes in [3] and [4], KVM
> marks the region with MemAttr[2:0]=0b101 in S2.
> 
> If the device 

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread maobibo




On 2024/2/19 下午4:48, Huacai Chen wrote:

On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:




On 2024/2/19 上午10:42, Huacai Chen wrote:

Hi, Bibo,

On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:


The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
   arch/loongarch/Kconfig|  9 
   arch/loongarch/include/asm/kvm_para.h |  7 
   arch/loongarch/include/asm/paravirt.h | 27 
   .../include/asm/paravirt_api_clock.h  |  1 +
   arch/loongarch/kernel/Makefile|  1 +
   arch/loongarch/kernel/paravirt.c  | 41 +++
   arch/loongarch/kernel/setup.c |  2 +
   7 files changed, 88 insertions(+)
   create mode 100644 arch/loongarch/include/asm/paravirt.h
   create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
   create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
  bool
  default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+ the kernel is theoretically slower and slightly larger.
+
   config ARCH_SUPPORTS_KEXEC
  def_bool y

diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
   #ifndef _ASM_LOONGARCH_KVM_PARA_H
   #define _ASM_LOONGARCH_KVM_PARA_H

+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
   /*
* LoongArch hypcall return code
*/
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}

The steal time code can be removed in this patch, I think.


Originally I want to remove this piece of code, but it fails to compile
if CONFIG_PARAVIRT is selected. Here is reference code, function
paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.

static __always_inline u64 steal_account_process_time(u64 maxtime)
{
#ifdef CONFIG_PARAVIRT
  if (static_key_false(_steal_enabled)) {
  u64 steal;

  steal = paravirt_steal_clock(smp_processor_id());
  steal -= this_rq()->prev_steal_time;
  steal = min(steal, maxtime);
  account_steal_time(steal);
  this_rq()->prev_steal_time += steal;

  return steal;
  }
#endif
  return 0;
}

OK, then keep it.




+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
   obj-$(CONFIG_STACKTRACE)   += stacktrace.o

   obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o

   obj-$(CONFIG_SMP)  += 

Re: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held

2024-02-19 Thread Dan Carpenter
On Fri, Feb 16, 2024 at 05:17:20PM -0800, Dan Williams wrote:
> > commit 807fbe9cac9b190dab83e3ff377a30d18859c8ab
> > Author: Ira Weiny 
> > Date:   Wed Feb 14 15:25:24 2024 -0800
> > 
> > Revert "acpi/ghes: Process CXL Component Events"
> > 
> > This reverts commit 671a794c33c6e048ca5cedd5ad6af44d52d5d7e5.
> 
> Even reverts need changelogs, I can add one. I got conflicts trying to
> apply this to current fixes branch. I think I am going to just
> surgically backout the drivers/acpi/apei/ghes.c changes.

The `git revert` command comes from ancient times and it just sets
people up for failure...  :/

regards,
dan carpenter



Re: [PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system

2024-02-19 Thread Huacai Chen
On Mon, Feb 19, 2024 at 3:37 PM maobibo  wrote:
>
>
>
> On 2024/2/19 下午3:16, Huacai Chen wrote:
> > On Mon, Feb 19, 2024 at 12:18 PM maobibo  wrote:
> >>
> >>
> >>
> >> On 2024/2/19 上午10:45, Huacai Chen wrote:
> >>> Hi, Bibo,
> >>>
> >>> On Thu, Feb 1, 2024 at 11:20 AM Bibo Mao  wrote:
> 
>  On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
>  register access on ipi sending, and two iocsr access on ipi receiving
>  which is ipi interrupt handler. On VM mode all iocsr registers
>  accessing will cause VM to trap into hypervisor. So with ipi hw
>  notification once there will be three times of trap.
> 
>  This patch adds pv ipi support for VM, hypercall instruction is used
>  to ipi sender, and hypervisor will inject SWI on the VM. During SWI
>  interrupt handler, only estat CSR register is written to clear irq.
>  Estat CSR register access will not trap into hypervisor. So with pv ipi
>  supported, pv ipi sender will trap into hypervsor one time, pv ipi
>  revicer will not trap, there is only one time of trap.
> 
>  Also this patch adds ipi multicast support, the method is similar with
>  x86. With ipi multicast support, ipi notification can be sent to at most
>  128 vcpus at one time. It reduces trap times into hypervisor greatly.
> 
>  Signed-off-by: Bibo Mao 
>  ---
> arch/loongarch/include/asm/hardirq.h   |   1 +
> arch/loongarch/include/asm/kvm_host.h  |   1 +
> arch/loongarch/include/asm/kvm_para.h  | 124 +
> arch/loongarch/include/asm/loongarch.h |   1 +
> arch/loongarch/kernel/irq.c|   2 +-
> arch/loongarch/kernel/paravirt.c   | 113 ++
> arch/loongarch/kernel/smp.c|   2 +-
> arch/loongarch/kvm/exit.c  |  73 ++-
> arch/loongarch/kvm/vcpu.c  |   1 +
> 9 files changed, 314 insertions(+), 4 deletions(-)
> 
>  diff --git a/arch/loongarch/include/asm/hardirq.h 
>  b/arch/loongarch/include/asm/hardirq.h
>  index 9f0038e19c7f..8a611843c1f0 100644
>  --- a/arch/loongarch/include/asm/hardirq.h
>  +++ b/arch/loongarch/include/asm/hardirq.h
>  @@ -21,6 +21,7 @@ enum ipi_msg_type {
> typedef struct {
>    unsigned int ipi_irqs[NR_IPI];
>    unsigned int __softirq_pending;
>  +   atomic_t messages cacheline_aligned_in_smp;
> >>> Do we really need atomic_t? A plain "unsigned int" can reduce cost
> >>> significantly.
> >> For IPI, there are multiple senders and one receiver, the sender uses
> >> atomic_fetch_or(action, >messages) and the receiver uses
> >> atomic_xchg(>messages, 0) to clear message.
> >>
> >> There needs sync mechanism between senders and receiver, atomic is the
> >> most simple method.
> > At least from receiver side, the native IPI doesn't need atomic for
> > read and clear:
> > static u32 ipi_read_clear(int cpu)
> > {
> >  u32 action;
> >
> >  /* Load the ipi register to figure out what we're supposed to do */
> >  action = iocsr_read32(LOONGARCH_IOCSR_IPI_STATUS);
> >  /* Clear the ipi register to clear the interrupt */
> >  iocsr_write32(action, LOONGARCH_IOCSR_IPI_CLEAR);
> >  wbflush();
> It is because on physical hardware it is two IOCSR registers and also
> there is no method to use atomic read and clear method for IOCSR registers.
>
> However if ipi message is stored on ddr memory, atomic read/clear can
> used. Your can compare price of one iocsr_read32 + one iocsr_write32 +
> wbflush with one atomic_xchg(>messages, 0)
atomic ops is of course cheaper than iocsr, but unsigned int is even cheaper.

>
> Regards
> Bibo Mao
> >
> >  return action;
> > }
> >
> >>>
> } cacheline_aligned irq_cpustat_t;
> 
> DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
>  diff --git a/arch/loongarch/include/asm/kvm_host.h 
>  b/arch/loongarch/include/asm/kvm_host.h
>  index 57399d7cf8b7..1bf927e2bfac 100644
>  --- a/arch/loongarch/include/asm/kvm_host.h
>  +++ b/arch/loongarch/include/asm/kvm_host.h
>  @@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
>    u64 idle_exits;
>    u64 cpucfg_exits;
>    u64 signal_exits;
>  +   u64 hvcl_exits;
> >>> hypercall_exits is better.
> >> yeap, hypercall_exits is better, will fix in next version.
> >>>
> };
> 
> #define KVM_MEM_HUGEPAGE_CAPABLE   (1UL << 0)
>  diff --git a/arch/loongarch/include/asm/kvm_para.h 
>  b/arch/loongarch/include/asm/kvm_para.h
>  index 41200e922a82..a25a84e372b9 100644
>  --- a/arch/loongarch/include/asm/kvm_para.h
>  +++ b/arch/loongarch/include/asm/kvm_para.h
>  @@ -9,6 +9,10 @@
> #define HYPERVISOR_VENDOR_SHIFT8
> #define HYPERCALL_CODE(vendor, code)   ((vendor << 
> 

Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-02-19 Thread Huacai Chen
On Mon, Feb 19, 2024 at 12:11 PM maobibo  wrote:
>
>
>
> On 2024/2/19 上午10:42, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao  wrote:
> >>
> >> The patch adds paravirt interface for guest kernel, function
> >> pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
> >> runs on VM mode, it will call function kvm_para_available() to detect
> >> whether current VMM is KVM hypervisor. And the paravirt function can work
> >> only if current VMM is KVM hypervisor, since there is only KVM hypervisor
> >> supported on LoongArch now.
> >>
> >> This patch only adds paravirt interface for guest kernel, however there
> >> is not effective pv functions added here.
> >>
> >> Signed-off-by: Bibo Mao 
> >> ---
> >>   arch/loongarch/Kconfig|  9 
> >>   arch/loongarch/include/asm/kvm_para.h |  7 
> >>   arch/loongarch/include/asm/paravirt.h | 27 
> >>   .../include/asm/paravirt_api_clock.h  |  1 +
> >>   arch/loongarch/kernel/Makefile|  1 +
> >>   arch/loongarch/kernel/paravirt.c  | 41 +++
> >>   arch/loongarch/kernel/setup.c |  2 +
> >>   7 files changed, 88 insertions(+)
> >>   create mode 100644 arch/loongarch/include/asm/paravirt.h
> >>   create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> >>   create mode 100644 arch/loongarch/kernel/paravirt.c
> >>
> >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >> index 10959e6c3583..817a56dff80f 100644
> >> --- a/arch/loongarch/Kconfig
> >> +++ b/arch/loongarch/Kconfig
> >> @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
> >>  bool
> >>  default y
> >>
> >> +config PARAVIRT
> >> +   bool "Enable paravirtualization code"
> >> +   depends on AS_HAS_LVZ_EXTENSION
> >> +   help
> >> +  This changes the kernel so it can modify itself when it is run
> >> + under a hypervisor, potentially improving performance 
> >> significantly
> >> + over full virtualization.  However, when run without a hypervisor
> >> + the kernel is theoretically slower and slightly larger.
> >> +
> >>   config ARCH_SUPPORTS_KEXEC
> >>  def_bool y
> >>
> >> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> >> b/arch/loongarch/include/asm/kvm_para.h
> >> index 9425d3b7e486..41200e922a82 100644
> >> --- a/arch/loongarch/include/asm/kvm_para.h
> >> +++ b/arch/loongarch/include/asm/kvm_para.h
> >> @@ -2,6 +2,13 @@
> >>   #ifndef _ASM_LOONGARCH_KVM_PARA_H
> >>   #define _ASM_LOONGARCH_KVM_PARA_H
> >>
> >> +/*
> >> + * Hypcall code field
> >> + */
> >> +#define HYPERVISOR_KVM 1
> >> +#define HYPERVISOR_VENDOR_SHIFT8
> >> +#define HYPERCALL_CODE(vendor, code)   ((vendor << 
> >> HYPERVISOR_VENDOR_SHIFT) + code)
> >> +
> >>   /*
> >>* LoongArch hypcall return code
> >>*/
> >> diff --git a/arch/loongarch/include/asm/paravirt.h 
> >> b/arch/loongarch/include/asm/paravirt.h
> >> new file mode 100644
> >> index ..b64813592ba0
> >> --- /dev/null
> >> +++ b/arch/loongarch/include/asm/paravirt.h
> >> @@ -0,0 +1,27 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> >> +#define _ASM_LOONGARCH_PARAVIRT_H
> >> +
> >> +#ifdef CONFIG_PARAVIRT
> >> +#include 
> >> +struct static_key;
> >> +extern struct static_key paravirt_steal_enabled;
> >> +extern struct static_key paravirt_steal_rq_enabled;
> >> +
> >> +u64 dummy_steal_clock(int cpu);
> >> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> >> +
> >> +static inline u64 paravirt_steal_clock(int cpu)
> >> +{
> >> +   return static_call(pv_steal_clock)(cpu);
> >> +}
> > The steal time code can be removed in this patch, I think.
> >
> Originally I want to remove this piece of code, but it fails to compile
> if CONFIG_PARAVIRT is selected. Here is reference code, function
> paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected.
>
> static __always_inline u64 steal_account_process_time(u64 maxtime)
> {
> #ifdef CONFIG_PARAVIRT
>  if (static_key_false(_steal_enabled)) {
>  u64 steal;
>
>  steal = paravirt_steal_clock(smp_processor_id());
>  steal -= this_rq()->prev_steal_time;
>  steal = min(steal, maxtime);
>  account_steal_time(steal);
>  this_rq()->prev_steal_time += steal;
>
>  return steal;
>  }
> #endif
>  return 0;
> }
OK, then keep it.

>
> >> +
> >> +int pv_guest_init(void);
> >> +#else
> >> +static inline int pv_guest_init(void)
> >> +{
> >> +   return 0;
> >> +}
> >> +
> >> +#endif // CONFIG_PARAVIRT
> >> +#endif
> >> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
> >> b/arch/loongarch/include/asm/paravirt_api_clock.h
> >> new file mode 100644
> >> index ..65ac7cee0dad
> >> --- /dev/null
> >> +++