Re: [PATCH] drm/panel/raydium-rm692e5: select CONFIG_DRM_DISPLAY_DP_HELPER

2023-12-27 Thread Luca Weiss
On Mon Oct 23, 2023 at 3:25 PM CEST, Neil Armstrong wrote:
> Hi,
>
> On 23/10/2023 13:55, Arnd Bergmann wrote:
> > From: Arnd Bergmann 
> > 
> > As with several other panel drivers, this fails to link without the DP
> > helper library:
> > 
> > ld: drivers/gpu/drm/panel/panel-raydium-rm692e5.o: in function 
> > `rm692e5_prepare':
> > panel-raydium-rm692e5.c:(.text+0x11f4): undefined reference to 
> > `drm_dsc_pps_payload_pack'
> > 
> > Select the same symbols that the others already use.
> > 
> > Fixes: 988d0ff29ecf7 ("drm/panel: Add driver for BOE RM692E5 AMOLED panel")
> > Signed-off-by: Arnd Bergmann 
> > ---
> >   drivers/gpu/drm/panel/Kconfig | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/panel/Kconfig b/drivers/gpu/drm/panel/Kconfig
> > index 99e14dc212ecb..a4ac4b4fe 100644
> > --- a/drivers/gpu/drm/panel/Kconfig
> > +++ b/drivers/gpu/drm/panel/Kconfig
> > @@ -530,6 +530,8 @@ config DRM_PANEL_RAYDIUM_RM692E5
> > depends on OF
> > depends on DRM_MIPI_DSI
> > depends on BACKLIGHT_CLASS_DEVICE
> > +   select DRM_DISPLAY_DP_HELPER
> > +   select DRM_DISPLAY_HELPER
> > help
> >   Say Y here if you want to enable support for Raydium RM692E5-based
> >   display panels, such as the one found in the Fairphone 5 smartphone.
>
> Will apply once drm-misc-next-fixes is synced with the last drm-misc-next PR 
> for v6.7.

Hi Neil,

I think this patch is still pending, I don't see it in linux-next.

It was also reported by a buildbot today
https://lore.kernel.org/lkml/202312281138.phn1js8s-...@intel.com/

Regards
Luca

>
> Neil



Re: [PATCH v3 00/11] Add mediate-drm secure flow for SVP

2023-12-27 Thread 胡俊光


[PATCH v9 0/2] Resolve suspend-resume racing with GuC destroy-context-worker

2023-12-27 Thread Alan Previn
This series is the result of debugging issues root caused to
races between the GuC's destroyed_worker_func being triggered
vs repeating suspend-resume cycles with concurrent delayed
fence signals for engine-freeing.

The reproduction steps require that an app is launched right
before the start of the suspend cycle where it creates a
new gem context and submits a tiny workload that would
complete in the middle of the suspend cycle. However this
app uses dma-buffer sharing or dma-fence with non-GPU
objects or signals that eventually triggers a FENCE_FREE
via__i915_sw_fence_notify that connects to engines_notify ->
free_engines_rcu -> intel_context_put ->
kref_put(>ref..) that queues the worker after the GuCs
CTB has been disabled (i.e. after i915-gem's suspend-late).

This sequence is a corner-case and required repeating this
app->suspend->resume cycle ~1500 times across 4 identical
systems to see it once. That said, based on above callstack,
it is clear that merely flushing the context destruction worker,
which is obviously missing and needed, isn't sufficient.

Because of that, this series adds additional patches besides
the obvious (Patch #1) flushing of the worker during the
suspend flows. It also includes (Patch #2) closing a race
between sending the context-deregistration H2G vs the CTB
getting disabled in the midst of it (by detecing the failure
and unrolling the guc-lrc-unpin flow) and adding an additional
rcu_barrier in the gem-suspend flow to purge outstanding
rcu defered tasks that may include context destruction.

This patch was tested and confirmed to be reliably working
after running ~1500 suspend resume cycles on 4 concurrent
machines.

Changes from prior revs:
   v8: - Rebase again to resolve conflicts (Rodrigo).
   v7: - Rebase on latest drm-tip.
   v6: - Dont hold the spinlock while calling deregister_context
 which can take a longer time. (Rodrigo)
   - Fix / improve of comments. (Rodrigo)
   - Release the ce->guc_state.lock before calling
 deregister_context and retake it if we fail. (John/Daniele).
   v5: - Remove Patch #3 which doesnt solve this exact bug
 but can be a separate patch(Tvrtko).
   v4: - In Patch #2, change the position of the calls into
 rcu_barrier based on latest testing data. (Alan/Anshuman).
   - In Patch #3, fix the timeout value selection for the
 final gt-pm idle-wait that was incorrectly using a 'ns'
 #define as a milisec timeout.
   v3: - In Patch #3, when deregister_context fails, instead
 of calling intel_gt_pm_put(that might sleep), call
 __intel_wakeref_put (without ASYNC flag) (Rodrigo/Anshuman).
   - In wait_for_suspend add an rcu_barrier before we
 proceed to wait for idle. (Anshuman)
   v2: - Patch #2 Restructure code in guc_lrc_desc_unpin so
 it's more readible to differentiate (1)direct guc-id
 cleanup ..vs (2) sending the H2G ctx-destroy action ..
 vs (3) the unrolling steps if the H2G fails.
   - Patch #2 Add a check to close the race sooner by checking
 for intel_guc_is_ready from destroyed_worker_func.
   - Patch #2 When guc_submission_send_busy_loop gets a
 failure from intel_guc_send_busy_loop, we need to undo
 i.e. decrement the outstanding_submission_g2h.
   - Patch #3 In wait_for_suspend, fix checking of return from
 intel_gt_pm_wait_timeout_for_idle to now use -ETIMEDOUT
 and add documentation for intel_wakeref_wait_for_idle.
 (Rodrigo).

Alan Previn (2):
  drm/i915/guc: Flush context destruction worker at suspend
  drm/i915/guc: Close deregister-context race against CT-loss

 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 78 +--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  2 +
 4 files changed, 87 insertions(+), 5 deletions(-)


base-commit: 56c351e9f8c3ca9389a924ff2a46b34b50fb37dd
-- 
2.39.0



[PATCH v9 1/2] drm/i915/guc: Flush context destruction worker at suspend

2023-12-27 Thread Alan Previn
When suspending, flush the context-guc-id
deregistration worker at the final stages of
intel_gt_suspend_late when we finally call gt_sanitize
that eventually leads down to __uc_sanitize so that
the deregistration worker doesn't fire off later as
we reset the GuC microcontroller.

Signed-off-by: Alan Previn 
Reviewed-by: Rodrigo Vivi 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a259f1118c5a..9c64ae0766cc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1613,6 +1613,11 @@ static void guc_flush_submissions(struct intel_guc *guc)
spin_unlock_irqrestore(_engine->lock, flags);
 }
 
+void intel_guc_submission_flush_work(struct intel_guc *guc)
+{
+   flush_work(>submission_state.destroyed_worker);
+}
+
 static void guc_flush_destroyed_contexts(struct intel_guc *guc);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index c57b29cdb1a6..b6df75622d3b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -38,6 +38,8 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
   bool interruptible,
   long timeout);
 
+void intel_guc_submission_flush_work(struct intel_guc *guc);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
return guc->submission_supported;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 3872d309ed31..b8b09b1bee3e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -690,6 +690,8 @@ void intel_uc_suspend(struct intel_uc *uc)
return;
}
 
+   intel_guc_submission_flush_work(guc);
+
with_intel_runtime_pm(_to_gt(uc)->i915->runtime_pm, wakeref) {
err = intel_guc_suspend(guc);
if (err)
-- 
2.39.0



[PATCH v9 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-12-27 Thread Alan Previn
If we are at the end of suspend or very early in resume
its possible an async fence signal (via rcu_call) is triggered
to free_engines which could lead us to the execution of
the context destruction worker (after a prior worker flush).

Thus, when suspending, insert rcu_barriers at the start
of i915_gem_suspend (part of driver's suspend prepare) and
again in i915_gem_suspend_late so that all such cases have
completed and context destruction list isn't missing anything.

In destroyed_worker_func, close the race against CT-loss
by checking that CT is enabled before calling into
deregister_destroyed_contexts.

Based on testing, guc_lrc_desc_unpin may still race and fail
as we traverse the GuC's context-destroy list because the
CT could be disabled right before calling GuC's CT send function.

We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).

In such a case, we need to unroll the entire process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. Without the unroll, the taken wakeref is leaked and will
cascade into a kernel hang later at the tail end of suspend in
this function:

   intel_wakeref_wait_for_idle(>wakeref)
   (called by) - intel_gt_pm_wait_for_idle
   (called by) - wait_for_suspend

Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
contexts if guc_lrc_desc_unpin fails due to CT send falure.
When unrolling, keep the context in the GuC's destroy-list so
it can get picked up on the next destroy worker invocation
(if suspend aborted) or get fully purged as part of a GuC
sanitization (end of suspend) or a reset flow.

Signed-off-by: Alan Previn 
Signed-off-by: Anshuman Gupta 
Tested-by: Mousumi Jana 
Acked-by: Daniele Ceraolo Spurio 
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 73 +--
 2 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 0d812f4d787d..3b27218aabe2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -28,6 +28,13 @@ void i915_gem_suspend(struct drm_i915_private *i915)
GEM_TRACE("%s\n", dev_name(i915->drm.dev));
 
intel_wakeref_auto(>runtime_pm.userfault_wakeref, 0);
+   /*
+* On rare occasions, we've observed the fence completion triggers
+* free_engines asynchronously via rcu_call. Ensure those are done.
+* This path is only called on suspend, so it's an acceptable cost.
+*/
+   rcu_barrier();
+
flush_workqueue(i915->wq);
 
/*
@@ -160,6 +167,9 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
 * machine in an unusable condition.
 */
 
+   /* Like i915_gem_suspend, flush tasks staged from fence triggers */
+   rcu_barrier();
+
for_each_gt(gt, i915, i)
intel_gt_suspend_late(gt);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9c64ae0766cc..cae637fc3ead 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -236,6 +236,13 @@ set_context_destroyed(struct intel_context *ce)
ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline void
+clr_context_destroyed(struct intel_context *ce)
+{
+   lockdep_assert_held(>guc_state.lock);
+   ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED;
+}
+
 static inline bool context_pending_disable(struct intel_context *ce)
 {
return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
@@ -613,6 +620,8 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
 u32 g2h_len_dw,
 bool loop)
 {
+   int ret;
+
/*
 * We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
 * so we don't handle the case where we don't get a reply because we
@@ -623,7 +632,11 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
if (g2h_len_dw)
atomic_inc(>outstanding_submission_g2h);
 
-   return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   if (ret)
+   atomic_dec(>outstanding_submission_g2h);
+
+   return ret;
 }
 
 int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
@@ -3288,12 +3301,13 @@ static void guc_context_close(struct intel_context *ce)

Re: [PATCH v8 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-12-27 Thread Teres Alexis, Alan Previn
On Tue, 2023-12-26 at 10:11 -0500, Vivi, Rodrigo wrote:
> On Wed, Dec 20, 2023 at 11:08:59PM +, Teres Alexis, Alan Previn wrote:
> > On Wed, 2023-12-13 at 16:23 -0500, Vivi, Rodrigo wrote:
alan:snip

> > 
> > 
> > alan: Thanks Rodrigo for the RB last week, just quick update:
> > 
> > I've cant reproduce the BAT failures that seem to be intermittent
> > on platform and test - however, a noticable number of failures
> > do keep occuring on i915_selftest @live @requests where the
> > last test leaked a wakeref and the failing test hangs waiting
> > for gt to idle before starting its test.
> > 
> > i have to debug this further although from code inspection
> > is unrelated to the patches in this series.
> > Hopefully its a different issue.
> 
> Yeap, likely not related. Anyway, I'm sorry for not merging
> this sooner. Could you please send a rebased version? This
> on is not applying cleanly anymore.

Hi Rodrigo, i will rebase it as soon as i do a bit more testing..
I realize i was using a slighlty older guc and with newer guc am
seeing all kinds of failures but trending as not an issue with the series.



Re: [syzbot] [dri?] WARNING in drm_prime_destroy_file_private (2)

2023-12-27 Thread Qi Zheng




On 2023/12/28 04:51, syzbot wrote:

Hello,

syzbot found the following issue on:

HEAD commit:5254c0cbc92d Merge tag 'block-6.7-2023-12-22' of git://git..
git tree:   upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=10cc6995e8
kernel config:  https://syzkaller.appspot.com/x/.config?x=314e9ad033a7d3a7
dashboard link: https://syzkaller.appspot.com/bug?extid=59dcc2e7283a6f5f5ba1
compiler:   gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 
2.40
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=13e35809e8
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=155d5fd6e8

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/ebe09a5995ee/disk-5254c0cb.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/02178d7f5f98/vmlinux-5254c0cb.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/12307f47d87c/bzImage-5254c0cb.xz

The issue was bisected to:

commit ea4452de2ae987342fadbdd2c044034e6480daad
Author: Qi Zheng 
Date:   Fri Nov 18 10:00:11 2022 +

 mm: fix unexpected changes to {failslab|fail_page_alloc}.attr

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13027f76e8
final oops: https://syzkaller.appspot.com/x/report.txt?x=10827f76e8
console output: https://syzkaller.appspot.com/x/log.txt?x=17027f76e8

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+59dcc2e7283a6f5f5...@syzkaller.appspotmail.com
Fixes: ea4452de2ae9 ("mm: fix unexpected changes to 
{failslab|fail_page_alloc}.attr")

R10:  R11: 0246 R12: 7efe98069194
R13: 7efe97fd2210 R14: 0002 R15: 6972642f7665642f
  
[ cut here ]
WARNING: CPU: 0 PID: 5107 at drivers/gpu/drm/drm_prime.c:227 
drm_prime_destroy_file_private+0x43/0x60 drivers/gpu/drm/drm_prime.c:227


The warning is caused by !RB_EMPTY_ROOT(_fpriv->dmabufs):

drm_prime_destroy_file_private
--> WARN_ON(!RB_EMPTY_ROOT(_fpriv->dmabufs));

It seems irrelevant to the logic of fault injection. So I don't see
why the commit ea4452de2ae9 can cause this warning. :(


Modules linked in:
CPU: 0 PID: 5107 Comm: syz-executor227 Not tainted 
6.7.0-rc6-syzkaller-00248-g5254c0cbc92d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
11/17/2023
RIP: 0010:drm_prime_destroy_file_private+0x43/0x60 
drivers/gpu/drm/drm_prime.c:227
Code: 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 21 48 8b 83 90 00 00 00 48 
85 c0 75 06 5b e9 13 f1 93 fc e8 0e f1 93 fc 90 <0f> 0b 90 5b e9 04 f1 93 fc e8 
3f 9b ea fc eb d8 66 66 2e 0f 1f 84
RSP: 0018:c90003bdf9e0 EFLAGS: 00010293
RAX:  RBX: 888019f28378 RCX: c90003bdf9b0
RDX: 888018ff9dc0 RSI: 84f380c2 RDI: 888019f28408
RBP: 888019f28000 R08: 0001 R09: 0001
R10: 8f193a57 R11:  R12: 88814829a000
R13: 888019f282a8 R14: 88814829a068 R15: 88814829a0a0
FS:  () GS:8880b980() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7efe98050410 CR3: 6d1ff000 CR4: 00350ef0
Call Trace:
  
  drm_file_free.part.0+0x738/0xb90 drivers/gpu/drm/drm_file.c:290
  drm_file_free drivers/gpu/drm/drm_file.c:247 [inline]
  drm_close_helper.isra.0+0x180/0x1f0 drivers/gpu/drm/drm_file.c:307
  drm_release+0x22a/0x4f0 drivers/gpu/drm/drm_file.c:494
  __fput+0x270/0xb70 fs/file_table.c:394
  task_work_run+0x14d/0x240 kernel/task_work.c:180
  exit_task_work include/linux/task_work.h:38 [inline]
  do_exit+0xa8a/0x2ad0 kernel/exit.c:869
  do_group_exit+0xd4/0x2a0 kernel/exit.c:1018
  get_signal+0x23b5/0x2790 kernel/signal.c:2904
  arch_do_signal_or_restart+0x90/0x7f0 arch/x86/kernel/signal.c:309
  exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
  exit_to_user_mode_prepare+0x121/0x240 kernel/entry/common.c:204
  __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
  syscall_exit_to_user_mode+0x1e/0x60 kernel/entry/common.c:296
  do_syscall_64+0x4d/0x110 arch/x86/entry/common.c:89
  entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7efe98014769
Code: Unable to access opcode bytes at 0x7efe9801473f.
RSP: 002b:7efe97fd2208 EFLAGS: 0246 ORIG_RAX: 00ca
RAX: fe00 RBX: 7efe9809c408 RCX: 7efe98014769
RDX:  RSI: 0080 RDI: 7efe9809c408
RBP: 7efe9809c400 R08: 3131 R09: 3131
R10:  R11: 0246 R12: 7efe98069194
R13: 7efe97fd2210 R14: 0002 R15: 6972642f7665642f
  


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information 

[syzbot] [dri?] WARNING in drm_prime_destroy_file_private (2)

2023-12-27 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:5254c0cbc92d Merge tag 'block-6.7-2023-12-22' of git://git..
git tree:   upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=10cc6995e8
kernel config:  https://syzkaller.appspot.com/x/.config?x=314e9ad033a7d3a7
dashboard link: https://syzkaller.appspot.com/bug?extid=59dcc2e7283a6f5f5ba1
compiler:   gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 
2.40
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=13e35809e8
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=155d5fd6e8

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/ebe09a5995ee/disk-5254c0cb.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/02178d7f5f98/vmlinux-5254c0cb.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/12307f47d87c/bzImage-5254c0cb.xz

The issue was bisected to:

commit ea4452de2ae987342fadbdd2c044034e6480daad
Author: Qi Zheng 
Date:   Fri Nov 18 10:00:11 2022 +

mm: fix unexpected changes to {failslab|fail_page_alloc}.attr

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13027f76e8
final oops: https://syzkaller.appspot.com/x/report.txt?x=10827f76e8
console output: https://syzkaller.appspot.com/x/log.txt?x=17027f76e8

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+59dcc2e7283a6f5f5...@syzkaller.appspotmail.com
Fixes: ea4452de2ae9 ("mm: fix unexpected changes to 
{failslab|fail_page_alloc}.attr")

R10:  R11: 0246 R12: 7efe98069194
R13: 7efe97fd2210 R14: 0002 R15: 6972642f7665642f
 
[ cut here ]
WARNING: CPU: 0 PID: 5107 at drivers/gpu/drm/drm_prime.c:227 
drm_prime_destroy_file_private+0x43/0x60 drivers/gpu/drm/drm_prime.c:227
Modules linked in:
CPU: 0 PID: 5107 Comm: syz-executor227 Not tainted 
6.7.0-rc6-syzkaller-00248-g5254c0cbc92d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
11/17/2023
RIP: 0010:drm_prime_destroy_file_private+0x43/0x60 
drivers/gpu/drm/drm_prime.c:227
Code: 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 21 48 8b 83 90 00 00 
00 48 85 c0 75 06 5b e9 13 f1 93 fc e8 0e f1 93 fc 90 <0f> 0b 90 5b e9 04 f1 93 
fc e8 3f 9b ea fc eb d8 66 66 2e 0f 1f 84
RSP: 0018:c90003bdf9e0 EFLAGS: 00010293
RAX:  RBX: 888019f28378 RCX: c90003bdf9b0
RDX: 888018ff9dc0 RSI: 84f380c2 RDI: 888019f28408
RBP: 888019f28000 R08: 0001 R09: 0001
R10: 8f193a57 R11:  R12: 88814829a000
R13: 888019f282a8 R14: 88814829a068 R15: 88814829a0a0
FS:  () GS:8880b980() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7efe98050410 CR3: 6d1ff000 CR4: 00350ef0
Call Trace:
 
 drm_file_free.part.0+0x738/0xb90 drivers/gpu/drm/drm_file.c:290
 drm_file_free drivers/gpu/drm/drm_file.c:247 [inline]
 drm_close_helper.isra.0+0x180/0x1f0 drivers/gpu/drm/drm_file.c:307
 drm_release+0x22a/0x4f0 drivers/gpu/drm/drm_file.c:494
 __fput+0x270/0xb70 fs/file_table.c:394
 task_work_run+0x14d/0x240 kernel/task_work.c:180
 exit_task_work include/linux/task_work.h:38 [inline]
 do_exit+0xa8a/0x2ad0 kernel/exit.c:869
 do_group_exit+0xd4/0x2a0 kernel/exit.c:1018
 get_signal+0x23b5/0x2790 kernel/signal.c:2904
 arch_do_signal_or_restart+0x90/0x7f0 arch/x86/kernel/signal.c:309
 exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
 exit_to_user_mode_prepare+0x121/0x240 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
 syscall_exit_to_user_mode+0x1e/0x60 kernel/entry/common.c:296
 do_syscall_64+0x4d/0x110 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7efe98014769
Code: Unable to access opcode bytes at 0x7efe9801473f.
RSP: 002b:7efe97fd2208 EFLAGS: 0246 ORIG_RAX: 00ca
RAX: fe00 RBX: 7efe9809c408 RCX: 7efe98014769
RDX:  RSI: 0080 RDI: 7efe9809c408
RBP: 7efe9809c400 R08: 3131 R09: 3131
R10:  R11: 0246 R12: 7efe98069194
R13: 7efe97fd2210 R14: 0002 R15: 6972642f7665642f
 


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will 

Re: [PATCH 2/3] drm/msm/adreno: Add A305B support

2023-12-27 Thread Luca Weiss
On Donnerstag, 30. November 2023 21:35:19 CET Luca Weiss wrote:
> Add support for the Adreno 305B GPU that is found in MSM8226(v2) SoC.
> Previously this was mistakenly claimed to be supported but using wrong
> a configuration.
> 
> In MSM8226v1 there's also a A305B but with chipid 0x03000510 which
> should work with the same configuration but due to lack of hardware for
> testing this is not added.
> 
> Signed-off-by: Luca Weiss 

Hi all,

Any chance this can be picked up for v6.8? The dts patch has already been 
picked up :)

Regards
Luca

> ---
>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c  | 15 ---
>  drivers/gpu/drm/msm/adreno/adreno_device.c | 15 +++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 +
>  3 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c index c86b377f6f0d..5fc29801c4c7
> 100644
> --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> @@ -134,6 +134,13 @@ static int a3xx_hw_init(struct msm_gpu *gpu)
>   /* Set up AOOO: */
>   gpu_write(gpu, REG_A3XX_VBIF_OUT_AXI_AOOO_EN, 0x003c);
>   gpu_write(gpu, REG_A3XX_VBIF_OUT_AXI_AOOO, 0x003c003c);
> + } else if (adreno_is_a305b(adreno_gpu)) {
> + gpu_write(gpu, REG_A3XX_VBIF_IN_RD_LIM_CONF0, 0x00181818);
> + gpu_write(gpu, REG_A3XX_VBIF_IN_WR_LIM_CONF0, 0x00181818);
> + gpu_write(gpu, REG_A3XX_VBIF_OUT_RD_LIM_CONF0, 
0x0018);
> + gpu_write(gpu, REG_A3XX_VBIF_OUT_WR_LIM_CONF0, 
0x0018);
> + gpu_write(gpu, REG_A3XX_VBIF_DDR_OUT_MAX_BURST, 
0x0303);
> + gpu_write(gpu, REG_A3XX_VBIF_ROUND_ROBIN_QOS_ARB, 0x0003);
>   } else if (adreno_is_a306(adreno_gpu)) {
>   gpu_write(gpu, REG_A3XX_VBIF_ROUND_ROBIN_QOS_ARB, 0x0003);
>   gpu_write(gpu, REG_A3XX_VBIF_OUT_RD_LIM_CONF0, 
0x000a);
> @@ -230,7 +237,9 @@ static int a3xx_hw_init(struct msm_gpu *gpu)
>   gpu_write(gpu, REG_A3XX_UCHE_CACHE_MODE_CONTROL_REG, 0x0001);
> 
>   /* Enable Clock gating: */
> - if (adreno_is_a306(adreno_gpu))
> + if (adreno_is_a305b(adreno_gpu))
> + gpu_write(gpu, REG_A3XX_RBBM_CLOCK_CTL, 0x);
> + else if (adreno_is_a306(adreno_gpu))
>   gpu_write(gpu, REG_A3XX_RBBM_CLOCK_CTL, 0x);
>   else if (adreno_is_a320(adreno_gpu))
>   gpu_write(gpu, REG_A3XX_RBBM_CLOCK_CTL, 0xbfff);
> @@ -333,7 +342,7 @@ static int a3xx_hw_init(struct msm_gpu *gpu)
>   
AXXX_CP_QUEUE_THRESHOLDS_CSQ_IB1_START(2) |
>   
AXXX_CP_QUEUE_THRESHOLDS_CSQ_IB2_START(6) |
>   
AXXX_CP_QUEUE_THRESHOLDS_CSQ_ST_START(14));
> - } else if (adreno_is_a330(adreno_gpu)) {
> + } else if (adreno_is_a330(adreno_gpu) || 
adreno_is_a305b(adreno_gpu)) {
>   /* NOTE: this (value take from downstream android driver)
>* includes some bits outside of the known bitfields.  But
>* A330 has this "MERCIU queue" thing too, which might
> @@ -559,7 +568,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
>   goto fail;
> 
>   /* if needed, allocate gmem: */
> - if (adreno_is_a330(adreno_gpu)) {
> + if (adreno_is_a330(adreno_gpu) || adreno_is_a305b(adreno_gpu)) {
>   ret = adreno_gpu_ocmem_init(_gpu->base.pdev->dev,
>   adreno_gpu, _gpu-
>ocmem);
>   if (ret)
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c
> b/drivers/gpu/drm/msm/adreno/adreno_device.c index
> f62ab5257e66..7028d5449956 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -55,10 +55,17 @@ static const struct adreno_info gpulist[] = {
>   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
>   .init  = a2xx_gpu_init,
>   }, {
> - .chip_ids = ADRENO_CHIP_IDS(
> - 0x03000512,
> - 0x03000520
> - ),
> + .chip_ids = ADRENO_CHIP_IDS(0x03000512),
> + .family = ADRENO_3XX,
> + .fw = {
> + [ADRENO_FW_PM4] = "a330_pm4.fw",
> + [ADRENO_FW_PFP] = "a330_pfp.fw",
> + },
> + .gmem  = SZ_128K,
> + .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> + .init  = a3xx_gpu_init,
> + }, {
> + .chip_ids = ADRENO_CHIP_IDS(0x03000520),
>   .family = ADRENO_3XX,
>   .revn  = 305,
>   .fw = {
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.h index 80b3f6312116..c654f21499bb
> 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -247,6 +247,11 @@ static 

RPI4B: what is needed for /dev/video10 to work ( v4l_m2m )

2023-12-27 Thread AL13N

Hi,


I have a RPI4B with upstream kernel 6.1 64bit and there is no 
/dev/video10 present. I thought if I waited a bit more, it would appear 
in the kernel, but that was folly on my part.


Currently, watching a movie is painful since the software decoding is 
way too slow and it has very low fps on 1080p (or even 720p or even 
480p)


IIRC, someone told me something else has to be fixed before the codecs 
can be done, but I don't remember what it was, or i didn't find it in my 
email/the archives.


Can someone tell me what exactly needs to be done (in kernel) so that I 
can take a crack at it, (hopefully with some help)?


I don't remember if this was relevant, but there was some talk of 
needing to use opengl output with a specific texture format for it to 
work? or is that seperate?



Thanks in advance,

AL13N


Re: [PATCH 1/2] drm/bridge: add ->edid_read hook and drm_bridge_edid_read()

2023-12-27 Thread Jani Nikula
On Fri, 22 Dec 2023, Dmitry Baryshkov  wrote:
> On Thu, 26 Oct 2023 at 12:40, Jani Nikula  wrote:
>>
>> Add new struct drm_edid based ->edid_read hook and
>> drm_bridge_edid_read() function to call the hook.
>>
>> Signed-off-by: Jani Nikula 
>> ---
>>  drivers/gpu/drm/drm_bridge.c | 46 +++-
>>  include/drm/drm_bridge.h | 33 ++
>>  2 files changed, 78 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
>> index 30d66bee0ec6..e1cfba2ff583 100644
>> --- a/drivers/gpu/drm/drm_bridge.c
>> +++ b/drivers/gpu/drm/drm_bridge.c
>> @@ -27,8 +27,9 @@
>>  #include 
>>
>>  #include 
>> -#include 
>>  #include 
>> +#include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -1206,6 +1207,47 @@ int drm_bridge_get_modes(struct drm_bridge *bridge,
>>  }
>>  EXPORT_SYMBOL_GPL(drm_bridge_get_modes);
>>
>> +/**
>> + * drm_bridge_edid_read - read the EDID data of the connected display
>> + * @bridge: bridge control structure
>> + * @connector: the connector to read EDID for
>> + *
>> + * If the bridge supports output EDID retrieval, as reported by the
>> + * DRM_BRIDGE_OP_EDID bridge ops flag, call _bridge_funcs.edid_read to 
>> get
>> + * the EDID and return it. Otherwise return NULL.
>> + *
>> + * If _bridge_funcs.edid_read is not set, fall back to using
>> + * drm_bridge_get_edid() and wrapping it in struct drm_edid.
>> + *
>> + * RETURNS:
>> + * The retrieved EDID on success, or NULL otherwise.
>
> Wouldn't it be better to return an ERR_PTR instead of NULL?

I don't think so. The existing drm_bridge_get_edid() returns NULL on
errors. The ->get_edid hook returns NULL on errors. The new ->edid_read
returns NULL on errors. The drm_get_edid() and drm_edid_read() functions
return NULL on errors.

It would be surprising if one of the functions started returning error
pointers.

I don't see any added benefit with error pointers here. The ->edid_read
hook could be made to return error pointers too, but then there's quite
the burden in making all drivers return sensible values where the
difference in error code actually matters. And which error scenarios do
we want to differentiate here? How should we handle them differently?


BR,
Jani.


>
>> + */
>> +const struct drm_edid *drm_bridge_edid_read(struct drm_bridge *bridge,
>> +   struct drm_connector *connector)
>> +{
>> +   if (!(bridge->ops & DRM_BRIDGE_OP_EDID))
>> +   return NULL;
>> +
>> +   /* Transitional: Fall back to ->get_edid. */
>> +   if (!bridge->funcs->edid_read) {
>> +   const struct drm_edid *drm_edid;
>> +   struct edid *edid;
>> +
>> +   edid = drm_bridge_get_edid(bridge, connector);
>> +   if (!edid)
>> +   return NULL;
>> +
>> +   drm_edid = drm_edid_alloc(edid, (edid->extensions + 1) * 
>> EDID_LENGTH);
>> +
>> +   kfree(edid);
>> +
>> +   return drm_edid;
>> +   }
>> +
>> +   return bridge->funcs->edid_read(bridge, connector);
>> +}
>> +EXPORT_SYMBOL_GPL(drm_bridge_edid_read);
>
> [skipped the rest]

-- 
Jani Nikula, Intel


[PATCH] drm/bridge: parade-ps8640: Ensure bridge is suspended in .post_disable()

2023-12-27 Thread Pin-yen Lin
Disable the autosuspend of runtime PM and use completion to make sure
ps8640_suspend() is called in ps8640_atomic_post_disable().

The ps8640 bridge seems to expect everything to be power cycled at the
disable process, but sometimes ps8640_aux_transfer() holds the runtime
PM reference and prevents the bridge from suspend.

Instead of force powering off the bridge and taking the risk of breaking
the AUX communication, disable the autosuspend and wait for
ps8640_suspend() being called here, and re-enable the autosuspend
afterwards.  With this approach, the bridge should be suspended after
the current ps8640_aux_transfer() completes.

Fixes: 826cff3f7ebb ("drm/bridge: parade-ps8640: Enable runtime power 
management")
Signed-off-by: Pin-yen Lin 
---

 drivers/gpu/drm/bridge/parade-ps8640.c | 33 +-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c 
b/drivers/gpu/drm/bridge/parade-ps8640.c
index 8161b1a1a4b1..f8ea486a76fd 100644
--- a/drivers/gpu/drm/bridge/parade-ps8640.c
+++ b/drivers/gpu/drm/bridge/parade-ps8640.c
@@ -107,6 +107,7 @@ struct ps8640 {
struct device_link *link;
bool pre_enabled;
bool need_post_hpd_delay;
+   struct completion suspend_completion;
 };
 
 static const struct regmap_config ps8640_regmap_config[] = {
@@ -417,6 +418,8 @@ static int __maybe_unused ps8640_suspend(struct device *dev)
if (ret < 0)
dev_err(dev, "cannot disable regulators %d\n", ret);
 
+   complete_all(_bridge->suspend_completion);
+
return ret;
 }
 
@@ -465,11 +468,37 @@ static void ps8640_atomic_post_disable(struct drm_bridge 
*bridge,
   struct drm_bridge_state 
*old_bridge_state)
 {
struct ps8640 *ps_bridge = bridge_to_ps8640(bridge);
+   struct device *dev = _bridge->page[PAGE0_DP_CNTL]->dev;
 
ps_bridge->pre_enabled = false;
 
ps8640_bridge_vdo_control(ps_bridge, DISABLE);
-   pm_runtime_put_sync_suspend(_bridge->page[PAGE0_DP_CNTL]->dev);
+
+   /*
+* The ps8640 bridge seems to expect everything to be power cycled at
+* the disable process, but sometimes ps8640_aux_transfer() holds the
+* runtime PM reference and prevents the bridge from suspend.
+* Instead of force powering off the bridge and taking the risk of
+* breaking the AUX communication, disable the autosuspend and wait for
+* ps8640_suspend() being called here, and re-enable the autosuspend
+* afterwards.  With this approach, the bridge should be suspended after
+* the current ps8640_aux_transfer() completes.
+*/
+   reinit_completion(_bridge->suspend_completion);
+   pm_runtime_dont_use_autosuspend(dev);
+   pm_runtime_put_sync_suspend(dev);
+
+   /*
+* Mostly the suspend completes under 10 ms, but sometimes it could
+* take 708 ms to complete.  Set the timeout to 2000 ms here to be
+* extra safe.
+*/
+   if (!wait_for_completion_timeout(_bridge->suspend_completion,
+msecs_to_jiffies(2000))) {
+   dev_warn(dev, "Failed to wait for the suspend completion\n");
+   }
+
+   pm_runtime_use_autosuspend(dev);
 }
 
 static int ps8640_bridge_attach(struct drm_bridge *bridge,
@@ -693,6 +722,8 @@ static int ps8640_probe(struct i2c_client *client)
if (ret)
return ret;
 
+   init_completion(_bridge->suspend_completion);
+
ret = devm_of_dp_aux_populate_bus(_bridge->aux, 
ps8640_bridge_link_panel);
 
/*
-- 
2.43.0.472.g3155946c3a-goog



Re: [PATCH] drm: Check output polling initialized before disabling

2023-12-27 Thread Saurabh Singh Sengar
On Tue, Dec 26, 2023 at 11:27:15PM -0800, Shradha Gupta wrote:
> In drm_mode_config_helper_suspend() check if output polling
> support is initialized before enabling/disabling polling.
> For drivers like hyperv-drm, that do not initialize connector
> polling, if suspend is called without this check, it leads to
> suspend failure with following stack
> 
> [  770.719392] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) 
> done.
> [  770.720592] printk: Suspending console(s) (use no_console_suspend to debug)
> [  770.948823] [ cut here ]
> [  770.948824] WARNING: CPU: 1 PID: 17197 at kernel/workqueue.c:3162 
> __flush_work.isra.0+0x212/0x230
> [  770.948831] Modules linked in: rfkill nft_counter xt_conntrack xt_owner 
> udf nft_compat crc_itu_t nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib 
> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat 
> nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink 
> vfat fat mlx5_ib ib_uverbs ib_core mlx5_core intel_rapl_msr intel_rapl_common 
> kvm_amd ccp mlxfw kvm psample hyperv_drm tls drm_shmem_helper drm_kms_helper 
> irqbypass pcspkr syscopyarea sysfillrect sysimgblt hv_balloon hv_utils joydev 
> drm fuse xfs libcrc32c pci_hyperv pci_hyperv_intf sr_mod sd_mod cdrom t10_pi 
> sg hv_storvsc scsi_transport_fc hv_netvsc serio_raw hyperv_keyboard 
> hid_hyperv crct10dif_pclmul crc32_pclmul crc32c_intel hv_vmbus 
> ghash_clmulni_intel dm_mirror dm_region_hash dm_log dm_mod
> [  770.948863] CPU: 1 PID: 17197 Comm: systemd-sleep Not tainted 
> 5.14.0-362.2.1.el9_3.x86_64 #1
> [  770.948865] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
> Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022
> [  770.948866] RIP: 0010:__flush_work.isra.0+0x212/0x230
> [  770.948869] Code: 8b 4d 00 4c 8b 45 08 89 ca 48 c1 e9 04 83 e2 08 83 e1 0f 
> 83 ca 02 89 c8 48 0f ba 6d 00 03 e9 25 ff ff ff 0f 0b e9 4e ff ff ff <0f> 0b 
> 45 31 ed e9 44 ff ff ff e8 8f 89 b2 00 66 66 2e 0f 1f 84 00
> [  770.948870] RSP: 0018:af4ac213fb10 EFLAGS: 00010246
> [  770.948871] RAX:  RBX:  RCX: 
> 8c992857
> [  770.948872] RDX: 0001 RSI: 0001 RDI: 
> 9aad82b00330
> [  770.948873] RBP: 9aad82b00330 R08:  R09: 
> 9aad87ee3d10
> [  770.948874] R10: 0200 R11:  R12: 
> 9aad82b00330
> [  770.948874] R13: 0001 R14:  R15: 
> 0001
> [  770.948875] FS:  7ff1b2f6bb40() GS:9aaf37d0() 
> knlGS:
> [  770.948878] CS:  0010 DS:  ES:  CR0: 80050033
> [  770.948878] CR2: 555f345cb666 CR3: 0001462dc005 CR4: 
> 00370ee0
> [  770.948879] Call Trace:
> [  770.948880]  
> [  770.948881]  ? show_trace_log_lvl+0x1c4/0x2df
> [  770.948884]  ? show_trace_log_lvl+0x1c4/0x2df
> [  770.948886]  ? __cancel_work_timer+0x103/0x190
> [  770.948887]  ? __flush_work.isra.0+0x212/0x230
> [  770.948889]  ? __warn+0x81/0x110
> [  770.948891]  ? __flush_work.isra.0+0x212/0x230
> [  770.948892]  ? report_bug+0x10a/0x140
> [  770.948895]  ? handle_bug+0x3c/0x70
> [  770.948898]  ? exc_invalid_op+0x14/0x70
> [  770.948899]  ? asm_exc_invalid_op+0x16/0x20
> [  770.948903]  ? __flush_work.isra.0+0x212/0x230
> [  770.948905]  __cancel_work_timer+0x103/0x190
> [  770.948907]  ? _raw_spin_unlock_irqrestore+0xa/0x30
> [  770.948910]  drm_kms_helper_poll_disable+0x1e/0x40 [drm_kms_helper]
> [  770.948923]  drm_mode_config_helper_suspend+0x1c/0x80 [drm_kms_helper]
> [  770.948933]  ? __pfx_vmbus_suspend+0x10/0x10 [hv_vmbus]
> [  770.948942]  hyperv_vmbus_suspend+0x17/0x40 [hyperv_drm]
> [  770.948944]  ? __pfx_vmbus_suspend+0x10/0x10 [hv_vmbus]
> [  770.948951]  dpm_run_callback+0x4c/0x140
> [  770.948954]  __device_suspend_noirq+0x74/0x220
> [  770.948956]  dpm_noirq_suspend_devices+0x148/0x2a0
> [  770.948958]  dpm_suspend_end+0x54/0xe0
> [  770.948960]  create_image+0x14/0x290
> [  770.948963]  hibernation_snapshot+0xd6/0x200
> [  770.948964]  hibernate.cold+0x8b/0x1fb
> [  770.948967]  state_store+0xcd/0xd0
> [  770.948969]  kernfs_fop_write_iter+0x124/0x1b0
> [  770.948973]  new_sync_write+0xff/0x190
> [  770.948976]  vfs_write+0x1ef/0x280
> [  770.948978]  ksys_write+0x5f/0xe0
> [  770.948979]  do_syscall_64+0x5c/0x90
> [  770.948981]  ? syscall_exit_work+0x103/0x130
> [  770.948983]  ? syscall_exit_to_user_mode+0x12/0x30
> [  770.948985]  ? do_syscall_64+0x69/0x90
> [  770.948986]  ? do_syscall_64+0x69/0x90
> [  770.948987]  ? do_user_addr_fault+0x1d6/0x6a0
> [  770.948989]  ? do_syscall_64+0x69/0x90
> [  770.948990]  ? exc_page_fault+0x62/0x150
> [  770.948992]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [  770.948995] RIP: 0033:0x7ff1b293eba7
> [  770.949010] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 
> f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 
> 00 f0 ff ff 77 51 

[PATCH v9 4/6] udmabuf: Convert udmabuf driver to use folios (v2)

2023-12-27 Thread Vivek Kasireddy
This is mainly a preparatory patch to use memfd_pin_folios() API
for pinning folios. Using folios instead of pages makes sense as
the udmabuf driver needs to handle both shmem and hugetlb cases.
However, the function vmap_udmabuf() still needs a list of pages;
so, we collect all the head pages into a local array in this case.

Other changes in this patch include the addition of helpers for
checking the memfd seals and exporting dmabuf. Moving code from
udmabuf_create() into these helpers improves readability given
that udmabuf_create() is a bit long.

v2: (Matthew)
- Use folio_pfn() on the folio instead of page_to_pfn() on head page
- Don't split the arguments to shmem_read_folio() on multiple lines

Cc: David Hildenbrand 
Cc: Matthew Wilcox 
Cc: Daniel Vetter 
Cc: Mike Kravetz 
Cc: Hugh Dickins 
Cc: Peter Xu 
Cc: Jason Gunthorpe 
Cc: Gerd Hoffmann 
Cc: Dongwon Kim 
Cc: Junxiao Chang 
Signed-off-by: Vivek Kasireddy 
---
 drivers/dma-buf/udmabuf.c | 140 ++
 1 file changed, 83 insertions(+), 57 deletions(-)

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 274defd3fa3e..a8f3af61f7f2 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -26,7 +26,7 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a dmabuf, in 
megabytes. Default is
 
 struct udmabuf {
pgoff_t pagecount;
-   struct page **pages;
+   struct folio **folios;
struct sg_table *sg;
struct miscdevice *device;
pgoff_t *offsets;
@@ -42,7 +42,7 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf)
if (pgoff >= ubuf->pagecount)
return VM_FAULT_SIGBUS;
 
-   pfn = page_to_pfn(ubuf->pages[pgoff]);
+   pfn = folio_pfn(ubuf->folios[pgoff]);
pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
 
return vmf_insert_pfn(vma, vmf->address, pfn);
@@ -68,11 +68,21 @@ static int mmap_udmabuf(struct dma_buf *buf, struct 
vm_area_struct *vma)
 static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map)
 {
struct udmabuf *ubuf = buf->priv;
+   struct page **pages;
void *vaddr;
+   pgoff_t pg;
 
dma_resv_assert_held(buf->resv);
 
-   vaddr = vm_map_ram(ubuf->pages, ubuf->pagecount, -1);
+   pages = kmalloc_array(ubuf->pagecount, sizeof(*pages), GFP_KERNEL);
+   if (!pages)
+   return -ENOMEM;
+
+   for (pg = 0; pg < ubuf->pagecount; pg++)
+   pages[pg] = >folios[pg]->page;
+
+   vaddr = vm_map_ram(pages, ubuf->pagecount, -1);
+   kfree(pages);
if (!vaddr)
return -EINVAL;
 
@@ -107,7 +117,8 @@ static struct sg_table *get_sg_table(struct device *dev, 
struct dma_buf *buf,
goto err_alloc;
 
for_each_sg(sg->sgl, sgl, ubuf->pagecount, i)
-   sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, ubuf->offsets[i]);
+   sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE,
+ubuf->offsets[i]);
 
ret = dma_map_sgtable(dev, sg, direction, 0);
if (ret < 0)
@@ -152,9 +163,9 @@ static void release_udmabuf(struct dma_buf *buf)
put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
 
for (pg = 0; pg < ubuf->pagecount; pg++)
-   put_page(ubuf->pages[pg]);
+   folio_put(ubuf->folios[pg]);
kfree(ubuf->offsets);
-   kfree(ubuf->pages);
+   kfree(ubuf->folios);
kfree(ubuf);
 }
 
@@ -215,36 +226,33 @@ static int handle_hugetlb_pages(struct udmabuf *ubuf, 
struct file *memfd,
pgoff_t mapidx = offset >> huge_page_shift(hpstate);
pgoff_t subpgoff = (offset & ~huge_page_mask(hpstate)) >> PAGE_SHIFT;
pgoff_t maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT;
-   struct page *hpage = NULL;
-   struct folio *folio;
+   struct folio *folio = NULL;
pgoff_t pgidx;
 
mapidx <<= huge_page_order(hpstate);
for (pgidx = 0; pgidx < pgcnt; pgidx++) {
-   if (!hpage) {
+   if (!folio) {
folio = __filemap_get_folio(memfd->f_mapping,
mapidx,
FGP_ACCESSED, 0);
if (IS_ERR(folio))
return PTR_ERR(folio);
-
-   hpage = >page;
}
 
-   get_page(hpage);
-   ubuf->pages[*pgbuf] = hpage;
+   folio_get(folio);
+   ubuf->folios[*pgbuf] = folio;
ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT;
(*pgbuf)++;
if (++subpgoff == maxsubpgs) {
-   put_page(hpage);
-   hpage = NULL;
+   folio_put(folio);
+   folio = NULL;
subpgoff = 0;
mapidx += pages_per_huge_page(hpstate);
}
}
 
-

[PATCH v9 6/6] selftests/dma-buf/udmabuf: Add tests to verify data after page migration

2023-12-27 Thread Vivek Kasireddy
Since the memfd pages associated with a udmabuf may be migrated
as part of udmabuf create, we need to verify the data coherency
after successful migration. The new tests added in this patch try
to do just that using 4k sized pages and also 2 MB sized huge
pages for the memfd.

Successful completion of the tests would mean that there is no
disconnect between the memfd pages and the ones associated with
a udmabuf. And, these tests can also be augmented in the future
to test newer udmabuf features (such as handling memfd hole punch).

Cc: Shuah Khan 
Cc: David Hildenbrand 
Cc: Daniel Vetter 
Cc: Mike Kravetz 
Cc: Hugh Dickins 
Cc: Peter Xu 
Cc: Jason Gunthorpe 
Cc: Gerd Hoffmann 
Cc: Dongwon Kim 
Cc: Junxiao Chang 
Based-on-patch-by: Mike Kravetz 
Signed-off-by: Vivek Kasireddy 
---
 .../selftests/drivers/dma-buf/udmabuf.c   | 151 +-
 1 file changed, 147 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/drivers/dma-buf/udmabuf.c 
b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
index c812080e304e..d76c813fe652 100644
--- a/tools/testing/selftests/drivers/dma-buf/udmabuf.c
+++ b/tools/testing/selftests/drivers/dma-buf/udmabuf.c
@@ -9,26 +9,132 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 #define TEST_PREFIX"drivers/dma-buf/udmabuf"
 #define NUM_PAGES   4
+#define NUM_ENTRIES 4
+#define MEMFD_SIZE  1024 /* in pages */
 
-static int memfd_create(const char *name, unsigned int flags)
+static unsigned int page_size;
+
+static int create_memfd_with_seals(off64_t size, bool hpage)
+{
+   int memfd, ret;
+   unsigned int flags = MFD_ALLOW_SEALING;
+
+   if (hpage)
+   flags |= MFD_HUGETLB;
+
+   memfd = memfd_create("udmabuf-test", flags);
+   if (memfd < 0) {
+   printf("%s: [skip,no-memfd]\n", TEST_PREFIX);
+   exit(77);
+   }
+
+   ret = fcntl(memfd, F_ADD_SEALS, F_SEAL_SHRINK);
+   if (ret < 0) {
+   printf("%s: [skip,fcntl-add-seals]\n", TEST_PREFIX);
+   exit(77);
+   }
+
+   ret = ftruncate(memfd, size);
+   if (ret == -1) {
+   printf("%s: [FAIL,memfd-truncate]\n", TEST_PREFIX);
+   exit(1);
+   }
+
+   return memfd;
+}
+
+static int create_udmabuf_list(int devfd, int memfd, off64_t memfd_size)
+{
+   struct udmabuf_create_list *list;
+   int ubuf_fd, i;
+
+   list = malloc(sizeof(struct udmabuf_create_list) +
+ sizeof(struct udmabuf_create_item) * NUM_ENTRIES);
+   if (!list) {
+   printf("%s: [FAIL, udmabuf-malloc]\n", TEST_PREFIX);
+   exit(1);
+   }
+
+   for (i = 0; i < NUM_ENTRIES; i++) {
+   list->list[i].memfd  = memfd;
+   list->list[i].offset = i * (memfd_size / NUM_ENTRIES);
+   list->list[i].size   = getpagesize() * NUM_PAGES;
+   }
+
+   list->count = NUM_ENTRIES;
+   list->flags = UDMABUF_FLAGS_CLOEXEC;
+   ubuf_fd = ioctl(devfd, UDMABUF_CREATE_LIST, list);
+   free(list);
+   if (ubuf_fd < 0) {
+   printf("%s: [FAIL, udmabuf-create]\n", TEST_PREFIX);
+   exit(1);
+   }
+
+   return ubuf_fd;
+}
+
+static void write_to_memfd(void *addr, off64_t size, char chr)
+{
+   int i;
+
+   for (i = 0; i < size / page_size; i++) {
+   *((char *)addr + (i * page_size)) = chr;
+   }
+}
+
+static void *mmap_fd(int fd, off64_t size)
 {
-   return syscall(__NR_memfd_create, name, flags);
+   void *addr;
+
+   addr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+   if (addr == MAP_FAILED) {
+   printf("%s: ubuf_fd mmap fail\n", TEST_PREFIX);
+   exit(1);
+   }
+
+   return addr;
+}
+
+static int compare_chunks(void *addr1, void *addr2, off64_t memfd_size)
+{
+   off64_t off;
+   int i = 0, j, k = 0, ret = 0;
+   char char1, char2;
+
+   while (i < NUM_ENTRIES) {
+   off = i * (memfd_size / NUM_ENTRIES);
+   for (j = 0; j < NUM_PAGES; j++, k++) {
+   char1 = *((char *)addr1 + off + (j * getpagesize()));
+   char2 = *((char *)addr2 + (k * getpagesize()));
+   if (char1 != char2) {
+   ret = -1;
+   goto err;
+   }
+   }
+   i++;
+   }
+err:
+   munmap(addr1, memfd_size);
+   munmap(addr2, NUM_ENTRIES * NUM_PAGES * getpagesize());
+   return ret;
 }
 
 int main(int argc, char *argv[])
 {
struct udmabuf_create create;
int devfd, memfd, buf, ret;
-   off_t size;
-   void *mem;
+   off64_t size;
+   void *addr1, *addr2;
 
devfd = open("/dev/udmabuf", O_RDWR);
if (devfd < 0) {
@@ -90,6 +196,9 @@ int main(int argc, char *argv[])
}
 
/* should work */
+   

[PATCH v9 5/6] udmabuf: Pin the pages using memfd_pin_folios() API (v7)

2023-12-27 Thread Vivek Kasireddy
Using memfd_pin_folios() will ensure that the pages are pinned
correctly using FOLL_PIN. And, this also ensures that we don't
accidentally break features such as memory hotunplug as it would
not allow pinning pages in the movable zone.

Using this new API also simplifies the code as we no longer have
to deal with extracting individual pages from their mappings or
handle shmem and hugetlb cases separately.

v2:
- Adjust to the change in signature of pin_user_pages_fd() by
  passing in file * instead of fd.

v3:
- Limit the changes in this patch only to those that are required
  for using pin_user_pages_fd()
- Slightly improve the commit message

v4:
- Adjust to the change in name of the API (memfd_pin_user_pages)

v5:
- Adjust to the changes in memfd_pin_folios which now populates
  a list of folios and offsets

v6:
- Don't unnecessarily use folio_page() (Matthew)
- Pass [start, end] and max_folios to memfd_pin_folios()
- Create another temporary array to hold the folios returned by
  memfd_pin_folios() as we populate ubuf->folios.
- Unpin the folios only once as memfd_pin_folios pins them once

v7:
- Use a list to track the folios that need to be unpinned

Cc: David Hildenbrand 
Cc: Matthew Wilcox 
Cc: Daniel Vetter 
Cc: Mike Kravetz 
Cc: Hugh Dickins 
Cc: Peter Xu 
Cc: Jason Gunthorpe 
Cc: Gerd Hoffmann 
Cc: Dongwon Kim 
Cc: Junxiao Chang 
Signed-off-by: Vivek Kasireddy 
---
 drivers/dma-buf/udmabuf.c | 153 +++---
 1 file changed, 78 insertions(+), 75 deletions(-)

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index a8f3af61f7f2..8086c2b5be5a 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -30,6 +30,12 @@ struct udmabuf {
struct sg_table *sg;
struct miscdevice *device;
pgoff_t *offsets;
+   struct list_head unpin_list;
+};
+
+struct udmabuf_folio {
+   struct folio *folio;
+   struct list_head list;
 };
 
 static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf)
@@ -153,17 +159,43 @@ static void unmap_udmabuf(struct dma_buf_attachment *at,
return put_sg_table(at->dev, sg, direction);
 }
 
+static void unpin_all_folios(struct list_head *unpin_list)
+{
+   struct udmabuf_folio *ubuf_folio;
+
+   while (!list_empty(unpin_list)) {
+   ubuf_folio = list_first_entry(unpin_list,
+ struct udmabuf_folio, list);
+   unpin_user_page(_folio->folio->page);
+
+   list_del(_folio->list);
+   kfree(ubuf_folio);
+   }
+}
+
+static int add_to_unpin_list(struct list_head *unpin_list,
+struct folio *folio)
+{
+   struct udmabuf_folio *ubuf_folio;
+
+   ubuf_folio = kzalloc(sizeof(*ubuf_folio), GFP_KERNEL);
+   if (!ubuf_folio)
+   return -ENOMEM;
+
+   ubuf_folio->folio = folio;
+   list_add_tail(_folio->list, unpin_list);
+   return 0;
+}
+
 static void release_udmabuf(struct dma_buf *buf)
 {
struct udmabuf *ubuf = buf->priv;
struct device *dev = ubuf->device->this_device;
-   pgoff_t pg;
 
if (ubuf->sg)
put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
 
-   for (pg = 0; pg < ubuf->pagecount; pg++)
-   folio_put(ubuf->folios[pg]);
+   unpin_all_folios(>unpin_list);
kfree(ubuf->offsets);
kfree(ubuf->folios);
kfree(ubuf);
@@ -218,64 +250,6 @@ static const struct dma_buf_ops udmabuf_ops = {
 #define SEALS_WANTED (F_SEAL_SHRINK)
 #define SEALS_DENIED (F_SEAL_WRITE)
 
-static int handle_hugetlb_pages(struct udmabuf *ubuf, struct file *memfd,
-   pgoff_t offset, pgoff_t pgcnt,
-   pgoff_t *pgbuf)
-{
-   struct hstate *hpstate = hstate_file(memfd);
-   pgoff_t mapidx = offset >> huge_page_shift(hpstate);
-   pgoff_t subpgoff = (offset & ~huge_page_mask(hpstate)) >> PAGE_SHIFT;
-   pgoff_t maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT;
-   struct folio *folio = NULL;
-   pgoff_t pgidx;
-
-   mapidx <<= huge_page_order(hpstate);
-   for (pgidx = 0; pgidx < pgcnt; pgidx++) {
-   if (!folio) {
-   folio = __filemap_get_folio(memfd->f_mapping,
-   mapidx,
-   FGP_ACCESSED, 0);
-   if (IS_ERR(folio))
-   return PTR_ERR(folio);
-   }
-
-   folio_get(folio);
-   ubuf->folios[*pgbuf] = folio;
-   ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT;
-   (*pgbuf)++;
-   if (++subpgoff == maxsubpgs) {
-   folio_put(folio);
-   folio = NULL;
-   subpgoff = 0;
-   mapidx += pages_per_huge_page(hpstate);
-   }
-   }
-
-   if (folio)
-  

[PATCH v9 3/6] mm/gup: Introduce memfd_pin_folios() for pinning memfd folios (v9)

2023-12-27 Thread Vivek Kasireddy
For drivers that would like to longterm-pin the folios associated
with a memfd, the memfd_pin_folios() API provides an option to
not only pin the folios via FOLL_PIN but also to check and migrate
them if they reside in movable zone or CMA block. This API
currently works with memfds but it should work with any files
that belong to either shmemfs or hugetlbfs. Files belonging to
other filesystems are rejected for now.

The folios need to be located first before pinning them via FOLL_PIN.
If they are found in the page cache, they can be immediately pinned.
Otherwise, they need to be allocated using the filesystem specific
APIs and then pinned.

v2:
- Drop gup_flags and improve comments and commit message (David)
- Allocate a page if we cannot find in page cache for the hugetlbfs
  case as well (David)
- Don't unpin pages if there is a migration related failure (David)
- Drop the unnecessary nr_pages <= 0 check (Jason)
- Have the caller of the API pass in file * instead of fd (Jason)

v3: (David)
- Enclose the huge page allocation code with #ifdef CONFIG_HUGETLB_PAGE
  (Build error reported by kernel test robot )
- Don't forget memalloc_pin_restore() on non-migration related errors
- Improve the readability of the cleanup code associated with
  non-migration related errors
- Augment the comments by describing FOLL_LONGTERM like behavior
- Include the R-b tag from Jason

v4:
- Remove the local variable "page" and instead use 3 return statements
  in alloc_file_page() (David)
- Add the R-b tag from David

v5: (David)
- For hugetlb case, ensure that we only obtain head pages from the
  mapping by using __filemap_get_folio() instead of find_get_page_flags()
- Handle -EEXIST when two or more potential users try to simultaneously
  add a huge page to the mapping by forcing them to retry on failure

v6: (Christoph)
- Rename this API to memfd_pin_user_pages() to make it clear that it
  is intended for memfds
- Move the memfd page allocation helper from gup.c to memfd.c
- Fix indentation errors in memfd_pin_user_pages()
- For contiguous ranges of folios, use a helper such as
  filemap_get_folios_contig() to lookup the page cache in batches

v7:
- Rename this API to memfd_pin_folios() and make it return folios
  and offsets instead of pages (David)
- Don't continue processing the folios in the batch returned by
  filemap_get_folios_contig() if they do not have correct next_idx
- Add the R-b tag from Christoph

v8: (David)
- Have caller pass [start, end], max_folios instead of start, nr_pages
- Replace offsets array with just offset into the first page
- Add comments explaning the need for next_idx
- Pin (and return) the folio (via FOLL_PIN) only once

v9: (Matthew)
- Drop the extern while declaring memfd_alloc_folio()
- Fix memfd_alloc_folio() declaration to have it return struct folio *
  instead of struct page * when CONFIG_MEMFD_CREATE is not defined

Cc: David Hildenbrand 
Cc: Matthew Wilcox (Oracle) 
Cc: Christoph Hellwig 
Cc: Daniel Vetter 
Cc: Mike Kravetz 
Cc: Hugh Dickins 
Cc: Peter Xu 
Cc: Gerd Hoffmann 
Cc: Dongwon Kim 
Cc: Junxiao Chang 
Suggested-by: Jason Gunthorpe 
Reviewed-by: Jason Gunthorpe  (v2)
Reviewed-by: David Hildenbrand  (v3)
Reviewed-by: Christoph Hellwig  (v6)
Signed-off-by: Vivek Kasireddy 
---
 include/linux/memfd.h |   5 ++
 include/linux/mm.h|   3 +
 mm/gup.c  | 149 ++
 mm/memfd.c|  34 ++
 4 files changed, 191 insertions(+)

diff --git a/include/linux/memfd.h b/include/linux/memfd.h
index e7abf6fa4c52..3f2cf339ceaf 100644
--- a/include/linux/memfd.h
+++ b/include/linux/memfd.h
@@ -6,11 +6,16 @@
 
 #ifdef CONFIG_MEMFD_CREATE
 extern long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int arg);
+struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx);
 #else
 static inline long memfd_fcntl(struct file *f, unsigned int c, unsigned int a)
 {
return -EINVAL;
 }
+static inline struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
+{
+   return ERR_PTR(-EINVAL);
+}
 #endif
 
 #endif /* __LINUX_MEMFD_H */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 418d26608ece..942d2e618253 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2472,6 +2472,9 @@ long get_user_pages_unlocked(unsigned long start, 
unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
 long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
+long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
+ struct folio **folios, unsigned int max_folios,
+ pgoff_t *offset);
 
 int get_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages);
diff --git a/mm/gup.c b/mm/gup.c
index 231711efa390..42eb212af73f 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -5,6 +5,7 @@
 #include 
 
 

[PATCH v9 2/6] udmabuf: Add back support for mapping hugetlb pages (v6)

2023-12-27 Thread Vivek Kasireddy
A user or admin can configure a VMM (Qemu) Guest's memory to be
backed by hugetlb pages for various reasons. However, a Guest OS
would still allocate (and pin) buffers that are backed by regular
4k sized pages. In order to map these buffers and create dma-bufs
for them on the Host, we first need to find the hugetlb pages where
the buffer allocations are located and then determine the offsets
of individual chunks (within those pages) and use this information
to eventually populate a scatterlist.

Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options
were passed to the Host kernel and Qemu was launched with these
relevant options: qemu-system-x86_64 -m 4096m
-device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080
-display gtk,gl=on
-object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M
-machine memory-backend=mem1

Replacing -display gtk,gl=on with -display gtk,gl=off above would
exercise the mmap handler.

v2: Updated get_sg_table() to manually populate the scatterlist for
both huge page and non-huge-page cases.

v3: s/offsets/subpgoff/g
s/hpoff/mapidx/g

v4: Replaced find_get_page_flags() with __filemap_get_folio() to
ensure that we only obtain head pages from the mapping

v5: Fix the calculation of mapidx to ensure that it is a order-n
page multiple

v6:
- Split the processing of hugetlb or shmem pages into helpers to
  simplify the code in udmabuf_create() (Christoph)
- Move the creation of offsets array out of hugetlb context and
  into common code

Cc: David Hildenbrand 
Cc: Daniel Vetter 
Cc: Mike Kravetz 
Cc: Hugh Dickins 
Cc: Peter Xu 
Cc: Jason Gunthorpe 
Cc: Gerd Hoffmann 
Cc: Dongwon Kim 
Cc: Junxiao Chang 
Acked-by: Mike Kravetz  (v2)
Signed-off-by: Vivek Kasireddy 
---
 drivers/dma-buf/udmabuf.c | 122 +++---
 1 file changed, 101 insertions(+), 21 deletions(-)

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 820c993c8659..274defd3fa3e 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,6 +29,7 @@ struct udmabuf {
struct page **pages;
struct sg_table *sg;
struct miscdevice *device;
+   pgoff_t *offsets;
 };
 
 static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf)
@@ -41,6 +43,8 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf)
return VM_FAULT_SIGBUS;
 
pfn = page_to_pfn(ubuf->pages[pgoff]);
+   pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
+
return vmf_insert_pfn(vma, vmf->address, pfn);
 }
 
@@ -90,23 +94,29 @@ static struct sg_table *get_sg_table(struct device *dev, 
struct dma_buf *buf,
 {
struct udmabuf *ubuf = buf->priv;
struct sg_table *sg;
+   struct scatterlist *sgl;
+   unsigned int i = 0;
int ret;
 
sg = kzalloc(sizeof(*sg), GFP_KERNEL);
if (!sg)
return ERR_PTR(-ENOMEM);
-   ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount,
-   0, ubuf->pagecount << PAGE_SHIFT,
-   GFP_KERNEL);
+
+   ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL);
if (ret < 0)
-   goto err;
+   goto err_alloc;
+
+   for_each_sg(sg->sgl, sgl, ubuf->pagecount, i)
+   sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, ubuf->offsets[i]);
+
ret = dma_map_sgtable(dev, sg, direction, 0);
if (ret < 0)
-   goto err;
+   goto err_map;
return sg;
 
-err:
+err_map:
sg_free_table(sg);
+err_alloc:
kfree(sg);
return ERR_PTR(ret);
 }
@@ -143,6 +153,7 @@ static void release_udmabuf(struct dma_buf *buf)
 
for (pg = 0; pg < ubuf->pagecount; pg++)
put_page(ubuf->pages[pg]);
+   kfree(ubuf->offsets);
kfree(ubuf->pages);
kfree(ubuf);
 }
@@ -196,17 +207,77 @@ static const struct dma_buf_ops udmabuf_ops = {
 #define SEALS_WANTED (F_SEAL_SHRINK)
 #define SEALS_DENIED (F_SEAL_WRITE)
 
+static int handle_hugetlb_pages(struct udmabuf *ubuf, struct file *memfd,
+   pgoff_t offset, pgoff_t pgcnt,
+   pgoff_t *pgbuf)
+{
+   struct hstate *hpstate = hstate_file(memfd);
+   pgoff_t mapidx = offset >> huge_page_shift(hpstate);
+   pgoff_t subpgoff = (offset & ~huge_page_mask(hpstate)) >> PAGE_SHIFT;
+   pgoff_t maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT;
+   struct page *hpage = NULL;
+   struct folio *folio;
+   pgoff_t pgidx;
+
+   mapidx <<= huge_page_order(hpstate);
+   for (pgidx = 0; pgidx < pgcnt; pgidx++) {
+   if (!hpage) {
+   folio = __filemap_get_folio(memfd->f_mapping,
+   mapidx,
+   FGP_ACCESSED, 0);

[PATCH v9 1/6] udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap

2023-12-27 Thread Vivek Kasireddy
Add VM_PFNMAP to vm_flags in the mmap handler to ensure that
the mappings would be managed without using struct page.

And, in the vm_fault handler, use vmf_insert_pfn to share the
page's pfn to userspace instead of directly sharing the page
(via struct page *).

Cc: David Hildenbrand 
Cc: Daniel Vetter 
Cc: Mike Kravetz 
Cc: Hugh Dickins 
Cc: Peter Xu 
Cc: Jason Gunthorpe 
Cc: Gerd Hoffmann 
Cc: Dongwon Kim 
Cc: Junxiao Chang 
Suggested-by: David Hildenbrand 
Acked-by: David Hildenbrand 
Signed-off-by: Vivek Kasireddy 
---
 drivers/dma-buf/udmabuf.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index c40645999648..820c993c8659 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -35,12 +35,13 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
struct udmabuf *ubuf = vma->vm_private_data;
pgoff_t pgoff = vmf->pgoff;
+   unsigned long pfn;
 
if (pgoff >= ubuf->pagecount)
return VM_FAULT_SIGBUS;
-   vmf->page = ubuf->pages[pgoff];
-   get_page(vmf->page);
-   return 0;
+
+   pfn = page_to_pfn(ubuf->pages[pgoff]);
+   return vmf_insert_pfn(vma, vmf->address, pfn);
 }
 
 static const struct vm_operations_struct udmabuf_vm_ops = {
@@ -56,6 +57,7 @@ static int mmap_udmabuf(struct dma_buf *buf, struct 
vm_area_struct *vma)
 
vma->vm_ops = _vm_ops;
vma->vm_private_data = ubuf;
+   vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
return 0;
 }
 
-- 
2.39.2



[PATCH v9 0/6] mm/gup: Introduce memfd_pin_folios() for pinning memfd folios (v9)

2023-12-27 Thread Vivek Kasireddy
The first two patches were previously reviewed but not yet merged.
These ones need to be merged first as the fourth patch depends on
the changes introduced in them and they also fix bugs seen in
very specific scenarios (running Qemu with hugetlb=on, blob=true
and rebooting guest VM).

The third patch introduces memfd_pin_folios() API and the fourth
patch converts udmabuf driver to use folios. The fifth patch shows
how the udmabuf driver can make use of the new API to longterm-pin
the folios. The last patch adds two new udmabuf selftests to verify
data coherency after potential page migration.

v2:
- Updated the first patch to include review feedback from David and
  Jason. The main change in this series is the allocation of page
  in the case of hugetlbfs if it is not found in the page cache.

v3:
- Made changes to include review feedback from David to improve the
  comments and readability of code
- Enclosed the hugepage alloc code with #ifdef CONFIG_HUGETLB_PAGE

v4:
- Augmented the commit message of the udmabuf patch that uses
  pin_user_pages_fd()
- Added previously reviewed but unmerged udmabuf patches to this
  series

v5:
- Updated the patch that adds pin_user_pages_fd() to include feedback
  from David to handle simultaneous users trying to add a huge page
  to the mapping
- Replaced find_get_page_flags() with __filemap_get_folio() in the
  second and third patches to ensure that we only obtain head pages
  from the mapping

v6: (Christoph)
- Renamed the new API to memfd_pin_user_pages()
- Improved the page cache lookup efficiency by using
  filemap_get_folios_contig() which uses batches

v7:
- Rename the new API to memfd_pin_folios() and make it return folios
  and offsets (David)
- Added a new preparatory patch to this series to convert udmabuf
  driver to use folios

v8:
- Addressed review comments from Matthew in patches 4 and 5
- Included David's suggestions to have the caller of memfd_pin_folios()
  pass a range [stard, end], max_folios instead of start, nr_pages
- Ensured that a folio is pinned and unpinned only once (David)

v9:
- Drop the extern and fix the return type in the declaration of
  memfd_alloc_folio() (Matthew)
- Use a list to track the folios that need to be unpinned (patch 5)

This series is tested using following methods:
- Run the subtests added in the fifth patch
- Run Qemu (master) with the following options and a few additional
  patches to Spice:
  qemu-system-x86_64 -m 4096m
  -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080
  -spice port=3001,gl=on,disable-ticketing=on,preferred-codec=gstreamer:h264
  -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M
  -machine memory-backend=mem1

Cc: David Hildenbrand 
Cc: Matthew Wilcox (Oracle) 
Cc: Christoph Hellwig 
Cc: Daniel Vetter 
Cc: Mike Kravetz 
Cc: Hugh Dickins 
Cc: Peter Xu 
Cc: Jason Gunthorpe 
Cc: Gerd Hoffmann 
Cc: Dongwon Kim 
Cc: Junxiao Chang 

Vivek Kasireddy (6):
  udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap
  udmabuf: Add back support for mapping hugetlb pages (v6)
  mm/gup: Introduce memfd_pin_folios() for pinning memfd folios (v9)
  udmabuf: Convert udmabuf driver to use folios (v2)
  udmabuf: Pin the pages using memfd_pin_folios() API (v7)
  selftests/dma-buf/udmabuf: Add tests to verify data after page
migration

 drivers/dma-buf/udmabuf.c | 231 +-
 include/linux/memfd.h |   5 +
 include/linux/mm.h|   3 +
 mm/gup.c  | 149 +++
 mm/memfd.c|  34 +++
 .../selftests/drivers/dma-buf/udmabuf.c   | 151 +++-
 6 files changed, 509 insertions(+), 64 deletions(-)

-- 
2.39.2