Re: [PATCH 0/9] drm: Annotate structs with __counted_by

2023-10-02 Thread Christian König via Virtualization

Am 02.10.23 um 20:08 schrieb Kees Cook:

On Mon, Oct 02, 2023 at 08:01:57PM +0200, Christian König wrote:

Am 02.10.23 um 18:53 schrieb Kees Cook:

On Mon, Oct 02, 2023 at 11:06:19AM -0400, Alex Deucher wrote:

On Mon, Oct 2, 2023 at 5:20 AM Christian König
 wrote:

Am 29.09.23 um 21:33 schrieb Kees Cook:

On Fri, 22 Sep 2023 10:32:05 -0700, Kees Cook wrote:

This is a batch of patches touching drm for preparing for the coming
implementation by GCC and Clang of the __counted_by attribute. Flexible
array members annotated with __counted_by can have their accesses
bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array
indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions).

As found with Coccinelle[1], add __counted_by to structs that would
benefit from the annotation.

[...]

Since this got Acks, I figure I should carry it in my tree. Let me know
if this should go via drm instead.

Applied to for-next/hardening, thanks!

[1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with 
__counted_by
 https://git.kernel.org/kees/c/a6046ac659d6

STOP! In a follow up discussion Alex and I figured out that this won't work.

I'm so confused; from the discussion I saw that Alex said both instances
were false positives?


The value in the structure is byte swapped based on some firmware
endianness which not necessary matches the CPU endianness.

SMU10 is APU only so the endianess of the SMU firmware and the CPU
will always match.

Which I think is what is being said here?


Please revert that one from going upstream if it's already on it's way.

And because of those reasons I strongly think that patches like this
should go through the DRM tree :)

Sure, that's fine -- please let me know. It was others Acked/etc. Who
should carry these patches?

Probably best if the relevant maintainer pick them up individually.

Some of those structures are filled in by firmware/hardware and only the
maintainers can judge if that value actually matches what the compiler
needs.

We have cases where individual bits are used as flags or when the size is
byte swapped etc...

Even Alex and I didn't immediately say how and where that field is actually
used and had to dig that up. That's where the confusion came from.

Okay, I've dropped them all from my tree. Several had Acks/Reviews, so
hopefully those can get picked up for the DRM tree?


I will pick those up to go through drm-misc-next.

Going to ping maintainers once more when I'm not sure if stuff is 
correct or not.


Christian.



Thanks!

-Kees



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with __counted_by

2023-09-25 Thread Christian König via Virtualization

Am 22.09.23 um 19:41 schrieb Alex Deucher:

On Fri, Sep 22, 2023 at 1:32 PM Kees Cook  wrote:

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct 
smu10_voltage_dependency_table.

[1] 
https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Evan Quan 
Cc: Alex Deucher 
Cc: "Christian König" 
Cc: "Pan, Xinhui" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Xiaojian Du 
Cc: Huang Rui 
Cc: Kevin Wang 
Cc: amd-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Kees Cook 

Acked-by: Alex Deucher 


Mhm, I'm not sure if this is a good idea. That is a structure filled in 
by the firmware, isn't it?


That would imply that we might need to byte swap count before it is 
checkable.


Regards,
Christian.




---
  drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h
index 808e0ecbe1f0..42adc2a3dcbc 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h
@@ -192,7 +192,7 @@ struct smu10_clock_voltage_dependency_record {

  struct smu10_voltage_dependency_table {
 uint32_t count;
-   struct smu10_clock_voltage_dependency_record entries[];
+   struct smu10_clock_voltage_dependency_record entries[] 
__counted_by(count);
  };

  struct smu10_clock_voltage_information {
--
2.34.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v15 17/23] drm/shmem-helper: Add and use drm_gem_shmem_resv_assert_held() helper

2023-08-29 Thread Christian König via Virtualization

Am 29.08.23 um 09:29 schrieb Boris Brezillon:

On Tue, 29 Aug 2023 05:34:23 +0300
Dmitry Osipenko  wrote:


On 8/28/23 13:12, Boris Brezillon wrote:

On Sun, 27 Aug 2023 20:54:43 +0300
Dmitry Osipenko  wrote:
   

In a preparation of adding drm-shmem memory shrinker, move all reservation
locking lockdep checks to use new drm_gem_shmem_resv_assert_held() that
will resolve spurious lockdep warning about wrong locking order vs
fs_reclam code paths during freeing of shmem GEM, where lockdep isn't
aware that it's impossible to have locking contention with the fs_reclam
at this special time.

Signed-off-by: Dmitry Osipenko 
---
  drivers/gpu/drm/drm_gem_shmem_helper.c | 37 +-
  1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index d96fee3d6166..ca5da976aafa 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -128,6 +128,23 @@ struct drm_gem_shmem_object *drm_gem_shmem_create(struct 
drm_device *dev, size_t
  }
  EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
  
+static void drm_gem_shmem_resv_assert_held(struct drm_gem_shmem_object *shmem)

+{
+   /*
+* Destroying the object is a special case.. drm_gem_shmem_free()
+* calls many things that WARN_ON if the obj lock is not held.  But
+* acquiring the obj lock in drm_gem_shmem_free() can cause a locking
+* order inversion between reservation_ww_class_mutex and fs_reclaim.
+*
+* This deadlock is not actually possible, because no one should
+* be already holding the lock when drm_gem_shmem_free() is called.
+* Unfortunately lockdep is not aware of this detail.  So when the
+* refcount drops to zero, we pretend it is already locked.
+*/
+   if (kref_read(>base.refcount))
+   drm_gem_shmem_resv_assert_held(shmem);
+}
+
  /**
   * drm_gem_shmem_free - Free resources associated with a shmem GEM object
   * @shmem: shmem GEM object to free
@@ -142,8 +159,6 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem)
if (obj->import_attach) {
drm_prime_gem_destroy(obj, shmem->sgt);
} else if (!shmem->imported_sgt) {
-   dma_resv_lock(shmem->base.resv, NULL);
-
drm_WARN_ON(obj->dev, kref_read(>vmap_use_count));
  
  		if (shmem->sgt) {

@@ -156,8 +171,6 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem)
drm_gem_shmem_put_pages_locked(shmem);

AFAICT, drm_gem_shmem_put_pages_locked() is the only function that's
called in the free path and would complain about resv-lock not being
held. I think I'd feel more comfortable if we were adding a
drm_gem_shmem_free_pages() function that did everything
drm_gem_shmem_put_pages_locked() does except for the lock_held() check
and the refcount dec, and have it called here (and in
drm_gem_shmem_put_pages_locked()). This way we can keep using
dma_resv_assert_held() instead of having our own variant.

It's not only drm_gem_shmem_free_pages(), but any drm-shmem function
that drivers may use in the GEM's freeing callback.

For example, panfrost_gem_free_object() may unpin shmem BO and then do
drm_gem_shmem_free().

Is this really a valid use case?


I haven't followed the whole discussion, but I think it isn't a valid 
use case.


That page_use_count is none zero while the GEM object is about to be 
destroyed can only happen is someone managed to grab a reference to the 
page without referencing the GEM object.


This is turn usually happens when somebody incorrectly walks the CPU 
page tables and grabs page references where it shouldn't. KMS used to do 
this and we had already had a discussion that they shouldn't do this.


Regards,
Christian.



  If the GEM refcount dropped to zero,
we should certainly not have pages_pin_count > 0 (thinking of vmap-ed
buffers that might disappear while kernel still has a pointer to the
CPU-mapped area). The only reason we have this
drm_gem_shmem_put_pages_locked() in drm_gem_shmem_free() is because of
this implicit ref hold by the sgt, and IMHO, we should be stricter and
check that pages_use_count == 1 when sgt != NULL and pages_use_count ==
0 otherwise.

I actually think it's a good thing to try and catch any attempt to call
functions trying lock the resv in a path they're not supposed to. At
least we can decide whether these actions are valid or not in this
context, and provide dedicated helpers for the free path if they are.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v15 11/23] dma-resv: Add kref_put_dma_resv()

2023-08-28 Thread Christian König via Virtualization

Am 27.08.23 um 19:54 schrieb Dmitry Osipenko:

Add simple kref_put_dma_resv() helper that wraps around kref_put_ww_mutex()
for drivers that needs to lock dma-resv on kref_put().

It's not possible to easily add this helper to kref.h because of the
headers inclusion dependency, hence add it to dma-resv.h.


I was never really a big fan of kref_put_mutex() in the first place.

The main advantage comes from the included memory barrier, but this 
actually doesn't work like most people think it works and is usually 
pretty dangerous.


And IIRC this was done because of the some special behavior mutexes have 
with memory barriers and that isn't necessary with ww-mutex.


Christian.



Signed-off-by: Dmitry Osipenko 
---
  include/linux/dma-resv.h | 9 +
  1 file changed, 9 insertions(+)

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index 8d0e34dad446..c5cf302e4194 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -41,6 +41,7 @@
  
  #include 

  #include 
+#include 
  #include 
  #include 
  #include 
@@ -464,6 +465,14 @@ static inline void dma_resv_unlock(struct dma_resv *obj)
ww_mutex_unlock(>lock);
  }
  
+static inline int kref_put_dma_resv(struct kref *kref,

+   void (*release)(struct kref *kref),
+   struct dma_resv *resv,
+   struct ww_acquire_ctx *ctx)
+{
+   return kref_put_ww_mutex(kref, release, >lock, ctx);
+}
+
  void dma_resv_init(struct dma_resv *obj);
  void dma_resv_fini(struct dma_resv *obj);
  int dma_resv_reserve_fences(struct dma_resv *obj, unsigned int num_fences);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH -next 1/7] drm/amdkfd: Remove unnecessary NULL values

2023-08-09 Thread Christian König via Virtualization

Am 09.08.23 um 05:44 schrieb Ruan Jinjie:

The NULL initialization of the pointers assigned by kzalloc() first is
not necessary, because if the kzalloc() failed, the pointers will be
assigned NULL, otherwise it works as usual. so remove it.

Signed-off-by: Ruan Jinjie 


Reviewed-by: Christian König  for this one, 
the amd display code and the radeon stuff.


Thanks,
Christian.


---
  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
index 863cf060af48..d01bb57733b3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
@@ -48,7 +48,7 @@ int pipe_priority_map[] = {
  
  struct kfd_mem_obj *allocate_hiq_mqd(struct kfd_node *dev, struct queue_properties *q)

  {
-   struct kfd_mem_obj *mqd_mem_obj = NULL;
+   struct kfd_mem_obj *mqd_mem_obj;
  
  	mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_KERNEL);

if (!mqd_mem_obj)
@@ -64,7 +64,7 @@ struct kfd_mem_obj *allocate_hiq_mqd(struct kfd_node *dev, 
struct queue_properti
  struct kfd_mem_obj *allocate_sdma_mqd(struct kfd_node *dev,
struct queue_properties *q)
  {
-   struct kfd_mem_obj *mqd_mem_obj = NULL;
+   struct kfd_mem_obj *mqd_mem_obj;
uint64_t offset;
  
  	mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_KERNEL);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v1 00/52] drm/crtc: Rename struct drm_crtc::dev to drm_dev

2023-07-12 Thread Christian König via Virtualization

Am 12.07.23 um 15:38 schrieb Uwe Kleine-König:

Hello Maxime,

On Wed, Jul 12, 2023 at 02:52:38PM +0200, Maxime Ripard wrote:

On Wed, Jul 12, 2023 at 01:02:53PM +0200, Uwe Kleine-König wrote:

Background is that this makes merge conflicts easier to handle and detect.

Really?

FWIW, I agree with Christian here.


Each file (apart from include/drm/drm_crtc.h) is only touched once. So
unless I'm missing something you don't get less or easier conflicts by
doing it all in a single patch. But you gain the freedom to drop a
patch for one driver without having to drop the rest with it.

Not really, because the last patch removed the union anyway. So you have
to revert both the last patch, plus that driver one. And then you need
to add a TODO to remove that union eventually.

Yes, with a single patch you have only one revert (but 194 files changed,
1264 insertions(+), 1296 deletions(-)) instead of two (one of them: 1
file changed, 9 insertions(+), 1 deletion(-); the other maybe a bit
bigger). (And maybe you get away with just reverting the last patch.)

With a single patch the TODO after a revert is "redo it all again (and
prepare for a different set of conflicts)" while with the split series
it's only "fix that one driver that was forgotten/borked" + reapply that
10 line patch.


Yeah, but for a maintainer the size of the patches doesn't matter. 
That's only interesting if you need to manually review the patch, which 
you hopefully doesn't do in case of something auto-generated.


In other words if the patch is auto-generated re-applying it completely 
is less work than fixing things up individually.



  As the one who gets that TODO, I prefer the latter.


Yeah, but your personal preferences are not a technical relevant 
argument to a maintainer.


At the end of the day Dave or Daniel need to decide, because they need 
to live with it.


Regards,
Christian.



So in sum: If your metric is "small count of reverted commits", you're
right. If however your metric is: Better get 95% of this series' change
in than maybe 0%, the split series is the way to do it.

With me having spend ~3h on this series' changes, it's maybe
understandable that I did it the way I did.

FTR: This series was created on top of v6.5-rc1. If you apply it to
drm-misc-next you get a (trivial) conflict in patch #2. If I consider to
be the responsible maintainer who applies this series, I like being able
to just do git am --skip then.

FTR#2: In drm-misc-next is a new driver
(drivers/gpu/drm/loongson/lsdc_crtc.c) so skipping the last patch for
now might indeed be a good idea.


So I still like the split version better, but I'm open to a more
verbose reasoning from your side.

You're doing only one thing here, really: you change the name of a
structure field. If it was shared between multiple maintainers, then
sure, splitting that up is easier for everyone, but this will go through
drm-misc, so I can't see the benefit it brings.

I see your argument, but I think mine weights more.

Best regards
Uwe



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v1 00/52] drm/crtc: Rename struct drm_crtc::dev to drm_dev

2023-07-12 Thread Christian König via Virtualization

Am 12.07.23 um 11:46 schrieb Uwe Kleine-König:

Hello,

while I debugged an issue in the imx-lcdc driver I was constantly
irritated about struct drm_device pointer variables being named "dev"
because with that name I usually expect a struct device pointer.

I think there is a big benefit when these are all renamed to "drm_dev".
I have no strong preference here though, so "drmdev" or "drm" are fine
for me, too. Let the bikesheding begin!

Some statistics:

$ git grep -ohE 'struct drm_device *\* *[^ (),;]*' v6.5-rc1 | sort | uniq -c | 
sort -n
   1 struct drm_device *adev_to_drm
   1 struct drm_device *drm_
   1 struct drm_device  *drm_dev
   1 struct drm_device*drm_dev
   1 struct drm_device *pdev
   1 struct drm_device *rdev
   1 struct drm_device *vdev
   2 struct drm_device *dcss_drv_dev_to_drm
   2 struct drm_device **ddev
   2 struct drm_device *drm_dev_alloc
   2 struct drm_device *mock
   2 struct drm_device *p_ddev
   5 struct drm_device *device
   9 struct drm_device * dev
  25 struct drm_device *d
  95 struct drm_device *
 216 struct drm_device *ddev
 234 struct drm_device *drm_dev
 611 struct drm_device *drm
4190 struct drm_device *dev

This series starts with renaming struct drm_crtc::dev to drm_dev. If
it's not only me and others like the result of this effort it should be
followed up by adapting the other structs and the individual usages in
the different drivers.

To make this series a bit easier handleable, I first added an alias for
drm_crtc::dev, then converted the drivers one after another and the last
patch drops the "dev" name. This has the advantage of being easier to
review, and if I should have missed an instance only the last patch must
be dropped/reverted. Also this series might conflict with other patches,
in this case the remaining patches can still go in (apart from the last
one of course). Maybe it also makes sense to delay applying the last
patch by one development cycle?


When you automatically generate the patch (with cocci for example) I 
usually prefer a single patch instead.


Background is that this makes merge conflicts easier to handle and detect.

When you have multiple patches and a merge conflict because of some 
added lines using the old field the build breaks only on the last patch 
which removes the old field.


In such cases reviewing the patch just means automatically re-generating 
it and double checking that you don't see anything funky.


Apart from that I honestly absolutely don't care what the name is.

Cheers,
Christian.



The series was compile tested for arm, arm64, powerpc and amd64 using an
allmodconfig (though I only build drivers/gpu/).

Best regards
Uwe

Uwe Kleine-König (52):
   drm/crtc: Start renaming struct drm_crtc::dev to drm_dev
   drm/core: Use struct drm_crtc::drm_dev instead of struct drm_crtc::dev
   drm/amd: Use struct drm_crtc::drm_dev instead of struct drm_crtc::dev
   drm/armada: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/arm: Use struct drm_crtc::drm_dev instead of struct drm_crtc::dev
   drm/aspeed: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/ast: Use struct drm_crtc::drm_dev instead of struct drm_crtc::dev
   drm/atmel-hlcdc: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/exynos: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/fsl-dcu: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/gma500: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/gud: Use struct drm_crtc::drm_dev instead of struct drm_crtc::dev
   drm/hisilicon: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/hyperv: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/i915: Use struct drm_crtc::drm_dev instead of struct drm_crtc::dev
   drm/imx: Use struct drm_crtc::drm_dev instead of struct drm_crtc::dev
   drm/ingenic: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/kmb: Use struct drm_crtc::drm_dev instead of struct drm_crtc::dev
   drm/logicvc: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/mcde: Use struct drm_crtc::drm_dev instead of struct drm_crtc::dev
   drm/mediatek: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/meson: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/mgag200: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/msm: Use struct drm_crtc::drm_dev instead of struct drm_crtc::dev
   drm/mxsfb: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/nouveau: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/omapdrm: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/panel-ili9341: Use struct drm_crtc::drm_dev instead of struct
 drm_crtc::dev
   drm/pl111: Use struct 

Re: [PATCH] drm: Remove unnecessary (void*) conversions

2023-05-26 Thread Christian König via Virtualization

Am 26.05.23 um 05:32 schrieb Su Hui:

Pointer variables of (void*) type do not require type cast.


Please split that up by subsystem/driver. Taking it through the misc 
tree might just cause merge conflicts.


Christian.



Signed-off-by: Su Hui 
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c | 2 +-
  drivers/gpu/drm/amd/pm/amdgpu_pm.c| 2 +-
  drivers/gpu/drm/etnaviv/etnaviv_drv.c | 4 ++--
  drivers/gpu/drm/nouveau/nouveau_debugfs.c | 2 +-
  drivers/gpu/drm/omapdrm/omap_debugfs.c| 6 +++---
  drivers/gpu/drm/pl111/pl111_debugfs.c | 2 +-
  drivers/gpu/drm/qxl/qxl_debugfs.c | 4 ++--
  drivers/gpu/drm/tiny/arcpgu.c | 2 +-
  drivers/gpu/drm/ttm/ttm_resource.c| 3 +--
  drivers/gpu/drm/virtio/virtgpu_debugfs.c  | 6 +++---
  drivers/gpu/drm/vmwgfx/ttm_object.c   | 5 ++---
  drivers/gpu/drm/vmwgfx/vmwgfx_gem.c   | 2 +-
  12 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
index 827fcb4fb3b3..8a2c39927167 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
@@ -3312,7 +3312,7 @@ static ssize_t dtn_log_write(
  
  static int mst_topo_show(struct seq_file *m, void *unused)

  {
-   struct amdgpu_device *adev = (struct amdgpu_device *)m->private;
+   struct amdgpu_device *adev = m->private;
struct drm_device *dev = adev_to_drm(adev);
struct drm_connector *connector;
struct drm_connector_list_iter conn_iter;
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 58c2246918fd..e6c870bd307b 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -3671,7 +3671,7 @@ static void amdgpu_parse_cg_state(struct seq_file *m, u64 
flags)
  
  static int amdgpu_debugfs_pm_info_show(struct seq_file *m, void *unused)

  {
-   struct amdgpu_device *adev = (struct amdgpu_device *)m->private;
+   struct amdgpu_device *adev = m->private;
struct drm_device *dev = adev_to_drm(adev);
u64 flags = 0;
int r;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c 
b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
index 31a7f59ccb49..dd57f7164e9a 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
@@ -198,7 +198,7 @@ static int etnaviv_ring_show(struct etnaviv_gpu *gpu, 
struct seq_file *m)
  
  static int show_unlocked(struct seq_file *m, void *arg)

  {
-   struct drm_info_node *node = (struct drm_info_node *) m->private;
+   struct drm_info_node *node = m->private;
struct drm_device *dev = node->minor->dev;
int (*show)(struct drm_device *dev, struct seq_file *m) =
node->info_ent->data;
@@ -208,7 +208,7 @@ static int show_unlocked(struct seq_file *m, void *arg)
  
  static int show_each_gpu(struct seq_file *m, void *arg)

  {
-   struct drm_info_node *node = (struct drm_info_node *) m->private;
+   struct drm_info_node *node = m->private;
struct drm_device *dev = node->minor->dev;
struct etnaviv_drm_private *priv = dev->dev_private;
struct etnaviv_gpu *gpu;
diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c 
b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
index 2a36d1ca8fda..96b59d5d68ed 100644
--- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c
+++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
@@ -37,7 +37,7 @@
  static int
  nouveau_debugfs_vbios_image(struct seq_file *m, void *data)
  {
-   struct drm_info_node *node = (struct drm_info_node *) m->private;
+   struct drm_info_node *node = m->private;
struct nouveau_drm *drm = nouveau_drm(node->minor->dev);
int i;
  
diff --git a/drivers/gpu/drm/omapdrm/omap_debugfs.c b/drivers/gpu/drm/omapdrm/omap_debugfs.c

index a3d470468e5b..a94ce502e152 100644
--- a/drivers/gpu/drm/omapdrm/omap_debugfs.c
+++ b/drivers/gpu/drm/omapdrm/omap_debugfs.c
@@ -19,7 +19,7 @@
  
  static int gem_show(struct seq_file *m, void *arg)

  {
-   struct drm_info_node *node = (struct drm_info_node *) m->private;
+   struct drm_info_node *node = m->private;
struct drm_device *dev = node->minor->dev;
struct omap_drm_private *priv = dev->dev_private;
  
@@ -33,7 +33,7 @@ static int gem_show(struct seq_file *m, void *arg)
  
  static int mm_show(struct seq_file *m, void *arg)

  {
-   struct drm_info_node *node = (struct drm_info_node *) m->private;
+   struct drm_info_node *node = m->private;
struct drm_device *dev = node->minor->dev;
struct drm_printer p = drm_seq_file_printer(m);
  
@@ -45,7 +45,7 @@ static int mm_show(struct seq_file *m, void *arg)

  #ifdef 

Re: [PATCH] drm/gem: Expose the buffer object handle to userspace last

2023-02-23 Thread Christian König via Virtualization

Am 20.02.23 um 11:23 schrieb Tvrtko Ursulin:


On 20/02/2023 10:01, Christian König wrote:

Am 20.02.23 um 10:55 schrieb Tvrtko Ursulin:


Hi,

On 14/02/2023 13:59, Christian König wrote:

Am 14.02.23 um 13:50 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Currently drm_gem_handle_create_tail exposes the handle to userspace
before the buffer object constructions is complete. This allowing
of working against a partially constructed object, which may also 
be in

the process of having its creation fail, can have a range of negative
outcomes.

A lot of those will depend on what the individual drivers are 
doing in
their obj->funcs->open() callbacks, and also with a common failure 
mode

being -ENOMEM from drm_vma_node_allow.

We can make sure none of this can happen by allocating a handle last,
although with a downside that more of the function now runs under the
dev->object_name_lock.

Looking into the individual drivers open() hooks, we have
amdgpu_gem_object_open which seems like it could have a potential 
security

issue without this change.

A couple drivers like qxl_gem_object_open and vmw_gem_object_open
implement no-op hooks so no impact for them.

A bunch of other require a deeper look by individual owners to 
asses for

impact. Those are lima_gem_object_open, nouveau_gem_object_open,
panfrost_gem_open, radeon_gem_object_open and 
virtio_gpu_gem_object_open.


Putting aside the risk assesment of the above, some common 
scenarios to

think about are along these lines:

1)
Userspace closes a handle by speculatively "guessing" it from a 
second

thread.

This results in an unreachable buffer object so, a memory leak.

2)
Same as 1), but object is in the process of getting closed (failed
creation).

The second thread is then able to re-cycle the handle and 
idr_remove would
in the first thread would then remove the handle it does not own 
from the

idr.

3)
Going back to the earlier per driver problem space - individual 
impact
assesment of allowing a second thread to access and operate on a 
partially

constructed handle / object. (Can something crash? Leak information?)

In terms of identifying when the problem started I will tag some 
patches

as references, but not all, if even any, of them actually point to a
broken state. I am just identifying points at which more 
opportunity for

issues to arise was added.


Yes I've looked into this once as well, but couldn't completely 
solve it for some reason.


Give me a day or two to get this tested and all the logic swapped 
back into my head again.


Managed to recollect what the problem with earlier attempts was?


Nope, that's way to long ago. I can only assume that I ran into 
problems with the object_name_lock.


Probably best to double check if that doesn't result in a lock 
inversion when somebody grabs the reservation lock in their ->load() 
callback.


Hmm I don't immediately follow the connection. But I have only found 
radeon_driver_load_kms as using the load callback. Is there any 
lockdep enabled CI for that driver which could tell us if there is a 
problem there?


We don't have CI for radeon and most likely never will, that hw is just 
to old. But we also have amdgpu_gem_object_open() and that looks 
suspiciously like trouble.


The function makes sure that every BO is registered in the VM house 
keeping functions of the drm_file and while doing so grabs a few locks. 
I'm not sure what the locking order of those are.


Could be that this will work, could be that it breaks. I will ping 
internally today if somebody from my team can take a look at this patch.


Regards,
Christian.



Regards,

Tvrtko



Regards,
Christian.



Regards,

Tvrtko


Christian.



References: 304eda32920b ("drm/gem: add hooks to notify driver 
when object handle is created/destroyed")

References: ca481c9b2a3a ("drm/gem: implement vma access management")
References: b39b5394fabc ("drm/gem: Add drm_gem_object_funcs")
Cc: dri-de...@lists.freedesktop.org
Cc: Rob Clark 
Cc: Ben Skeggs 
Cc: David Herrmann 
Cc: Noralf Trønnes 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: amd-...@lists.freedesktop.org
Cc: l...@lists.freedesktop.org
Cc: nouv...@lists.freedesktop.org
Cc: Steven Price 
Cc: virtualization@lists.linux-foundation.org
Cc: spice-de...@lists.freedesktop.org
Cc: Zack Rusin 
---
  drivers/gpu/drm/drm_gem.c | 48 
+++

  1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index aa15c52ae182..e3d897bca0f2 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -356,52 +356,52 @@ drm_gem_handle_create_tail(struct drm_file 
*file_priv,

 u32 *handlep)
  {
  struct drm_device *dev = obj->dev;
-    u32 handle;
  int ret;
WARN_ON(!mutex_is_locked(>object_name_lock));
  if (obj->handle_count++ == 0)
  drm_gem_object_get(obj);
+    ret = drm_vma_node_allow(>vma_node, file_priv);
+    if (ret)
+    goto err_put;
+
+    if 

Re: [PATCH] drm/gem: Expose the buffer object handle to userspace last

2023-02-14 Thread Christian König via Virtualization

Am 14.02.23 um 13:50 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Currently drm_gem_handle_create_tail exposes the handle to userspace
before the buffer object constructions is complete. This allowing
of working against a partially constructed object, which may also be in
the process of having its creation fail, can have a range of negative
outcomes.

A lot of those will depend on what the individual drivers are doing in
their obj->funcs->open() callbacks, and also with a common failure mode
being -ENOMEM from drm_vma_node_allow.

We can make sure none of this can happen by allocating a handle last,
although with a downside that more of the function now runs under the
dev->object_name_lock.

Looking into the individual drivers open() hooks, we have
amdgpu_gem_object_open which seems like it could have a potential security
issue without this change.

A couple drivers like qxl_gem_object_open and vmw_gem_object_open
implement no-op hooks so no impact for them.

A bunch of other require a deeper look by individual owners to asses for
impact. Those are lima_gem_object_open, nouveau_gem_object_open,
panfrost_gem_open, radeon_gem_object_open and virtio_gpu_gem_object_open.

Putting aside the risk assesment of the above, some common scenarios to
think about are along these lines:

1)
Userspace closes a handle by speculatively "guessing" it from a second
thread.

This results in an unreachable buffer object so, a memory leak.

2)
Same as 1), but object is in the process of getting closed (failed
creation).

The second thread is then able to re-cycle the handle and idr_remove would
in the first thread would then remove the handle it does not own from the
idr.

3)
Going back to the earlier per driver problem space - individual impact
assesment of allowing a second thread to access and operate on a partially
constructed handle / object. (Can something crash? Leak information?)

In terms of identifying when the problem started I will tag some patches
as references, but not all, if even any, of them actually point to a
broken state. I am just identifying points at which more opportunity for
issues to arise was added.


Yes I've looked into this once as well, but couldn't completely solve it 
for some reason.


Give me a day or two to get this tested and all the logic swapped back 
into my head again.


Christian.



References: 304eda32920b ("drm/gem: add hooks to notify driver when object handle is 
created/destroyed")
References: ca481c9b2a3a ("drm/gem: implement vma access management")
References: b39b5394fabc ("drm/gem: Add drm_gem_object_funcs")
Cc: dri-de...@lists.freedesktop.org
Cc: Rob Clark 
Cc: Ben Skeggs 
Cc: David Herrmann 
Cc: Noralf Trønnes 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: amd-...@lists.freedesktop.org
Cc: l...@lists.freedesktop.org
Cc: nouv...@lists.freedesktop.org
Cc: Steven Price 
Cc: virtualization@lists.linux-foundation.org
Cc: spice-de...@lists.freedesktop.org
Cc: Zack Rusin 
---
  drivers/gpu/drm/drm_gem.c | 48 +++
  1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index aa15c52ae182..e3d897bca0f2 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -356,52 +356,52 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
   u32 *handlep)
  {
struct drm_device *dev = obj->dev;
-   u32 handle;
int ret;
  
  	WARN_ON(!mutex_is_locked(>object_name_lock));

if (obj->handle_count++ == 0)
drm_gem_object_get(obj);
  
+	ret = drm_vma_node_allow(>vma_node, file_priv);

+   if (ret)
+   goto err_put;
+
+   if (obj->funcs->open) {
+   ret = obj->funcs->open(obj, file_priv);
+   if (ret)
+   goto err_revoke;
+   }
+
/*
-* Get the user-visible handle using idr.  Preload and perform
-* allocation under our spinlock.
+* Get the user-visible handle using idr as the _last_ step.
+* Preload and perform allocation under our spinlock.
 */
idr_preload(GFP_KERNEL);
spin_lock(_priv->table_lock);
-
ret = idr_alloc(_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
-
spin_unlock(_priv->table_lock);
idr_preload_end();
  
-	mutex_unlock(>object_name_lock);

if (ret < 0)
-   goto err_unref;
-
-   handle = ret;
+   goto err_close;
  
-	ret = drm_vma_node_allow(>vma_node, file_priv);

-   if (ret)
-   goto err_remove;
+   mutex_unlock(>object_name_lock);
  
-	if (obj->funcs->open) {

-   ret = obj->funcs->open(obj, file_priv);
-   if (ret)
-   goto err_revoke;
-   }
+   *handlep = ret;
  
-	*handlep = handle;

return 0;
  
+err_close:

+   if (obj->funcs->close)
+   obj->funcs->close(obj, file_priv);
  err_revoke:

Re: [drm-misc:drm-misc-next] [drm/ttm] 1802537820: WARNING:at_drivers/gpu/drm/drm_gem_vram_helper.c:#drm_gem_vram_offset

2023-02-08 Thread Christian König via Virtualization

Am 08.02.23 um 10:38 schrieb Matthew Auld:

On Wed, 8 Feb 2023 at 08:32, Christian König  wrote:

Hey guys,

I'm pretty sure this is a bug in bochs which happens to surface because
of a recent TTM change, we have seen similar problems in the past with
this driver.

What happens is that userspace tries to bind a BO to a CRTC before the
BO has even a backing store.

Any idea how to fix this? I can just remove the warning, but that's not
really a good solution.

IIUC this driver is just using ttm_bo_move_memcpy() underneath for its
bo_move callback, which looks to be doing:

if (!bo->resource)
 return 0;

Which doesn't make any sense to me.There should at least be a
move_null(), and maybe also a multi-hop to handle clearing. Otherwise
bo->resource is likely always NULL (and we hit the above warning),
even after the dummy move. What do you think?


Oh, good point. That should indeed be move_null().

Do you want to write a patch or should I take care of this?

Thanks for pointing that out,
Christian.




Regards,
Christian.

Am 08.02.23 um 05:32 schrieb kernel test robot:

Greeting,

FYI, we noticed 
WARNING:at_drivers/gpu/drm/drm_gem_vram_helper.c:#drm_gem_vram_offset due to 
commit (built with gcc-11):

commit: 1802537820389183dfcd814e0f6a60d1496a75ef ("drm/ttm: stop allocating dummy 
resources during BO creation")
git://anongit.freedesktop.org/drm/drm-misc drm-misc-next

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot 
| Link: 
https://lore.kernel.org/oe-lkp/202302081038.984b8c1-oliver.s...@intel.com


[   25.994992][T1] [ cut here ]
[ 25.995050][ T1] WARNING: CPU: 0 PID: 1 at 
drivers/gpu/drm/drm_gem_vram_helper.c:255 drm_gem_vram_offset (??:?)
[   25.995080][T1] Modules linked in:
[   25.995100][T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G
T  6.2.0-rc6-01191-g180253782038 #1 a8db67375c3ac749313dafaec43f39836e38fae9
[   25.995117][T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[ 25.995128][ T1] RIP: 0010:drm_gem_vram_offset (??:?)
[ 25.995144][ T1] Code: 02 00 00 b8 ff ff 37 00 48 89 fa 48 c1 e0 2a 48 c1 ea 03 80 
3c 02 00 74 05 e8 7f 1f eb fe 48 8b 9b 20 02 00 00 48 85 db 75 06 <0f> 0b 31 c0 
eb 4b 48 8d 7b 10 b8 ff ff 37 00 48 89 fa 48 c1 e0 2a
All code

 0:02 00   add(%rax),%al
 2:00 b8 ff ff 37 00   add%bh,0x37(%rax)
 8:48 89 famov%rdi,%rdx
 b:48 c1 e0 2a shl$0x2a,%rax
 f:48 c1 ea 03 shr$0x3,%rdx
13:80 3c 02 00 cmpb   $0x0,(%rdx,%rax,1)
17:74 05   je 0x1e
19:e8 7f 1f eb fe  callq  0xfeeb1f9d
1e:48 8b 9b 20 02 00 00mov0x220(%rbx),%rbx
25:48 85 dbtest   %rbx,%rbx
28:75 06   jne0x30
2a:*   0f 0b   ud2 <-- trapping instruction
2c:31 c0   xor%eax,%eax
2e:eb 4b   jmp0x7b
30:48 8d 7b 10 lea0x10(%rbx),%rdi
34:b8 ff ff 37 00  mov$0x37,%eax
39:48 89 famov%rdi,%rdx
3c:48 c1 e0 2a shl$0x2a,%rax

Code starting with the faulting instruction
===
 0:0f 0b   ud2
 2:31 c0   xor%eax,%eax
 4:eb 4b   jmp0x51
 6:48 8d 7b 10 lea0x10(%rbx),%rdi
 a:b8 ff ff 37 00  mov$0x37,%eax
 f:48 89 famov%rdi,%rdx
12:48 c1 e0 2a shl$0x2a,%rax
[   25.995156][T1] RSP: :c901f028 EFLAGS: 00210246
[   25.995174][T1] RAX: dc00 RBX:  RCX: 

[   25.995186][T1] RDX: 111026dee544 RSI: 8881372d4b10 RDI: 
888136f72a20
[   25.995196][T1] RBP: c901f030 R08:  R09: 

[   25.995206][T1] R10:  R11:  R12: 
8881372d4b00
[   25.995215][T1] R13: 888136e9ee00 R14: 888136f4a060 R15: 
0500
[   25.995225][T1] FS:  () GS:8883aee0() 
knlGS:
[   25.995236][T1] CS:  0010 DS:  ES:  CR0: 80050033
[   25.995247][T1] CR2: f7fa1cd4 CR3: 06015000 CR4: 
000406b0
[   25.995262][T1] Call Trace:
[   25.995271][T1]  
[ 25.995287][ T1] bochs_plane_update (bochs.c:?)
[ 25.995308][ T1] ? 

Re: [drm-misc:drm-misc-next] [drm/ttm] 1802537820: WARNING:at_drivers/gpu/drm/drm_gem_vram_helper.c:#drm_gem_vram_offset

2023-02-08 Thread Christian König via Virtualization

Hey guys,

I'm pretty sure this is a bug in bochs which happens to surface because 
of a recent TTM change, we have seen similar problems in the past with 
this driver.


What happens is that userspace tries to bind a BO to a CRTC before the 
BO has even a backing store.


Any idea how to fix this? I can just remove the warning, but that's not 
really a good solution.


Regards,
Christian.

Am 08.02.23 um 05:32 schrieb kernel test robot:

Greeting,

FYI, we noticed 
WARNING:at_drivers/gpu/drm/drm_gem_vram_helper.c:#drm_gem_vram_offset due to 
commit (built with gcc-11):

commit: 1802537820389183dfcd814e0f6a60d1496a75ef ("drm/ttm: stop allocating dummy 
resources during BO creation")
git://anongit.freedesktop.org/drm/drm-misc drm-misc-next

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot 
| Link: 
https://lore.kernel.org/oe-lkp/202302081038.984b8c1-oliver.s...@intel.com


[   25.994992][T1] [ cut here ]
[ 25.995050][ T1] WARNING: CPU: 0 PID: 1 at 
drivers/gpu/drm/drm_gem_vram_helper.c:255 drm_gem_vram_offset (??:?)
[   25.995080][T1] Modules linked in:
[   25.995100][T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G
T  6.2.0-rc6-01191-g180253782038 #1 a8db67375c3ac749313dafaec43f39836e38fae9
[   25.995117][T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[ 25.995128][ T1] RIP: 0010:drm_gem_vram_offset (??:?)
[ 25.995144][ T1] Code: 02 00 00 b8 ff ff 37 00 48 89 fa 48 c1 e0 2a 48 c1 ea 03 80 
3c 02 00 74 05 e8 7f 1f eb fe 48 8b 9b 20 02 00 00 48 85 db 75 06 <0f> 0b 31 c0 
eb 4b 48 8d 7b 10 b8 ff ff 37 00 48 89 fa 48 c1 e0 2a
All code

0:  02 00   add(%rax),%al
2:  00 b8 ff ff 37 00   add%bh,0x37(%rax)
8:  48 89 famov%rdi,%rdx
b:  48 c1 e0 2a shl$0x2a,%rax
f:  48 c1 ea 03 shr$0x3,%rdx
   13:  80 3c 02 00 cmpb   $0x0,(%rdx,%rax,1)
   17:  74 05   je 0x1e
   19:  e8 7f 1f eb fe  callq  0xfeeb1f9d
   1e:  48 8b 9b 20 02 00 00mov0x220(%rbx),%rbx
   25:  48 85 dbtest   %rbx,%rbx
   28:  75 06   jne0x30
   2a:* 0f 0b   ud2 <-- trapping instruction
   2c:  31 c0   xor%eax,%eax
   2e:  eb 4b   jmp0x7b
   30:  48 8d 7b 10 lea0x10(%rbx),%rdi
   34:  b8 ff ff 37 00  mov$0x37,%eax
   39:  48 89 famov%rdi,%rdx
   3c:  48 c1 e0 2a shl$0x2a,%rax

Code starting with the faulting instruction
===
0:  0f 0b   ud2
2:  31 c0   xor%eax,%eax
4:  eb 4b   jmp0x51
6:  48 8d 7b 10 lea0x10(%rbx),%rdi
a:  b8 ff ff 37 00  mov$0x37,%eax
f:  48 89 famov%rdi,%rdx
   12:  48 c1 e0 2a shl$0x2a,%rax
[   25.995156][T1] RSP: :c901f028 EFLAGS: 00210246
[   25.995174][T1] RAX: dc00 RBX:  RCX: 

[   25.995186][T1] RDX: 111026dee544 RSI: 8881372d4b10 RDI: 
888136f72a20
[   25.995196][T1] RBP: c901f030 R08:  R09: 

[   25.995206][T1] R10:  R11:  R12: 
8881372d4b00
[   25.995215][T1] R13: 888136e9ee00 R14: 888136f4a060 R15: 
0500
[   25.995225][T1] FS:  () GS:8883aee0() 
knlGS:
[   25.995236][T1] CS:  0010 DS:  ES:  CR0: 80050033
[   25.995247][T1] CR2: f7fa1cd4 CR3: 06015000 CR4: 
000406b0
[   25.995262][T1] Call Trace:
[   25.995271][T1]  
[ 25.995287][ T1] bochs_plane_update (bochs.c:?)
[ 25.995308][ T1] ? rcu_read_lock_bh_held (??:?)
[ 25.995337][ T1] ? bochs_pci_probe (bochs.c:?)
[ 25.995352][ T1] ? srcu_read_unlock (blk-mq.c:?)
[ 25.995396][ T1] bochs_pipe_enable (bochs.c:?)
[ 25.995410][ T1] ? drm_dev_printk (??:?)
[ 25.995435][ T1] ? bochs_pipe_update (bochs.c:?)
[ 25.995454][ T1] ? bochs_plane_update (bochs.c:?)
[ 25.995473][ T1] ? bochs_pipe_update (bochs.c:?)
[ 25.995487][ T1] ? bochs_pipe_update (bochs.c:?)
[ 25.995507][ T1] drm_simple_kms_crtc_enable (drm_simple_kms_helper.c:?)
[ 25.995533][ T1] drm_atomic_helper_commit_modeset_enables (??:?)
[ 25.995570][ T1] drm_atomic_helper_commit_tail (??:?)
[ 25.995591][ T1] commit_tail (drm_atomic_helper.c:?)
[ 25.995631][ T1] drm_atomic_helper_commit (??:?)
[ 25.995652][ T1] ? commit_work (??:?)
[ 25.995670][ T1] drm_atomic_commit (??:?)
[ 25.995689][ T1] ? 

Re: [PATCH 01/21] drm/amdgpu: Don't set struct drm_driver.lastclose

2022-10-20 Thread Christian König via Virtualization

Am 20.10.22 um 12:37 schrieb Thomas Zimmermann:

Don't set struct drm_driver.lastclose. It's used to restore the
fbdev console. But as amdgpu uses generic fbdev emulation, the
console is being restored by the DRM client helpers already. See
the call to drm_client_dev_restore() in drm_lastclose().


???

The commit message doesn't match what the patch is doing. You are 
removing output_poll_changed instead of lastclose here.


Did something got mixed up?

Cheers,
Christian.



Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   | 1 -
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 --
  2 files changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 23998f727c7f9..fb7186c5ade2a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -1224,7 +1224,6 @@ amdgpu_display_user_framebuffer_create(struct drm_device 
*dev,
  
  const struct drm_mode_config_funcs amdgpu_mode_funcs = {

.fb_create = amdgpu_display_user_framebuffer_create,
-   .output_poll_changed = drm_fb_helper_output_poll_changed,
  };
  
  static const struct drm_prop_enum_list amdgpu_underscan_enum_list[] =

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index f6a9e8fdd87d6..e9a28a5363b9a 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -82,7 +82,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
@@ -2810,7 +2809,6 @@ const struct amdgpu_ip_block_version dm_ip_block =
  static const struct drm_mode_config_funcs amdgpu_dm_mode_funcs = {
.fb_create = amdgpu_display_user_framebuffer_create,
.get_format_info = amd_get_format_info,
-   .output_poll_changed = drm_fb_helper_output_poll_changed,
.atomic_check = amdgpu_dm_atomic_check,
.atomic_commit = drm_atomic_helper_commit,
  };


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 00/21] Move all drivers to a common dma-buf locking convention

2022-10-18 Thread Christian König via Virtualization

Am 18.10.22 um 01:07 schrieb Dmitry Osipenko:

On 10/17/22 20:22, Dmitry Osipenko wrote:

Hello,

This series moves all drivers to a dynamic dma-buf locking specification.
 From now on all dma-buf importers are made responsible for holding
dma-buf's reservation lock around all operations performed over dma-bufs
in accordance to the locking specification. This allows us to utilize
reservation lock more broadly around kernel without fearing of a potential
deadlocks.

This patchset passes all i915 selftests. It was also tested using VirtIO,
Panfrost, Lima, Tegra, udmabuf, AMDGPU and Nouveau drivers. I tested cases
of display+GPU, display+V4L and GPU+V4L dma-buf sharing (where appropriate),
which covers majority of kernel drivers since rest of the drivers share
same or similar code paths.

Changelog:

v7: - Rebased on top of recent drm-misc-next.

 - Added ack from Jason Gunthorpe to the RDMA patch.

 - Added iosys_map_clear() to dma_buf_vmap_unlocked(), making it fully
   consistent with dma_buf_vmap().

v6: - Added r-b from Michael Ruhl to the i915 patch.

 - Added acks from Sumit Semwal and updated commit message of the
   "Move dma_buf_vmap() to dynamic locking specification" patch like
   was suggested by Sumit.

 - Added "!dmabuf" check to dma_buf_vmap_unlocked() to match the locked
   variant of the function, for consistency.

v5: - Added acks and r-bs that were given to v4.

 - Changed i915 preparation patch like was suggested by Michael Ruhl.
   The scope of reservation locking is smaller now.

v4: - Added dma_buf_mmap() to the "locking convention" documentation,
   which was missed by accident in v3.

 - Added acks from Christian König, Tomasz Figa and Hans Verkuil that
   they gave to couple v3 patches.

 - Dropped the "_unlocked" postfix from function names that don't have
   the locked variant, as was requested by Christian König.

 - Factored out the per-driver preparations into separate patches
   to ease reviewing of the changes, which is now doable without the
   global dma-buf functions renaming.

 - Factored out the dynamic locking convention enforcements into separate
   patches which add the final dma_resv_assert_held(dmabuf->resv) to the
   dma-buf API functions.

v3: - Factored out dma_buf_mmap_unlocked() and attachment functions
   into aseparate patches, like was suggested by Christian König.

 - Corrected and factored out dma-buf locking documentation into
   a separate patch, like was suggested by Christian König.

 - Intel driver dropped the reservation locking fews days ago from
   its BO-release code path, but we need that locking for the imported
   GEMs because in the end that code path unmaps the imported GEM.
   So I added back the locking needed by the imported GEMs, updating
   the "dma-buf attachment locking specification" patch appropriately.

 - Tested Nouveau+Intel dma-buf import/export combo.

 - Tested udmabuf import to i915/Nouveau/AMDGPU.

 - Fixed few places in Etnaviv, Panfrost and Lima drivers that I missed
   to switch to locked dma-buf vmapping in the drm/gem: Take reservation
   lock for vmap/vunmap operations" patch. In a result invalidated the
   Christian's r-b that he gave to v2.

 - Added locked dma-buf vmap/vunmap functions that are needed for fixing
   vmappping of Etnaviv, Panfrost and Lima drivers mentioned above.
   I actually had this change stashed for the drm-shmem shrinker patchset,
   but then realized that it's already needed by the dma-buf patches.
   Also improved my tests to better cover these code paths.

v2: - Changed locking specification to avoid problems with a cross-driver
   ww locking, like was suggested by Christian König. Now the attach/detach
   callbacks are invoked without the held lock and exporter should take the
   lock.

 - Added "locking convention" documentation that explains which dma-buf
   functions and callbacks are locked/unlocked for importers and exporters,
   which was requested by Christian König.

 - Added ack from Tomasz Figa to the V4L patches that he gave to v1.

Dmitry Osipenko (21):
   dma-buf: Add unlocked variant of vmapping functions
   dma-buf: Add unlocked variant of attachment-mapping functions
   drm/gem: Take reservation lock for vmap/vunmap operations
   drm/prime: Prepare to dynamic dma-buf locking specification
   drm/armada: Prepare to dynamic dma-buf locking specification
   drm/i915: Prepare to dynamic dma-buf locking specification
   drm/omapdrm: Prepare to dynamic dma-buf locking specification
   drm/tegra: Prepare to dynamic dma-buf locking specification
   drm/etnaviv: Prepare to dynamic dma-buf locking specification
   RDMA/umem: Prepare to dynamic dma-buf locking specification
   misc: fastrpc: Prepare to dynamic dma-buf locking specification
   xen/gntdev: Prepare to dynamic dma-buf locking 

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-09-07 Thread Christian König via Virtualization

Am 06.09.22 um 22:05 schrieb Daniel Vetter:

On Tue, Sep 06, 2022 at 10:01:47PM +0200, Daniel Vetter wrote:

On Mon, Aug 15, 2022 at 12:05:19PM +0200, Christian König wrote:

Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:

Higher order pages allocated using alloc_pages() aren't refcounted and they
need to be refcounted, otherwise it's impossible to map them by KVM. This
patch sets the refcount of the tail pages and fixes the KVM memory mapping
faults.

Without this change guest virgl driver can't map host buffers into guest
and can't provide OpenGL 4.5 profile support to the guest. The host
mappings are also needed for enabling the Venus driver using host GPU
drivers that are utilizing TTM.

Based on a patch proposed by Trigger Huang.

Well I can't count how often I have repeated this: This is an absolutely
clear NAK!

TTM pages are not reference counted in the first place and because of this
giving them to virgl is illegal.

Please immediately stop this completely broken approach. We have discussed
this multiple times now.

Yeah we need to get this stuff closed for real by tagging them all with
VM_IO or VM_PFNMAP asap.

For a bit more context: Anything mapping a bo should be VM_SPECIAL. And I
think we should add the checks to the gem and dma-buf mmap functions to
validate for that, and fix all the fallout.

Otherwise this dragon keeps resurrecting ...

VM_SPECIAL _will_ block get_user_pages, which will block everyone from
even trying to refcount this stuff.

Minimally we need to fix this for all ttm drivers, and it sounds like
that's still not yet the case :-( Iirc last time around some funky amdkfd
userspace was the hold-up because regressions?


My recollection is that Felix and I fixed this with a KFD specific 
workaround. But I can double check with Felix on Monday.


Christian.


-Daniel


It seems ot be a recurring amount of fun that people try to mmap dma-buf
and then call get_user_pages on them.

Which just doesn't work. I guess this is also why Rob Clark send out that
dma-buf patch to expos mapping information (i.e. wc vs wb vs uncached).

There seems to be some serious bonghits going on :-/
-Daniel


Regards,
Christian.


Cc: sta...@vger.kernel.org
Cc: Trigger Huang 
Link: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.collabora.com%2Fnews-and-blog%2Fblog%2F2021%2F11%2F26%2Fvenus-on-qemu-enabling-new-virtual-vulkan-driver%2F%23qcom1343data=05%7C01%7Cchristian.koenig%40amd.com%7C37a7d9b0f91249da415b08da90432d3a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637980915471280078%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=XN6wFiWc6Jljekmst0aOCPSTsFLlmkUjD9F%2Fl9nluAs%3Dreserved=0
Tested-by: Dmitry Osipenko  # AMDGPU (Qemu and 
crosvm)
Signed-off-by: Dmitry Osipenko 
---
   drivers/gpu/drm/ttm/ttm_pool.c | 25 -
   1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 21b61631f73a..11e92bb149c9 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
*pool, gfp_t gfp_flags,
unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
struct ttm_pool_dma *dma;
struct page *p;
+   unsigned int i;
void *vaddr;
/* Don't set the __GFP_COMP flag for higher order allocations.
@@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
*pool, gfp_t gfp_flags,
if (!pool->use_dma_alloc) {
p = alloc_pages(gfp_flags, order);
-   if (p)
+   if (p) {
p->private = order;
+   goto ref_tail_pages;
+   }
return p;
}
@@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
*pool, gfp_t gfp_flags,
dma->vaddr = (unsigned long)vaddr | order;
p->private = (unsigned long)dma;
+
+ref_tail_pages:
+   /*
+* KVM requires mapped tail pages to be refcounted because put_page()
+* is invoked on them in the end of the page fault handling, and thus,
+* tail pages need to be protected from the premature releasing.
+* In fact, KVM page fault handler refuses to map tail pages to guest
+* if they aren't refcounted because hva_to_pfn_remapped() checks the
+* refcount specifically for this case.
+*
+* In particular, unreferenced tail pages result in a KVM "Bad address"
+* failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
+* accesses mapped host TTM buffer that contains tail pages.
+*/
+   for (i = 1; i < 1 << order; i++)
+   page_ref_inc(p + i);
+
return p;
   error_free:
@@ -133,6 +153,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum 
ttm_caching caching,
   {
unsigned long attr = 

Re: [PATCH v4 00/21] Move all drivers to a common dma-buf locking convention

2022-09-01 Thread Christian König via Virtualization

Hi Dmitry,

I've gone over this multiple times now and while it is still possible 
that we missed something I think that this should land now.


The whole topic is just to complicated that we can 100% sure guarantee 
that there isn't anything wrong with the locking, but lockdep and driver 
testing should allow us to quickly find remaining issues.


Do you have commit rights to drm-misc-next or should I push it?

Thanks,
Christian.

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Hello,

This series moves all drivers to a dynamic dma-buf locking specification.
 From now on all dma-buf importers are made responsible for holding
dma-buf's reservation lock around all operations performed over dma-bufs
in accordance to the locking specification. This allows us to utilize
reservation lock more broadly around kernel without fearing of a potential
deadlocks.

This patchset passes all i915 selftests. It was also tested using VirtIO,
Panfrost, Lima, Tegra, udmabuf, AMDGPU and Nouveau drivers. I tested cases
of display+GPU, display+V4L and GPU+V4L dma-buf sharing (where appropriate),
which covers majority of kernel drivers since rest of the drivers share
same or similar code paths.

Changelog:

v4: - Added dma_buf_mmap() to the "locking convention" documentation,
   which was missed by accident in v3.

 - Added acks from Christian König, Tomasz Figa and Hans Verkuil that
   they gave to couple v3 patches.

 - Dropped the "_unlocked" postfix from function names that don't have
   the locked variant, as was requested by Christian König.

 - Factored out the per-driver preparations into separate patches
   to ease reviewing of the changes, which is now doable without the
   global dma-buf functions renaming.

 - Factored out the dynamic locking convention enforcements into separate
   patches which add the final dma_resv_assert_held(dmabuf->resv) to the
   dma-buf API functions.

v3: - Factored out dma_buf_mmap_unlocked() and attachment functions
   into aseparate patches, like was suggested by Christian König.

 - Corrected and factored out dma-buf locking documentation into
   a separate patch, like was suggested by Christian König.

 - Intel driver dropped the reservation locking fews days ago from
   its BO-release code path, but we need that locking for the imported
   GEMs because in the end that code path unmaps the imported GEM.
   So I added back the locking needed by the imported GEMs, updating
   the "dma-buf attachment locking specification" patch appropriately.

 - Tested Nouveau+Intel dma-buf import/export combo.

 - Tested udmabuf import to i915/Nouveau/AMDGPU.

 - Fixed few places in Etnaviv, Panfrost and Lima drivers that I missed
   to switch to locked dma-buf vmapping in the drm/gem: Take reservation
   lock for vmap/vunmap operations" patch. In a result invalidated the
   Christian's r-b that he gave to v2.

 - Added locked dma-buf vmap/vunmap functions that are needed for fixing
   vmappping of Etnaviv, Panfrost and Lima drivers mentioned above.
   I actually had this change stashed for the drm-shmem shrinker patchset,
   but then realized that it's already needed by the dma-buf patches.
   Also improved my tests to better cover these code paths.

v2: - Changed locking specification to avoid problems with a cross-driver
   ww locking, like was suggested by Christian König. Now the attach/detach
   callbacks are invoked without the held lock and exporter should take the
   lock.

 - Added "locking convention" documentation that explains which dma-buf
   functions and callbacks are locked/unlocked for importers and exporters,
   which was requested by Christian König.

 - Added ack from Tomasz Figa to the V4L patches that he gave to v1.

Dmitry Osipenko (21):
   dma-buf: Add unlocked variant of vmapping functions
   dma-buf: Add unlocked variant of attachment-mapping functions
   drm/gem: Take reservation lock for vmap/vunmap operations
   drm/prime: Prepare to dynamic dma-buf locking specification
   drm/armada: Prepare to dynamic dma-buf locking specification
   drm/i915: Prepare to dynamic dma-buf locking specification
   drm/omapdrm: Prepare to dynamic dma-buf locking specification
   drm/tegra: Prepare to dynamic dma-buf locking specification
   drm/etnaviv: Prepare to dynamic dma-buf locking specification
   RDMA/umem: Prepare to dynamic dma-buf locking specification
   misc: fastrpc: Prepare to dynamic dma-buf locking specification
   xen/gntdev: Prepare to dynamic dma-buf locking specification
   media: videobuf2: Prepare to dynamic dma-buf locking specification
   media: tegra-vde: Prepare to dynamic dma-buf locking specification
   dma-buf: Move dma_buf_vmap() to dynamic locking specification
   dma-buf: Move dma_buf_attach() to dynamic locking specification
   dma-buf: Move dma_buf_map_attachment() to dynamic locking
 specification
   

Re: [PATCH v4 21/21] dma-buf: Remove obsoleted internal lock

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

The internal dma-buf lock isn't needed anymore because the updated
locking specification claims that dma-buf reservation must be locked
by importers, and thus, the internal data is already protected by the
reservation lock. Remove the obsoleted internal lock.

Acked-by: Christian König 
Signed-off-by: Dmitry Osipenko 


Reviewed-by: Christian König 


---
  drivers/dma-buf/dma-buf.c | 14 --
  include/linux/dma-buf.h   |  9 -
  2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 97ce884fad76..772fdd9eeed8 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -656,7 +656,6 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
  
  	dmabuf->file = file;
  
-	mutex_init(>lock);

INIT_LIST_HEAD(>attachments);
  
  	mutex_lock(_list.lock);

@@ -1502,7 +1501,7 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
  int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map)
  {
struct iosys_map ptr;
-   int ret = 0;
+   int ret;
  
  	iosys_map_clear(map);
  
@@ -1514,28 +1513,25 @@ int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map)

if (!dmabuf->ops->vmap)
return -EINVAL;
  
-	mutex_lock(>lock);

if (dmabuf->vmapping_counter) {
dmabuf->vmapping_counter++;
BUG_ON(iosys_map_is_null(>vmap_ptr));
*map = dmabuf->vmap_ptr;
-   goto out_unlock;
+   return 0;
}
  
  	BUG_ON(iosys_map_is_set(>vmap_ptr));
  
  	ret = dmabuf->ops->vmap(dmabuf, );

if (WARN_ON_ONCE(ret))
-   goto out_unlock;
+   return ret;
  
  	dmabuf->vmap_ptr = ptr;

dmabuf->vmapping_counter = 1;
  
  	*map = dmabuf->vmap_ptr;
  
-out_unlock:

-   mutex_unlock(>lock);
-   return ret;
+   return 0;
  }
  EXPORT_SYMBOL_NS_GPL(dma_buf_vmap, DMA_BUF);
  
@@ -1577,13 +1573,11 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map)

BUG_ON(dmabuf->vmapping_counter == 0);
BUG_ON(!iosys_map_is_equal(>vmap_ptr, map));
  
-	mutex_lock(>lock);

if (--dmabuf->vmapping_counter == 0) {
if (dmabuf->ops->vunmap)
dmabuf->ops->vunmap(dmabuf, map);
iosys_map_clear(>vmap_ptr);
}
-   mutex_unlock(>lock);
  }
  EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF);
  
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h

index f11b5bbc2f37..6fa8d4e29719 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -326,15 +326,6 @@ struct dma_buf {
/** @ops: dma_buf_ops associated with this buffer object. */
const struct dma_buf_ops *ops;
  
-	/**

-* @lock:
-*
-* Used internally to serialize list manipulation, attach/detach and
-* vmap/unmap. Note that in many cases this is superseeded by
-* dma_resv_lock() on @resv.
-*/
-   struct mutex lock;
-
/**
 * @vmapping_counter:
 *


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 20/21] media: videobuf2: Stop using internal dma-buf lock

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

All drivers that use dma-bufs have been moved to the updated locking
specification and now dma-buf reservation is guaranteed to be locked
by importers during the mapping operations. There is no need to take
the internal dma-buf lock anymore. Remove locking from the videobuf2
memory allocators.

Acked-by: Tomasz Figa 
Acked-by: Hans Verkuil 
Signed-off-by: Dmitry Osipenko 


Acked-by: Christian König 


---
  drivers/media/common/videobuf2/videobuf2-dma-contig.c | 11 +--
  drivers/media/common/videobuf2/videobuf2-dma-sg.c | 11 +--
  drivers/media/common/videobuf2/videobuf2-vmalloc.c| 11 +--
  3 files changed, 3 insertions(+), 30 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index 79f4d8301fbb..555bd40fa472 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -382,18 +382,12 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
struct dma_buf_attachment *db_attach, enum dma_data_direction dma_dir)
  {
struct vb2_dc_attachment *attach = db_attach->priv;
-   /* stealing dmabuf mutex to serialize map/unmap operations */
-   struct mutex *lock = _attach->dmabuf->lock;
struct sg_table *sgt;
  
-	mutex_lock(lock);

-
sgt = >sgt;
/* return previously mapped sg table */
-   if (attach->dma_dir == dma_dir) {
-   mutex_unlock(lock);
+   if (attach->dma_dir == dma_dir)
return sgt;
-   }
  
  	/* release any previous cache */

if (attach->dma_dir != DMA_NONE) {
@@ -409,14 +403,11 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
if (dma_map_sgtable(db_attach->dev, sgt, dma_dir,
DMA_ATTR_SKIP_CPU_SYNC)) {
pr_err("failed to map scatterlist\n");
-   mutex_unlock(lock);
return ERR_PTR(-EIO);
}
  
  	attach->dma_dir = dma_dir;
  
-	mutex_unlock(lock);

-
return sgt;
  }
  
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c

index 36ecdea8d707..36981a5b5c53 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -424,18 +424,12 @@ static struct sg_table *vb2_dma_sg_dmabuf_ops_map(
struct dma_buf_attachment *db_attach, enum dma_data_direction dma_dir)
  {
struct vb2_dma_sg_attachment *attach = db_attach->priv;
-   /* stealing dmabuf mutex to serialize map/unmap operations */
-   struct mutex *lock = _attach->dmabuf->lock;
struct sg_table *sgt;
  
-	mutex_lock(lock);

-
sgt = >sgt;
/* return previously mapped sg table */
-   if (attach->dma_dir == dma_dir) {
-   mutex_unlock(lock);
+   if (attach->dma_dir == dma_dir)
return sgt;
-   }
  
  	/* release any previous cache */

if (attach->dma_dir != DMA_NONE) {
@@ -446,14 +440,11 @@ static struct sg_table *vb2_dma_sg_dmabuf_ops_map(
/* mapping to the client with new direction */
if (dma_map_sgtable(db_attach->dev, sgt, dma_dir, 0)) {
pr_err("failed to map scatterlist\n");
-   mutex_unlock(lock);
return ERR_PTR(-EIO);
}
  
  	attach->dma_dir = dma_dir;
  
-	mutex_unlock(lock);

-
return sgt;
  }
  
diff --git a/drivers/media/common/videobuf2/videobuf2-vmalloc.c b/drivers/media/common/videobuf2/videobuf2-vmalloc.c

index 7831bf545874..41db707e43a4 100644
--- a/drivers/media/common/videobuf2/videobuf2-vmalloc.c
+++ b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
@@ -267,18 +267,12 @@ static struct sg_table *vb2_vmalloc_dmabuf_ops_map(
struct dma_buf_attachment *db_attach, enum dma_data_direction dma_dir)
  {
struct vb2_vmalloc_attachment *attach = db_attach->priv;
-   /* stealing dmabuf mutex to serialize map/unmap operations */
-   struct mutex *lock = _attach->dmabuf->lock;
struct sg_table *sgt;
  
-	mutex_lock(lock);

-
sgt = >sgt;
/* return previously mapped sg table */
-   if (attach->dma_dir == dma_dir) {
-   mutex_unlock(lock);
+   if (attach->dma_dir == dma_dir)
return sgt;
-   }
  
  	/* release any previous cache */

if (attach->dma_dir != DMA_NONE) {
@@ -289,14 +283,11 @@ static struct sg_table *vb2_vmalloc_dmabuf_ops_map(
/* mapping to the client with new direction */
if (dma_map_sgtable(db_attach->dev, sgt, dma_dir, 0)) {
pr_err("failed to map scatterlist\n");
-   mutex_unlock(lock);
return ERR_PTR(-EIO);
}
  
  	attach->dma_dir = dma_dir;
  
-	mutex_unlock(lock);

-
return sgt;
  }
  


___
Virtualization mailing list

Re: [PATCH v4 19/21] dma-buf: Document dynamic locking convention

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Add documentation for the dynamic locking convention. The documentation
tells dma-buf API users when they should take the reservation lock and
when not.

Signed-off-by: Dmitry Osipenko 


Reviewed-by: Christian König 


---
  Documentation/driver-api/dma-buf.rst |  6 +++
  drivers/dma-buf/dma-buf.c| 64 
  2 files changed, 70 insertions(+)

diff --git a/Documentation/driver-api/dma-buf.rst 
b/Documentation/driver-api/dma-buf.rst
index 36a76cbe9095..622b8156d212 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -119,6 +119,12 @@ DMA Buffer ioctls
  
  .. kernel-doc:: include/uapi/linux/dma-buf.h
  
+DMA-BUF locking convention

+~
+
+.. kernel-doc:: drivers/dma-buf/dma-buf.c
+   :doc: locking convention
+
  Kernel Functions and Structures Reference
  ~
  
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c

index d9130486cb8f..97ce884fad76 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -794,6 +794,70 @@ static struct sg_table * __map_dma_buf(struct 
dma_buf_attachment *attach,
return sg_table;
  }
  
+/**

+ * DOC: locking convention
+ *
+ * In order to avoid deadlock situations between dma-buf exports and importers,
+ * all dma-buf API users must follow the common dma-buf locking convention.
+ *
+ * Convention for importers
+ *
+ * 1. Importers must hold the dma-buf reservation lock when calling these
+ *functions:
+ *
+ * - dma_buf_pin()
+ * - dma_buf_unpin()
+ * - dma_buf_map_attachment()
+ * - dma_buf_unmap_attachment()
+ * - dma_buf_vmap()
+ * - dma_buf_vunmap()
+ *
+ * 2. Importers must not hold the dma-buf reservation lock when calling these
+ *functions:
+ *
+ * - dma_buf_attach()
+ * - dma_buf_dynamic_attach()
+ * - dma_buf_detach()
+ * - dma_buf_export(
+ * - dma_buf_fd()
+ * - dma_buf_get()
+ * - dma_buf_put()
+ * - dma_buf_mmap()
+ * - dma_buf_begin_cpu_access()
+ * - dma_buf_end_cpu_access()
+ * - dma_buf_map_attachment_unlocked()
+ * - dma_buf_unmap_attachment_unlocked()
+ * - dma_buf_vmap_unlocked()
+ * - dma_buf_vunmap_unlocked()
+ *
+ * Convention for exporters
+ *
+ * 1. These _buf_ops callbacks are invoked with unlocked dma-buf
+ *reservation and exporter can take the lock:
+ *
+ * - _buf_ops.attach()
+ * - _buf_ops.detach()
+ * - _buf_ops.release()
+ * - _buf_ops.begin_cpu_access()
+ * - _buf_ops.end_cpu_access()
+ *
+ * 2. These _buf_ops callbacks are invoked with locked dma-buf
+ *reservation and exporter can't take the lock:
+ *
+ * - _buf_ops.pin()
+ * - _buf_ops.unpin()
+ * - _buf_ops.map_dma_buf()
+ * - _buf_ops.unmap_dma_buf()
+ * - _buf_ops.mmap()
+ * - _buf_ops.vmap()
+ * - _buf_ops.vunmap()
+ *
+ * 3. Exporters must hold the dma-buf reservation lock when calling these
+ *functions:
+ *
+ * - dma_buf_move_notify()
+ */
+
  /**
   * dma_buf_dynamic_attach - Add the device to dma_buf's attachments list
   * @dmabuf:   [in]buffer to attach device to.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 17/21] dma-buf: Move dma_buf_map_attachment() to dynamic locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Move dma-buf attachment mapping functions to the dynamic locking
specification by asserting that the reservation lock is held.

Signed-off-by: Dmitry Osipenko 


Reviewed-by: Christian König 


---
  drivers/dma-buf/dma-buf.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 073942bf5ae9..8e928fe6e8df 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1037,8 +1037,7 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *attach,
if (WARN_ON(!attach || !attach->dmabuf))
return ERR_PTR(-EINVAL);
  
-	if (dma_buf_attachment_is_dynamic(attach))

-   dma_resv_assert_held(attach->dmabuf->resv);
+   dma_resv_assert_held(attach->dmabuf->resv);
  
  	if (attach->sgt) {

/*
@@ -1053,7 +1052,6 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *attach,
}
  
  	if (dma_buf_is_dynamic(attach->dmabuf)) {

-   dma_resv_assert_held(attach->dmabuf->resv);
if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) {
r = attach->dmabuf->ops->pin(attach);
if (r)
@@ -1142,15 +1140,11 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
return;
  
-	if (dma_buf_attachment_is_dynamic(attach))

-   dma_resv_assert_held(attach->dmabuf->resv);
+   dma_resv_assert_held(attach->dmabuf->resv);
  
  	if (attach->sgt == sg_table)

return;
  
-	if (dma_buf_is_dynamic(attach->dmabuf))

-   dma_resv_assert_held(attach->dmabuf->resv);
-
__unmap_dma_buf(attach, sg_table, direction);
  
  	if (dma_buf_is_dynamic(attach->dmabuf) &&


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 16/21] dma-buf: Move dma_buf_attach() to dynamic locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Move dma-buf attachment API functions to the dynamic locking specification
by taking the reservation lock around the mapping operations. The strict
locking convention prevents deadlock situations for dma-buf importers and
exporters.

Signed-off-by: Dmitry Osipenko 


Reviewed-by: Christian König 


---
  drivers/dma-buf/dma-buf.c | 20 
  1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index ceea4839c641..073942bf5ae9 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -858,8 +858,8 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct 
device *dev,
dma_buf_is_dynamic(dmabuf)) {
struct sg_table *sgt;
  
+		dma_resv_lock(attach->dmabuf->resv, NULL);

if (dma_buf_is_dynamic(attach->dmabuf)) {
-   dma_resv_lock(attach->dmabuf->resv, NULL);
ret = dmabuf->ops->pin(attach);
if (ret)
goto err_unlock;
@@ -872,8 +872,7 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct 
device *dev,
ret = PTR_ERR(sgt);
goto err_unpin;
}
-   if (dma_buf_is_dynamic(attach->dmabuf))
-   dma_resv_unlock(attach->dmabuf->resv);
+   dma_resv_unlock(attach->dmabuf->resv);
attach->sgt = sgt;
attach->dir = DMA_BIDIRECTIONAL;
}
@@ -889,8 +888,7 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct 
device *dev,
dmabuf->ops->unpin(attach);
  
  err_unlock:

-   if (dma_buf_is_dynamic(attach->dmabuf))
-   dma_resv_unlock(attach->dmabuf->resv);
+   dma_resv_unlock(attach->dmabuf->resv);
  
  	dma_buf_detach(dmabuf, attach);

return ERR_PTR(ret);
@@ -936,21 +934,19 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct 
dma_buf_attachment *attach)
if (WARN_ON(!dmabuf || !attach))
return;
  
+	dma_resv_lock(attach->dmabuf->resv, NULL);

+
if (attach->sgt) {
-   if (dma_buf_is_dynamic(attach->dmabuf))
-   dma_resv_lock(attach->dmabuf->resv, NULL);
  
  		__unmap_dma_buf(attach, attach->sgt, attach->dir);
  
-		if (dma_buf_is_dynamic(attach->dmabuf)) {

+   if (dma_buf_is_dynamic(attach->dmabuf))
dmabuf->ops->unpin(attach);
-   dma_resv_unlock(attach->dmabuf->resv);
-   }
}
-
-   dma_resv_lock(dmabuf->resv, NULL);
list_del(>node);
+
dma_resv_unlock(dmabuf->resv);
+
if (dmabuf->ops->detach)
dmabuf->ops->detach(dmabuf, attach);
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 14/21] media: tegra-vde: Prepare to dynamic dma-buf locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Prepare Tegra video decoder driver to the common dynamic dma-buf
locking convention by starting to use the unlocked versions of dma-buf
API functions.

Signed-off-by: Dmitry Osipenko 


Acked-by: Christian König 


---
  drivers/media/platform/nvidia/tegra-vde/dmabuf-cache.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/media/platform/nvidia/tegra-vde/dmabuf-cache.c 
b/drivers/media/platform/nvidia/tegra-vde/dmabuf-cache.c
index 69c346148070..1c5b94989aec 100644
--- a/drivers/media/platform/nvidia/tegra-vde/dmabuf-cache.c
+++ b/drivers/media/platform/nvidia/tegra-vde/dmabuf-cache.c
@@ -38,7 +38,7 @@ static void tegra_vde_release_entry(struct 
tegra_vde_cache_entry *entry)
if (entry->vde->domain)
tegra_vde_iommu_unmap(entry->vde, entry->iova);
  
-	dma_buf_unmap_attachment(entry->a, entry->sgt, entry->dma_dir);

+   dma_buf_unmap_attachment_unlocked(entry->a, entry->sgt, entry->dma_dir);
dma_buf_detach(dmabuf, entry->a);
dma_buf_put(dmabuf);
  
@@ -102,7 +102,7 @@ int tegra_vde_dmabuf_cache_map(struct tegra_vde *vde,

goto err_unlock;
}
  
-	sgt = dma_buf_map_attachment(attachment, dma_dir);

+   sgt = dma_buf_map_attachment_unlocked(attachment, dma_dir);
if (IS_ERR(sgt)) {
dev_err(dev, "Failed to get dmabufs sg_table\n");
err = PTR_ERR(sgt);
@@ -152,7 +152,7 @@ int tegra_vde_dmabuf_cache_map(struct tegra_vde *vde,
  err_free:
kfree(entry);
  err_unmap:
-   dma_buf_unmap_attachment(attachment, sgt, dma_dir);
+   dma_buf_unmap_attachment_unlocked(attachment, sgt, dma_dir);
  err_detach:
dma_buf_detach(dmabuf, attachment);
  err_unlock:


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 13/21] media: videobuf2: Prepare to dynamic dma-buf locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Prepare V4L2 memory allocators to the common dynamic dma-buf locking
convention by starting to use the unlocked versions of dma-buf API
functions.

Acked-by: Tomasz Figa 
Signed-off-by: Dmitry Osipenko 


Acked-by: Christian König 


---
  drivers/media/common/videobuf2/videobuf2-dma-contig.c | 11 ++-
  drivers/media/common/videobuf2/videobuf2-dma-sg.c |  8 
  drivers/media/common/videobuf2/videobuf2-vmalloc.c|  6 +++---
  3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index 678b359717c4..79f4d8301fbb 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -101,7 +101,7 @@ static void *vb2_dc_vaddr(struct vb2_buffer *vb, void 
*buf_priv)
if (buf->db_attach) {
struct iosys_map map;
  
-		if (!dma_buf_vmap(buf->db_attach->dmabuf, ))

+   if (!dma_buf_vmap_unlocked(buf->db_attach->dmabuf, ))
buf->vaddr = map.vaddr;
  
  		return buf->vaddr;

@@ -711,7 +711,7 @@ static int vb2_dc_map_dmabuf(void *mem_priv)
}
  
  	/* get the associated scatterlist for this buffer */

-   sgt = dma_buf_map_attachment(buf->db_attach, buf->dma_dir);
+   sgt = dma_buf_map_attachment_unlocked(buf->db_attach, buf->dma_dir);
if (IS_ERR(sgt)) {
pr_err("Error getting dmabuf scatterlist\n");
return -EINVAL;
@@ -722,7 +722,8 @@ static int vb2_dc_map_dmabuf(void *mem_priv)
if (contig_size < buf->size) {
pr_err("contiguous chunk is too small %lu/%lu\n",
   contig_size, buf->size);
-   dma_buf_unmap_attachment(buf->db_attach, sgt, buf->dma_dir);
+   dma_buf_unmap_attachment_unlocked(buf->db_attach, sgt,
+ buf->dma_dir);
return -EFAULT;
}
  
@@ -750,10 +751,10 @@ static void vb2_dc_unmap_dmabuf(void *mem_priv)

}
  
  	if (buf->vaddr) {

-   dma_buf_vunmap(buf->db_attach->dmabuf, );
+   dma_buf_vunmap_unlocked(buf->db_attach->dmabuf, );
buf->vaddr = NULL;
}
-   dma_buf_unmap_attachment(buf->db_attach, sgt, buf->dma_dir);
+   dma_buf_unmap_attachment_unlocked(buf->db_attach, sgt, buf->dma_dir);
  
  	buf->dma_addr = 0;

buf->dma_sgt = NULL;
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c 
b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index fa69158a65b1..36ecdea8d707 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -309,7 +309,7 @@ static void *vb2_dma_sg_vaddr(struct vb2_buffer *vb, void 
*buf_priv)
  
  	if (!buf->vaddr) {

if (buf->db_attach) {
-   ret = dma_buf_vmap(buf->db_attach->dmabuf, );
+   ret = dma_buf_vmap_unlocked(buf->db_attach->dmabuf, 
);
buf->vaddr = ret ? NULL : map.vaddr;
} else {
buf->vaddr = vm_map_ram(buf->pages, buf->num_pages, -1);
@@ -565,7 +565,7 @@ static int vb2_dma_sg_map_dmabuf(void *mem_priv)
}
  
  	/* get the associated scatterlist for this buffer */

-   sgt = dma_buf_map_attachment(buf->db_attach, buf->dma_dir);
+   sgt = dma_buf_map_attachment_unlocked(buf->db_attach, buf->dma_dir);
if (IS_ERR(sgt)) {
pr_err("Error getting dmabuf scatterlist\n");
return -EINVAL;
@@ -594,10 +594,10 @@ static void vb2_dma_sg_unmap_dmabuf(void *mem_priv)
}
  
  	if (buf->vaddr) {

-   dma_buf_vunmap(buf->db_attach->dmabuf, );
+   dma_buf_vunmap_unlocked(buf->db_attach->dmabuf, );
buf->vaddr = NULL;
}
-   dma_buf_unmap_attachment(buf->db_attach, sgt, buf->dma_dir);
+   dma_buf_unmap_attachment_unlocked(buf->db_attach, sgt, buf->dma_dir);
  
  	buf->dma_sgt = NULL;

  }
diff --git a/drivers/media/common/videobuf2/videobuf2-vmalloc.c 
b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
index 948152f1596b..7831bf545874 100644
--- a/drivers/media/common/videobuf2/videobuf2-vmalloc.c
+++ b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
@@ -376,7 +376,7 @@ static int vb2_vmalloc_map_dmabuf(void *mem_priv)
struct iosys_map map;
int ret;
  
-	ret = dma_buf_vmap(buf->dbuf, );

+   ret = dma_buf_vmap_unlocked(buf->dbuf, );
if (ret)
return -EFAULT;
buf->vaddr = map.vaddr;
@@ -389,7 +389,7 @@ static void vb2_vmalloc_unmap_dmabuf(void *mem_priv)
struct vb2_vmalloc_buf *buf = mem_priv;
struct iosys_map map = IOSYS_MAP_INIT_VADDR(buf->vaddr);
  
-	dma_buf_vunmap(buf->dbuf, );

+   dma_buf_vunmap_unlocked(buf->dbuf, );
buf->vaddr = NULL;
  

Re: [PATCH v4 12/21] xen/gntdev: Prepare to dynamic dma-buf locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Prepare gntdev driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Signed-off-by: Dmitry Osipenko 


Acked-by: Christian König 


---
  drivers/xen/gntdev-dmabuf.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
index 940e5e9e8a54..4440e626b797 100644
--- a/drivers/xen/gntdev-dmabuf.c
+++ b/drivers/xen/gntdev-dmabuf.c
@@ -600,7 +600,7 @@ dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct 
device *dev,
  
  	gntdev_dmabuf->u.imp.attach = attach;
  
-	sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);

+   sgt = dma_buf_map_attachment_unlocked(attach, DMA_BIDIRECTIONAL);
if (IS_ERR(sgt)) {
ret = ERR_CAST(sgt);
goto fail_detach;
@@ -658,7 +658,7 @@ dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct 
device *dev,
  fail_end_access:
dmabuf_imp_end_foreign_access(gntdev_dmabuf->u.imp.refs, count);
  fail_unmap:
-   dma_buf_unmap_attachment(attach, sgt, DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(attach, sgt, DMA_BIDIRECTIONAL);
  fail_detach:
dma_buf_detach(dma_buf, attach);
  fail_free_obj:
@@ -708,8 +708,8 @@ static int dmabuf_imp_release(struct gntdev_dmabuf_priv 
*priv, u32 fd)
attach = gntdev_dmabuf->u.imp.attach;
  
  	if (gntdev_dmabuf->u.imp.sgt)

-   dma_buf_unmap_attachment(attach, gntdev_dmabuf->u.imp.sgt,
-DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(attach, 
gntdev_dmabuf->u.imp.sgt,
+ DMA_BIDIRECTIONAL);
dma_buf = attach->dmabuf;
dma_buf_detach(attach->dmabuf, attach);
dma_buf_put(dma_buf);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 11/21] misc: fastrpc: Prepare to dynamic dma-buf locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Prepare fastrpc to the common dynamic dma-buf locking convention by
starting to use the unlocked versions of dma-buf API functions.

Signed-off-by: Dmitry Osipenko 


Acked-by: Christian König 


---
  drivers/misc/fastrpc.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 93ebd174d848..6fcfb2e9f7a7 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -310,8 +310,8 @@ static void fastrpc_free_map(struct kref *ref)
return;
}
}
-   dma_buf_unmap_attachment(map->attach, map->table,
-DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(map->attach, map->table,
+ DMA_BIDIRECTIONAL);
dma_buf_detach(map->buf, map->attach);
dma_buf_put(map->buf);
}
@@ -726,7 +726,7 @@ static int fastrpc_map_create(struct fastrpc_user *fl, int 
fd,
goto attach_err;
}
  
-	map->table = dma_buf_map_attachment(map->attach, DMA_BIDIRECTIONAL);

+   map->table = dma_buf_map_attachment_unlocked(map->attach, 
DMA_BIDIRECTIONAL);
if (IS_ERR(map->table)) {
err = PTR_ERR(map->table);
goto map_err;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 10/21] RDMA/umem: Prepare to dynamic dma-buf locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Prepare InfiniBand drivers to the common dynamic dma-buf locking
convention by starting to use the unlocked versions of dma-buf API
functions.

Signed-off-by: Dmitry Osipenko 


Acked-by: Christian König 


---
  drivers/infiniband/core/umem_dmabuf.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/umem_dmabuf.c 
b/drivers/infiniband/core/umem_dmabuf.c
index 04c04e6d24c3..43b26bc12288 100644
--- a/drivers/infiniband/core/umem_dmabuf.c
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -26,7 +26,8 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf 
*umem_dmabuf)
if (umem_dmabuf->sgt)
goto wait_fence;
  
-	sgt = dma_buf_map_attachment(umem_dmabuf->attach, DMA_BIDIRECTIONAL);

+   sgt = dma_buf_map_attachment_unlocked(umem_dmabuf->attach,
+ DMA_BIDIRECTIONAL);
if (IS_ERR(sgt))
return PTR_ERR(sgt);
  
@@ -102,8 +103,8 @@ void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf)

umem_dmabuf->last_sg_trim = 0;
}
  
-	dma_buf_unmap_attachment(umem_dmabuf->attach, umem_dmabuf->sgt,

-DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(umem_dmabuf->attach, umem_dmabuf->sgt,
+ DMA_BIDIRECTIONAL);
  
  	umem_dmabuf->sgt = NULL;

  }


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 09/21] drm/etnaviv: Prepare to dynamic dma-buf locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Prepare Etnaviv driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Signed-off-by: Dmitry Osipenko 


Interesting, where is the matching vmap()?

Anyway, this patch is Acked-by: Christian König 


---
  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
index 3fa2da149639..7031db145a77 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
@@ -65,7 +65,7 @@ static void etnaviv_gem_prime_release(struct 
etnaviv_gem_object *etnaviv_obj)
struct iosys_map map = IOSYS_MAP_INIT_VADDR(etnaviv_obj->vaddr);
  
  	if (etnaviv_obj->vaddr)

-   dma_buf_vunmap(etnaviv_obj->base.import_attach->dmabuf, );
+   dma_buf_vunmap_unlocked(etnaviv_obj->base.import_attach->dmabuf, 
);
  
  	/* Don't drop the pages for imported dmabuf, as they are not

 * ours, just free the array we allocated:


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 08/21] drm/tegra: Prepare to dynamic dma-buf locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Prepare Tegra DRM driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Signed-off-by: Dmitry Osipenko 


Acked-by: Christian König 


---
  drivers/gpu/drm/tegra/gem.c | 17 +
  1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 81991090adcc..b09b8ab40ae4 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -84,7 +84,7 @@ static struct host1x_bo_mapping *tegra_bo_pin(struct device 
*dev, struct host1x_
goto free;
}
  
-		map->sgt = dma_buf_map_attachment(map->attach, direction);

+   map->sgt = dma_buf_map_attachment_unlocked(map->attach, 
direction);
if (IS_ERR(map->sgt)) {
dma_buf_detach(buf, map->attach);
err = PTR_ERR(map->sgt);
@@ -160,7 +160,8 @@ static struct host1x_bo_mapping *tegra_bo_pin(struct device 
*dev, struct host1x_
  static void tegra_bo_unpin(struct host1x_bo_mapping *map)
  {
if (map->attach) {
-   dma_buf_unmap_attachment(map->attach, map->sgt, map->direction);
+   dma_buf_unmap_attachment_unlocked(map->attach, map->sgt,
+ map->direction);
dma_buf_detach(map->attach->dmabuf, map->attach);
} else {
dma_unmap_sgtable(map->dev, map->sgt, map->direction, 0);
@@ -181,7 +182,7 @@ static void *tegra_bo_mmap(struct host1x_bo *bo)
if (obj->vaddr) {
return obj->vaddr;
} else if (obj->gem.import_attach) {
-   ret = dma_buf_vmap(obj->gem.import_attach->dmabuf, );
+   ret = dma_buf_vmap_unlocked(obj->gem.import_attach->dmabuf, 
);
return ret ? NULL : map.vaddr;
} else {
return vmap(obj->pages, obj->num_pages, VM_MAP,
@@ -197,7 +198,7 @@ static void tegra_bo_munmap(struct host1x_bo *bo, void 
*addr)
if (obj->vaddr)
return;
else if (obj->gem.import_attach)
-   dma_buf_vunmap(obj->gem.import_attach->dmabuf, );
+   dma_buf_vunmap_unlocked(obj->gem.import_attach->dmabuf, );
else
vunmap(addr);
  }
@@ -461,7 +462,7 @@ static struct tegra_bo *tegra_bo_import(struct drm_device 
*drm,
  
  	get_dma_buf(buf);
  
-	bo->sgt = dma_buf_map_attachment(attach, DMA_TO_DEVICE);

+   bo->sgt = dma_buf_map_attachment_unlocked(attach, DMA_TO_DEVICE);
if (IS_ERR(bo->sgt)) {
err = PTR_ERR(bo->sgt);
goto detach;
@@ -479,7 +480,7 @@ static struct tegra_bo *tegra_bo_import(struct drm_device 
*drm,
  
  detach:

if (!IS_ERR_OR_NULL(bo->sgt))
-   dma_buf_unmap_attachment(attach, bo->sgt, DMA_TO_DEVICE);
+   dma_buf_unmap_attachment_unlocked(attach, bo->sgt, 
DMA_TO_DEVICE);
  
  	dma_buf_detach(buf, attach);

dma_buf_put(buf);
@@ -508,8 +509,8 @@ void tegra_bo_free_object(struct drm_gem_object *gem)
tegra_bo_iommu_unmap(tegra, bo);
  
  	if (gem->import_attach) {

-   dma_buf_unmap_attachment(gem->import_attach, bo->sgt,
-DMA_TO_DEVICE);
+   dma_buf_unmap_attachment_unlocked(gem->import_attach, bo->sgt,
+ DMA_TO_DEVICE);
drm_prime_gem_destroy(gem, NULL);
} else {
tegra_bo_free(gem->dev, bo);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 06/21] drm/i915: Prepare to dynamic dma-buf locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Prepare i915 driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions
and handling cases where importer now holds the reservation lock.

Signed-off-by: Dmitry Osipenko 


Acked-by: Christian König , but it's probably 
best if somebody from the Intel guys take a look as well.



---
  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   |  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_object.c   | 12 
  .../gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c | 16 
  3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c6333..07eee1c09aaf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -72,7 +72,7 @@ static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf,
struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
void *vaddr;
  
-	vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB);

+   vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
  
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c

index 389e9f157ca5..7e2a9b02526c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -331,7 +331,19 @@ static void __i915_gem_free_objects(struct 
drm_i915_private *i915,
continue;
}
  
+		/*

+* dma_buf_unmap_attachment() requires reservation to be
+* locked. The imported GEM shouldn't share reservation lock,
+* so it's safe to take the lock.
+*/
+   if (obj->base.import_attach)
+   i915_gem_object_lock(obj, NULL);
+
__i915_gem_object_pages_fini(obj);
+
+   if (obj->base.import_attach)
+   i915_gem_object_unlock(obj);
+
__i915_gem_free_object(obj);
  
  		/* But keep the pointer alive for RCU-protected lookups */

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
index 62c61af77a42..9e3ed634aa0e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
@@ -213,7 +213,7 @@ static int igt_dmabuf_import_same_driver(struct 
drm_i915_private *i915,
goto out_import;
}
  
-	st = dma_buf_map_attachment(import_attach, DMA_BIDIRECTIONAL);

+   st = dma_buf_map_attachment_unlocked(import_attach, DMA_BIDIRECTIONAL);
if (IS_ERR(st)) {
err = PTR_ERR(st);
goto out_detach;
@@ -226,7 +226,7 @@ static int igt_dmabuf_import_same_driver(struct 
drm_i915_private *i915,
timeout = -ETIME;
}
err = timeout > 0 ? 0 : timeout;
-   dma_buf_unmap_attachment(import_attach, st, DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(import_attach, st, DMA_BIDIRECTIONAL);
  out_detach:
dma_buf_detach(dmabuf, import_attach);
  out_import:
@@ -296,7 +296,7 @@ static int igt_dmabuf_import(void *arg)
goto out_obj;
}
  
-	err = dma_buf_vmap(dmabuf, );

+   err = dma_buf_vmap_unlocked(dmabuf, );
dma_map = err ? NULL : map.vaddr;
if (!dma_map) {
pr_err("dma_buf_vmap failed\n");
@@ -337,7 +337,7 @@ static int igt_dmabuf_import(void *arg)
  
  	err = 0;

  out_dma_map:
-   dma_buf_vunmap(dmabuf, );
+   dma_buf_vunmap_unlocked(dmabuf, );
  out_obj:
i915_gem_object_put(obj);
  out_dmabuf:
@@ -358,7 +358,7 @@ static int igt_dmabuf_import_ownership(void *arg)
if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
  
-	err = dma_buf_vmap(dmabuf, );

+   err = dma_buf_vmap_unlocked(dmabuf, );
ptr = err ? NULL : map.vaddr;
if (!ptr) {
pr_err("dma_buf_vmap failed\n");
@@ -367,7 +367,7 @@ static int igt_dmabuf_import_ownership(void *arg)
}
  
  	memset(ptr, 0xc5, PAGE_SIZE);

-   dma_buf_vunmap(dmabuf, );
+   dma_buf_vunmap_unlocked(dmabuf, );
  
  	obj = to_intel_bo(i915_gem_prime_import(>drm, dmabuf));

if (IS_ERR(obj)) {
@@ -418,7 +418,7 @@ static int igt_dmabuf_export_vmap(void *arg)
}
i915_gem_object_put(obj);
  
-	err = dma_buf_vmap(dmabuf, );

+   err = dma_buf_vmap_unlocked(dmabuf, );
ptr = err ? NULL : map.vaddr;
if (!ptr) {
pr_err("dma_buf_vmap failed\n");
@@ -435,7 +435,7 @@ static int igt_dmabuf_export_vmap(void *arg)
memset(ptr, 0xc5, dmabuf->size);
  
  	err = 0;

-   dma_buf_vunmap(dmabuf, );
+   dma_buf_vunmap_unlocked(dmabuf, );
  out:
dma_buf_put(dmabuf);
return err;



Re: [PATCH v4 05/21] drm/armada: Prepare to dynamic dma-buf locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Prepare Armada driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Signed-off-by: Dmitry Osipenko 


Acked-by: Christian König 


---
  drivers/gpu/drm/armada/armada_gem.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/armada/armada_gem.c 
b/drivers/gpu/drm/armada/armada_gem.c
index 5430265ad458..26d10065d534 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -66,8 +66,8 @@ void armada_gem_free_object(struct drm_gem_object *obj)
if (dobj->obj.import_attach) {
/* We only ever display imported data */
if (dobj->sgt)
-   dma_buf_unmap_attachment(dobj->obj.import_attach,
-dobj->sgt, DMA_TO_DEVICE);
+   
dma_buf_unmap_attachment_unlocked(dobj->obj.import_attach,
+ dobj->sgt, 
DMA_TO_DEVICE);
drm_prime_gem_destroy(>obj, NULL);
}
  
@@ -539,8 +539,8 @@ int armada_gem_map_import(struct armada_gem_object *dobj)

  {
int ret;
  
-	dobj->sgt = dma_buf_map_attachment(dobj->obj.import_attach,

-  DMA_TO_DEVICE);
+   dobj->sgt = dma_buf_map_attachment_unlocked(dobj->obj.import_attach,
+   DMA_TO_DEVICE);
if (IS_ERR(dobj->sgt)) {
ret = PTR_ERR(dobj->sgt);
dobj->sgt = NULL;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 04/21] drm/prime: Prepare to dynamic dma-buf locking specification

2022-09-01 Thread Christian König via Virtualization

Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:

Prepare DRM prime core to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions.

Signed-off-by: Dmitry Osipenko 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/drm_prime.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index eb09e86044c6..20e109a802ae 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -940,7 +940,7 @@ struct drm_gem_object *drm_gem_prime_import_dev(struct 
drm_device *dev,
  
  	get_dma_buf(dma_buf);
  
-	sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);

+   sgt = dma_buf_map_attachment_unlocked(attach, DMA_BIDIRECTIONAL);
if (IS_ERR(sgt)) {
ret = PTR_ERR(sgt);
goto fail_detach;
@@ -958,7 +958,7 @@ struct drm_gem_object *drm_gem_prime_import_dev(struct 
drm_device *dev,
return obj;
  
  fail_unmap:

-   dma_buf_unmap_attachment(attach, sgt, DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(attach, sgt, DMA_BIDIRECTIONAL);
  fail_detach:
dma_buf_detach(dma_buf, attach);
dma_buf_put(dma_buf);
@@ -1056,7 +1056,7 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, 
struct sg_table *sg)
  
  	attach = obj->import_attach;

if (sg)
-   dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
+   dma_buf_unmap_attachment_unlocked(attach, sg, 
DMA_BIDIRECTIONAL);
dma_buf = attach->dmabuf;
dma_buf_detach(attach->dmabuf, attach);
/* remove the reference */


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Linaro-mm-sig] [PATCH v3 6/9] dma-buf: Move dma-buf attachment to dynamic locking specification

2022-08-24 Thread Christian König via Virtualization

Am 24.08.22 um 17:49 schrieb Dmitry Osipenko:

On 8/24/22 18:24, Christian König wrote:

Am 24.08.22 um 12:22 schrieb Dmitry Osipenko:

Move dma-buf attachment API functions to the dynamic locking
specification.
The strict locking convention prevents deadlock situations for dma-buf
importers and exporters.

Previously, the "unlocked" versions of the attachment API functions
weren't taking the reservation lock and this patch makes them to take
the lock.

Intel and AMD GPU drivers already were mapping the attached dma-bufs
under
the held lock during attachment, hence these drivers are updated to use
the locked functions.

Signed-off-by: Dmitry Osipenko 
---
   drivers/dma-buf/dma-buf.c  | 115 ++---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |   4 +-
   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |   8 +-
   drivers/gpu/drm/i915/gem/i915_gem_object.c |  12 +++
   include/linux/dma-buf.h    |  20 ++--
   5 files changed, 110 insertions(+), 49 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 4556a12bd741..f2a5a122da4a 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -559,7 +559,7 @@ static struct file *dma_buf_getfile(struct dma_buf
*dmabuf, int flags)
    * 2. Userspace passes this file-descriptors to all drivers it wants
this buffer
    *    to share with: First the file descriptor is converted to a
_buf using
    *    dma_buf_get(). Then the buffer is attached to the device using
- *    dma_buf_attach().
+ *    dma_buf_attach_unlocked().

Now I get why this is confusing me so much.

The _unlocked postfix implies that there is another function which
should be called with the locks already held, but this is not the case
for attach/detach (because they always need to grab the lock themselves).

That's correct. The attach/detach ops of exporter can take the lock
(like i915 exporter does it), hence importer must not grab the lock
around dma_buf_attach() invocation.


So I suggest to drop the _unlocked postfix for the attach/detach
functions. Another step would then be to unify attach/detach with
dynamic_attach/dynamic_detach when both have the same locking convention
anyway.

It's not a problem to change the name, but it's unclear to me why we
should do it. The _unlocked postfix tells importer that reservation must
be unlocked and it must be unlocked in case of dma_buf_attach().

Dropping the postfix will make dma_buf_attach() inconsistent with the
rest of the _unlocked functions(?). Are you sure we need to rename it?


The idea of the postfix was to distinguish between two different 
versions of the same function, e.g. dma_buf_vmap_unlocked() vs normal 
dma_buf_vmap().


When we don't have those two types of the same function I don't think it 
makes to much sense to keep that. We should just properly document which 
functions expect what and that's what your documentation patch does.


Regards,
Christian.




Sorry that this is going so much back and forth, it's really complicated
to keep all the stuff in my head at the moment :)

Not a problem at all, I expected that it will take some time for this
patchset to settle down.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 6/9] dma-buf: Move dma-buf attachment to dynamic locking specification

2022-08-24 Thread Christian König via Virtualization

Am 24.08.22 um 17:03 schrieb Dmitry Osipenko:

On 8/24/22 17:08, Christian König wrote:

Am 24.08.22 um 12:22 schrieb Dmitry Osipenko:

Move dma-buf attachment API functions to the dynamic locking
specification.
The strict locking convention prevents deadlock situations for dma-buf
importers and exporters.

Previously, the "unlocked" versions of the attachment API functions
weren't taking the reservation lock and this patch makes them to take
the lock.

Didn't we concluded that we need to keep the attach and detach callbacks
without the lock and only move the map/unmap callbacks over?

Otherwise it won't be possible for drivers to lock multiple buffers if
they have to shuffle things around for a specific attachment.

We did conclude that. The attach/detach dma-buf ops are unlocked, but
the map_dma_buf/unmap_dma_buf must be invoked under lock and
dma_buf_dynamic_attach_unlocked() maps dma-buf if either importer or
exporter can't handle the dynamic mapping [1].


Ah! You are confusing me over and over again with that :)

Ok in this case that here is fine, I just need to re-read the patch.

Thanks,
Christian.



[1]
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv6.0-rc2%2Fsource%2Fdrivers%2Fdma-buf%2Fdma-buf.c%23L869data=05%7C01%7Cchristian.koenig%40amd.com%7Cdf23d89db8b84bf6d4c008da85e1dc6c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637969502441026991%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=d8kWKjDCFn%2B3KmK135Gcv6%2FMLffEYcipouqWxfc%2BKXM%3Dreserved=0

Hence I re-arranged the dma_resv_lock() in
dma_buf_dynamic_attach_unlocked() to move both pinning and mapping under
the held lock.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 9/9] dma-buf: Remove internal lock

2022-08-24 Thread Christian König via Virtualization

Am 24.08.22 um 12:22 schrieb Dmitry Osipenko:

The internal dma-buf lock isn't needed anymore because the updated
locking specification claims that dma-buf reservation must be locked
by importers, and thus, the internal data is already protected by the
reservation lock. Remove the obsoleted internal lock.

Signed-off-by: Dmitry Osipenko 


Reviewed-by: Christian König 


---
  drivers/dma-buf/dma-buf.c | 14 --
  include/linux/dma-buf.h   |  9 -
  2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 696d132b02f4..a0406254f0ae 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -656,7 +656,6 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
  
  	dmabuf->file = file;
  
-	mutex_init(>lock);

INIT_LIST_HEAD(>attachments);
  
  	mutex_lock(_list.lock);

@@ -1503,7 +1502,7 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_mmap_unlocked, DMA_BUF);
  int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map)
  {
struct iosys_map ptr;
-   int ret = 0;
+   int ret;
  
  	iosys_map_clear(map);
  
@@ -1515,28 +1514,25 @@ int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map)

if (!dmabuf->ops->vmap)
return -EINVAL;
  
-	mutex_lock(>lock);

if (dmabuf->vmapping_counter) {
dmabuf->vmapping_counter++;
BUG_ON(iosys_map_is_null(>vmap_ptr));
*map = dmabuf->vmap_ptr;
-   goto out_unlock;
+   return 0;
}
  
  	BUG_ON(iosys_map_is_set(>vmap_ptr));
  
  	ret = dmabuf->ops->vmap(dmabuf, );

if (WARN_ON_ONCE(ret))
-   goto out_unlock;
+   return ret;
  
  	dmabuf->vmap_ptr = ptr;

dmabuf->vmapping_counter = 1;
  
  	*map = dmabuf->vmap_ptr;
  
-out_unlock:

-   mutex_unlock(>lock);
-   return ret;
+   return 0;
  }
  EXPORT_SYMBOL_NS_GPL(dma_buf_vmap, DMA_BUF);
  
@@ -1578,13 +1574,11 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map)

BUG_ON(dmabuf->vmapping_counter == 0);
BUG_ON(!iosys_map_is_equal(>vmap_ptr, map));
  
-	mutex_lock(>lock);

if (--dmabuf->vmapping_counter == 0) {
if (dmabuf->ops->vunmap)
dmabuf->ops->vunmap(dmabuf, map);
iosys_map_clear(>vmap_ptr);
}
-   mutex_unlock(>lock);
  }
  EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF);
  
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h

index d48d534dc55c..aed6695bbb50 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -326,15 +326,6 @@ struct dma_buf {
/** @ops: dma_buf_ops associated with this buffer object. */
const struct dma_buf_ops *ops;
  
-	/**

-* @lock:
-*
-* Used internally to serialize list manipulation, attach/detach and
-* vmap/unmap. Note that in many cases this is superseeded by
-* dma_resv_lock() on @resv.
-*/
-   struct mutex lock;
-
/**
 * @vmapping_counter:
 *


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 9/9] dma-buf: Remove internal lock

2022-08-24 Thread Christian König via Virtualization

Am 24.08.22 um 12:22 schrieb Dmitry Osipenko:

The internal dma-buf lock isn't needed anymore because the updated
locking specification claims that dma-buf reservation must be locked
by importers, and thus, the internal data is already protected by the
reservation lock. Remove the obsoleted internal lock.

Signed-off-by: Dmitry Osipenko 


Reviewed-by: Christian König 


---
  drivers/dma-buf/dma-buf.c | 14 --
  include/linux/dma-buf.h   |  9 -
  2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 696d132b02f4..a0406254f0ae 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -656,7 +656,6 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
  
  	dmabuf->file = file;
  
-	mutex_init(>lock);

INIT_LIST_HEAD(>attachments);
  
  	mutex_lock(_list.lock);

@@ -1503,7 +1502,7 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_mmap_unlocked, DMA_BUF);
  int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map)
  {
struct iosys_map ptr;
-   int ret = 0;
+   int ret;
  
  	iosys_map_clear(map);
  
@@ -1515,28 +1514,25 @@ int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map)

if (!dmabuf->ops->vmap)
return -EINVAL;
  
-	mutex_lock(>lock);

if (dmabuf->vmapping_counter) {
dmabuf->vmapping_counter++;
BUG_ON(iosys_map_is_null(>vmap_ptr));
*map = dmabuf->vmap_ptr;
-   goto out_unlock;
+   return 0;
}
  
  	BUG_ON(iosys_map_is_set(>vmap_ptr));
  
  	ret = dmabuf->ops->vmap(dmabuf, );

if (WARN_ON_ONCE(ret))
-   goto out_unlock;
+   return ret;
  
  	dmabuf->vmap_ptr = ptr;

dmabuf->vmapping_counter = 1;
  
  	*map = dmabuf->vmap_ptr;
  
-out_unlock:

-   mutex_unlock(>lock);
-   return ret;
+   return 0;
  }
  EXPORT_SYMBOL_NS_GPL(dma_buf_vmap, DMA_BUF);
  
@@ -1578,13 +1574,11 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map)

BUG_ON(dmabuf->vmapping_counter == 0);
BUG_ON(!iosys_map_is_equal(>vmap_ptr, map));
  
-	mutex_lock(>lock);

if (--dmabuf->vmapping_counter == 0) {
if (dmabuf->ops->vunmap)
dmabuf->ops->vunmap(dmabuf, map);
iosys_map_clear(>vmap_ptr);
}
-   mutex_unlock(>lock);
  }
  EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF);
  
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h

index d48d534dc55c..aed6695bbb50 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -326,15 +326,6 @@ struct dma_buf {
/** @ops: dma_buf_ops associated with this buffer object. */
const struct dma_buf_ops *ops;
  
-	/**

-* @lock:
-*
-* Used internally to serialize list manipulation, attach/detach and
-* vmap/unmap. Note that in many cases this is superseeded by
-* dma_resv_lock() on @resv.
-*/
-   struct mutex lock;
-
/**
 * @vmapping_counter:
 *


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 6/9] dma-buf: Move dma-buf attachment to dynamic locking specification

2022-08-24 Thread Christian König via Virtualization

Am 24.08.22 um 12:22 schrieb Dmitry Osipenko:

Move dma-buf attachment API functions to the dynamic locking specification.
The strict locking convention prevents deadlock situations for dma-buf
importers and exporters.

Previously, the "unlocked" versions of the attachment API functions
weren't taking the reservation lock and this patch makes them to take
the lock.


Didn't we concluded that we need to keep the attach and detach callbacks 
without the lock and only move the map/unmap callbacks over?


Otherwise it won't be possible for drivers to lock multiple buffers if 
they have to shuffle things around for a specific attachment.


Regards,
Christian.



Intel and AMD GPU drivers already were mapping the attached dma-bufs under
the held lock during attachment, hence these drivers are updated to use
the locked functions.

Signed-off-by: Dmitry Osipenko 
---
  drivers/dma-buf/dma-buf.c  | 115 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   4 +-
  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |   8 +-
  drivers/gpu/drm/i915/gem/i915_gem_object.c |  12 +++
  include/linux/dma-buf.h|  20 ++--
  5 files changed, 110 insertions(+), 49 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 4556a12bd741..f2a5a122da4a 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -559,7 +559,7 @@ static struct file *dma_buf_getfile(struct dma_buf *dmabuf, 
int flags)
   * 2. Userspace passes this file-descriptors to all drivers it wants this 
buffer
   *to share with: First the file descriptor is converted to a _buf 
using
   *dma_buf_get(). Then the buffer is attached to the device using
- *dma_buf_attach().
+ *dma_buf_attach_unlocked().
   *
   *Up to this stage the exporter is still free to migrate or reallocate the
   *backing storage.
@@ -569,8 +569,8 @@ static struct file *dma_buf_getfile(struct dma_buf *dmabuf, 
int flags)
   *dma_buf_map_attachment() and dma_buf_unmap_attachment().
   *
   * 4. Once a driver is done with a shared buffer it needs to call
- *dma_buf_detach() (after cleaning up any mappings) and then release the
- *reference acquired with dma_buf_get() by calling dma_buf_put().
+ *dma_buf_detach_unlocked() (after cleaning up any mappings) and then
+ *release the reference acquired with dma_buf_get() by calling 
dma_buf_put().
   *
   * For the detailed semantics exporters are expected to implement see
   * _buf_ops.
@@ -802,7 +802,7 @@ static struct sg_table * __map_dma_buf(struct 
dma_buf_attachment *attach,
   * @importer_priv:[in]importer private pointer for the attachment
   *
   * Returns struct dma_buf_attachment pointer for this attachment. Attachments
- * must be cleaned up by calling dma_buf_detach().
+ * must be cleaned up by calling dma_buf_detach_unlocked().
   *
   * Optionally this calls _buf_ops.attach to allow device-specific attach
   * functionality.
@@ -858,8 +858,8 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf *dmabuf, 
struct device *dev,
dma_buf_is_dynamic(dmabuf)) {
struct sg_table *sgt;
  
+		dma_resv_lock(attach->dmabuf->resv, NULL);

if (dma_buf_is_dynamic(attach->dmabuf)) {
-   dma_resv_lock(attach->dmabuf->resv, NULL);
ret = dmabuf->ops->pin(attach);
if (ret)
goto err_unlock;
@@ -872,8 +872,7 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf *dmabuf, 
struct device *dev,
ret = PTR_ERR(sgt);
goto err_unpin;
}
-   if (dma_buf_is_dynamic(attach->dmabuf))
-   dma_resv_unlock(attach->dmabuf->resv);
+   dma_resv_unlock(attach->dmabuf->resv);
attach->sgt = sgt;
attach->dir = DMA_BIDIRECTIONAL;
}
@@ -889,8 +888,7 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf *dmabuf, 
struct device *dev,
dmabuf->ops->unpin(attach);
  
  err_unlock:

-   if (dma_buf_is_dynamic(attach->dmabuf))
-   dma_resv_unlock(attach->dmabuf->resv);
+   dma_resv_unlock(attach->dmabuf->resv);
  
  	dma_buf_detach_unlocked(dmabuf, attach);

return ERR_PTR(ret);
@@ -927,7 +925,7 @@ static void __unmap_dma_buf(struct dma_buf_attachment 
*attach,
   * @dmabuf:   [in]buffer to detach from.
   * @attach:   [in]attachment to be detached; is free'd after this call.
   *
- * Clean up a device attachment obtained by calling dma_buf_attach().
+ * Clean up a device attachment obtained by calling dma_buf_attach_unlocked().
   *
   * Optionally this calls _buf_ops.detach for device-specific detach.
   */
@@ -937,21 +935,19 @@ void dma_buf_detach_unlocked(struct dma_buf *dmabuf,
if (WARN_ON(!dmabuf || !attach))
return;
  
+	dma_resv_lock(attach->dmabuf->resv, NULL);

+
if (attach->sgt) {
-

Re: [PATCH v3 5/9] dma-buf: Move dma_buf_mmap_unlocked() to dynamic locking specification

2022-08-24 Thread Christian König via Virtualization
This should work, but I'm really wondering if this makes a difference 
for somebody.


Anyway the approach is fine with me: Acked-by: Christian König 



Regards,
Christian.

Am 24.08.22 um 12:22 schrieb Dmitry Osipenko:

Move dma_buf_mmap_unlocked() function to the dynamic locking specification
by taking the reservation lock. Neither of the today's drivers take the
reservation lock within the mmap() callback, hence it's safe to enforce
the locking.

Signed-off-by: Dmitry Osipenko 
---
  drivers/dma-buf/dma-buf.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index f358af401360..4556a12bd741 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1348,6 +1348,8 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_end_cpu_access, DMA_BUF);
  int dma_buf_mmap_unlocked(struct dma_buf *dmabuf, struct vm_area_struct *vma,
  unsigned long pgoff)
  {
+   int ret;
+
if (WARN_ON(!dmabuf || !vma))
return -EINVAL;
  
@@ -1368,7 +1370,11 @@ int dma_buf_mmap_unlocked(struct dma_buf *dmabuf, struct vm_area_struct *vma,

vma_set_file(vma, dmabuf->file);
vma->vm_pgoff = pgoff;
  
-	return dmabuf->ops->mmap(dmabuf, vma);

+   dma_resv_lock(dmabuf->resv, NULL);
+   ret = dmabuf->ops->mmap(dmabuf, vma);
+   dma_resv_unlock(dmabuf->resv);
+
+   return ret;
  }
  EXPORT_SYMBOL_NS_GPL(dma_buf_mmap_unlocked, DMA_BUF);
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-18 Thread Christian König via Virtualization

Am 18.08.22 um 01:13 schrieb Dmitry Osipenko:

On 8/18/22 01:57, Dmitry Osipenko wrote:

On 8/15/22 18:54, Dmitry Osipenko wrote:

On 8/15/22 17:57, Dmitry Osipenko wrote:

On 8/15/22 16:53, Christian König wrote:

Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:

[SNIP]

Well that comment sounds like KVM is doing the right thing, so I'm
wondering what exactly is going on here.

KVM actually doesn't hold the page reference, it takes the temporal
reference during page fault and then drops the reference once page is
mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
a race condition here?


Well the question is why does KVM grab the page reference in the first
place?

If that is to prevent the mapping from changing then yes that's illegal
and won't work. It can always happen that you grab the address, solve
the fault and then immediately fault again because the address you just
grabbed is invalidated.

If it's for some other reason than we should probably investigate if we
shouldn't stop doing this.

CC: +Paolo Bonzini who introduced this code

commit add6a0cd1c5ba51b201e1361b05a5df817083618
Author: Paolo Bonzini 
Date:   Tue Jun 7 17:51:18 2016 +0200

 KVM: MMU: try to fix up page faults before giving up

 The vGPU folks would like to trap the first access to a BAR by setting
 vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
handler
 then can use remap_pfn_range to place some non-reserved pages in the
VMA.

 This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
 and fixup_user_fault together help supporting it.  The patch also
supports
 VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
 reference counting.

@Paolo,
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fdri-devel%2F73e5ed8d-0d25-7d44-8fa2-e1d61b1f5a04%40amd.com%2FT%2F%23m7647ce5f8c4749599d2c6bc15a2b45f8d8cf8154data=05%7C01%7Cchristian.koenig%40amd.com%7Cecb0f0eb6c2d43afa15b08da80a625ff%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637963748360952327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=52Dvisa7p%2BckmBxMvDJFScGSNy9JRPDdnPK05C%2F6n7A%3Dreserved=0


If we need to bump the refcount only for VM_MIXEDMAP and not for
VM_PFNMAP, then perhaps we could add a flag for that to the kvm_main
code that will denote to kvm_release_page_clean whether it needs to put
the page?

The other variant that kind of works is to mark TTM pages reserved using
SetPageReserved/ClearPageReserved, telling KVM not to mess with the page
struct. But the potential consequences of doing this are unclear to me.

Christian, do you think we can do it?

Although, no. It also doesn't work with KVM without additional changes
to KVM.


Well my fundamental problem is that I can't fit together why KVM is 
grabing a page reference in the first place.


See the idea of the page reference is that you have one reference is 
that you count the reference so that the memory is not reused while you 
access it, e.g. for I/O or mapping it into different address spaces etc...


But none of those use cases seem to apply to KVM. If I'm not totally 
mistaken in KVM you want to make sure that the address space mapping, 
e.g. the translation between virtual and physical address, don't change 
while you handle it, but grabbing a page reference is the completely 
wrong approach for that.


Regards,
Christian.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Christian König via Virtualization

Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:

[SNIP]

Well that comment sounds like KVM is doing the right thing, so I'm
wondering what exactly is going on here.

KVM actually doesn't hold the page reference, it takes the temporal
reference during page fault and then drops the reference once page is
mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
a race condition here?



Well the question is why does KVM grab the page reference in the first 
place?


If that is to prevent the mapping from changing then yes that's illegal 
and won't work. It can always happen that you grab the address, solve 
the fault and then immediately fault again because the address you just 
grabbed is invalidated.


If it's for some other reason than we should probably investigate if we 
shouldn't stop doing this.


Regards,
Christian.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Christian König via Virtualization

Am 15.08.22 um 13:50 schrieb Dmitry Osipenko:

On 8/15/22 14:28, Christian König wrote:

Maybe it was discussed privately? In this case I will be happy to get
more info from you about the root of the problem so I could start to
look at how to fix it properly. It's not apparent where the problem is
to a TTM newbie like me.


Well this is completely unfixable. See the whole purpose of TTM is to
allow tracing where what is mapped of a buffer object.

If you circumvent that and increase the page reference yourself than
that whole functionality can't work correctly any more.

Are you suggesting that the problem is that TTM doesn't see the KVM page
faults/mappings?

Yes, and no. It's one of the issues, but there is more behind that (e.g.
what happens when TTM switches from pages to local memory for backing a
BO).

If KVM page fault could reach TTM, then it should be able to relocate
BO. I see now where is the problem, thanks. Although, I'm wondering
whether it already works somehow.. I'll try to play with the the AMDGPU
shrinker and see what will happen on guest mapping of a relocated BO.


Well the page fault already somehow reaches TTM, otherwise the pfn 
couldn't be filled in in the first place.


The issues is more that KVM should never ever grab a page reference to 
pages mapped with VM_IO or VM_PFNMAP.


Essentially we need to apply the same restriction as with 
get_user_pages() here.



Another question is why is KVM accessing the page structure in the first
place? The VMA is mapped with VM_PFNMAP and VM_IO, KVM should never ever
touch any of those pages.

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.19%2Fsource%2Fvirt%2Fkvm%2Fkvm_main.c%23L2549data=05%7C01%7Cchristian.koenig%40amd.com%7C2f38c27f20f842fc582a08da7eb4580d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637961610314049167%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Pu5F1EF9UvDPdOQ7sjJ1WDRt5XpFZmAMXdkexnDpEmU%3Dreserved=0


Well that comment sounds like KVM is doing the right thing, so I'm 
wondering what exactly is going on here.


Regards,
Christian.





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Christian König via Virtualization

Am 15.08.22 um 13:19 schrieb Dmitry Osipenko:

[SNIP]

I'll try to dig out the older discussions, thank you for the quick
reply!

Are you sure it was really discussed in public previously? All I can
find is yours two answers to a similar patches where you're saying that
this it's a wrong solution without in-depth explanation and further
discussions.

Yeah, that's my problem as well I can't find that of hand.

But yes it certainly was discussed in public.

If it was only CC'd to dri-devel, then could be that emails didn't pass
the spam moderation :/


That might be possible.


Maybe it was discussed privately? In this case I will be happy to get
more info from you about the root of the problem so I could start to
look at how to fix it properly. It's not apparent where the problem is
to a TTM newbie like me.


Well this is completely unfixable. See the whole purpose of TTM is to
allow tracing where what is mapped of a buffer object.

If you circumvent that and increase the page reference yourself than
that whole functionality can't work correctly any more.

Are you suggesting that the problem is that TTM doesn't see the KVM page
faults/mappings?


Yes, and no. It's one of the issues, but there is more behind that (e.g. 
what happens when TTM switches from pages to local memory for backing a BO).


Another question is why is KVM accessing the page structure in the first 
place? The VMA is mapped with VM_PFNMAP and VM_IO, KVM should never ever 
touch any of those pages.


Regards,
Christian.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Christian König via Virtualization

Am 15.08.22 um 12:47 schrieb Dmitry Osipenko:

On 8/15/22 13:18, Dmitry Osipenko wrote:

On 8/15/22 13:14, Christian König wrote:

Am 15.08.22 um 12:11 schrieb Christian König:

Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:

On 8/15/22 13:05, Christian König wrote:

Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:

Higher order pages allocated using alloc_pages() aren't refcounted and
they
need to be refcounted, otherwise it's impossible to map them by
KVM. This
patch sets the refcount of the tail pages and fixes the KVM memory
mapping
faults.

Without this change guest virgl driver can't map host buffers into
guest
and can't provide OpenGL 4.5 profile support to the guest. The host
mappings are also needed for enabling the Venus driver using host GPU
drivers that are utilizing TTM.

Based on a patch proposed by Trigger Huang.

Well I can't count how often I have repeated this: This is an
absolutely
clear NAK!

TTM pages are not reference counted in the first place and because of
this giving them to virgl is illegal.

A? The first page is refcounted when allocated, the tail pages are not.

No they aren't. The first page is just by coincident initialized with
a refcount of 1. This refcount is completely ignored and not used at all.

Incrementing the reference count and by this mapping the page into
some other address space is illegal and corrupts the internal state
tracking of TTM.

See this comment in the source code as well:

     /* Don't set the __GFP_COMP flag for higher order allocations.
  * Mapping pages directly into an userspace process and calling
  * put_page() on a TTM allocated page is illegal.
  */

I have absolutely no idea how somebody had the idea he could do this.

I saw this comment, but it doesn't make sense because it doesn't explain
why it's illegal. Hence it looks like a bogus comment since the
refcouting certainly works, at least to a some degree because I haven't
noticed any problems in practice, maybe by luck :)

I'll try to dig out the older discussions, thank you for the quick reply!

Are you sure it was really discussed in public previously? All I can
find is yours two answers to a similar patches where you're saying that
this it's a wrong solution without in-depth explanation and further
discussions.


Yeah, that's my problem as well I can't find that of hand.

But yes it certainly was discussed in public.



Maybe it was discussed privately? In this case I will be happy to get
more info from you about the root of the problem so I could start to
look at how to fix it properly. It's not apparent where the problem is
to a TTM newbie like me.



Well this is completely unfixable. See the whole purpose of TTM is to 
allow tracing where what is mapped of a buffer object.


If you circumvent that and increase the page reference yourself than 
that whole functionality can't work correctly any more.


Regards,
Christian.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Christian König via Virtualization

Am 15.08.22 um 12:18 schrieb Dmitry Osipenko:

On 8/15/22 13:14, Christian König wrote:

Am 15.08.22 um 12:11 schrieb Christian König:

Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:

On 8/15/22 13:05, Christian König wrote:

Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:

Higher order pages allocated using alloc_pages() aren't refcounted and
they
need to be refcounted, otherwise it's impossible to map them by
KVM. This
patch sets the refcount of the tail pages and fixes the KVM memory
mapping
faults.

Without this change guest virgl driver can't map host buffers into
guest
and can't provide OpenGL 4.5 profile support to the guest. The host
mappings are also needed for enabling the Venus driver using host GPU
drivers that are utilizing TTM.

Based on a patch proposed by Trigger Huang.

Well I can't count how often I have repeated this: This is an
absolutely
clear NAK!

TTM pages are not reference counted in the first place and because of
this giving them to virgl is illegal.

A? The first page is refcounted when allocated, the tail pages are not.

No they aren't. The first page is just by coincident initialized with
a refcount of 1. This refcount is completely ignored and not used at all.

Incrementing the reference count and by this mapping the page into
some other address space is illegal and corrupts the internal state
tracking of TTM.

See this comment in the source code as well:

     /* Don't set the __GFP_COMP flag for higher order allocations.
  * Mapping pages directly into an userspace process and calling
  * put_page() on a TTM allocated page is illegal.
  */

I have absolutely no idea how somebody had the idea he could do this.

I saw this comment, but it doesn't make sense because it doesn't explain
why it's illegal. Hence it looks like a bogus comment since the
refcouting certainly works, at least to a some degree because I haven't
noticed any problems in practice, maybe by luck :)


Well exactly that's the problem. It does not work, you are just lucky :)

I will provide a patch to set the reference count to zero even for 
non-compound pages. Maybe that will yield more backtrace to abusers of 
this interface.


Regards,
Christian.



I'll try to dig out the older discussions, thank you for the quick reply!



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Christian König via Virtualization

Am 15.08.22 um 12:11 schrieb Christian König:

Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:

On 8/15/22 13:05, Christian König wrote:

Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:

Higher order pages allocated using alloc_pages() aren't refcounted and
they
need to be refcounted, otherwise it's impossible to map them by 
KVM. This

patch sets the refcount of the tail pages and fixes the KVM memory
mapping
faults.

Without this change guest virgl driver can't map host buffers into 
guest

and can't provide OpenGL 4.5 profile support to the guest. The host
mappings are also needed for enabling the Venus driver using host GPU
drivers that are utilizing TTM.

Based on a patch proposed by Trigger Huang.
Well I can't count how often I have repeated this: This is an 
absolutely

clear NAK!

TTM pages are not reference counted in the first place and because of
this giving them to virgl is illegal.

A? The first page is refcounted when allocated, the tail pages are not.


No they aren't. The first page is just by coincident initialized with 
a refcount of 1. This refcount is completely ignored and not used at all.


Incrementing the reference count and by this mapping the page into 
some other address space is illegal and corrupts the internal state 
tracking of TTM.


See this comment in the source code as well:

    /* Don't set the __GFP_COMP flag for higher order allocations.
 * Mapping pages directly into an userspace process and calling
 * put_page() on a TTM allocated page is illegal.
 */

I have absolutely no idea how somebody had the idea he could do this.

Regards,
Christian.




Please immediately stop this completely broken approach. We have
discussed this multiple times now.

Could you please give me a link to these discussions?


Not of hand, please search the dri-devel list for similar patches. 
This was brought up multiple times now.


Regards,
Christian.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Christian König via Virtualization

Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:

On 8/15/22 13:05, Christian König wrote:

Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:

Higher order pages allocated using alloc_pages() aren't refcounted and
they
need to be refcounted, otherwise it's impossible to map them by KVM. This
patch sets the refcount of the tail pages and fixes the KVM memory
mapping
faults.

Without this change guest virgl driver can't map host buffers into guest
and can't provide OpenGL 4.5 profile support to the guest. The host
mappings are also needed for enabling the Venus driver using host GPU
drivers that are utilizing TTM.

Based on a patch proposed by Trigger Huang.

Well I can't count how often I have repeated this: This is an absolutely
clear NAK!

TTM pages are not reference counted in the first place and because of
this giving them to virgl is illegal.

A? The first page is refcounted when allocated, the tail pages are not.


No they aren't. The first page is just by coincident initialized with a 
refcount of 1. This refcount is completely ignored and not used at all.


Incrementing the reference count and by this mapping the page into some 
other address space is illegal and corrupts the internal state tracking 
of TTM.



Please immediately stop this completely broken approach. We have
discussed this multiple times now.

Could you please give me a link to these discussions?


Not of hand, please search the dri-devel list for similar patches. This 
was brought up multiple times now.


Regards,
Christian.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Christian König via Virtualization

Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:

Higher order pages allocated using alloc_pages() aren't refcounted and they
need to be refcounted, otherwise it's impossible to map them by KVM. This
patch sets the refcount of the tail pages and fixes the KVM memory mapping
faults.

Without this change guest virgl driver can't map host buffers into guest
and can't provide OpenGL 4.5 profile support to the guest. The host
mappings are also needed for enabling the Venus driver using host GPU
drivers that are utilizing TTM.

Based on a patch proposed by Trigger Huang.


Well I can't count how often I have repeated this: This is an absolutely 
clear NAK!


TTM pages are not reference counted in the first place and because of 
this giving them to virgl is illegal.


Please immediately stop this completely broken approach. We have 
discussed this multiple times now.


Regards,
Christian.



Cc: sta...@vger.kernel.org
Cc: Trigger Huang 
Link: 
https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343
Tested-by: Dmitry Osipenko  # AMDGPU (Qemu and 
crosvm)
Signed-off-by: Dmitry Osipenko 
---
  drivers/gpu/drm/ttm/ttm_pool.c | 25 -
  1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 21b61631f73a..11e92bb149c9 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
*pool, gfp_t gfp_flags,
unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
struct ttm_pool_dma *dma;
struct page *p;
+   unsigned int i;
void *vaddr;
  
  	/* Don't set the __GFP_COMP flag for higher order allocations.

@@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
*pool, gfp_t gfp_flags,
  
  	if (!pool->use_dma_alloc) {

p = alloc_pages(gfp_flags, order);
-   if (p)
+   if (p) {
p->private = order;
+   goto ref_tail_pages;
+   }
return p;
}
  
@@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
  
  	dma->vaddr = (unsigned long)vaddr | order;

p->private = (unsigned long)dma;
+
+ref_tail_pages:
+   /*
+* KVM requires mapped tail pages to be refcounted because put_page()
+* is invoked on them in the end of the page fault handling, and thus,
+* tail pages need to be protected from the premature releasing.
+* In fact, KVM page fault handler refuses to map tail pages to guest
+* if they aren't refcounted because hva_to_pfn_remapped() checks the
+* refcount specifically for this case.
+*
+* In particular, unreferenced tail pages result in a KVM "Bad address"
+* failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
+* accesses mapped host TTM buffer that contains tail pages.
+*/
+   for (i = 1; i < 1 << order; i++)
+   page_ref_inc(p + i);
+
return p;
  
  error_free:

@@ -133,6 +153,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum 
ttm_caching caching,
  {
unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
struct ttm_pool_dma *dma;
+   unsigned int i;
void *vaddr;
  
  #ifdef CONFIG_X86

@@ -142,6 +163,8 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum 
ttm_caching caching,
if (caching != ttm_cached && !PageHighMem(p))
set_pages_wb(p, 1 << order);
  #endif
+   for (i = 1; i < 1 << order; i++)
+   page_ref_dec(p + i);
  
  	if (!pool || !pool->use_dma_alloc) {

__free_pages(p, order);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Linaro-mm-sig] [PATCH v2 3/5] dma-buf: Move all dma-bufs to dynamic locking specification

2022-08-10 Thread Christian König via Virtualization

Am 10.08.22 um 19:49 schrieb Dmitry Osipenko:

On 8/10/22 14:30, Christian König wrote:

Am 25.07.22 um 17:18 schrieb Dmitry Osipenko:

This patch moves the non-dynamic dma-buf users over to the dynamic
locking specification. The strict locking convention prevents deadlock
situation for dma-buf importers and exporters.

Previously the "unlocked" versions of the dma-buf API functions weren't
taking the reservation lock and this patch makes them to take the lock.

Intel and AMD GPU drivers already were mapping imported dma-bufs under
the held lock, hence the "locked" variant of the functions are added
for them and the drivers are updated to use the "locked" versions.

In general "Yes, please", but that won't be that easy.

You not only need to change amdgpu and i915, but all drivers
implementing the map_dma_buf(), unmap_dma_buf() callbacks.

Auditing all that code is a huge bunch of work.

Hm, neither of drivers take the resv lock in map_dma_buf/unmap_dma_buf.
It's easy to audit them all and I did it. So either I'm missing
something or it doesn't take much time to check them all. Am I really
missing something?


Ok, so this is only changing map/unmap now?

In this case please separate this from the documentation change.

I would also drop the _locked postfix from the function name, just 
having _unlocked on all functions which are supposed to be called with 
the lock held should be sufficient.


Thanks for looking into this,
Christian.



https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2FA%2Fident%2Fmap_dma_bufdata=05%7C01%7Cchristian.koenig%40amd.com%7C70fd52d0a82a477bfbfe08da7af8bec7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637957506041914442%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=K47uCULsoiURjze0H0ksUa4vzJ%2BxqgoShH9106FvyyA%3Dreserved=0



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v8 1/2] drm/gem: Properly annotate WW context on drm_gem_lock_reservations() error

2022-08-10 Thread Christian König via Virtualization

Am 10.08.22 um 10:33 schrieb Daniel Vetter:

On Wed, 10 Aug 2022 at 08:52, Christian König  wrote:

Am 09.08.22 um 18:44 schrieb Daniel Vetter:

On Tue, Jul 05, 2022 at 01:33:51PM +0200, Christian König wrote:

Am 01.07.22 um 11:02 schrieb Dmitry Osipenko:

Use ww_acquire_fini() in the error code paths. Otherwise lockdep
thinks that lock is held when lock's memory is freed after the
drm_gem_lock_reservations() error. The ww_acquire_context needs to be
annotated as "released", which fixes the noisy "WARNING: held lock freed!"
splat of VirtIO-GPU driver with CONFIG_DEBUG_MUTEXES=y and enabled lockdep.

Cc: sta...@vger.kernel.org
Fixes: 7edc3e3b975b5 ("drm: Add helpers for locking an array of BO 
reservations.")
Reviewed-by: Thomas Hellström 
Signed-off-by: Dmitry Osipenko 

Reviewed-by: Christian König 

Also added this r-b tag when merging to drm-misc-next-fixes.

IIRC I've already pushed this to drm-misc-fixes with a CC stable tag
about 2 weeks ago.

Please double check, it probably just hasn't come down the stream again yet.

Hm quickly check and I didn't spot it? There's a few patches from
Dmitry in the last few pulls, and some more stuff pending, but not
these two afaics?


Mhm, there is some potential that I wanted to push it but got distracted 
by the re-occurring drm-tip build breakages.


Anyway what I wanted to say is that this stuff should probably go to 
drm-misc-fixes with a CC: stable tag :)


Christian.


-Daniel


Christian.


-Daniel


---
drivers/gpu/drm/drm_gem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index eb0c2d041f13..86d670c71286 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1226,7 +1226,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, 
int count,
 ret = dma_resv_lock_slow_interruptible(obj->resv,
  acquire_ctx);
 if (ret) {
-   ww_acquire_done(acquire_ctx);
+   ww_acquire_fini(acquire_ctx);
 return ret;
 }
 }
@@ -1251,7 +1251,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, 
int count,
 goto retry;
 }
-   ww_acquire_done(acquire_ctx);
+   ww_acquire_fini(acquire_ctx);
 return ret;
 }
 }




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v8 1/2] drm/gem: Properly annotate WW context on drm_gem_lock_reservations() error

2022-08-10 Thread Christian König via Virtualization

Am 09.08.22 um 18:44 schrieb Daniel Vetter:

On Tue, Jul 05, 2022 at 01:33:51PM +0200, Christian König wrote:

Am 01.07.22 um 11:02 schrieb Dmitry Osipenko:

Use ww_acquire_fini() in the error code paths. Otherwise lockdep
thinks that lock is held when lock's memory is freed after the
drm_gem_lock_reservations() error. The ww_acquire_context needs to be
annotated as "released", which fixes the noisy "WARNING: held lock freed!"
splat of VirtIO-GPU driver with CONFIG_DEBUG_MUTEXES=y and enabled lockdep.

Cc: sta...@vger.kernel.org
Fixes: 7edc3e3b975b5 ("drm: Add helpers for locking an array of BO 
reservations.")
Reviewed-by: Thomas Hellström 
Signed-off-by: Dmitry Osipenko 

Reviewed-by: Christian König 

Also added this r-b tag when merging to drm-misc-next-fixes.


IIRC I've already pushed this to drm-misc-fixes with a CC stable tag 
about 2 weeks ago.


Please double check, it probably just hasn't come down the stream again yet.

Christian.


-Daniel


---
   drivers/gpu/drm/drm_gem.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index eb0c2d041f13..86d670c71286 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1226,7 +1226,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, 
int count,
ret = dma_resv_lock_slow_interruptible(obj->resv,
 acquire_ctx);
if (ret) {
-   ww_acquire_done(acquire_ctx);
+   ww_acquire_fini(acquire_ctx);
return ret;
}
}
@@ -1251,7 +1251,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, 
int count,
goto retry;
}
-   ww_acquire_done(acquire_ctx);
+   ww_acquire_fini(acquire_ctx);
return ret;
}
}


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 4/6] dma-buf: Acquire wait-wound context on attachment

2022-07-20 Thread Christian König via Virtualization

Am 19.07.22 um 22:05 schrieb Dmitry Osipenko:

On 7/15/22 09:59, Dmitry Osipenko wrote:

On 7/15/22 09:50, Christian König wrote:

Am 15.07.22 um 02:52 schrieb Dmitry Osipenko:

Intel i915 GPU driver uses wait-wound mutex to lock multiple GEMs on the
attachment to the i915 dma-buf. In order to let all drivers utilize
shared
wait-wound context during attachment in a general way, make dma-buf
core to
acquire the ww context internally for the attachment operation and update
i915 driver to use the importer's ww context instead of the internal one.

  From now on all dma-buf exporters shall use the importer's ww context
for
the attachment operation.

Signed-off-by: Dmitry Osipenko 
---
   drivers/dma-buf/dma-buf.c |  8 +-
   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  2 +-
   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  2 +-
   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  6 ++---
   drivers/gpu/drm/i915/i915_gem_evict.c |  2 +-
   drivers/gpu/drm/i915/i915_gem_ww.c    | 26 +++
   drivers/gpu/drm/i915/i915_gem_ww.h    | 15 +--
   7 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 0ee588276534..37545ecb845a 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -807,6 +807,8 @@ static struct sg_table * __map_dma_buf(struct
dma_buf_attachment *attach,
    * Optionally this calls _buf_ops.attach to allow
device-specific attach
    * functionality.
    *
+ * Exporters shall use ww_ctx acquired by this function.
+ *
    * Returns:
    *
    * A pointer to newly created _buf_attachment on success, or a
negative
@@ -822,6 +824,7 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf
*dmabuf, struct device *dev,
   void *importer_priv)
   {
   struct dma_buf_attachment *attach;
+    struct ww_acquire_ctx ww_ctx;
   int ret;
     if (WARN_ON(!dmabuf || !dev))
@@ -841,7 +844,8 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf
*dmabuf, struct device *dev,
   attach->importer_ops = importer_ops;
   attach->importer_priv = importer_priv;
   -    dma_resv_lock(dmabuf->resv, NULL);
+    ww_acquire_init(_ctx, _ww_class);
+    dma_resv_lock(dmabuf->resv, _ctx);

That won't work like this. The core property of a WW context is that you
need to unwind all the locks and re-quire them with the contended one
first.

When you statically lock the imported one here you can't do that any more.

You're right. I felt that something is missing here, but couldn't
notice. I'll think more about this and enable
CONFIG_DEBUG_WW_MUTEX_SLOWPATH. Thank you!


Christian, do you think we could make an excuse for the attach()
callback and make the exporter responsible for taking the resv lock? It
will be inconsistent with the rest of the callbacks, where importer
takes the lock, but it will be the simplest and least invasive solution.
It's very messy to do a cross-driver ww locking, I don't think it's the
right approach.


So to summarize the following calls will require that the caller hold 
the resv lock:

1. dma_buf_pin()/dma_buf_unpin()
2. dma_buf_map_attachment()/dma_buf_unmap_attachment()
3. dma_buf_vmap()/dma_buf_vunmap()
4. dma_buf_move_notify()

The following calls require that caller does not held the resv lock:
1. dma_buf_attach()/dma_buf_dynamic_attach()/dma_buf_detach()
2. dma_buf_export()/dma_buf_fd()
3. dma_buf_get()/dma_buf_put()
4. dma_buf_begin_cpu_access()/dma_buf_end_cpu_access()

If that's correct than that would work for me as well, but we should 
probably document this.


Or let me ask the other way around: What calls exactly do you need to 
change to solve your original issue? That was vmap/vunmap, wasn't it? If 
yes then let's concentrate on those for the moment.


Regards,
Christian.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 1/6] dma-buf: Add _unlocked postfix to function names

2022-07-15 Thread Christian König via Virtualization

Am 15.07.22 um 11:31 schrieb Dmitry Osipenko:

On 7/15/22 10:19, Christian König wrote:

-struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment
*attach,
-    enum dma_data_direction direction)
+struct sg_table *
+dma_buf_map_attachment_unlocked(struct dma_buf_attachment *attach,
+    enum dma_data_direction direction)

The locking state of mapping and unmapping operations depend on if the
attachment is dynamic or not.

So this here is not a good idea at all since it suggests that the
function is always called without holding the lock.

I had the same thought while was working on this patch and initially was
thinking about adding an "unlocked" alias to dma_buf_map_attachment().
In the end I decided that it will create even more confusion and it's
simpler just to rename this func here since there are only two drivers
using the dynamic mapping.

Do you have suggestions how to improve it?


Just keep it as it is. The ultimative goal is to switch all drivers over 
to the dynamic mapping or at least over to assume that the map/unmap 
callbacks are called with the lock held.


Regards,
Christian.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 2/6] drm/gem: Take reservation lock for vmap/vunmap operations

2022-07-15 Thread Christian König via Virtualization

Am 15.07.22 um 02:52 schrieb Dmitry Osipenko:

The new common dma-buf locking convention will require buffer importers
to hold the reservation lock around mapping operations. Make DRM GEM core
to take the lock around the vmapping operations and update QXL and i915
drivers to use the locked functions for the case where DRM core now holds
the lock. This patch prepares DRM core and drivers to transition to the
common dma-buf locking convention where vmapping of exported GEMs will
be done under the held reservation lock.


Oh ^^ That looks like a bug fix to me!

At least drm_gem_ttm_vmap() and drm_gem_ttm_vunmap() already expected 
that they are called with the reservation lock held.


Otherwise you could mess up internal structures in the TTM buffer object 
while vmapping it.


I will take a deeper look at this.

Regards,
Christian.



Signed-off-by: Dmitry Osipenko 
---
  drivers/gpu/drm/drm_client.c |  4 +--
  drivers/gpu/drm/drm_gem.c| 28 
  drivers/gpu/drm/drm_gem_framebuffer_helper.c |  6 ++---
  drivers/gpu/drm/drm_prime.c  |  4 +--
  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   |  2 +-
  drivers/gpu/drm/qxl/qxl_object.c | 17 ++--
  drivers/gpu/drm/qxl/qxl_prime.c  |  4 +--
  include/drm/drm_gem.h|  3 +++
  8 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/drm_client.c b/drivers/gpu/drm/drm_client.c
index 2b230b4d6942..fbcb1e995384 100644
--- a/drivers/gpu/drm/drm_client.c
+++ b/drivers/gpu/drm/drm_client.c
@@ -323,7 +323,7 @@ drm_client_buffer_vmap(struct drm_client_buffer *buffer,
 * fd_install step out of the driver backend hooks, to make that
 * final step optional for internal users.
 */
-   ret = drm_gem_vmap(buffer->gem, map);
+   ret = drm_gem_vmap_unlocked(buffer->gem, map);
if (ret)
return ret;
  
@@ -345,7 +345,7 @@ void drm_client_buffer_vunmap(struct drm_client_buffer *buffer)

  {
struct iosys_map *map = >map;
  
-	drm_gem_vunmap(buffer->gem, map);

+   drm_gem_vunmap_unlocked(buffer->gem, map);
  }
  EXPORT_SYMBOL(drm_client_buffer_vunmap);
  
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c

index eb0c2d041f13..9769c33cad99 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1155,6 +1155,8 @@ void drm_gem_print_info(struct drm_printer *p, unsigned 
int indent,
  
  int drm_gem_pin(struct drm_gem_object *obj)

  {
+   dma_resv_assert_held(obj->resv);
+
if (obj->funcs->pin)
return obj->funcs->pin(obj);
else
@@ -1163,6 +1165,8 @@ int drm_gem_pin(struct drm_gem_object *obj)
  
  void drm_gem_unpin(struct drm_gem_object *obj)

  {
+   dma_resv_assert_held(obj->resv);
+
if (obj->funcs->unpin)
obj->funcs->unpin(obj);
  }
@@ -1171,6 +1175,8 @@ int drm_gem_vmap(struct drm_gem_object *obj, struct 
iosys_map *map)
  {
int ret;
  
+	dma_resv_assert_held(obj->resv);

+
if (!obj->funcs->vmap)
return -EOPNOTSUPP;
  
@@ -1186,6 +1192,8 @@ EXPORT_SYMBOL(drm_gem_vmap);
  
  void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)

  {
+   dma_resv_assert_held(obj->resv);
+
if (iosys_map_is_null(map))
return;
  
@@ -1197,6 +1205,26 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)

  }
  EXPORT_SYMBOL(drm_gem_vunmap);
  
+int drm_gem_vmap_unlocked(struct drm_gem_object *obj, struct iosys_map *map)

+{
+   int ret;
+
+   dma_resv_lock(obj->resv, NULL);
+   ret = drm_gem_vmap(obj, map);
+   dma_resv_unlock(obj->resv);
+
+   return ret;
+}
+EXPORT_SYMBOL(drm_gem_vmap_unlocked);
+
+void drm_gem_vunmap_unlocked(struct drm_gem_object *obj, struct iosys_map *map)
+{
+   dma_resv_lock(obj->resv, NULL);
+   drm_gem_vunmap(obj, map);
+   dma_resv_unlock(obj->resv);
+}
+EXPORT_SYMBOL(drm_gem_vunmap_unlocked);
+
  /**
   * drm_gem_lock_reservations - Sets up the ww context and acquires
   * the lock on an array of GEM objects.
diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c 
b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
index 880a4975507f..e35e224e6303 100644
--- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c
+++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
@@ -354,7 +354,7 @@ int drm_gem_fb_vmap(struct drm_framebuffer *fb, struct 
iosys_map *map,
ret = -EINVAL;
goto err_drm_gem_vunmap;
}
-   ret = drm_gem_vmap(obj, [i]);
+   ret = drm_gem_vmap_unlocked(obj, [i]);
if (ret)
goto err_drm_gem_vunmap;
}
@@ -376,7 +376,7 @@ int drm_gem_fb_vmap(struct drm_framebuffer *fb, struct 
iosys_map *map,
obj = drm_gem_fb_get_obj(fb, i);
if (!obj)
continue;
-   

Re: [PATCH v1 1/6] dma-buf: Add _unlocked postfix to function names

2022-07-15 Thread Christian König via Virtualization

Am 15.07.22 um 02:52 schrieb Dmitry Osipenko:

Add _unlocked postfix to the dma-buf API function names in a preparation
to move all non-dynamic dma-buf users over to the dynamic locking
specification. This patch only renames API functions, preparing drivers
to the common locking convention. Later on we will make the "unlocked"
functions to take the reservation lock.

Suggested-by: Christian König 
Signed-off-by: Dmitry Osipenko 
---
  drivers/dma-buf/dma-buf.c | 76 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c   |  4 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  4 +-
  drivers/gpu/drm/armada/armada_gem.c   | 14 ++--
  drivers/gpu/drm/drm_gem_cma_helper.c  |  6 +-
  drivers/gpu/drm/drm_gem_shmem_helper.c|  6 +-
  drivers/gpu/drm/drm_prime.c   | 12 +--
  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c   |  6 +-
  drivers/gpu/drm/exynos/exynos_drm_gem.c   |  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 12 +--
  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 20 ++---
  drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |  8 +-
  drivers/gpu/drm/tegra/gem.c   | 27 +++
  drivers/infiniband/core/umem_dmabuf.c | 11 +--
  .../common/videobuf2/videobuf2-dma-contig.c   | 15 ++--
  .../media/common/videobuf2/videobuf2-dma-sg.c | 12 +--
  .../common/videobuf2/videobuf2-vmalloc.c  |  6 +-
  .../platform/nvidia/tegra-vde/dmabuf-cache.c  | 12 +--
  drivers/misc/fastrpc.c| 12 +--
  drivers/xen/gntdev-dmabuf.c   | 14 ++--
  include/linux/dma-buf.h   | 34 +
  21 files changed, 161 insertions(+), 152 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 44574fbe7482..d16237a6ffaa 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -795,7 +795,7 @@ static struct sg_table * __map_dma_buf(struct 
dma_buf_attachment *attach,
  }
  
  /**

- * dma_buf_dynamic_attach - Add the device to dma_buf's attachments list
+ * dma_buf_dynamic_attach_unlocked - Add the device to dma_buf's attachments 
list
   * @dmabuf:   [in]buffer to attach device to.
   * @dev:  [in]device to be attached.
   * @importer_ops: [in]importer operations for the attachment
@@ -817,9 +817,9 @@ static struct sg_table * __map_dma_buf(struct 
dma_buf_attachment *attach,
   * indicated with the error code -EBUSY.
   */
  struct dma_buf_attachment *
-dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev,
-  const struct dma_buf_attach_ops *importer_ops,
-  void *importer_priv)
+dma_buf_dynamic_attach_unlocked(struct dma_buf *dmabuf, struct device *dev,
+   const struct dma_buf_attach_ops *importer_ops,
+   void *importer_priv)
  {
struct dma_buf_attachment *attach;
int ret;
@@ -892,25 +892,25 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct 
device *dev,
if (dma_buf_is_dynamic(attach->dmabuf))
dma_resv_unlock(attach->dmabuf->resv);
  
-	dma_buf_detach(dmabuf, attach);

+   dma_buf_detach_unlocked(dmabuf, attach);
return ERR_PTR(ret);
  }
-EXPORT_SYMBOL_NS_GPL(dma_buf_dynamic_attach, DMA_BUF);
+EXPORT_SYMBOL_NS_GPL(dma_buf_dynamic_attach_unlocked, DMA_BUF);
  
  /**

- * dma_buf_attach - Wrapper for dma_buf_dynamic_attach
+ * dma_buf_attach_unlocked - Wrapper for dma_buf_dynamic_attach
   * @dmabuf:   [in]buffer to attach device to.
   * @dev:  [in]device to be attached.
   *
- * Wrapper to call dma_buf_dynamic_attach() for drivers which still use a 
static
- * mapping.
+ * Wrapper to call dma_buf_dynamic_attach_unlocked() for drivers which still
+ * use a static mapping.
   */
-struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
- struct device *dev)
+struct dma_buf_attachment *dma_buf_attach_unlocked(struct dma_buf *dmabuf,
+  struct device *dev)
  {
-   return dma_buf_dynamic_attach(dmabuf, dev, NULL, NULL);
+   return dma_buf_dynamic_attach_unlocked(dmabuf, dev, NULL, NULL);
  }
-EXPORT_SYMBOL_NS_GPL(dma_buf_attach, DMA_BUF);
+EXPORT_SYMBOL_NS_GPL(dma_buf_attach_unlocked, DMA_BUF);
  
  static void __unmap_dma_buf(struct dma_buf_attachment *attach,

struct sg_table *sg_table,
@@ -923,7 +923,7 @@ static void __unmap_dma_buf(struct dma_buf_attachment 
*attach,
  }
  
  /**

- * dma_buf_detach - Remove the given attachment from dmabuf's attachments list
+ * dma_buf_detach_unlocked - Remove the given attachment from dmabuf's 
attachments list
   * @dmabuf:   [in]buffer to detach from.
   * @attach:   [in]attachment to be detached; is free'd after this call.
   *
@@ -931,7 +931,8 @@ static void __unmap_dma_buf(struct dma_buf_attachment 
*attach,
   *
   * Optionally 

Re: [PATCH v1 4/6] dma-buf: Acquire wait-wound context on attachment

2022-07-15 Thread Christian König via Virtualization

Am 15.07.22 um 02:52 schrieb Dmitry Osipenko:

Intel i915 GPU driver uses wait-wound mutex to lock multiple GEMs on the
attachment to the i915 dma-buf. In order to let all drivers utilize shared
wait-wound context during attachment in a general way, make dma-buf core to
acquire the ww context internally for the attachment operation and update
i915 driver to use the importer's ww context instead of the internal one.

 From now on all dma-buf exporters shall use the importer's ww context for
the attachment operation.

Signed-off-by: Dmitry Osipenko 
---
  drivers/dma-buf/dma-buf.c |  8 +-
  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  2 +-
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_object.h|  6 ++---
  drivers/gpu/drm/i915/i915_gem_evict.c |  2 +-
  drivers/gpu/drm/i915/i915_gem_ww.c| 26 +++
  drivers/gpu/drm/i915/i915_gem_ww.h| 15 +--
  7 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 0ee588276534..37545ecb845a 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -807,6 +807,8 @@ static struct sg_table * __map_dma_buf(struct 
dma_buf_attachment *attach,
   * Optionally this calls _buf_ops.attach to allow device-specific attach
   * functionality.
   *
+ * Exporters shall use ww_ctx acquired by this function.
+ *
   * Returns:
   *
   * A pointer to newly created _buf_attachment on success, or a negative
@@ -822,6 +824,7 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf *dmabuf, 
struct device *dev,
void *importer_priv)
  {
struct dma_buf_attachment *attach;
+   struct ww_acquire_ctx ww_ctx;
int ret;
  
  	if (WARN_ON(!dmabuf || !dev))

@@ -841,7 +844,8 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf *dmabuf, 
struct device *dev,
attach->importer_ops = importer_ops;
attach->importer_priv = importer_priv;
  
-	dma_resv_lock(dmabuf->resv, NULL);

+   ww_acquire_init(_ctx, _ww_class);
+   dma_resv_lock(dmabuf->resv, _ctx);


That won't work like this. The core property of a WW context is that you 
need to unwind all the locks and re-quire them with the contended one first.


When you statically lock the imported one here you can't do that any more.

Regards,
Christian.

  
  	if (dmabuf->ops->attach) {

ret = dmabuf->ops->attach(dmabuf, attach);
@@ -876,11 +880,13 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf *dmabuf, 
struct device *dev,
}
  
  	dma_resv_unlock(dmabuf->resv);

+   ww_acquire_fini(_ctx);
  
  	return attach;
  
  err_attach:

dma_resv_unlock(attach->dmabuf->resv);
+   ww_acquire_fini(_ctx);
kfree(attach);
return ERR_PTR(ret);
  
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c

index c199bf71c373..9173f0232b16 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -173,7 +173,7 @@ static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf,
if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM))
return -EOPNOTSUPP;
  
-	for_i915_gem_ww(, err, true) {

+   for_i915_dmabuf_ww(, dmabuf, err, true) {
err = i915_gem_object_migrate(obj, , INTEL_REGION_SMEM);
if (err)
continue;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 30fe847c6664..ad7d602fc43a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3409,7 +3409,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
goto err_vma;
}
  
-	ww_acquire_done();

+   ww_acquire_done(eb.ww.ctx);
eb_capture_stage();
  
  	out_fence = eb_requests_create(, in_fence, out_fence_fd);

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index e11d82a9f7c3..5ae38f94a5c7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -178,9 +178,9 @@ static inline int __i915_gem_object_lock(struct 
drm_i915_gem_object *obj,
int ret;
  
  	if (intr)

-   ret = dma_resv_lock_interruptible(obj->base.resv, ww ? >ctx 
: NULL);
+   ret = dma_resv_lock_interruptible(obj->base.resv, ww ? ww->ctx 
: NULL);
else
-   ret = dma_resv_lock(obj->base.resv, ww ? >ctx : NULL);
+   ret = dma_resv_lock(obj->base.resv, ww ? ww->ctx : NULL);
  
  	if (!ret && ww) {

i915_gem_object_get(obj);
@@ -216,7 +216,7 @@ static inline bool i915_gem_object_trylock(struct 
drm_i915_gem_object *obj,
if (!ww)
return dma_resv_trylock(obj->base.resv);
else
-   return 

Re: [PATCH v8 2/2] drm/gem: Don't map imported GEMs

2022-07-05 Thread Christian König via Virtualization

Am 01.07.22 um 11:02 schrieb Dmitry Osipenko:

Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't
handle imported dma-bufs properly, which results in mapping of something
else than the imported dma-buf. On NVIDIA Tegra we get a hard lockup when
userspace writes to the memory mapping of a dma-buf that was imported into
Tegra's DRM GEM.

Majority of DRM drivers prohibit mapping of the imported GEM objects.
Mapping of imported GEMs require special care from userspace since it
should sync dma-buf because mapping coherency of the exporter device may
not match the DRM device. Let's prohibit the mapping for all DRM drivers
for consistency.

Suggested-by: Thomas Hellström 
Signed-off-by: Dmitry Osipenko 


I'm pretty sure that this is the right approach, but it's certainly more 
than possible that somebody abused this already.


Anyway patch is Reviewed-by: Christian König  
since you are really fixing a major stability problem here.


Regards,
Christian.


---
  drivers/gpu/drm/drm_gem.c  | 4 
  drivers/gpu/drm/drm_gem_shmem_helper.c | 9 -
  2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 86d670c71286..fc9ec42fa0ab 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1034,6 +1034,10 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, 
unsigned long obj_size,
  {
int ret;
  
+	/* Don't allow imported objects to be mapped */

+   if (obj->import_attach)
+   return -EINVAL;
+
/* Check for valid size. */
if (obj_size < vma->vm_end - vma->vm_start)
return -EINVAL;
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 8ad0e02991ca..6190f5018986 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -609,17 +609,8 @@ EXPORT_SYMBOL_GPL(drm_gem_shmem_vm_ops);
   */
  int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct 
vm_area_struct *vma)
  {
-   struct drm_gem_object *obj = >base;
int ret;
  
-	if (obj->import_attach) {

-   /* Drop the reference drm_gem_mmap_obj() acquired.*/
-   drm_gem_object_put(obj);
-   vma->vm_private_data = NULL;
-
-   return dma_buf_mmap(obj->dma_buf, vma, 0);
-   }
-
ret = drm_gem_shmem_get_pages(shmem);
if (ret) {
drm_gem_vm_close(vma);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v8 1/2] drm/gem: Properly annotate WW context on drm_gem_lock_reservations() error

2022-07-05 Thread Christian König via Virtualization

Am 01.07.22 um 11:02 schrieb Dmitry Osipenko:

Use ww_acquire_fini() in the error code paths. Otherwise lockdep
thinks that lock is held when lock's memory is freed after the
drm_gem_lock_reservations() error. The ww_acquire_context needs to be
annotated as "released", which fixes the noisy "WARNING: held lock freed!"
splat of VirtIO-GPU driver with CONFIG_DEBUG_MUTEXES=y and enabled lockdep.

Cc: sta...@vger.kernel.org
Fixes: 7edc3e3b975b5 ("drm: Add helpers for locking an array of BO 
reservations.")
Reviewed-by: Thomas Hellström 
Signed-off-by: Dmitry Osipenko 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/drm_gem.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index eb0c2d041f13..86d670c71286 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1226,7 +1226,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, 
int count,
ret = dma_resv_lock_slow_interruptible(obj->resv,
 acquire_ctx);
if (ret) {
-   ww_acquire_done(acquire_ctx);
+   ww_acquire_fini(acquire_ctx);
return ret;
}
}
@@ -1251,7 +1251,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, 
int count,
goto retry;
}
  
-			ww_acquire_done(acquire_ctx);

+   ww_acquire_fini(acquire_ctx);
return ret;
}
}


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Linaro-mm-sig] Re: [PATCH v6 02/22] drm/gem: Move mapping of imported dma-bufs to drm_gem_mmap_obj()

2022-07-04 Thread Christian König via Virtualization

Am 30.06.22 um 01:06 schrieb Dmitry Osipenko:

On 6/29/22 11:43, Thomas Hellström (Intel) wrote:

On 6/29/22 10:22, Dmitry Osipenko wrote:

On 6/29/22 09:40, Thomas Hellström (Intel) wrote:

On 5/27/22 01:50, Dmitry Osipenko wrote:

Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't
handle imported dma-bufs properly, which results in mapping of
something
else than the imported dma-buf. For example, on NVIDIA Tegra we get a
hard
lockup when userspace writes to the memory mapping of a dma-buf that
was
imported into Tegra's DRM GEM.

To fix this bug, move mapping of imported dma-bufs to
drm_gem_mmap_obj().
Now mmaping of imported dma-bufs works properly for all DRM drivers.

Same comment about Fixes: as in patch 1,

Cc: sta...@vger.kernel.org
Signed-off-by: Dmitry Osipenko 
---
    drivers/gpu/drm/drm_gem.c  | 3 +++
    drivers/gpu/drm/drm_gem_shmem_helper.c | 9 -
    drivers/gpu/drm/tegra/gem.c    | 4 
    3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 86d670c71286..7c0b025508e4 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1038,6 +1038,9 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj,
unsigned long obj_size,
    if (obj_size < vma->vm_end - vma->vm_start)
    return -EINVAL;
    +    if (obj->import_attach)
+    return dma_buf_mmap(obj->dma_buf, vma, 0);

If we start enabling mmaping of imported dma-bufs on a majority of
drivers in this way, how do we ensure that user-space is not blindly
using the object mmap without calling the needed DMA_BUF_IOCTL_SYNC
which is needed before and after cpu access of mmap'ed dma-bufs?

I was under the impression (admittedly without looking) that the few
drivers that actually called into dma_buf_mmap() had some private
user-mode driver code in place that ensured this happened.

Since it's a userspace who does the mapping, then it should be a
responsibility of userspace to do all the necessary syncing.

Sure, but nothing prohibits user-space to ignore the syncing thinking
"It works anyway", testing those drivers where the syncing is a NOP. And
when a driver that finally needs syncing is tested it's too late to fix
all broken user-space.


   I'm not
sure whether anyone in userspace really needs to map imported dma-bufs
in practice. Nevertheless, this use-case is broken and should be fixed
by either allowing to do the mapping or prohibiting it.


Then I'd vote for prohibiting it, at least for now. And for the future
moving forward we could perhaps revisit the dma-buf need for syncing,
requiring those drivers that actually need it to implement emulated
coherent memory which can be done not too inefficiently (vmwgfx being
one example).

Alright, I'll change it to prohibit the mapping. This indeed should be a
better option.


Oh, yes please. But I would expect that some people start screaming.

Over time I've got tons of TTM patches because people illegally tried to 
mmap() imported DMA-bufs in their driver.


Anyway this is probably the right thing to do and we can work on fixing 
the fallout later on.


Regards,
Christian.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 14/22] dma-buf: Introduce new locking convention

2022-05-30 Thread Christian König via Virtualization

Hi Dmitry,

Am 30.05.22 um 15:26 schrieb Dmitry Osipenko:

Hello Christian,

On 5/30/22 09:50, Christian König wrote:

Hi Dmitry,

First of all please separate out this patch from the rest of the series,
since this is a complex separate structural change.

I assume all the patches will go via the DRM tree in the end since the
rest of the DRM patches in this series depend on this dma-buf change.
But I see that separation may ease reviewing of the dma-buf changes, so
let's try it.


That sounds like you are underestimating a bit how much trouble this 
will be.



I have tried this before and failed because catching all the locks in
the right code paths are very tricky. So expect some fallout from this
and make sure the kernel test robot and CI systems are clean.

Sure, I'll fix up all the reported things in the next iteration.

BTW, have you ever posted yours version of the patch? Will be great if
we could compare the changed code paths.


No, I never even finished creating it after realizing how much work it 
would be.



This patch introduces new locking convention for dma-buf users. From now
on all dma-buf importers are responsible for holding dma-buf reservation
lock around operations performed over dma-bufs.

This patch implements the new dma-buf locking convention by:

    1. Making dma-buf API functions to take the reservation lock.

    2. Adding new locked variants of the dma-buf API functions for drivers
   that need to manage imported dma-bufs under the held lock.

Instead of adding new locked variants please mark all variants which
expect to be called without a lock with an _unlocked postfix.

This should make it easier to remove those in a follow up patch set and
then fully move the locking into the importer.

Do we really want to move all the locks to the importers? Seems the
majority of drivers should be happy with the dma-buf helpers handling
the locking for them.


Yes, I clearly think so.




    3. Converting all drivers to the new locking scheme.

I have strong doubts that you got all of them. At least radeon and
nouveau should grab the reservation lock in their ->attach callbacks
somehow.

Radeon and Nouveau use gem_prime_import_sg_table() and they take resv
lock already, seems they should be okay (?)


You are looking at the wrong side. You need to fix the export code path, 
not the import ones.


See for example attach on radeon works like this 
drm_gem_map_attach->drm_gem_pin->radeon_gem_prime_pin->radeon_bo_reserve->ttm_bo_reserve->dma_resv_lock.


Same for nouveau and probably a few other exporters as well. That will 
certainly cause a deadlock if you don't fix it.


I strongly suggest to do this step by step, first attach/detach and then 
the rest.


Regards,
Christian.



I assume all the basics should covered in this v6. At minimum Intel,
Tegra, Panfrost, Lima and Rockchip drivers should be good. If I missed
something, then please let me know and I'll correct it.


Signed-off-by: Dmitry Osipenko 
---
   drivers/dma-buf/dma-buf.c | 270 +++---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   6 +-
   drivers/gpu/drm/drm_client.c  |   4 +-
   drivers/gpu/drm/drm_gem.c |  33 +++
   drivers/gpu/drm/drm_gem_framebuffer_helper.c  |   6 +-
   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  10 +-
   drivers/gpu/drm/qxl/qxl_object.c  |  17 +-
   drivers/gpu/drm/qxl/qxl_prime.c   |   4 +-
   .../common/videobuf2/videobuf2-dma-contig.c   |  11 +-
   .../media/common/videobuf2/videobuf2-dma-sg.c |  11 +-
   .../common/videobuf2/videobuf2-vmalloc.c  |  11 +-
   include/drm/drm_gem.h |   3 +
   include/linux/dma-buf.h   |  14 +-
   13 files changed, 241 insertions(+), 159 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 32f55640890c..64a9909ccfa2 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -552,7 +552,6 @@ struct dma_buf *dma_buf_export(const struct
dma_buf_export_info *exp_info)
   file->f_mode |= FMODE_LSEEK;
   dmabuf->file = file;
   -    mutex_init(>lock);

Please make removing dmabuf->lock a separate change.

Alright



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 14/22] dma-buf: Introduce new locking convention

2022-05-30 Thread Christian König via Virtualization

Hi Dmitry,

First of all please separate out this patch from the rest of the series, 
since this is a complex separate structural change.


Am 27.05.22 um 01:50 schrieb Dmitry Osipenko:

All dma-bufs have dma-reservation lock that allows drivers to perform
exclusive operations over shared dma-bufs. Today's dma-buf API has
incomplete locking specification, which creates dead lock situation
for dma-buf importers and exporters that don't coordinate theirs locks.


Well please drop that sentence. The locking specifications are actually 
very well defined, it's just that some drivers are a bit broken 
regarding them.


What you do here is rather moving all the non-dynamic drivers over to 
the dynamic locking specification (which is really nice to have).


I have tried this before and failed because catching all the locks in 
the right code paths are very tricky. So expect some fallout from this 
and make sure the kernel test robot and CI systems are clean.



This patch introduces new locking convention for dma-buf users. From now
on all dma-buf importers are responsible for holding dma-buf reservation
lock around operations performed over dma-bufs.

This patch implements the new dma-buf locking convention by:

   1. Making dma-buf API functions to take the reservation lock.

   2. Adding new locked variants of the dma-buf API functions for drivers
  that need to manage imported dma-bufs under the held lock.


Instead of adding new locked variants please mark all variants which 
expect to be called without a lock with an _unlocked postfix.


This should make it easier to remove those in a follow up patch set and 
then fully move the locking into the importer.



   3. Converting all drivers to the new locking scheme.


I have strong doubts that you got all of them. At least radeon and 
nouveau should grab the reservation lock in their ->attach callbacks 
somehow.




Signed-off-by: Dmitry Osipenko 
---
  drivers/dma-buf/dma-buf.c | 270 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   6 +-
  drivers/gpu/drm/drm_client.c  |   4 +-
  drivers/gpu/drm/drm_gem.c |  33 +++
  drivers/gpu/drm/drm_gem_framebuffer_helper.c  |   6 +-
  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  10 +-
  drivers/gpu/drm/qxl/qxl_object.c  |  17 +-
  drivers/gpu/drm/qxl/qxl_prime.c   |   4 +-
  .../common/videobuf2/videobuf2-dma-contig.c   |  11 +-
  .../media/common/videobuf2/videobuf2-dma-sg.c |  11 +-
  .../common/videobuf2/videobuf2-vmalloc.c  |  11 +-
  include/drm/drm_gem.h |   3 +
  include/linux/dma-buf.h   |  14 +-
  13 files changed, 241 insertions(+), 159 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 32f55640890c..64a9909ccfa2 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -552,7 +552,6 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
file->f_mode |= FMODE_LSEEK;
dmabuf->file = file;
  
-	mutex_init(>lock);


Please make removing dmabuf->lock a separate change.

Regards,
Christian.


INIT_LIST_HEAD(>attachments);
  
  	mutex_lock(_list.lock);

@@ -737,14 +736,14 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct 
device *dev,
attach->importer_ops = importer_ops;
attach->importer_priv = importer_priv;  3. Converting all drivers to 
the new locking scheme.

Signed-off-by: Dmitry Osipenko 
---
  drivers/dma-buf/dma-buf.c | 270 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   6 +-
  drivers/gpu/drm/drm_client.c  |   4 +-

  
+	dma_resv_lock(dmabuf->resv, NULL);

+
if (dmabuf->ops->attach) {
ret = dmabuf->ops->attach(dmabuf, attach);
if (ret)
goto err_attach;
}
-   dma_resv_lock(dmabuf->resv, NULL);
list_add(>node, >attachments);
-   dma_resv_unlock(dmabuf->resv);
  
  	/* When either the importer or the exporter can't handle dynamic

 * mappings we cache the mapping here to avoid issues with the
@@ -755,7 +754,6 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct 
device *dev,
struct sg_table *sgt;
  
  		if (dma_buf_is_dynamic(attach->dmabuf)) {

-   dma_resv_lock(attach->dmabuf->resv, NULL);
ret = dmabuf->ops->pin(attach);
if (ret)
goto err_unlock;
@@ -768,15 +766,16 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct 
device *dev,
ret = PTR_ERR(sgt);
goto err_unpin;
}
-   if (dma_buf_is_dynamic(attach->dmabuf))
-   dma_resv_unlock(attach->dmabuf->resv);
attach->sgt = sgt;
attach->dir = DMA_BIDIRECTIONAL;
}
  
+	

Re: [PATCH v2] drm/qxl: fix qxl can't use in arm64

2022-03-25 Thread Christian König via Virtualization

Am 24.03.22 um 11:49 schrieb Cong Liu:

qxl use ioremap to map ram_header and rom, in the arm64 implementation,
the device is mapped as DEVICE_nGnRE, it can not support unaligned
access. and qxl is a virtual device, it can be treated more like RAM
than actual MMIO registers. use ioremap_wc() replace it.

Signed-off-by: Cong Liu 


Looks sane to me, but I'm really not involved enough to fully judge.

Acked-by: Christian König 


---
  drivers/gpu/drm/qxl/qxl_kms.c | 4 ++--
  drivers/gpu/drm/qxl/qxl_ttm.c | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index 4dc5ad13f12c..a054e4a00fe8 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -165,7 +165,7 @@ int qxl_device_init(struct qxl_device *qdev,
 (int)qdev->surfaceram_size / 1024,
 (sb == 4) ? "64bit" : "32bit");
  
-	qdev->rom = ioremap(qdev->rom_base, qdev->rom_size);

+   qdev->rom = ioremap_wc(qdev->rom_base, qdev->rom_size);
if (!qdev->rom) {
pr_err("Unable to ioremap ROM\n");
r = -ENOMEM;
@@ -183,7 +183,7 @@ int qxl_device_init(struct qxl_device *qdev,
goto rom_unmap;
}
  
-	qdev->ram_header = ioremap(qdev->vram_base +

+   qdev->ram_header = ioremap_wc(qdev->vram_base +
   qdev->rom->ram_header_offset,
   sizeof(*qdev->ram_header));
if (!qdev->ram_header) {
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index b2e33d5ba5d0..95df5750f47f 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -82,13 +82,13 @@ int qxl_ttm_io_mem_reserve(struct ttm_device *bdev,
case TTM_PL_VRAM:
mem->bus.is_iomem = true;
mem->bus.offset = (mem->start << PAGE_SHIFT) + qdev->vram_base;
-   mem->bus.caching = ttm_cached;
+   mem->bus.caching = ttm_write_combined;
break;
case TTM_PL_PRIV:
mem->bus.is_iomem = true;
mem->bus.offset = (mem->start << PAGE_SHIFT) +
qdev->surfaceram_base;
-   mem->bus.caching = ttm_cached;
+   mem->bus.caching = ttm_write_combined;
break;
default:
return -EINVAL;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: 回复: Re: 回复: Re: 回复: Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64

2022-03-24 Thread Christian König via Virtualization

Hi Cong,

when I understand Robin correctly all mapping (host, guest, kernel, 
userspace etc..) must have the same caching attributes unless you use 
the S2FWB feature introduced with Armv8.4.


If you don't follow those rules you usually run into coherency issues or 
even worse system hangs. So you not only need to adjust the kernel 
mappings, but also the function for userspace mappings to follow those 
rules.


Additional to that I think I read Robin correctly that mapping the 
resource write combined should be sufficient to solve your problem. So I 
suggest to use that instead.


On the other hand if you manage to fix this completely for arm64 by 
updating all the places to use cached mappings I'm fine with it as well. 
It's then up to Dave to decide if that's something we want because QXL 
is intentionally emulating a hardware device as far as I know.


Regards,
Christian.

Am 24.03.22 um 08:04 schrieb liuco...@kylinos.cn:


Hi Christian,


Can you help explain more detail under what circumstances userspace 
mapping


will be problematic, you mean cached mappings also need adjustment in

function ttm_prot_from_caching?


Regards,

Cong.




*主 题:*Re: 回复: Re: 回复: Re: [PATCH v1 1/2] drm/qxl: replace ioremap 
by ioremap_cache on arm64

*日 期:*2022-03-23 18:17
*发件人:*Christian König
*收件人:*liucong2@kylinos.cnairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-de...@lists.freedesktop.org 



Hi Cong,

   yes I've seen that, but that is still not sufficient.

   You need to update the check in ttm_module.c as well or otherwise   
 your userspace mapping might not work correctly either.


   Regards,
   Christian.

Am 23.03.22 um 11:00 schrieb liuco...@kylinos.cn:


Hi Christian,


another commit fix the case in ttm. I send two patches at the       
 same time, but seems I miss


 '--cover-letter' when run format-patch or some other bad       
 operation.





Regards,

Cong.




*主 题:*Re: 回复: Re: [PATCH v1 1/2] drm/qxl: replace           
 ioremap by ioremap_cache on arm64

*日 期:*2022-03-23 17:34
*发件人:*Christian König
*收件人:*liucong2@kylinos.cnairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-de...@lists.freedesktop.org 



Hi Cong,

                well than Dave must decide what to do here.

                When QXL emulates a device it should also not use     
         memory accesses    which won't work on a physical device.


                BTW: Your patch is really buggy, it misses the cases 
in              ttm_module.c


                Regards,
                Christian.

Am 23.03.22 um 09:48 schrieb liuco...@kylinos.cn:


Hi Christian,


according to 'Arm Architecture Reference Manual                 
 Armv8,for  Armv8-A


architecture profile' pdf E2.6.5


E2.6.5 Generation of Alignment faults by load/store                 
 multiple  accesses to


 Device memory


When                  FEAT_LSMAOC is  implemented and the value of 
the        applicable nTLSMD


field is 0, any                  memory  access by an AArch32 Load 
Multiple or            Store


Multiple                  instruction to        an address that the 
stage 1  translation


assigns as                  Device-nGRE,  Device-nGnRE, or 
Device-nGnRnE      generates


an Alignment                  fault.


so it seems not just some ARM boards doesn't allow                 
 unaligned    access to MMIO


space, all pci memory mapped as Device-nGnRE in arm64  cannot       
 support


unaligned access. and qxl is a device simulated by                 
 qemu, it        should be able to access


to MMIO space in a more flexible way(PROT_NORMAL)                 
 than the real        physical


graphics card.






Cong.





*主 题:*Re: [PATCH v1                    1/2]  drm/qxl: replace 
ioremap by      ioremap_cache on arm64

*日 期:*2022-03-23  15:15
*发件人:*Christian  König
*收件人:*CongLiuairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-de...@lists.freedesktop.org 




Am 22.03.22 um 10:34 schrieb Cong Liu:
                        > qxl use ioremap to map ram_header and rom, 
 in the arm64        implementation,
                        > the device is mapped as DEVICE_nGnRE, it 
 can not support        unaligned

                        > access.

                        Well that some ARM boards doesn't allow 
 unaligned access to MMIO        space

                        is a well known bug of those ARM boards.

                        So as far as I know this is a hardware bug 
you  are trying to        workaround
                        here and I'm not 100% sure that this is 
 correct.


                        Christian.

                        >
                        >    6.620515] pc : 

Re: 回复: Re: 回复: Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64

2022-03-23 Thread Christian König via Virtualization

Hi Cong,

yes I've seen that, but that is still not sufficient.

You need to update the check in ttm_module.c as well or otherwise your 
userspace mapping might not work correctly either.


Regards,
Christian.

Am 23.03.22 um 11:00 schrieb liuco...@kylinos.cn:


Hi Christian,


another commit fix the case in ttm. I send two patches at the same 
time, but seems I miss


 '--cover-letter' when run format-patch or some other bad operation.




Regards,

Cong.




*主 题:*Re: 回复: Re: [PATCH v1 1/2] drm/qxl: replace ioremap by 
ioremap_cache on arm64

*日 期:*2022-03-23 17:34
*发件人:*Christian König
*收件人:*liucong2@kylinos.cnairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-de...@lists.freedesktop.org 



Hi Cong,

   well than Dave must decide what to do here.

   When QXL emulates a device it should also not use memory accesses   
 which won't work on a physical device.


   BTW: Your patch is really buggy, it misses the cases in ttm_module.c

   Regards,
   Christian.

Am 23.03.22 um 09:48 schrieb liuco...@kylinos.cn:


Hi Christian,


according to 'Arm Architecture Reference Manual Armv8,for        Armv8-A

architecture profile' pdf E2.6.5


E2.6.5 Generation of Alignment faults by load/store multiple       
 accesses to


 Device memory


When FEAT_LSMAOC is        implemented and the value of the 
applicable nTLSMD


field is 0, any memory        access by an AArch32 Load Multiple or 
Store


Multiple instruction to        an address that the stage 1 translation

assigns as Device-nGRE,        Device-nGnRE, or Device-nGnRnE generates

an Alignment fault.


so it seems not just some ARM boards doesn't allow unaligned       
 access to MMIO


space, all pci memory mapped as Device-nGnRE in arm64 cannot       
 support


unaligned access. and qxl is a device simulated by qemu, it       
 should be able to access


to MMIO space in a more flexible way(PROT_NORMAL) than the real       
 physical


graphics card.






Cong.





*主 题:*Re: [PATCH v1 1/2]          drm/qxl: replace ioremap by 
ioremap_cache on arm64

*日 期:*2022-03-23 15:15
*发件人:*Christian König
*收件人:*CongLiuairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-de...@lists.freedesktop.org 




Am 22.03.22 um 10:34 schrieb Cong Liu:
       > qxl use ioremap to map ram_header and rom, in the arm64     
   implementation,
       > the device is mapped as DEVICE_nGnRE, it can not support     
   unaligned

       > access.

       Well that some ARM boards doesn't allow unaligned access to 
MMIO        space

       is a well known bug of those ARM boards.

       So as far as I know this is a hardware bug you are trying to   
     workaround

       here and I'm not 100% sure that this is correct.

       Christian.

       >
       >    6.620515] pc : setup_hw_slot+0x24/0x60 [qxl]
       > [    6.620961] lr : setup_slot+0x34/0xf0 [qxl]
       > [    6.621376] sp : 800012b73760
       > [    6.621701] x29: 800012b73760 x28: 0001   
     x27: 1000
       > [    6.622400] x26: 0001 x25: 0400   
     x24: cf376848c000
       > [    6.623099] x23: c4087400 x22: cf3718e17140   
     x21: 
       > [    6.623823] x20: c4086000 x19: c40870b0   
     x18: 0014
       > [    6.624519] x17: 4d3605ab x16: bb3b6129   
     x15: 6e771809
       > [    6.625214] x14: 0001 x13: 007473696c5f7974   
     x12: 696e69615f65
       > [    6.625909] x11: d543656a x10:    
     x9 : cf3718e085a4
       > [    6.626616] x8 : 006c7871 x7 : 000a   
     x6 : 0017
       > [    6.627343] x5 : 1400 x4 : 800011f63400   
     x3 : 1400
       > [    6.628047] x2 :  x1 : c40870b0   
     x0 : c4086000

       > [    6.628751] Call trace:
       > [    6.628994]  setup_hw_slot+0x24/0x60 [qxl]
       > [    6.629404]  setup_slot+0x34/0xf0 [qxl]
       > [    6.629790]  qxl_device_init+0x6f0/0x7f0 [qxl]
       > [    6.630235]  qxl_pci_probe+0xdc/0x1d0 [qxl]
       > [    6.630654]  local_pci_probe+0x48/0xb8
       > [    6.631027]  pci_device_probe+0x194/0x208
       > [    6.631464]  really_probe+0xd0/0x458
       > [    6.631818]  __driver_probe_device+0x124/0x1c0
       > [    6.632256]  driver_probe_device+0x48/0x130
       > [    6.632669]  __driver_attach+0xc4/0x1a8
       > [    6.633049]  bus_for_each_dev+0x78/0xd0
       > [    6.633437]  driver_attach+0x2c/0x38
       > [    6.633789]  bus_add_driver+0x154/0x248
       > [    6.634168]  driver_register+0x6c/0x128
       > [    6.635205]  __pci_register_driver+0x4c/0x58
       > [    6.635628]  qxl_init+0x48/0x1000 [qxl]
       > [    

Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64

2022-03-23 Thread Christian König via Virtualization

Am 23.03.22 um 10:45 schrieb Robin Murphy:

On 2022-03-23 07:15, Christian König wrote:

Am 22.03.22 um 10:34 schrieb Cong Liu:

qxl use ioremap to map ram_header and rom, in the arm64 implementation,
the device is mapped as DEVICE_nGnRE, it can not support unaligned
access.


Well that some ARM boards doesn't allow unaligned access to MMIO 
space is a well known bug of those ARM boards.


So as far as I know this is a hardware bug you are trying to 
workaround here and I'm not 100% sure that this is correct.


No, this one's not a bug. The Device memory type used for iomem 
mappings is *architecturally* defined to enforce properties like 
aligned accesses, no speculation, no reordering, etc. If something 
wants to be treated more like RAM than actual MMIO registers, then 
ioremap_wc() or ioremap_cache() is the appropriate thing to do in 
general (with the former being a bit more portable according to 
Documentation/driver-api/device-io.rst).


Of course *then* you might find that on some systems the 
interconnect/PCIe implementation/endpoint doesn't actually like 
unaligned accesses once the CPU MMU starts allowing them to be sent 
out, but hey, one step at a time ;)


Ah, good point! I was already wondering why that sometimes work and 
sometimes doesn't.


Thanks,
Christian.



Robin.



Christian.



   6.620515] pc : setup_hw_slot+0x24/0x60 [qxl]
[    6.620961] lr : setup_slot+0x34/0xf0 [qxl]
[    6.621376] sp : 800012b73760
[    6.621701] x29: 800012b73760 x28: 0001 x27: 
1000
[    6.622400] x26: 0001 x25: 0400 x24: 
cf376848c000
[    6.623099] x23: c4087400 x22: cf3718e17140 x21: 

[    6.623823] x20: c4086000 x19: c40870b0 x18: 
0014
[    6.624519] x17: 4d3605ab x16: bb3b6129 x15: 
6e771809
[    6.625214] x14: 0001 x13: 007473696c5f7974 x12: 
696e69615f65
[    6.625909] x11: d543656a x10:  x9 : 
cf3718e085a4
[    6.626616] x8 : 006c7871 x7 : 000a x6 : 
0017
[    6.627343] x5 : 1400 x4 : 800011f63400 x3 : 
1400
[    6.628047] x2 :  x1 : c40870b0 x0 : 
c4086000

[    6.628751] Call trace:
[    6.628994]  setup_hw_slot+0x24/0x60 [qxl]
[    6.629404]  setup_slot+0x34/0xf0 [qxl]
[    6.629790]  qxl_device_init+0x6f0/0x7f0 [qxl]
[    6.630235]  qxl_pci_probe+0xdc/0x1d0 [qxl]
[    6.630654]  local_pci_probe+0x48/0xb8
[    6.631027]  pci_device_probe+0x194/0x208
[    6.631464]  really_probe+0xd0/0x458
[    6.631818]  __driver_probe_device+0x124/0x1c0
[    6.632256]  driver_probe_device+0x48/0x130
[    6.632669]  __driver_attach+0xc4/0x1a8
[    6.633049]  bus_for_each_dev+0x78/0xd0
[    6.633437]  driver_attach+0x2c/0x38
[    6.633789]  bus_add_driver+0x154/0x248
[    6.634168]  driver_register+0x6c/0x128
[    6.635205]  __pci_register_driver+0x4c/0x58
[    6.635628]  qxl_init+0x48/0x1000 [qxl]
[    6.636013]  do_one_initcall+0x50/0x240
[    6.636390]  do_init_module+0x60/0x238
[    6.636768]  load_module+0x2458/0x2900
[    6.637136]  __do_sys_finit_module+0xbc/0x128
[    6.637561]  __arm64_sys_finit_module+0x28/0x38
[    6.638004]  invoke_syscall+0x74/0xf0
[    6.638366]  el0_svc_common.constprop.0+0x58/0x1a8
[    6.638836]  do_el0_svc+0x2c/0x90
[    6.639216]  el0_svc+0x40/0x190
[    6.639526]  el0t_64_sync_handler+0xb0/0xb8
[    6.639934]  el0t_64_sync+0x1a4/0x1a8
[    6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083)
[    6.640889] ---[ end trace 95615d89b7c87f95 ]---

Signed-off-by: Cong Liu 
---
  drivers/gpu/drm/qxl/qxl_kms.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_kms.c 
b/drivers/gpu/drm/qxl/qxl_kms.c

index 4dc5ad13f12c..0e61ac04d8ad 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev,
   (int)qdev->surfaceram_size / 1024,
   (sb == 4) ? "64bit" : "32bit");
+#ifdef CONFIG_ARM64
+    qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size);
+#else
  qdev->rom = ioremap(qdev->rom_base, qdev->rom_size);
+#endif
  if (!qdev->rom) {
  pr_err("Unable to ioremap ROM\n");
  r = -ENOMEM;
@@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev,
  goto rom_unmap;
  }
+#ifdef CONFIG_ARM64
+    qdev->ram_header = ioremap_cache(qdev->vram_base +
+   qdev->rom->ram_header_offset,
+   sizeof(*qdev->ram_header));
+#else
  qdev->ram_header = ioremap(qdev->vram_base +
 qdev->rom->ram_header_offset,
 sizeof(*qdev->ram_header));
+#endif
  if (!qdev->ram_header) {
  DRM_ERROR("Unable to ioremap RAM header\n");
  r = -ENOMEM;




___
Virtualization mailing list

Re: 回复: Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64

2022-03-23 Thread Christian König via Virtualization

Hi Cong,

well than Dave must decide what to do here.

When QXL emulates a device it should also not use memory accesses which 
won't work on a physical device.


BTW: Your patch is really buggy, it misses the cases in ttm_module.c

Regards,
Christian.

Am 23.03.22 um 09:48 schrieb liuco...@kylinos.cn:


Hi Christian,


according to 'Arm Architecture Reference Manual Armv8,for Armv8-A

architecture profile' pdf E2.6.5


E2.6.5 Generation of Alignment faults by load/store multiple accesses to

 Device memory


When FEAT_LSMAOC is implemented and the value of the applicable nTLSMD

field is 0, any memory access by an AArch32 Load Multiple or Store

Multiple instruction to an address that the stage 1 translation

assigns as Device-nGRE, Device-nGnRE, or Device-nGnRnE generates

an Alignment fault.


so it seems not just some ARM boards doesn't allow unaligned access to 
MMIO


space, all pci memory mapped as Device-nGnRE in arm64 cannot support

unaligned access. and qxl is a device simulated by qemu, it should be 
able to access


to MMIO space in a more flexible way(PROT_NORMAL) than the real physical

graphics card.






Cong.





*主 题:*Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on 
arm64

*日 期:*2022-03-23 15:15
*发件人:*Christian König
*收件人:*Cong 
Liuairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-de...@lists.freedesktop.org 




Am 22.03.22 um 10:34 schrieb Cong Liu:
> qxl use ioremap to map ram_header and rom, in the arm64 implementation,
> the device is mapped as DEVICE_nGnRE, it can not support unaligned
> access.

Well that some ARM boards doesn't allow unaligned access to MMIO space
is a well known bug of those ARM boards.

So as far as I know this is a hardware bug you are trying to workaround
here and I'm not 100% sure that this is correct.

Christian.

>
>    6.620515] pc : setup_hw_slot+0x24/0x60 [qxl]
> [    6.620961] lr : setup_slot+0x34/0xf0 [qxl]
> [    6.621376] sp : 800012b73760
> [    6.621701] x29: 800012b73760 x28: 0001 x27: 
1000
> [    6.622400] x26: 0001 x25: 0400 x24: 
cf376848c000
> [    6.623099] x23: c4087400 x22: cf3718e17140 x21: 

> [    6.623823] x20: c4086000 x19: c40870b0 x18: 
0014
> [    6.624519] x17: 4d3605ab x16: bb3b6129 x15: 
6e771809
> [    6.625214] x14: 0001 x13: 007473696c5f7974 x12: 
696e69615f65
> [    6.625909] x11: d543656a x10:  x9 : 
cf3718e085a4
> [    6.626616] x8 : 006c7871 x7 : 000a x6 : 
0017
> [    6.627343] x5 : 1400 x4 : 800011f63400 x3 : 
1400
> [    6.628047] x2 :  x1 : c40870b0 x0 : 
c4086000

> [    6.628751] Call trace:
> [    6.628994]  setup_hw_slot+0x24/0x60 [qxl]
> [    6.629404]  setup_slot+0x34/0xf0 [qxl]
> [    6.629790]  qxl_device_init+0x6f0/0x7f0 [qxl]
> [    6.630235]  qxl_pci_probe+0xdc/0x1d0 [qxl]
> [    6.630654]  local_pci_probe+0x48/0xb8
> [    6.631027]  pci_device_probe+0x194/0x208
> [    6.631464]  really_probe+0xd0/0x458
> [    6.631818]  __driver_probe_device+0x124/0x1c0
> [    6.632256]  driver_probe_device+0x48/0x130
> [    6.632669]  __driver_attach+0xc4/0x1a8
> [    6.633049]  bus_for_each_dev+0x78/0xd0
> [    6.633437]  driver_attach+0x2c/0x38
> [    6.633789]  bus_add_driver+0x154/0x248
> [    6.634168]  driver_register+0x6c/0x128
> [    6.635205]  __pci_register_driver+0x4c/0x58
> [    6.635628]  qxl_init+0x48/0x1000 [qxl]
> [    6.636013]  do_one_initcall+0x50/0x240
> [    6.636390]  do_init_module+0x60/0x238
> [    6.636768]  load_module+0x2458/0x2900
> [    6.637136]  __do_sys_finit_module+0xbc/0x128
> [    6.637561]  __arm64_sys_finit_module+0x28/0x38
> [    6.638004]  invoke_syscall+0x74/0xf0
> [    6.638366]  el0_svc_common.constprop.0+0x58/0x1a8
> [    6.638836]  do_el0_svc+0x2c/0x90
> [    6.639216]  el0_svc+0x40/0x190
> [    6.639526]  el0t_64_sync_handler+0xb0/0xb8
> [    6.639934]  el0t_64_sync+0x1a4/0x1a8
> [    6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083)
> [    6.640889] ---[ end trace 95615d89b7c87f95 ]---
>
> Signed-off-by: Cong Liu
> ---
>   drivers/gpu/drm/qxl/qxl_kms.c | 10 ++
>   1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/qxl/qxl_kms.c 
b/drivers/gpu/drm/qxl/qxl_kms.c

> index 4dc5ad13f12c..0e61ac04d8ad 100644
> --- a/drivers/gpu/drm/qxl/qxl_kms.c
> +++ b/drivers/gpu/drm/qxl/qxl_kms.c
> @@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev,
>   (int)qdev->surfaceram_size / 1024,
>   (sb == 4) ? "64bit" : "32bit");
>
> +#ifdef CONFIG_ARM64
> + qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size);
> +#else
>   qdev->rom = ioremap(qdev->rom_base, qdev->rom_size);
> +#endif
>   if (!qdev->rom) {
>   pr_err("Unable to ioremap ROM\n");

Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64

2022-03-23 Thread Christian König via Virtualization

Am 22.03.22 um 10:34 schrieb Cong Liu:

qxl use ioremap to map ram_header and rom, in the arm64 implementation,
the device is mapped as DEVICE_nGnRE, it can not support unaligned
access.


Well that some ARM boards doesn't allow unaligned access to MMIO space 
is a well known bug of those ARM boards.


So as far as I know this is a hardware bug you are trying to workaround 
here and I'm not 100% sure that this is correct.


Christian.



   6.620515] pc : setup_hw_slot+0x24/0x60 [qxl]
[6.620961] lr : setup_slot+0x34/0xf0 [qxl]
[6.621376] sp : 800012b73760
[6.621701] x29: 800012b73760 x28: 0001 x27: 1000
[6.622400] x26: 0001 x25: 0400 x24: cf376848c000
[6.623099] x23: c4087400 x22: cf3718e17140 x21: 
[6.623823] x20: c4086000 x19: c40870b0 x18: 0014
[6.624519] x17: 4d3605ab x16: bb3b6129 x15: 6e771809
[6.625214] x14: 0001 x13: 007473696c5f7974 x12: 696e69615f65
[6.625909] x11: d543656a x10:  x9 : cf3718e085a4
[6.626616] x8 : 006c7871 x7 : 000a x6 : 0017
[6.627343] x5 : 1400 x4 : 800011f63400 x3 : 1400
[6.628047] x2 :  x1 : c40870b0 x0 : c4086000
[6.628751] Call trace:
[6.628994]  setup_hw_slot+0x24/0x60 [qxl]
[6.629404]  setup_slot+0x34/0xf0 [qxl]
[6.629790]  qxl_device_init+0x6f0/0x7f0 [qxl]
[6.630235]  qxl_pci_probe+0xdc/0x1d0 [qxl]
[6.630654]  local_pci_probe+0x48/0xb8
[6.631027]  pci_device_probe+0x194/0x208
[6.631464]  really_probe+0xd0/0x458
[6.631818]  __driver_probe_device+0x124/0x1c0
[6.632256]  driver_probe_device+0x48/0x130
[6.632669]  __driver_attach+0xc4/0x1a8
[6.633049]  bus_for_each_dev+0x78/0xd0
[6.633437]  driver_attach+0x2c/0x38
[6.633789]  bus_add_driver+0x154/0x248
[6.634168]  driver_register+0x6c/0x128
[6.635205]  __pci_register_driver+0x4c/0x58
[6.635628]  qxl_init+0x48/0x1000 [qxl]
[6.636013]  do_one_initcall+0x50/0x240
[6.636390]  do_init_module+0x60/0x238
[6.636768]  load_module+0x2458/0x2900
[6.637136]  __do_sys_finit_module+0xbc/0x128
[6.637561]  __arm64_sys_finit_module+0x28/0x38
[6.638004]  invoke_syscall+0x74/0xf0
[6.638366]  el0_svc_common.constprop.0+0x58/0x1a8
[6.638836]  do_el0_svc+0x2c/0x90
[6.639216]  el0_svc+0x40/0x190
[6.639526]  el0t_64_sync_handler+0xb0/0xb8
[6.639934]  el0t_64_sync+0x1a4/0x1a8
[6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083)
[6.640889] ---[ end trace 95615d89b7c87f95 ]---

Signed-off-by: Cong Liu 
---
  drivers/gpu/drm/qxl/qxl_kms.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index 4dc5ad13f12c..0e61ac04d8ad 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev,
 (int)qdev->surfaceram_size / 1024,
 (sb == 4) ? "64bit" : "32bit");
  
+#ifdef CONFIG_ARM64

+   qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size);
+#else
qdev->rom = ioremap(qdev->rom_base, qdev->rom_size);
+#endif
if (!qdev->rom) {
pr_err("Unable to ioremap ROM\n");
r = -ENOMEM;
@@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev,
goto rom_unmap;
}
  
+#ifdef CONFIG_ARM64

+   qdev->ram_header = ioremap_cache(qdev->vram_base +
+  qdev->rom->ram_header_offset,
+  sizeof(*qdev->ram_header));
+#else
qdev->ram_header = ioremap(qdev->vram_base +
   qdev->rom->ram_header_offset,
   sizeof(*qdev->ram_header));
+#endif
if (!qdev->ram_header) {
DRM_ERROR("Unable to ioremap RAM header\n");
r = -ENOMEM;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization