subject:"\[Intel\-gfx\] \[PATCH v2 08\/15\] drm\/i915\/ttm Add a generic TTM memcpy move for page\-based iomem"

[Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Thomas Hellström

The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional global
TLB flushes of vmap() and consuming vmap space, elimination of a critical
point of failure and with a slight change of semantics we could also push
the memcpy out async for testing and async driver develpment purposes.
Pushing out async can be done since there is no memory allocation going on
that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but relies on
memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators for their
particular environment.

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
v2:
- Move new TTM exports to a separate commit. (Reported by Christian König)
- Avoid having the iterator init functions inline. (Reported by Jani Nikula)
- Remove a stray comment.
---
 drivers/gpu/drm/i915/Makefile |   1 +
 .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.c   | 194 ++
 .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.h   | 107 ++
 3 files changed, 302 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index cb8823570996..958ccc1edfed 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -155,6 +155,7 @@ gem-y += \
gem/i915_gem_stolen.o \
gem/i915_gem_throttle.o \
gem/i915_gem_tiling.o \
+   gem/i915_gem_ttm_bo_util.o \
gem/i915_gem_userptr.o \
gem/i915_gem_wait.o \
gem/i915_gemfs.o
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
new file mode 100644
index ..5f347a85bf44
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+/**
+ * DOC: Usage and intentions.
+ *
+ * This file contains functionality that we might want to move into
+ * ttm_bo_util.c if there is a common interest.
+ * Currently a kmap_local only memcpy with support for page-based iomem 
regions,
+ * and fast memcpy from write-combined memory.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "i915_memcpy.h"
+
+#include "gem/i915_gem_ttm_bo_util.h"
+
+static void i915_ttm_kmap_iter_tt_kmap_local(struct i915_ttm_kmap_iter *iter,
+struct dma_buf_map *dmap,
+pgoff_t i)
+{
+   struct i915_ttm_kmap_iter_tt *iter_tt =
+   container_of(iter, typeof(*iter_tt), base);
+
+   dma_buf_map_set_vaddr(dmap, kmap_local_page(iter_tt->tt->pages[i]));
+}
+
+static void i915_ttm_kmap_iter_iomap_kmap_local(struct i915_ttm_kmap_iter 
*iter,
+   struct dma_buf_map *dmap,
+   pgoff_t i)
+{
+   struct i915_ttm_kmap_iter_iomap *iter_io =
+   container_of(iter, typeof(*iter_io), base);
+   void __iomem *addr;
+
+retry:
+   while (i >= iter_io->cache.end) {
+   iter_io->cache.sg = iter_io->cache.sg ?
+   sg_next(iter_io->cache.sg) : iter_io->st->sgl;
+   iter_io->cache.i = iter_io->cache.end;
+   iter_io->cache.end += sg_dma_len(iter_io->cache.sg) >>
+   PAGE_SHIFT;
+   iter_io->cache.offs = sg_dma_address(iter_io->cache.sg) -
+   iter_io->start;
+   }
+
+   if (i < iter_io->cache.i) {
+   iter_io->cache.end = 0;
+   iter_io->cache.sg = NULL;
+   goto retry;
+   }
+
+   addr = io_mapping_map_local_wc(iter_io->iomap, iter_io->cache.offs +
+  (((resource_size_t)i - iter_io->cache.i)
+   << PAGE_SHIFT));
+   dma_buf_map_set_vaddr_iomem(dmap, addr);
+}
+
+static const struct i915_ttm_kmap_iter_ops i915_ttm_kmap_iter_tt_ops = {
+   .kmap_local = i915_ttm_kmap_iter_tt_kmap_local
+};
+
+static const struct i915_ttm_kmap_iter_ops i915_ttm_kmap_iter_io_ops = {
+   .kmap_local =  i915_ttm_kmap_iter_iomap_kmap_local
+};
+
+static void kunmap_local_dma_buf_map(struct dma_buf_map *map)
+{
+   if (map->is_iomem)
+   io_mapping_unmap_local(map->vaddr_iomem);
+   else
+   kunmap_local(map->vaddr);
+}
+
+/**
+

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Christian König




Am 18.05.21 um 10:26 schrieb Thomas Hellström:

The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional global
TLB flushes of vmap() and consuming vmap space, elimination of a critical
point of failure and with a slight change of semantics we could also push
the memcpy out async for testing and async driver develpment purposes.
Pushing out async can be done since there is no memory allocation going on
that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but relies on
memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators for their
particular environment.


In general yes please since I have that as TODO for TTM for a very long 
time.


But I would prefer to fix the implementation in TTM instead and give it 
proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.



Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
v2:
- Move new TTM exports to a separate commit. (Reported by Christian König)
- Avoid having the iterator init functions inline. (Reported by Jani Nikula)
- Remove a stray comment.
---
  drivers/gpu/drm/i915/Makefile |   1 +
  .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.c   | 194 ++
  .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.h   | 107 ++
  3 files changed, 302 insertions(+)
  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index cb8823570996..958ccc1edfed 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -155,6 +155,7 @@ gem-y += \
gem/i915_gem_stolen.o \
gem/i915_gem_throttle.o \
gem/i915_gem_tiling.o \
+   gem/i915_gem_ttm_bo_util.o \
gem/i915_gem_userptr.o \
gem/i915_gem_wait.o \
gem/i915_gemfs.o
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
new file mode 100644
index ..5f347a85bf44
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+/**
+ * DOC: Usage and intentions.
+ *
+ * This file contains functionality that we might want to move into
+ * ttm_bo_util.c if there is a common interest.
+ * Currently a kmap_local only memcpy with support for page-based iomem 
regions,
+ * and fast memcpy from write-combined memory.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "i915_memcpy.h"
+
+#include "gem/i915_gem_ttm_bo_util.h"
+
+static void i915_ttm_kmap_iter_tt_kmap_local(struct i915_ttm_kmap_iter *iter,
+struct dma_buf_map *dmap,
+pgoff_t i)
+{
+   struct i915_ttm_kmap_iter_tt *iter_tt =
+   container_of(iter, typeof(*iter_tt), base);
+
+   dma_buf_map_set_vaddr(dmap, kmap_local_page(iter_tt->tt->pages[i]));
+}
+
+static void i915_ttm_kmap_iter_iomap_kmap_local(struct i915_ttm_kmap_iter 
*iter,
+   struct dma_buf_map *dmap,
+   pgoff_t i)
+{
+   struct i915_ttm_kmap_iter_iomap *iter_io =
+   container_of(iter, typeof(*iter_io), base);
+   void __iomem *addr;
+
+retry:
+   while (i >= iter_io->cache.end) {
+   iter_io->cache.sg = iter_io->cache.sg ?
+   sg_next(iter_io->cache.sg) : iter_io->st->sgl;
+   iter_io->cache.i = iter_io->cache.end;
+   iter_io->cache.end += sg_dma_len(iter_io->cache.sg) >>
+   PAGE_SHIFT;
+   iter_io->cache.offs = sg_dma_address(iter_io->cache.sg) -
+   iter_io->start;
+   }
+
+   if (i < iter_io->cache.i) {
+   iter_io->cache.end = 0;
+   iter_io->cache.sg = NULL;
+   goto retry;
+   }
+
+   addr = io_mapping_map_local_wc(iter_io->iomap, iter_io->cache.offs +
+  (((resource_size_t)i - iter_io->cache.i)
+   << PAGE_SHIFT));
+   dma_buf_map_set_vaddr_iomem(dmap, addr);
+}
+
+static const struct i915

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Thomas Hellström



On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:

The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could also 
push

the memcpy out async for testing and async driver develpment purposes.
Pushing out async can be done since there is no memory allocation 
going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but relies on
memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators for 
their

particular environment.


In general yes please since I have that as TODO for TTM for a very 
long time.


But I would prefer to fix the implementation in TTM instead and give 
it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and perhaps 
have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all mapping 
in one way or another). That would mean perhaps land this is i915 now 
and sort out the unification once the struct ttm_resource, struct 
ttm_buffer_object separation has landed?


/Thomas


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Christian König


Am 18.05.21 um 14:04 schrieb Thomas Hellström:


On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:

The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could also 
push

the memcpy out async for testing and async driver develpment purposes.
Pushing out async can be done since there is no memory allocation 
going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but 
relies on

memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators for 
their

particular environment.


In general yes please since I have that as TODO for TTM for a very 
long time.


But I would prefer to fix the implementation in TTM instead and give 
it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and perhaps 
have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all 
mapping in one way or another). That would mean perhaps land this is 
i915 now and sort out the unification once the struct ttm_resource, 
struct ttm_buffer_object separation has landed?


That stuff is ready, reviewed and I'm just waiting for some amdgpu 
changes to land in drm-misc-next to push it.


But yes in general an iterator for the resource object sounds like the 
right plan to me as well.


Christian.



/Thomas




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Thomas Hellström



On 5/18/21 2:09 PM, Christian König wrote:

Am 18.05.21 um 14:04 schrieb Thomas Hellström:


On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:

The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional 
global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could 
also push

the memcpy out async for testing and async driver develpment purposes.
Pushing out async can be done since there is no memory allocation 
going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but 
relies on

memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators for 
their

particular environment.


In general yes please since I have that as TODO for TTM for a very 
long time.


But I would prefer to fix the implementation in TTM instead and give 
it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and perhaps 
have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all 
mapping in one way or another). That would mean perhaps land this is 
i915 now and sort out the unification once the struct ttm_resource, 
struct ttm_buffer_object separation has landed?


That stuff is ready, reviewed and I'm just waiting for some amdgpu 
changes to land in drm-misc-next to push it.


But yes in general an iterator for the resource object sounds like the 
right plan to me as well.


Christian.


OK, so then are you OK with landing this in i915 for now? That would 
also ofc mean the export you NAK'd but strictly for this memcpy use 
until we merge it with TTM?


/Thomas





/Thomas





___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Christian König


Am 18.05.21 um 14:52 schrieb Thomas Hellström:


On 5/18/21 2:09 PM, Christian König wrote:

Am 18.05.21 um 14:04 schrieb Thomas Hellström:


On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:

The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional 
global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could 
also push
the memcpy out async for testing and async driver develpment 
purposes.
Pushing out async can be done since there is no memory allocation 
going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but 
relies on

memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators 
for their

particular environment.


In general yes please since I have that as TODO for TTM for a very 
long time.


But I would prefer to fix the implementation in TTM instead and 
give it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and 
perhaps have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all 
mapping in one way or another). That would mean perhaps land this is 
i915 now and sort out the unification once the struct ttm_resource, 
struct ttm_buffer_object separation has landed?


That stuff is ready, reviewed and I'm just waiting for some amdgpu 
changes to land in drm-misc-next to push it.


But yes in general an iterator for the resource object sounds like 
the right plan to me as well.


Christian.


OK, so then are you OK with landing this in i915 for now? That would 
also ofc mean the export you NAK'd but strictly for this memcpy use 
until we merge it with TTM?


Well you can of course prototype that in i915, but I really don't want 
to export the TT functions upstream.


Can we cleanly move that functionality into TTM instead?

Christian.




/Thomas





/Thomas






___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Thomas Hellström



On 5/18/21 3:08 PM, Christian König wrote:

Am 18.05.21 um 14:52 schrieb Thomas Hellström:


On 5/18/21 2:09 PM, Christian König wrote:

Am 18.05.21 um 14:04 schrieb Thomas Hellström:


On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:
The internal ttm_bo_util memcpy uses vmap functionality, and 
while it

probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional 
global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could 
also push
the memcpy out async for testing and async driver develpment 
purposes.
Pushing out async can be done since there is no memory allocation 
going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but 
relies on

memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators 
for their

particular environment.


In general yes please since I have that as TODO for TTM for a very 
long time.


But I would prefer to fix the implementation in TTM instead and 
give it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and 
perhaps have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all 
mapping in one way or another). That would mean perhaps land this 
is i915 now and sort out the unification once the struct 
ttm_resource, struct ttm_buffer_object separation has landed?


That stuff is ready, reviewed and I'm just waiting for some amdgpu 
changes to land in drm-misc-next to push it.


But yes in general an iterator for the resource object sounds like 
the right plan to me as well.


Christian.


OK, so then are you OK with landing this in i915 for now? That would 
also ofc mean the export you NAK'd but strictly for this memcpy use 
until we merge it with TTM?


Well you can of course prototype that in i915, but I really don't want 
to export the TT functions upstream.


I understand, I once had the same thoughts trying to avoid that as far 
as possible, so this function was actually then added to the ttm_bo 
interface, (hence the awkward naming) as a helper for drivers 
implementing move(), essentially a very special case of 
ttm_bo_move_accel_cleanup(), but anyway, see below:




Can we cleanly move that functionality into TTM instead?


I'll take a look at that, but I think we'd initially be having iterators 
mimicing the current move_memcpy() for the

linear iomem !WC cases, hope that's OK.

/Thomas




Christian.




/Thomas





/Thomas







___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Christian König


Am 18.05.21 um 15:24 schrieb Thomas Hellström:


On 5/18/21 3:08 PM, Christian König wrote:

Am 18.05.21 um 14:52 schrieb Thomas Hellström:


On 5/18/21 2:09 PM, Christian König wrote:

Am 18.05.21 um 14:04 schrieb Thomas Hellström:


On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:
The internal ttm_bo_util memcpy uses vmap functionality, and 
while it

probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / 
io_mem_free()

callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional 
global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could 
also push
the memcpy out async for testing and async driver develpment 
purposes.
Pushing out async can be done since there is no memory 
allocation going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but 
relies on

memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators 
for their

particular environment.


In general yes please since I have that as TODO for TTM for a 
very long time.


But I would prefer to fix the implementation in TTM instead and 
give it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the 
old memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and 
perhaps have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all 
mapping in one way or another). That would mean perhaps land this 
is i915 now and sort out the unification once the struct 
ttm_resource, struct ttm_buffer_object separation has landed?


That stuff is ready, reviewed and I'm just waiting for some amdgpu 
changes to land in drm-misc-next to push it.


But yes in general an iterator for the resource object sounds like 
the right plan to me as well.


Christian.


OK, so then are you OK with landing this in i915 for now? That would 
also ofc mean the export you NAK'd but strictly for this memcpy use 
until we merge it with TTM?


Well you can of course prototype that in i915, but I really don't 
want to export the TT functions upstream.


I understand, I once had the same thoughts trying to avoid that as far 
as possible, so this function was actually then added to the ttm_bo 
interface, (hence the awkward naming) as a helper for drivers 
implementing move(), essentially a very special case of 
ttm_bo_move_accel_cleanup(), but anyway, see below:




Can we cleanly move that functionality into TTM instead?


I'll take a look at that, but I think we'd initially be having 
iterators mimicing the current move_memcpy() for the

linear iomem !WC cases, hope that's OK.


Yeah, that's peefectly fine with me. I can tackle cleaning up all 
drivers and move over to the new implementation when that is fully complete.


As I said we already have the same problem in amdgpu and only solved it 
by avoiding memcpy all together.


Christian.



/Thomas




Christian.




/Thomas





/Thomas








___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

Re: [Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

8 matches

Site Navigation

Mail list logo

Footer information