Re: [Freedreno] [PATCH] drm/msm: shake fist angrily at dma-mapping

2019-07-31 Thread Stephan Gerhold
On Tue, Jul 30, 2019 at 02:46:28PM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> So, using dma_sync_* for our cache needs works out w/ dma iommu ops, but
> it falls appart with dma direct ops.  The problem is that, depending on
> display generation, we can have either set of dma ops (mdp4 and dpu have
> iommu wired to mdss node, which maps to toplevel drm device, but mdp5
> has iommu wired up to the mdp sub-node within mdss).
> 
> Fixes this splat on mdp5 devices:
> 
>Unable to handle kernel paging request at virtual address 8000
>Mem abort info:
>  ESR = 0x96000144
>  Exception class = DABT (current EL), IL = 32 bits
>  SET = 0, FnV = 0
>  EA = 0, S1PTW = 0
>Data abort info:
>  ISV = 0, ISS = 0x0144
>  CM = 1, WnR = 1
>swapper pgtable: 4k pages, 48-bit VAs, pgdp=810e4000
>[8000] pgd=
>Internal error: Oops: 96000144 [#1] SMP
>Modules linked in: btqcomsmd btqca bluetooth cfg80211 ecdh_generic ecc 
> rfkill libarc4 panel_simple msm wcnss_ctrl qrtr_smd drm_kms_helper venus_enc 
> venus_dec videobuf2_dma_sg videobuf2_memops drm venus_core ipv6 qrtr 
> qcom_wcnss_pil v4l2_mem2mem qcom_sysmon videobuf2_v4l2 qmi_helpers 
> videobuf2_common crct10dif_ce mdt_loader qcom_common videodev qcom_glink_smem 
> remoteproc bmc150_accel_i2c bmc150_magn_i2c bmc150_accel_core bmc150_magn 
> snd_soc_lpass_apq8016 snd_soc_msm8916_analog mms114 mc nf_defrag_ipv6 
> snd_soc_lpass_cpu snd_soc_apq8016_sbc industrialio_triggered_buffer kfifo_buf 
> snd_soc_lpass_platform snd_soc_msm8916_digital drm_panel_orientation_quirks
>CPU: 2 PID: 33 Comm: kworker/2:1 Not tainted 5.3.0-rc2 #1
>Hardware name: Samsung Galaxy A5U (EUR) (DT)
>Workqueue: events deferred_probe_work_func
>pstate: 8005 (Nzcv daif -PAN -UAO)
>pc : __clean_dcache_area_poc+0x20/0x38
>lr : arch_sync_dma_for_device+0x28/0x30
>sp : 115736a0
>x29: 115736a0 x28: 0001
>x27: 800074830800 x26: 11478000
>x25:  x24: 0001
>x23: 11478a98 x22: 89fd1c10
>x21: 0001 x20: 800075ad0a00
>x19:  x18: 112b2000
>x17:  x16: 
>x15: fff0 x14: 11455d70
>x13:  x12: 0028
>x11: 0001 x10: 1106c000
>x9 : 7e0001d6b380 x8 : 1000
>x7 : 7e0001d6b380 x6 : 7e0001d6b382
>x5 :  x4 : 1000
>x3 : 003f x2 : 0040
>x1 : 80001000 x0 : 8000
>Call trace:
> __clean_dcache_area_poc+0x20/0x38
> dma_direct_sync_sg_for_device+0xb8/0xe8
> get_pages+0x22c/0x250 [msm]
> msm_gem_get_and_pin_iova+0xdc/0x168 [msm]
> ...
> 
> Fixes the combination of two patches:
> 
> Fixes: 0036bc73ccbe ("drm/msm: stop abusing dma_map/unmap for cache")
> Fixes: 449fa54d6815 ("dma-direct: correct the physical addr in 
> dma_direct_sync_sg_for_cpu/device")

Thanks for the patch! It fixes the issue on MSM8916/A5U:

Tested-by: Stephan Gerhold 
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

[Freedreno] [PATCH] drm/msm: shake fist angrily at dma-mapping

2019-07-30 Thread Rob Clark
From: Rob Clark 

So, using dma_sync_* for our cache needs works out w/ dma iommu ops, but
it falls appart with dma direct ops.  The problem is that, depending on
display generation, we can have either set of dma ops (mdp4 and dpu have
iommu wired to mdss node, which maps to toplevel drm device, but mdp5
has iommu wired up to the mdp sub-node within mdss).

Fixes this splat on mdp5 devices:

   Unable to handle kernel paging request at virtual address 8000
   Mem abort info:
 ESR = 0x96000144
 Exception class = DABT (current EL), IL = 32 bits
 SET = 0, FnV = 0
 EA = 0, S1PTW = 0
   Data abort info:
 ISV = 0, ISS = 0x0144
 CM = 1, WnR = 1
   swapper pgtable: 4k pages, 48-bit VAs, pgdp=810e4000
   [8000] pgd=
   Internal error: Oops: 96000144 [#1] SMP
   Modules linked in: btqcomsmd btqca bluetooth cfg80211 ecdh_generic ecc 
rfkill libarc4 panel_simple msm wcnss_ctrl qrtr_smd drm_kms_helper venus_enc 
venus_dec videobuf2_dma_sg videobuf2_memops drm venus_core ipv6 qrtr 
qcom_wcnss_pil v4l2_mem2mem qcom_sysmon videobuf2_v4l2 qmi_helpers 
videobuf2_common crct10dif_ce mdt_loader qcom_common videodev qcom_glink_smem 
remoteproc bmc150_accel_i2c bmc150_magn_i2c bmc150_accel_core bmc150_magn 
snd_soc_lpass_apq8016 snd_soc_msm8916_analog mms114 mc nf_defrag_ipv6 
snd_soc_lpass_cpu snd_soc_apq8016_sbc industrialio_triggered_buffer kfifo_buf 
snd_soc_lpass_platform snd_soc_msm8916_digital drm_panel_orientation_quirks
   CPU: 2 PID: 33 Comm: kworker/2:1 Not tainted 5.3.0-rc2 #1
   Hardware name: Samsung Galaxy A5U (EUR) (DT)
   Workqueue: events deferred_probe_work_func
   pstate: 8005 (Nzcv daif -PAN -UAO)
   pc : __clean_dcache_area_poc+0x20/0x38
   lr : arch_sync_dma_for_device+0x28/0x30
   sp : 115736a0
   x29: 115736a0 x28: 0001
   x27: 800074830800 x26: 11478000
   x25:  x24: 0001
   x23: 11478a98 x22: 89fd1c10
   x21: 0001 x20: 800075ad0a00
   x19:  x18: 112b2000
   x17:  x16: 
   x15: fff0 x14: 11455d70
   x13:  x12: 0028
   x11: 0001 x10: 1106c000
   x9 : 7e0001d6b380 x8 : 1000
   x7 : 7e0001d6b380 x6 : 7e0001d6b382
   x5 :  x4 : 1000
   x3 : 003f x2 : 0040
   x1 : 80001000 x0 : 8000
   Call trace:
__clean_dcache_area_poc+0x20/0x38
dma_direct_sync_sg_for_device+0xb8/0xe8
get_pages+0x22c/0x250 [msm]
msm_gem_get_and_pin_iova+0xdc/0x168 [msm]
...

Fixes the combination of two patches:

Fixes: 0036bc73ccbe ("drm/msm: stop abusing dma_map/unmap for cache")
Fixes: 449fa54d6815 ("dma-direct: correct the physical addr in 
dma_direct_sync_sg_for_cpu/device")
Signed-off-by: Rob Clark 
---
I think it is time to revisit drm_cache support for arm64, but we need
something for v5.3 to keep things working.

armv7 support might be harder, but hopefully no one is changing dma-
mapping implementation much on armv7 and we can just limp along with
the current hack on 32b arm.

 drivers/gpu/drm/msm/msm_gem.c | 47 +++
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index c2114c748c2f..8cf6362e64bf 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -32,6 +32,46 @@ static bool use_pages(struct drm_gem_object *obj)
return !msm_obj->vram_node;
 }
 
+/*
+ * Cache sync.. this is a bit over-complicated, to fit dma-mapping
+ * API.  Really GPU cache is out of scope here (handled on cmdstream)
+ * and all we need to do is invalidate newly allocated pages before
+ * mapping to CPU as uncached/writecombine.
+ *
+ * On top of this, we have the added headache, that depending on
+ * display generation, the display's iommu may be wired up to either
+ * the toplevel drm device (mdss), or to the mdp sub-node, meaning
+ * that here we either have dma-direct or iommu ops.
+ *
+ * Let this be a cautionary tail of abstraction gone wrong.
+ */
+
+static void sync_for_device(struct msm_gem_object *msm_obj)
+{
+   struct device *dev = msm_obj->base.dev->dev;
+
+   if (get_dma_ops(dev)) {
+   dma_sync_sg_for_device(dev, msm_obj->sgt->sgl,
+   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   } else {
+   dma_map_sg(dev, msm_obj->sgt->sgl,
+   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   }
+}
+
+static void sync_for_cpu(struct msm_gem_object *msm_obj)
+{
+   struct device *dev = msm_obj->base.dev->dev;
+
+   if (get_dma_ops(dev)) {
+   dma_sync_sg_for_cpu(dev, msm_obj->sgt->sgl,
+   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   } else {
+   dma_unmap_sg(dev,