Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-10 Thread Robin Murphy

On 05/09/2019 20:05, Rob Clark wrote:

On Thu, Sep 5, 2019 at 10:03 AM Rob Clark  wrote:


On Wed, Sep 4, 2019 at 11:06 AM Robin Murphy  wrote:


On 04/09/2019 01:12, Rob Clark wrote:

On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam  wrote:


Hi Jonathan,

On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek  wrote:


Hi,

I tried this and it works with patches 4+5 from Rob's series and
changing gpummu to use sg_phys(sg) instead of sg->dma_address
(dma_address isn't set now that dma_map_sg isn't used).


Thanks for testing it. I haven't had a chance to test it yet.

Rob,

I assume your series is targeted to 5.4, correct?


maybe, although Christoph Hellwig didn't seem like a big fan of
exposing cache ops, and would rather add a new allocation API for
uncached pages.. so I'm not entirely sure what the way forward will
be.


TBH, the use of map/unmap looked reasonable in the context of
"start/stop using these pages for stuff which may include DMA", so even
if it was cheekily ignoring sg->dma_address I'm not sure I'd really
consider it "abuse" - in comparison, using sync without a prior map
unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG
will be rendered useless with false positives if this driver is active
while trying to debug something else.

The warning referenced in 0036bc73ccbe represents something being
unmapped which didn't match a corresponding map - from what I can make
of get_pages()/put_pages() it looks like that would need msm_obj->flags
or msm_obj->sgt to change during the lifetime of the object, neither of
which sounds like a thing that should legitimately happen. Are you sure
this isn't all just hiding a subtle bug elsewhere? After all, if what
was being unmapped wasn't right, who says that what's now being synced is?



Correct, msm_obj->flags/sgt should not change.

I reverted the various patches, and went back to the original setup
that used dma_{map,unmap}_sg() to reproduce the original issue that
prompted the change in the first place.  It is a pretty massive flood
of splats, which pretty quickly overflowed the dmesg ring buffer, so I
might be missing some things, but I'll poke around some more.

The one thing I wonder about, what would happen if the buffer is
allocated and dma_map_sg() called before drm/msm attaches it's own
iommu_domains, and then dma_unmap_sg() afterwards.  We aren't actually
ever using the iommu domain that DMA API is creating for the device,
so all the extra iommu_map/unmap (and tlb flush) is at best
unnecessary.  But I'm not sure if it could be having some unintended
side effects that cause this sort of problem.


Right, one of the semi-intentional side-effects of 43c5bf11a610 is that 
iommu-dma no longer interferes with unmanaged domains - it will still go 
and make its own redundant mappings in the unattached default domain, 
but as long as the DMA API usage is fundamentally sound then it 
shouldn't actually get in the way.

it seems like every time (or at least every time we splat), we end up
w/ iova=f000 .. which doesn't sound likely to be right.
Although from just looking at the dma-iommu.c code, I'm not sure how
this happens.  And adding some printk's results in enough traces that
I can't boot for some reason..


Yeah, that's a bogus IOVA for sure, so regardless of how we actually 
make Adreno happy it would still be interesting to figure out how it 
came about. Do you see any WARNs from io-pgtable-arm before the one from 
__iommu_dma_unmap()?


Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-06 Thread Rob Clark
On Thu, Sep 5, 2019 at 3:30 PM Rob Clark  wrote:
>
> On Thu, Sep 5, 2019 at 12:05 PM Rob Clark  wrote:
> >
> > On Thu, Sep 5, 2019 at 10:03 AM Rob Clark  wrote:
> > >
> > > On Wed, Sep 4, 2019 at 11:06 AM Robin Murphy  wrote:
> > > >
> > > > On 04/09/2019 01:12, Rob Clark wrote:
> > > > > On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam  
> > > > > wrote:
> > > > >>
> > > > >> Hi Jonathan,
> > > > >>
> > > > >> On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek  
> > > > >> wrote:
> > > > >>>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I tried this and it works with patches 4+5 from Rob's series and
> > > > >>> changing gpummu to use sg_phys(sg) instead of sg->dma_address
> > > > >>> (dma_address isn't set now that dma_map_sg isn't used).
> > > > >>
> > > > >> Thanks for testing it. I haven't had a chance to test it yet.
> > > > >>
> > > > >> Rob,
> > > > >>
> > > > >> I assume your series is targeted to 5.4, correct?
> > > > >
> > > > > maybe, although Christoph Hellwig didn't seem like a big fan of
> > > > > exposing cache ops, and would rather add a new allocation API for
> > > > > uncached pages.. so I'm not entirely sure what the way forward will
> > > > > be.
> > > >
> > > > TBH, the use of map/unmap looked reasonable in the context of
> > > > "start/stop using these pages for stuff which may include DMA", so even
> > > > if it was cheekily ignoring sg->dma_address I'm not sure I'd really
> > > > consider it "abuse" - in comparison, using sync without a prior map
> > > > unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG
> > > > will be rendered useless with false positives if this driver is active
> > > > while trying to debug something else.
> > > >
> > > > The warning referenced in 0036bc73ccbe represents something being
> > > > unmapped which didn't match a corresponding map - from what I can make
> > > > of get_pages()/put_pages() it looks like that would need msm_obj->flags
> > > > or msm_obj->sgt to change during the lifetime of the object, neither of
> > > > which sounds like a thing that should legitimately happen. Are you sure
> > > > this isn't all just hiding a subtle bug elsewhere? After all, if what
> > > > was being unmapped wasn't right, who says that what's now being synced 
> > > > is?
> > > >
> > >
> > > Correct, msm_obj->flags/sgt should not change.
> > >
> > > I reverted the various patches, and went back to the original setup
> > > that used dma_{map,unmap}_sg() to reproduce the original issue that
> > > prompted the change in the first place.  It is a pretty massive flood
> > > of splats, which pretty quickly overflowed the dmesg ring buffer, so I
> > > might be missing some things, but I'll poke around some more.
> > >
> > > The one thing I wonder about, what would happen if the buffer is
> > > allocated and dma_map_sg() called before drm/msm attaches it's own
> > > iommu_domains, and then dma_unmap_sg() afterwards.  We aren't actually
> > > ever using the iommu domain that DMA API is creating for the device,
> > > so all the extra iommu_map/unmap (and tlb flush) is at best
> > > unnecessary.  But I'm not sure if it could be having some unintended
> > > side effects that cause this sort of problem.
> > >
> >
> > it seems like every time (or at least every time we splat), we end up
> > w/ iova=f000 .. which doesn't sound likely to be right.
> > Although from just looking at the dma-iommu.c code, I'm not sure how
> > this happens.  And adding some printk's results in enough traces that
> > I can't boot for some reason..
> >
>
> Ok, I see better what is going on.. at least on the kernel that I'm
> using on the yoga c630 laptop, where I have a patch[1] to skip domain
> attach.  That results in to_smmu_domain(domain)->pgtbl_ops being null,
> so arm_smmu_map() fails.  So we skip __finalise_sg() which sets the
> sg_dma_address().  Which causes the failure on unmap.
>
> That said, I'm pretty sure I've seen (or had reported) a similar splat
> (although maybe not so frequent) on devices without that patch (where
> the bootloader isn't enabling scanout).  I'll have to switch over to a
> different device that doesn't light up display from bootloader, so
> that I can drop that skip-domain-attach patch
>
> All that said, this would be much easier if I could do the cache
> operations without all this unneeded iommu stuff.  (Not to mention the
> unnecessary TLB flushes that I suspect are also happening.)
>
> [1] https://patchwork.kernel.org/patch/11038793/
>

fwiw, with https://patchwork.freedesktop.org/series/63096/ we could go
back to simply using dma_{map,unmap}_sg() in all cases, as the iommu
dma_ops would no longer get in the way.

BR,
-R
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-05 Thread Rob Clark
On Thu, Sep 5, 2019 at 12:05 PM Rob Clark  wrote:
>
> On Thu, Sep 5, 2019 at 10:03 AM Rob Clark  wrote:
> >
> > On Wed, Sep 4, 2019 at 11:06 AM Robin Murphy  wrote:
> > >
> > > On 04/09/2019 01:12, Rob Clark wrote:
> > > > On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam  
> > > > wrote:
> > > >>
> > > >> Hi Jonathan,
> > > >>
> > > >> On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek  
> > > >> wrote:
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> I tried this and it works with patches 4+5 from Rob's series and
> > > >>> changing gpummu to use sg_phys(sg) instead of sg->dma_address
> > > >>> (dma_address isn't set now that dma_map_sg isn't used).
> > > >>
> > > >> Thanks for testing it. I haven't had a chance to test it yet.
> > > >>
> > > >> Rob,
> > > >>
> > > >> I assume your series is targeted to 5.4, correct?
> > > >
> > > > maybe, although Christoph Hellwig didn't seem like a big fan of
> > > > exposing cache ops, and would rather add a new allocation API for
> > > > uncached pages.. so I'm not entirely sure what the way forward will
> > > > be.
> > >
> > > TBH, the use of map/unmap looked reasonable in the context of
> > > "start/stop using these pages for stuff which may include DMA", so even
> > > if it was cheekily ignoring sg->dma_address I'm not sure I'd really
> > > consider it "abuse" - in comparison, using sync without a prior map
> > > unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG
> > > will be rendered useless with false positives if this driver is active
> > > while trying to debug something else.
> > >
> > > The warning referenced in 0036bc73ccbe represents something being
> > > unmapped which didn't match a corresponding map - from what I can make
> > > of get_pages()/put_pages() it looks like that would need msm_obj->flags
> > > or msm_obj->sgt to change during the lifetime of the object, neither of
> > > which sounds like a thing that should legitimately happen. Are you sure
> > > this isn't all just hiding a subtle bug elsewhere? After all, if what
> > > was being unmapped wasn't right, who says that what's now being synced is?
> > >
> >
> > Correct, msm_obj->flags/sgt should not change.
> >
> > I reverted the various patches, and went back to the original setup
> > that used dma_{map,unmap}_sg() to reproduce the original issue that
> > prompted the change in the first place.  It is a pretty massive flood
> > of splats, which pretty quickly overflowed the dmesg ring buffer, so I
> > might be missing some things, but I'll poke around some more.
> >
> > The one thing I wonder about, what would happen if the buffer is
> > allocated and dma_map_sg() called before drm/msm attaches it's own
> > iommu_domains, and then dma_unmap_sg() afterwards.  We aren't actually
> > ever using the iommu domain that DMA API is creating for the device,
> > so all the extra iommu_map/unmap (and tlb flush) is at best
> > unnecessary.  But I'm not sure if it could be having some unintended
> > side effects that cause this sort of problem.
> >
>
> it seems like every time (or at least every time we splat), we end up
> w/ iova=f000 .. which doesn't sound likely to be right.
> Although from just looking at the dma-iommu.c code, I'm not sure how
> this happens.  And adding some printk's results in enough traces that
> I can't boot for some reason..
>

Ok, I see better what is going on.. at least on the kernel that I'm
using on the yoga c630 laptop, where I have a patch[1] to skip domain
attach.  That results in to_smmu_domain(domain)->pgtbl_ops being null,
so arm_smmu_map() fails.  So we skip __finalise_sg() which sets the
sg_dma_address().  Which causes the failure on unmap.

That said, I'm pretty sure I've seen (or had reported) a similar splat
(although maybe not so frequent) on devices without that patch (where
the bootloader isn't enabling scanout).  I'll have to switch over to a
different device that doesn't light up display from bootloader, so
that I can drop that skip-domain-attach patch

All that said, this would be much easier if I could do the cache
operations without all this unneeded iommu stuff.  (Not to mention the
unnecessary TLB flushes that I suspect are also happening.)

[1] https://patchwork.kernel.org/patch/11038793/

BR,
-R
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-05 Thread Rob Clark
On Thu, Sep 5, 2019 at 10:03 AM Rob Clark  wrote:
>
> On Wed, Sep 4, 2019 at 11:06 AM Robin Murphy  wrote:
> >
> > On 04/09/2019 01:12, Rob Clark wrote:
> > > On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam  wrote:
> > >>
> > >> Hi Jonathan,
> > >>
> > >> On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek  wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> I tried this and it works with patches 4+5 from Rob's series and
> > >>> changing gpummu to use sg_phys(sg) instead of sg->dma_address
> > >>> (dma_address isn't set now that dma_map_sg isn't used).
> > >>
> > >> Thanks for testing it. I haven't had a chance to test it yet.
> > >>
> > >> Rob,
> > >>
> > >> I assume your series is targeted to 5.4, correct?
> > >
> > > maybe, although Christoph Hellwig didn't seem like a big fan of
> > > exposing cache ops, and would rather add a new allocation API for
> > > uncached pages.. so I'm not entirely sure what the way forward will
> > > be.
> >
> > TBH, the use of map/unmap looked reasonable in the context of
> > "start/stop using these pages for stuff which may include DMA", so even
> > if it was cheekily ignoring sg->dma_address I'm not sure I'd really
> > consider it "abuse" - in comparison, using sync without a prior map
> > unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG
> > will be rendered useless with false positives if this driver is active
> > while trying to debug something else.
> >
> > The warning referenced in 0036bc73ccbe represents something being
> > unmapped which didn't match a corresponding map - from what I can make
> > of get_pages()/put_pages() it looks like that would need msm_obj->flags
> > or msm_obj->sgt to change during the lifetime of the object, neither of
> > which sounds like a thing that should legitimately happen. Are you sure
> > this isn't all just hiding a subtle bug elsewhere? After all, if what
> > was being unmapped wasn't right, who says that what's now being synced is?
> >
>
> Correct, msm_obj->flags/sgt should not change.
>
> I reverted the various patches, and went back to the original setup
> that used dma_{map,unmap}_sg() to reproduce the original issue that
> prompted the change in the first place.  It is a pretty massive flood
> of splats, which pretty quickly overflowed the dmesg ring buffer, so I
> might be missing some things, but I'll poke around some more.
>
> The one thing I wonder about, what would happen if the buffer is
> allocated and dma_map_sg() called before drm/msm attaches it's own
> iommu_domains, and then dma_unmap_sg() afterwards.  We aren't actually
> ever using the iommu domain that DMA API is creating for the device,
> so all the extra iommu_map/unmap (and tlb flush) is at best
> unnecessary.  But I'm not sure if it could be having some unintended
> side effects that cause this sort of problem.
>

it seems like every time (or at least every time we splat), we end up
w/ iova=f000 .. which doesn't sound likely to be right.
Although from just looking at the dma-iommu.c code, I'm not sure how
this happens.  And adding some printk's results in enough traces that
I can't boot for some reason..

BR,
-R
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-05 Thread Rob Clark
On Wed, Sep 4, 2019 at 11:06 AM Robin Murphy  wrote:
>
> On 04/09/2019 01:12, Rob Clark wrote:
> > On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam  wrote:
> >>
> >> Hi Jonathan,
> >>
> >> On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I tried this and it works with patches 4+5 from Rob's series and
> >>> changing gpummu to use sg_phys(sg) instead of sg->dma_address
> >>> (dma_address isn't set now that dma_map_sg isn't used).
> >>
> >> Thanks for testing it. I haven't had a chance to test it yet.
> >>
> >> Rob,
> >>
> >> I assume your series is targeted to 5.4, correct?
> >
> > maybe, although Christoph Hellwig didn't seem like a big fan of
> > exposing cache ops, and would rather add a new allocation API for
> > uncached pages.. so I'm not entirely sure what the way forward will
> > be.
>
> TBH, the use of map/unmap looked reasonable in the context of
> "start/stop using these pages for stuff which may include DMA", so even
> if it was cheekily ignoring sg->dma_address I'm not sure I'd really
> consider it "abuse" - in comparison, using sync without a prior map
> unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG
> will be rendered useless with false positives if this driver is active
> while trying to debug something else.
>
> The warning referenced in 0036bc73ccbe represents something being
> unmapped which didn't match a corresponding map - from what I can make
> of get_pages()/put_pages() it looks like that would need msm_obj->flags
> or msm_obj->sgt to change during the lifetime of the object, neither of
> which sounds like a thing that should legitimately happen. Are you sure
> this isn't all just hiding a subtle bug elsewhere? After all, if what
> was being unmapped wasn't right, who says that what's now being synced is?
>

Correct, msm_obj->flags/sgt should not change.

I reverted the various patches, and went back to the original setup
that used dma_{map,unmap}_sg() to reproduce the original issue that
prompted the change in the first place.  It is a pretty massive flood
of splats, which pretty quickly overflowed the dmesg ring buffer, so I
might be missing some things, but I'll poke around some more.

The one thing I wonder about, what would happen if the buffer is
allocated and dma_map_sg() called before drm/msm attaches it's own
iommu_domains, and then dma_unmap_sg() afterwards.  We aren't actually
ever using the iommu domain that DMA API is creating for the device,
so all the extra iommu_map/unmap (and tlb flush) is at best
unnecessary.  But I'm not sure if it could be having some unintended
side effects that cause this sort of problem.

BR,
-R
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-04 Thread Robin Murphy

On 04/09/2019 01:12, Rob Clark wrote:

On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam  wrote:


Hi Jonathan,

On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek  wrote:


Hi,

I tried this and it works with patches 4+5 from Rob's series and
changing gpummu to use sg_phys(sg) instead of sg->dma_address
(dma_address isn't set now that dma_map_sg isn't used).


Thanks for testing it. I haven't had a chance to test it yet.

Rob,

I assume your series is targeted to 5.4, correct?


maybe, although Christoph Hellwig didn't seem like a big fan of
exposing cache ops, and would rather add a new allocation API for
uncached pages.. so I'm not entirely sure what the way forward will
be.


TBH, the use of map/unmap looked reasonable in the context of 
"start/stop using these pages for stuff which may include DMA", so even 
if it was cheekily ignoring sg->dma_address I'm not sure I'd really 
consider it "abuse" - in comparison, using sync without a prior map 
unquestionably violates the API, and means that CONFIG_DMA_API_DEBUG 
will be rendered useless with false positives if this driver is active 
while trying to debug something else.


The warning referenced in 0036bc73ccbe represents something being 
unmapped which didn't match a corresponding map - from what I can make 
of get_pages()/put_pages() it looks like that would need msm_obj->flags 
or msm_obj->sgt to change during the lifetime of the object, neither of 
which sounds like a thing that should legitimately happen. Are you sure 
this isn't all just hiding a subtle bug elsewhere? After all, if what 
was being unmapped wasn't right, who says that what's now being synced is?


Robin.


In the mean time, it is a bit ugly, but I guess something like this should work:


diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 7263f4373f07..5a6a79fbc9d6 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -52,7 +52,7 @@ static void sync_for_device(struct msm_gem_object *msm_obj)
  {
  struct device *dev = msm_obj->base.dev->dev;

-if (get_dma_ops(dev)) {
+if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) {
  dma_sync_sg_for_device(dev, msm_obj->sgt->sgl,
  msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
  } else {
@@ -65,7 +65,7 @@ static void sync_for_cpu(struct msm_gem_object *msm_obj)
  {
  struct device *dev = msm_obj->base.dev->dev;

-if (get_dma_ops(dev)) {
+if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) {
  dma_sync_sg_for_cpu(dev, msm_obj->sgt->sgl,
  msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
  } else {


BR,
-R


If this is the case, what we should do about the i.MX5 regression on 5.3?

Would a revert of the two commits be acceptable in 5.3 in order to
avoid the regression?

Please advise.

Thanks

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-04 Thread Jonathan Marek

Hi,

I tried this and it works with patches 4+5 from Rob's series and 
changing gpummu to use sg_phys(sg) instead of sg->dma_address 
(dma_address isn't set now that dma_map_sg isn't used).


Jonathan

On 9/3/19 11:22 AM, Rob Clark wrote:

On Mon, Sep 2, 2019 at 11:03 AM Fabio Estevam  wrote:


Hi Robin,

On Mon, Sep 2, 2019 at 11:45 AM Robin Murphy  wrote:


Try 0036bc73ccbe - that looks like something that CONFIG_DMA_API_DEBUG
should have been screaming about anyway.


Thanks for your suggestion.

I can successfully boot after reverting the following commits:

commit 141db5703c887f46957615cd6616ca28fe4691e0 (HEAD)
Author: Fabio Estevam 
Date:   Mon Sep 2 14:58:18 2019 -0300

 Revert "drm/msm: stop abusing dma_map/unmap for cache"

 This reverts commit 0036bc73ccbe7e600a3468bf8e8879b122252274.

commit fa5b1f620f2984c254877d6049214c39c24c8207
Author: Fabio Estevam 
Date:   Mon Sep 2 14:56:01 2019 -0300

 Revert "drm/msm: Use the correct dma_sync calls in msm_gem"

 This reverts commit 3de433c5b38af49a5fc7602721e2ab5d39f1e69c.

Rob,

What would be the recommended approach for fixing this?



We need a direct way to handle cache, so we can stop trying to trick
DMA API into doing what we want.

Something like this is what I had in mind:

https://patchwork.freedesktop.org/series/65211/

I guess I could respin that.  I'm not really sure of any other way to
have things working on the different combinations of archs and dma_ops
that we have.  Lately fixing one has been breaking another.

BR,
-R


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-03 Thread Fabio Estevam
Hi Rob,

On Tue, Sep 3, 2019 at 9:12 PM Rob Clark  wrote:

> In the mean time, it is a bit ugly, but I guess something like this should 
> work:

Yes, this works on a i.MX53 board, thanks:

Tested-by: Fabio Estevam 

Is this something you could submit for 5.3?

Thanks
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-03 Thread Rob Clark
On Tue, Sep 3, 2019 at 12:31 PM Fabio Estevam  wrote:
>
> Hi Jonathan,
>
> On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek  wrote:
> >
> > Hi,
> >
> > I tried this and it works with patches 4+5 from Rob's series and
> > changing gpummu to use sg_phys(sg) instead of sg->dma_address
> > (dma_address isn't set now that dma_map_sg isn't used).
>
> Thanks for testing it. I haven't had a chance to test it yet.
>
> Rob,
>
> I assume your series is targeted to 5.4, correct?

maybe, although Christoph Hellwig didn't seem like a big fan of
exposing cache ops, and would rather add a new allocation API for
uncached pages.. so I'm not entirely sure what the way forward will
be.

In the mean time, it is a bit ugly, but I guess something like this should work:


diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 7263f4373f07..5a6a79fbc9d6 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -52,7 +52,7 @@ static void sync_for_device(struct msm_gem_object *msm_obj)
 {
 struct device *dev = msm_obj->base.dev->dev;

-if (get_dma_ops(dev)) {
+if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) {
 dma_sync_sg_for_device(dev, msm_obj->sgt->sgl,
 msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
 } else {
@@ -65,7 +65,7 @@ static void sync_for_cpu(struct msm_gem_object *msm_obj)
 {
 struct device *dev = msm_obj->base.dev->dev;

-if (get_dma_ops(dev)) {
+if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) {
 dma_sync_sg_for_cpu(dev, msm_obj->sgt->sgl,
 msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
 } else {


BR,
-R

> If this is the case, what we should do about the i.MX5 regression on 5.3?
>
> Would a revert of the two commits be acceptable in 5.3 in order to
> avoid the regression?
>
> Please advise.
>
> Thanks
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-03 Thread Fabio Estevam
Hi Jonathan,

On Tue, Sep 3, 2019 at 4:25 PM Jonathan Marek  wrote:
>
> Hi,
>
> I tried this and it works with patches 4+5 from Rob's series and
> changing gpummu to use sg_phys(sg) instead of sg->dma_address
> (dma_address isn't set now that dma_map_sg isn't used).

Thanks for testing it. I haven't had a chance to test it yet.

Rob,

I assume your series is targeted to 5.4, correct?

If this is the case, what we should do about the i.MX5 regression on 5.3?

Would a revert of the two commits be acceptable in 5.3 in order to
avoid the regression?

Please advise.

Thanks
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-03 Thread Rob Clark
On Mon, Sep 2, 2019 at 11:03 AM Fabio Estevam  wrote:
>
> Hi Robin,
>
> On Mon, Sep 2, 2019 at 11:45 AM Robin Murphy  wrote:
>
> > Try 0036bc73ccbe - that looks like something that CONFIG_DMA_API_DEBUG
> > should have been screaming about anyway.
>
> Thanks for your suggestion.
>
> I can successfully boot after reverting the following commits:
>
> commit 141db5703c887f46957615cd6616ca28fe4691e0 (HEAD)
> Author: Fabio Estevam 
> Date:   Mon Sep 2 14:58:18 2019 -0300
>
> Revert "drm/msm: stop abusing dma_map/unmap for cache"
>
> This reverts commit 0036bc73ccbe7e600a3468bf8e8879b122252274.
>
> commit fa5b1f620f2984c254877d6049214c39c24c8207
> Author: Fabio Estevam 
> Date:   Mon Sep 2 14:56:01 2019 -0300
>
> Revert "drm/msm: Use the correct dma_sync calls in msm_gem"
>
> This reverts commit 3de433c5b38af49a5fc7602721e2ab5d39f1e69c.
>
> Rob,
>
> What would be the recommended approach for fixing this?
>

We need a direct way to handle cache, so we can stop trying to trick
DMA API into doing what we want.

Something like this is what I had in mind:

https://patchwork.freedesktop.org/series/65211/

I guess I could respin that.  I'm not really sure of any other way to
have things working on the different combinations of archs and dma_ops
that we have.  Lately fixing one has been breaking another.

BR,
-R
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-02 Thread Fabio Estevam
Hi Robin,

On Mon, Sep 2, 2019 at 11:45 AM Robin Murphy  wrote:

> Try 0036bc73ccbe - that looks like something that CONFIG_DMA_API_DEBUG
> should have been screaming about anyway.

Thanks for your suggestion.

I can successfully boot after reverting the following commits:

commit 141db5703c887f46957615cd6616ca28fe4691e0 (HEAD)
Author: Fabio Estevam 
Date:   Mon Sep 2 14:58:18 2019 -0300

Revert "drm/msm: stop abusing dma_map/unmap for cache"

This reverts commit 0036bc73ccbe7e600a3468bf8e8879b122252274.

commit fa5b1f620f2984c254877d6049214c39c24c8207
Author: Fabio Estevam 
Date:   Mon Sep 2 14:56:01 2019 -0300

Revert "drm/msm: Use the correct dma_sync calls in msm_gem"

This reverts commit 3de433c5b38af49a5fc7602721e2ab5d39f1e69c.

Rob,

What would be the recommended approach for fixing this?

Thanks
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Adreno crash on i.MX53 running 5.3-rc6

2019-09-02 Thread Robin Murphy

On 02/09/2019 14:51, Fabio Estevam wrote:

Hi,

I am getting the following crash when booting the adreno driver on
i.MX53 running a 5.3-rc6 kernel.

Such error does not happen with 5.2 though.

Before I start running a bisect, I am wondering if anyone has any
ideas about this issue.


Try 0036bc73ccbe - that looks like something that CONFIG_DMA_API_DEBUG 
should have been screaming about anyway.


Robin.



Thanks,

Fabio Estevam

[2.083249] 8<--- cut here ---
[2.086460] Unable to handle kernel paging request at virtual
address 50001000
[2.094174] pgd = (ptrval)
[2.096911] [50001000] *pgd=
[2.100606] Internal error: Oops: 805 [#1] SMP ARM
[2.105412] Modules linked in:
[2.108487] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.3.0-rc6-00271-g9f159ae07f07 #4
[2.116411] Hardware name: Freescale i.MX53 (Device Tree Support)
[2.122538] PC is at v7_dma_clean_range+0x20/0x38
[2.127254] LR is at __dma_page_cpu_to_dev+0x28/0x90
[2.132226] pc : []lr : []psr: 2013
[2.138500] sp : d80b5a88  ip : de96c000  fp : d840ce6c
[2.143732] r10:   r9 : 0001  r8 : d843e010
[2.148964] r7 :   r6 : 8000  r5 : ddb6c000  r4 : 
[2.155500] r3 : 003f  r2 : 0040  r1 : 50008000  r0 : 50001000
[2.162037] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[2.169180] Control: 10c5387d  Table: 70004019  DAC: 0051
[2.174934] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
[2.180949] Stack: (0xd80b5a88 to 0xd80b6000)
[2.185319] 5a80:   c011c7bc d8491780 d840ce6c
d849b380  c011822c
[2.193509] 5aa0: c0d01a18 c0118abc c0118a78 d84a0200 0008
c1308908 d838e800 d849a4a8
[2.201697] 5ac0: d8491780 c06699b4   
d8491600 d80b5b20 d84a0200
[2.209886] 5ae0: d8491780 d8491600 d80b5b20 d8491600 d849a4a8
d84a0200 0003 d84a0358
[2.218077] 5b00: c1308908 d8491600 d849a4a8 d8491780 d840ce6c
c066a55c c1308908 c066a104
[2.226266] 5b20: 01001000  d84a0200 10700ac6 d849a480
d84a0200  d8491600
[2.234455] 5b40:  e0845000 c1308908 c066a72c d849a480
d840ce6c d840ce00 c1308908
[2.242643] 5b60:  c066b584 d849a488 d849a4a8 
c1308908 d840ce6c c066ff40
[2.250832] 5b80: d849a488 d849a4a8  c1308908 
d81b4000  e0845000
[2.259021] 5ba0: d838e800 c1308908 d8491600 10700ac6 d80b5bc8
d840ce00 d840ce6c 0001
[2.267210] 5bc0:  e0845000 d838e800 c066ece4 0100
 10ff 
[2.275399] 5be0: c1308908 0001 d81b4000  0100
 0001 10700ac6
[2.283587] 5c00: c0d6d564 d840ce00 d81b4010 0001 d81b4000
c0d6d564 c1308908 d80b5c48
[2.291777] 5c20: d838e800 c061f9cc c1029dec d80b5c48 d838e800
  c13e8788
[2.299965] 5c40:  c1308928 c102a234  0100
 10ff 
[2.308154] 5c60: 0001  a013 10700ac6 c13b7658
d840ce00 d838e800 d81b4000
[2.316343] 5c80: d840ce00 c1308908 0002 d838f800 
c0620514 0001 10700ac6
[2.324531] 5ca0: d8496440  d81b4010 c1aa1c00 d838e800
c061e070  
[2.332720] 5cc0:  c0d6c534 df56cf34 00c8 
10700ac6 d81b4010 
[2.340909] 5ce0:  d8496440 d838e800 c103acd0 d8496280
 c1380488 c06a3e10
[2.349097] 5d00:    d838f800 d838e800
d843e010 d8496440 c1308908
[2.357286] 5d20:  d83f9640 c1380488 c0668554 0006
0007 c13804d4 d83f9640
[2.365475] 5d40: c1380488 c017ec18 d80c c0c43e40 d843e010
d8496440 0001 c0182a94
[2.373665] 5d60: 6013 10700ac6 d843e010 d8496280 d8496400
0018 d8496440 0001
[2.381854] 5d80: c13804d4 d83f9640 c1380488 c06a4280 c1380488
 c0d764f8 d8496440
[2.390044] 5da0: c1380488 d843e010 c0d764f8 c1308908 
 c13ef300 c06a44f0
[2.398232] 5dc0: c0d8a0dc dffcc6f0 d843e010 dffcc6f0 
d843e010  c06680b8
[2.406421] 5de0: d84988c0 d83f9640 d84988c0 d84989a0 d8498230
10700ac6 0001 d843e010
[2.414610] 5e00:  c137eec0  c137eec0 
 c13ef300 c06ac1a0
[2.422799] 5e20: d843e010 c1aa40dc c1aa40e0  c137eec0
c06aa014 d843e010 c137eec0
[2.430988] 5e40: c137eec0 c1308908 c13e9880 c13e85d4 
c06aa368 c1308908 c13e9880
[2.439178] 5e60: c13e85d4 d843e010  c137eec0 c1308908
c13e9880 c13e85d4 c06aa618
[2.447367] 5e80:  c137eec0 d843e010 c06aa6a4 
c137eec0 c06aa620 c06a844c
[2.46] 5ea0: d80888d4 d80888a4 d84914d0 10700ac6 d80888d4
c137eec0 d8494f00 c1380d28
[2.463745] 5ec0:  c06a946c c105f3d4 c1308908 
c137eec0 c1308908 
[2.471934] 5ee0: c125fdd0 c06ab304 c1308928 c1308908 
c0103178 0109 
[2.480123] 5f00: dc6e dc00 c1126860 0109 0109
c014dc88 c11253ac c10607a0