Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.
On Fri, 2010-03-26 at 19:20 +0100, Michel Dänzer wrote: On Thu, 2010-03-25 at 19:56 +1000, Dave Airlie wrote: 2010/3/25 Michel Dänzer mic...@daenzer.net: On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote: From: Dave Airlie airl...@redhat.com On constrained r100 systems compiz would fail to start due to a lack of memory, we can just fallback place the objects rather than completely failing it works a lot better. Signed-off-by: Dave Airlie airl...@redhat.com This change seems to trigger or at least greatly expedite GPU lockups on my PowerBook. With the change applied, my normal X session locked up the GPU after just a few minutes several times. Now with it reverted it's back to the previous stability. Care to try in pci mode? see if helps, it might be just straining AGP a bit more, maybe try 1x as well if you aren't already in it. After various more tests, the gist of it seems to be that rendering to GTT with the 3D engine tends to lock up whereas the 2D and DMA engines work fine (otherwise I probably would have run into this a long time ago at BO eviction time). Maybe some kind of synchronization issue resulting in the 3D engine trying to access GTT memory which isn't bound yet / anymore? I think I've confirmed this theory by changing the AGP driver to bind the AGP scratch page instead of really unbinding a GTT entry. That prevents the lockups from occurring. Even so though, this change hurts performance, presumably because BOs evicted from VRAM (or not going in in the first place) aren't getting (back) in. -- Earthling Michel Dänzer |http://www.vmware.com Libre software enthusiast | Debian, X and DRI developer -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.
On Thu, 2010-03-25 at 19:56 +1000, Dave Airlie wrote: 2010/3/25 Michel Dänzer mic...@daenzer.net: On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote: From: Dave Airlie airl...@redhat.com On constrained r100 systems compiz would fail to start due to a lack of memory, we can just fallback place the objects rather than completely failing it works a lot better. Signed-off-by: Dave Airlie airl...@redhat.com This change seems to trigger or at least greatly expedite GPU lockups on my PowerBook. With the change applied, my normal X session locked up the GPU after just a few minutes several times. Now with it reverted it's back to the previous stability. Care to try in pci mode? see if helps, it might be just straining AGP a bit more, maybe try 1x as well if you aren't already in it. After various more tests, the gist of it seems to be that rendering to GTT with the 3D engine tends to lock up whereas the 2D and DMA engines work fine (otherwise I probably would have run into this a long time ago at BO eviction time). Maybe some kind of synchronization issue resulting in the 3D engine trying to access GTT memory which isn't bound yet / anymore? I don't know why that is - maybe something doesn't properly deal with BOs getting placed differently in some cases now - but anyway I suspect the implications of this change haven't been fully thought through: The log message sounds as though the change was mainly written with radeon_bo_create() / radeon_bo_list_validate() in mind, but radeon_ttm_placement_from_domain() is also called from other places: * radeon_bo_pin(): The change could lead to a BO being pinned to GTT instead of VRAM, which would probably be bad. * radeon_evict_flags(): The change might have undesirable consequences here as well, not sure. The first might be bad, [...] I actually saw this happen once for the scanout BO of a second X server, and it resulted in pretty spectacular cascaded failure. This confirmed my suspicion that it's a very bad idea for radeon_bo_pin(). In summary I'm afraid this isn't really a good solution for what you were trying to achieve, and it's actually breaking things that were working before, so I think it should be reverted. -- Earthling Michel Dänzer |http://www.vmware.com Libre software enthusiast | Debian, X and DRI developer -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.
On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote: From: Dave Airlie airl...@redhat.com On constrained r100 systems compiz would fail to start due to a lack of memory, we can just fallback place the objects rather than completely failing it works a lot better. Signed-off-by: Dave Airlie airl...@redhat.com This change seems to trigger or at least greatly expedite GPU lockups on my PowerBook. With the change applied, my normal X session locked up the GPU after just a few minutes several times. Now with it reverted it's back to the previous stability. I don't know why that is - maybe something doesn't properly deal with BOs getting placed differently in some cases now - but anyway I suspect the implications of this change haven't been fully thought through: The log message sounds as though the change was mainly written with radeon_bo_create() / radeon_bo_list_validate() in mind, but radeon_ttm_placement_from_domain() is also called from other places: * radeon_bo_pin(): The change could lead to a BO being pinned to GTT instead of VRAM, which would probably be bad. * radeon_evict_flags(): The change might have undesirable consequences here as well, not sure. I don't think the way we currently use the same arrays for normal and busy placement makes a lot of sense, but we probably need a better mechanism to specify which placements are desirable / acceptable in each case. -- Earthling Michel Dänzer |http://www.vmware.com Libre software enthusiast | Debian, X and DRI developer -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.
2010/3/25 Michel Dänzer mic...@daenzer.net: On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote: From: Dave Airlie airl...@redhat.com On constrained r100 systems compiz would fail to start due to a lack of memory, we can just fallback place the objects rather than completely failing it works a lot better. Signed-off-by: Dave Airlie airl...@redhat.com This change seems to trigger or at least greatly expedite GPU lockups on my PowerBook. With the change applied, my normal X session locked up the GPU after just a few minutes several times. Now with it reverted it's back to the previous stability. Care to try in pci mode? see if helps, it might be just straining AGP a bit more, maybe try 1x as well if you aren't already in it. I don't know why that is - maybe something doesn't properly deal with BOs getting placed differently in some cases now - but anyway I suspect the implications of this change haven't been fully thought through: The log message sounds as though the change was mainly written with radeon_bo_create() / radeon_bo_list_validate() in mind, but radeon_ttm_placement_from_domain() is also called from other places: * radeon_bo_pin(): The change could lead to a BO being pinned to GTT instead of VRAM, which would probably be bad. * radeon_evict_flags(): The change might have undesirable consequences here as well, not sure. The first might be bad, but the second should be okay, I'll take a closer look in the morning. I don't think the way we currently use the same arrays for normal and busy placement makes a lot of sense, but we probably need a better mechanism to specify which placements are desirable / acceptable in each case. Well its not really a huge matrix of choices, I'm guessing we need more info from userspace, or use the info better. Though in theory everything should work in GTT or VRAM equally well just slower. Dave. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.
On Don, 2010-03-25 at 19:56 +1000, Dave Airlie wrote: 2010/3/25 Michel Dänzer mic...@daenzer.net: On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote: From: Dave Airlie airl...@redhat.com On constrained r100 systems compiz would fail to start due to a lack of memory, we can just fallback place the objects rather than completely failing it works a lot better. Signed-off-by: Dave Airlie airl...@redhat.com This change seems to trigger or at least greatly expedite GPU lockups on my PowerBook. With the change applied, my normal X session locked up the GPU after just a few minutes several times. Now with it reverted it's back to the previous stability. Care to try in pci mode? see if helps, it might be just straining AGP a bit more, Ugh, k I'll try... but that incurs such a huge performance hit that the result might not be very meaningful I'm afraid. maybe try 1x as well if you aren't already in it. I am, for some reason neither 2x nor 4x has ever worked reliably with KMS although 4x was rock solid with UMS. I don't know why that is - maybe something doesn't properly deal with BOs getting placed differently in some cases now - but anyway I suspect the implications of this change haven't been fully thought through: The log message sounds as though the change was mainly written with radeon_bo_create() / radeon_bo_list_validate() in mind, but radeon_ttm_placement_from_domain() is also called from other places: * radeon_bo_pin(): The change could lead to a BO being pinned to GTT instead of VRAM, which would probably be bad. * radeon_evict_flags(): The change might have undesirable consequences here as well, not sure. The first might be bad, but the second should be okay, I'll take a closer look in the morning. What about that there are now usually no busy placements specified, couldn't that be a problem? I don't think the way we currently use the same arrays for normal and busy placement makes a lot of sense, but we probably need a better mechanism to specify which placements are desirable / acceptable in each case. Well its not really a huge matrix of choices, I'm guessing we need more info from userspace, or use the info better. Though in theory everything should work in GTT or VRAM equally well just slower. E.g. scanout from GTT could easily exceed the available bandwidth. -- Earthling Michel Dänzer |http://www.vmware.com Libre software enthusiast | Debian, X and DRI developer -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.
On Thu, 2010-03-25 at 11:11 +0100, Michel Dänzer wrote: On Don, 2010-03-25 at 19:56 +1000, Dave Airlie wrote: 2010/3/25 Michel Dänzer mic...@daenzer.net: On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote: From: Dave Airlie airl...@redhat.com On constrained r100 systems compiz would fail to start due to a lack of memory, we can just fallback place the objects rather than completely failing it works a lot better. Signed-off-by: Dave Airlie airl...@redhat.com This change seems to trigger or at least greatly expedite GPU lockups on my PowerBook. With the change applied, my normal X session locked up the GPU after just a few minutes several times. Now with it reverted it's back to the previous stability. Care to try in pci mode? see if helps, it might be just straining AGP a bit more, Ugh, k I'll try... but that incurs such a huge performance hit that the result might not be very meaningful I'm afraid. It didn't lock up in a couple of hours of suffering through PCI, so maybe it is an AGP problem, or maybe PCI is just too slow to trigger it... More likely the former though I guess. I don't know why that is - maybe something doesn't properly deal with BOs getting placed differently in some cases now - but anyway I suspect the implications of this change haven't been fully thought through: The log message sounds as though the change was mainly written with radeon_bo_create() / radeon_bo_list_validate() in mind, but radeon_ttm_placement_from_domain() is also called from other places: * radeon_bo_pin(): The change could lead to a BO being pinned to GTT instead of VRAM, which would probably be bad. * radeon_evict_flags(): The change might have undesirable consequences here as well, not sure. The first might be bad, but the second should be okay, I'll take a closer look in the morning. What about that there are now usually no busy placements specified, couldn't that be a problem? FWIW I tried re-using the normal placements for missing busy placements, didn't help. -- Earthling Michel Dänzer |http://www.vmware.com Libre software enthusiast | Debian, X and DRI developer -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.
From: Dave Airlie airl...@redhat.com On constrained r100 systems compiz would fail to start due to a lack of memory, we can just fallback place the objects rather than completely failing it works a lot better. Signed-off-by: Dave Airlie airl...@redhat.com --- drivers/gpu/drm/radeon/radeon.h|2 ++ drivers/gpu/drm/radeon/radeon_object.c | 10 +++--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 72ed251..69585bb 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -238,7 +238,9 @@ struct radeon_bo { struct list_headlist; /* Protected by tbo.reserved */ u32 placements[3]; + u32 busy_placements[3]; struct ttm_placementplacement; + struct ttm_placementbusy_placement; struct ttm_buffer_objecttbo; struct ttm_bo_kmap_obj kmap; unsignedpin_count; diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index fc9d00a..3580c75 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -65,15 +65,19 @@ bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object *bo) void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) { - u32 c = 0; + u32 c = 0, b = 0; rbo-placement.fpfn = 0; rbo-placement.lpfn = 0; rbo-placement.placement = rbo-placements; - rbo-placement.busy_placement = rbo-placements; + rbo-placement.busy_placement = rbo-busy_placements; if (domain RADEON_GEM_DOMAIN_VRAM) rbo-placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_VRAM; + /* add busy placement to TTM if VRAM is only option */ + if (domain == RADEON_GEM_DOMAIN_VRAM) { + rbo-busy_placements[b++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_TT; + } if (domain RADEON_GEM_DOMAIN_GTT) rbo-placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_TT; if (domain RADEON_GEM_DOMAIN_CPU) @@ -81,7 +85,7 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) if (!c) rbo-placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM; rbo-placement.num_placement = c; - rbo-placement.num_busy_placement = c; + rbo-placement.num_busy_placement = b; } int radeon_bo_create(struct radeon_device *rdev, struct drm_gem_object *gobj, -- 1.6.6.1 -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel