Bug#1054514: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait
Hi, On Wed, Mar 20, 2024 at 04:25:48PM +0100, Linux regression tracking (Thorsten Leemhuis) wrote: > On 08.03.24 02:08, Alex Constantino wrote: > > Fix OOM scenario by doing multiple notifications to the OOM handler through > > a busy wait logic. > > Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would > > result in a '[TTM] Buffer eviction failed' exception whenever it reached a > > timeout. > > > > Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") > > Link: > > https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info > > Reported-by: Timo Lindfors > > Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514 > > Signed-off-by: Alex Constantino > > --- > > drivers/gpu/drm/qxl/qxl_release.c | 20 ++-- > > 1 file changed, 14 insertions(+), 6 deletions(-) > > Hey Dave and Gerd as well as Thomas, Maarten and Maxime (the latter two > I just added to the CC), it seems to me this regression fix did not > maybe any progress since it was posted. Did I miss something, is it just > "we are busy with the merge window", or is there some other a reason? > Just wondering, I just saw someone on a Fedora IRC channel complaining > about the regression, that's why I'm asking. Would be really good to > finally get this resolved... I've ping'd Gerd last week about it, but he couldn't remember the details of why that patch was warranted in the first place. If it works, I'd prefer to revert the original patch that we know used to work instead of coming up with some less proven logic, which seems to be quite different to what it used to be. Alex, could you try reverting 5a838e5d5825c85556011478abde708251cc0776 and letting us know the result? Thanks! Maxime signature.asc Description: PGP signature
Bug#1054514: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait
On 08.03.24 02:08, Alex Constantino wrote: > Fix OOM scenario by doing multiple notifications to the OOM handler through > a busy wait logic. > Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would > result in a '[TTM] Buffer eviction failed' exception whenever it reached a > timeout. > > Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") > Link: > https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info > Reported-by: Timo Lindfors > Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514 > Signed-off-by: Alex Constantino > --- > drivers/gpu/drm/qxl/qxl_release.c | 20 ++-- > 1 file changed, 14 insertions(+), 6 deletions(-) Hey Dave and Gerd as well as Thomas, Maarten and Maxime (the latter two I just added to the CC), it seems to me this regression fix did not maybe any progress since it was posted. Did I miss something, is it just "we are busy with the merge window", or is there some other a reason? Just wondering, I just saw someone on a Fedora IRC channel complaining about the regression, that's why I'm asking. Would be really good to finally get this resolved... Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke > diff --git a/drivers/gpu/drm/qxl/qxl_release.c > b/drivers/gpu/drm/qxl/qxl_release.c > index 368d26da0d6a..51c22e7f9647 100644 > --- a/drivers/gpu/drm/qxl/qxl_release.c > +++ b/drivers/gpu/drm/qxl/qxl_release.c > @@ -20,8 +20,6 @@ > * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. > */ > > -#include > - > #include > > #include "qxl_drv.h" > @@ -59,14 +57,24 @@ static long qxl_fence_wait(struct dma_fence *fence, bool > intr, > { > struct qxl_device *qdev; > unsigned long cur, end = jiffies + timeout; > + signed long iterations = 1; > + signed long timeout_fraction = timeout; > > qdev = container_of(fence->lock, struct qxl_device, release_lock); > > - if (!wait_event_timeout(qdev->release_event, > + // using HZ as a factor since it is used in ttm_bo_wait_ctx too > + if (timeout_fraction > HZ) { > + iterations = timeout_fraction / HZ; > + timeout_fraction = HZ; > + } > + for (int i = 0; i < iterations; i++) { > + if (wait_event_timeout( > + qdev->release_event, > (dma_fence_is_signaled(fence) || > - (qxl_io_notify_oom(qdev), 0)), > - timeout)) > - return 0; > + (qxl_io_notify_oom(qdev), 0)), > + timeout_fraction)) > + break; > + } > > cur = jiffies; > if (time_after(cur, end))
Bug#1054514: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait
On 08.03.24 02:08, Alex Constantino wrote: > Fix OOM scenario by doing multiple notifications to the OOM handler through > a busy wait logic. > Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would > result in a '[TTM] Buffer eviction failed' exception whenever it reached a > timeout. Thx for working on this. > Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") > Link: > https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info Nitpicking: that ideally should be pointing to https://lore.kernel.org/regressions/ztgydqrlk6wx_...@eldamar.lan/ , as that the report and not just a reply to prod things. Ciao, Thorsten
Bug#1054514: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait
Fix OOM scenario by doing multiple notifications to the OOM handler through a busy wait logic. Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would result in a '[TTM] Buffer eviction failed' exception whenever it reached a timeout. Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") Link: https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info Reported-by: Timo Lindfors Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514 Signed-off-by: Alex Constantino --- drivers/gpu/drm/qxl/qxl_release.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c index 368d26da0d6a..51c22e7f9647 100644 --- a/drivers/gpu/drm/qxl/qxl_release.c +++ b/drivers/gpu/drm/qxl/qxl_release.c @@ -20,8 +20,6 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ -#include - #include #include "qxl_drv.h" @@ -59,14 +57,24 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr, { struct qxl_device *qdev; unsigned long cur, end = jiffies + timeout; + signed long iterations = 1; + signed long timeout_fraction = timeout; qdev = container_of(fence->lock, struct qxl_device, release_lock); - if (!wait_event_timeout(qdev->release_event, + // using HZ as a factor since it is used in ttm_bo_wait_ctx too + if (timeout_fraction > HZ) { + iterations = timeout_fraction / HZ; + timeout_fraction = HZ; + } + for (int i = 0; i < iterations; i++) { + if (wait_event_timeout( + qdev->release_event, (dma_fence_is_signaled(fence) || -(qxl_io_notify_oom(qdev), 0)), - timeout)) - return 0; + (qxl_io_notify_oom(qdev), 0)), + timeout_fraction)) + break; + } cur = jiffies; if (time_after(cur, end)) -- 2.39.2