Bug#1054514: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait

2024-03-27 Thread Maxime Ripard
Hi,

On Wed, Mar 20, 2024 at 04:25:48PM +0100, Linux regression tracking (Thorsten 
Leemhuis) wrote:
> On 08.03.24 02:08, Alex Constantino wrote:
> > Fix OOM scenario by doing multiple notifications to the OOM handler through
> > a busy wait logic.
> > Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
> > result in a '[TTM] Buffer eviction failed' exception whenever it reached a
> > timeout.
> > 
> > Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
> > Link: 
> > https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info
> > Reported-by: Timo Lindfors 
> > Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514
> > Signed-off-by: Alex Constantino 
> > ---
> >  drivers/gpu/drm/qxl/qxl_release.c | 20 ++--
> >  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> Hey Dave and Gerd as well as Thomas, Maarten and Maxime (the latter two
> I just added to the CC), it seems to me this regression fix did not
> maybe any progress since it was posted. Did I miss something, is it just
> "we are busy with the merge window", or is there some other a reason?
> Just wondering, I just saw someone on a Fedora IRC channel complaining
> about the regression, that's why I'm asking. Would be really good to
> finally get this resolved...

I've ping'd Gerd last week about it, but he couldn't remember the
details of why that patch was warranted in the first place.

If it works, I'd prefer to revert the original patch that we know used
to work instead of coming up with some less proven logic, which seems to
be quite different to what it used to be.

Alex, could you try reverting 5a838e5d5825c85556011478abde708251cc0776
and letting us know the result?

Thanks!
Maxime


signature.asc
Description: PGP signature


Bug#1054514: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait

2024-03-20 Thread Linux regression tracking (Thorsten Leemhuis)
On 08.03.24 02:08, Alex Constantino wrote:
> Fix OOM scenario by doing multiple notifications to the OOM handler through
> a busy wait logic.
> Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
> result in a '[TTM] Buffer eviction failed' exception whenever it reached a
> timeout.
> 
> Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
> Link: 
> https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info
> Reported-by: Timo Lindfors 
> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514
> Signed-off-by: Alex Constantino 
> ---
>  drivers/gpu/drm/qxl/qxl_release.c | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)

Hey Dave and Gerd as well as Thomas, Maarten and Maxime (the latter two
I just added to the CC), it seems to me this regression fix did not
maybe any progress since it was posted. Did I miss something, is it just
"we are busy with the merge window", or is there some other a reason?
Just wondering, I just saw someone on a Fedora IRC channel complaining
about the regression, that's why I'm asking. Would be really good to
finally get this resolved...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

> diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
> b/drivers/gpu/drm/qxl/qxl_release.c
> index 368d26da0d6a..51c22e7f9647 100644
> --- a/drivers/gpu/drm/qxl/qxl_release.c
> +++ b/drivers/gpu/drm/qxl/qxl_release.c
> @@ -20,8 +20,6 @@
>   * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>   */
>  
> -#include 
> -
>  #include 
>  
>  #include "qxl_drv.h"
> @@ -59,14 +57,24 @@ static long qxl_fence_wait(struct dma_fence *fence, bool 
> intr,
>  {
>   struct qxl_device *qdev;
>   unsigned long cur, end = jiffies + timeout;
> + signed long iterations = 1;
> + signed long timeout_fraction = timeout;
>  
>   qdev = container_of(fence->lock, struct qxl_device, release_lock);
>  
> - if (!wait_event_timeout(qdev->release_event,
> + // using HZ as a factor since it is used in ttm_bo_wait_ctx too
> + if (timeout_fraction > HZ) {
> + iterations = timeout_fraction / HZ;
> + timeout_fraction = HZ;
> + }
> + for (int i = 0; i < iterations; i++) {
> + if (wait_event_timeout(
> + qdev->release_event,
>   (dma_fence_is_signaled(fence) ||
> -  (qxl_io_notify_oom(qdev), 0)),
> - timeout))
> - return 0;
> + (qxl_io_notify_oom(qdev), 0)),
> + timeout_fraction))
> + break;
> + }
>  
>   cur = jiffies;
>   if (time_after(cur, end))



Bug#1054514: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait

2024-03-08 Thread Thorsten Leemhuis
On 08.03.24 02:08, Alex Constantino wrote:
> Fix OOM scenario by doing multiple notifications to the OOM handler through
> a busy wait logic.
> Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
> result in a '[TTM] Buffer eviction failed' exception whenever it reached a
> timeout.

Thx for working on this.

> Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
> Link: 
> https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info

Nitpicking: that ideally should be pointing to
https://lore.kernel.org/regressions/ztgydqrlk6wx_...@eldamar.lan/ , as
that the report and not just a reply to prod things.

Ciao, Thorsten



Bug#1054514: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait

2024-03-07 Thread Alex Constantino
Fix OOM scenario by doing multiple notifications to the OOM handler through
a busy wait logic.
Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
result in a '[TTM] Buffer eviction failed' exception whenever it reached a
timeout.

Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
Link: 
https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info
Reported-by: Timo Lindfors 
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514
Signed-off-by: Alex Constantino 
---
 drivers/gpu/drm/qxl/qxl_release.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
b/drivers/gpu/drm/qxl/qxl_release.c
index 368d26da0d6a..51c22e7f9647 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -20,8 +20,6 @@
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  */
 
-#include 
-
 #include 
 
 #include "qxl_drv.h"
@@ -59,14 +57,24 @@ static long qxl_fence_wait(struct dma_fence *fence, bool 
intr,
 {
struct qxl_device *qdev;
unsigned long cur, end = jiffies + timeout;
+   signed long iterations = 1;
+   signed long timeout_fraction = timeout;
 
qdev = container_of(fence->lock, struct qxl_device, release_lock);
 
-   if (!wait_event_timeout(qdev->release_event,
+   // using HZ as a factor since it is used in ttm_bo_wait_ctx too
+   if (timeout_fraction > HZ) {
+   iterations = timeout_fraction / HZ;
+   timeout_fraction = HZ;
+   }
+   for (int i = 0; i < iterations; i++) {
+   if (wait_event_timeout(
+   qdev->release_event,
(dma_fence_is_signaled(fence) ||
-(qxl_io_notify_oom(qdev), 0)),
-   timeout))
-   return 0;
+   (qxl_io_notify_oom(qdev), 0)),
+   timeout_fraction))
+   break;
+   }
 
cur = jiffies;
if (time_after(cur, end))
-- 
2.39.2