On 03/11/2015 09:14 AM, Daniel Vetter wrote: > On Wed, Mar 11, 2015 at 02:53:39PM +0000, John Harrison wrote: >> On 05/03/2015 14:49, Daniel Vetter wrote: >>> On Thu, Mar 05, 2015 at 01:57:31PM +0000, john.c.harri...@intel.com wrote: >>>> From: John Harrison <john.c.harri...@intel.com> >>>> >>>> The LRC submission code requires a request for tracking purposes. It does >>>> not >>>> actually require that request to 'complete' it simply uses it for keeping >>>> hold >>>> of reference counts on contexts and such like. >>>> >>>> In the case where the ring buffer is completely full, the LRC code looks >>>> for a >>>> pending request that would free up sufficient space upon completion and >>>> waits >>>> for it. If no such request can be found it resorts to simply polling the >>>> free >>>> space count until it is big enough. This situation should only occur when >>>> the >>>> entire buffer is filled with the request currently being generated. I.e., >>>> the >>>> user is trying to submit a single piece of work that is large than the ring >>>> buffer itself (which should be impossible because very large batch buffers >>>> don't >>>> consume any more ring buffer space). Before starting to poll, a submit >>>> call is >>>> made to make sure that the currently queued up work in the buffer will >>>> actually >>>> be submtted and thus the poll will eventually succeed. >>>> >>>> The problem here is that the 'official' request cannot be used as that >>>> could >>>> lead to multiple LRC submissions being tagged to a single request >>>> structure. >>>> Instead, the code fakes up a private request structure and uses that. >>>> >>>> This patch moves the faked request allocation higher up in the call stack >>>> to the >>>> wait code itself (rather than being at the very lowest submission level). >>>> Thus >>>> it is now obvious where the faked request is coming from and why it is >>>> necessary. The patch also replaces it with a call to the official request >>>> allocation code rather than attempting to duplicate that code. This becomes >>>> especially important in the future when the request allocation changes to >>>> accommodate a conversion to struct fence. >>>> >>>> For: VIZ-5115 >>>> Signed-off-by: John Harrison <john.c.harri...@intel.com> >>> This is only possible if you pile up tons of olr. Since your patch series >>> fixes this design issue by removing olr I think we can just put a WARN_ON >>> in here if this ever happens and bail out with -ELOSTMYMARBLES or >>> something. And then rip out all this complexity. >>> >>> Or do I miss something important? >>> -Daniel >> >> Yeah, you missed the extremely important bug in the free space calculation >> that meant this impossible code path was being hit on a regular basis. The >> LRC wait_request code differed from the legacy wait_request code in the the >> latter was updated with request->postfix changes and the former was not. >> Thus the LRC one would happily find a request that frees up enough space, >> wait on it, retire it and then find there was still not enough space! >> >> New patches to fix the space calculation bug and to completely remove the >> polling path will be forth coming... > > Hm, Jesse did dig out a regression where gem_ringfill blows up on > execlist. That igt is specifically to exercise this cornercases. I'm not > sure whith bugzilla Jesse meant, there's two: > > https://bugs.freedesktop.org/show_bug.cgi?id=89001 > https://bugs.freedesktop.org/show_bug.cgi?id=88865 > > Can you please check whether your fixes help for those issues two? > > Also adding Jesse since he's chasing these atm.
Ah cool, sounds related at least. 89001 was the one I looked at yesterday. The worrying thing was that this bug caused a hard system hang. Not even the reset button worked... I guess we should have an HSD for that too so we can root cause the system part of it. Jesse _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx