On 03/11/2015 09:14 AM, Daniel Vetter wrote:
> On Wed, Mar 11, 2015 at 02:53:39PM +0000, John Harrison wrote:
>> On 05/03/2015 14:49, Daniel Vetter wrote:
>>> On Thu, Mar 05, 2015 at 01:57:31PM +0000, john.c.harri...@intel.com wrote:
>>>> From: John Harrison <john.c.harri...@intel.com>
>>>>
>>>> The LRC submission code requires a request for tracking purposes. It does 
>>>> not
>>>> actually require that request to 'complete' it simply uses it for keeping 
>>>> hold
>>>> of reference counts on contexts and such like.
>>>>
>>>> In the case where the ring buffer is completely full, the LRC code looks 
>>>> for a
>>>> pending request that would free up sufficient space upon completion and 
>>>> waits
>>>> for it. If no such request can be found it resorts to simply polling the 
>>>> free
>>>> space count until it is big enough. This situation should only occur when 
>>>> the
>>>> entire buffer is filled with the request currently being generated. I.e., 
>>>> the
>>>> user is trying to submit a single piece of work that is large than the ring
>>>> buffer itself (which should be impossible because very large batch buffers 
>>>> don't
>>>> consume any more ring buffer space). Before starting to poll, a submit 
>>>> call is
>>>> made to make sure that the currently queued up work in the buffer will 
>>>> actually
>>>> be submtted and thus the poll will eventually succeed.
>>>>
>>>> The problem here is that the 'official' request cannot be used as that 
>>>> could
>>>> lead to multiple LRC submissions being tagged to a single request 
>>>> structure.
>>>> Instead, the code fakes up a private request structure and uses that.
>>>>
>>>> This patch moves the faked request allocation higher up in the call stack 
>>>> to the
>>>> wait code itself (rather than being at the very lowest submission level). 
>>>> Thus
>>>> it is now obvious where the faked request is coming from and why it is
>>>> necessary. The patch also replaces it with a call to the official request
>>>> allocation code rather than attempting to duplicate that code. This becomes
>>>> especially important in the future when the request allocation changes to
>>>> accommodate a conversion to struct fence.
>>>>
>>>> For: VIZ-5115
>>>> Signed-off-by: John Harrison <john.c.harri...@intel.com>
>>> This is only possible if you pile up tons of olr. Since your patch series
>>> fixes this design issue by removing olr I think we can just put a WARN_ON
>>> in here if this ever happens and bail out with -ELOSTMYMARBLES or
>>> something. And then rip out all this complexity.
>>>
>>> Or do I miss something important?
>>> -Daniel
>>
>> Yeah, you missed the extremely important bug in the free space calculation
>> that meant this impossible code path was being hit on a regular basis. The
>> LRC wait_request code differed from the legacy wait_request code in the the
>> latter was updated with request->postfix changes and the former was not.
>> Thus the LRC one would happily find a request that frees up enough space,
>> wait on it, retire it and then find there was still not enough space!
>>
>> New patches to fix the space calculation bug and to completely remove the
>> polling path will be forth coming...
> 
> Hm, Jesse did dig out a regression where gem_ringfill blows up on
> execlist. That igt is specifically to exercise this cornercases. I'm not
> sure whith bugzilla Jesse meant, there's two:
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=89001
> https://bugs.freedesktop.org/show_bug.cgi?id=88865
> 
> Can you please check whether your fixes help for those issues two?
> 
> Also adding Jesse since he's chasing these atm.

Ah cool, sounds related at least.  89001 was the one I looked at
yesterday.  The worrying thing was that this bug caused a hard system
hang.  Not even the reset button worked...  I guess we should have an
HSD for that too so we can root cause the system part of it.

Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to