Tomas Elf <tomas....@intel.com> writes:

> On 22/05/2015 18:05, Mika Kuoppala wrote:
>> During review of dynamic page tables series, I was able
>> to hit a lite restore bug with execlists. I assume that
>> due to incorrect pd, the batch run out of legit address space
>> and into the scratch page area. The ACTHD was increasing
>> due to scratch being all zeroes (MI_NOOPs). And as gen8
>> address space is quite large, the hangcheck happily waited
>> for a long long time, keeping the process effectively stuck.
>>
>> According to Chris Wilson any modern gpu will grind to halt
>> if it encounters commands of all ones. This seemed to do the
>> trick and hang was declared promptly when the gpu wandered into
>> the scratch land.
>>
>> v2: Use 0xffff00ff pattern (Chris)
>
> Just for my own benefit:
>
> 1. Is there any particular reason for this pattern rather than 0xffffffff?
>
> 2. Someone please correct me if I'm wrong here but at least based on my 
> own experiences with gen9 submitting batch buffers filled with bad 
> instructions (0xffffffff) to the GPU does not hang it. I'm guessing that 
> is because there's allegedly a hardware security parser that MI_NOOPs 
> out invalid instructions during execution. If that's the case here then 
> I guess we might have to come up with something else for gen9+ if we 
> want to induce engine hangs once the execution reaches the scratch page?
>

If that is the case with gen9, then we need more ducttape. Like
that we always increase busyness in hangcheck (a little) to finally
declare a hang even tho no loops are detected.

But with this and gen < 9, the execution grinds to a halt and
I get hang in a 5 second window.

-Mika

> On the other hand, on gen9+ page faulting is supposedly not broken 
> anymore so maybe we don't need the scratch page to begin with there so 
> maybe it's all moot at that point? Again, if I'm making no sense here 
> feel free to set things straight, I'm very curious about how all of this 
> is supposed to work.
>
> Thanks,
> Tomas
>
>>
>> Cc: Chris Wilson <ch...@chris-wilson.co.uk>
>> Signed-off-by: Mika Kuoppala <mika.kuopp...@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index 43fa543..a2a0c88 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -2168,6 +2168,8 @@ void i915_global_gtt_cleanup(struct drm_device *dev)
>>      vm->cleanup(vm);
>>   }
>>
>> +#define SCRATCH_PAGE_MAGIC 0xffff00ffffff00ffULL
>> +
>>   static int alloc_scratch_page(struct i915_address_space *vm)
>>   {
>>      struct i915_page_scratch *sp;
>> @@ -2185,6 +2187,7 @@ static int alloc_scratch_page(struct 
>> i915_address_space *vm)
>>              return ret;
>>      }
>>
>> +    fill_px(vm->dev, sp, SCRATCH_PAGE_MAGIC);
>>      set_pages_uc(px_page(sp), 1);
>>
>>      vm->scratch_page = sp;
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to