GPU lockup dumping

2012-05-24 Thread Christian König
On 23.05.2012 19:02, Jerome Glisse wrote:
> On Wed, May 23, 2012 at 12:41 PM, Dave Airlie  wrote:
>> On Wed, May 23, 2012 at 5:26 PM, Jerome Glisse  wrote:
>>> On Wed, May 23, 2012 at 12:08 PM, Dave Airlie  wrote:
 On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse  
 wrote:
> On Wed, May 23, 2012 at 8:34 AM, Christian K?nig
>   wrote:
>> On 23.05.2012 11:27, Dave Airlie wrote:
>>> On Thu, May 17, 2012 at 7:28 PM,wrote:
 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.
>>> I really don't like introducing this at this stage into 3.5,
>>>
>>> I'd really like a good review of the API and what information we provide
>>> along with how extensible it is.
>>>
>>> I'm still not convinced replay is what we want in the field, I know its
>>> what
>>> *you* want, but I think apitrace stuff in userspace pretty much covers
>>> the replaying situation. So I'd have to look at this and see how easy
>>> it makes disecting command streams etc.
>>>
>>> Dave.
>>
>> I agree that it might not be a good idea to push that into 3.5, since at
>> least I (and I also think Alex) didn't had time to look into it yet. On 
>> the
>> other hand the patches look quite reasonable.
>>
>> But I still wanted to throw in a requirement from my day to day work, 
>> maybe
>> that helps finding a more general solution:
>> When we start to work with more parts of the chip it might be necessary 
>> to
>> dump everything that is currently "in the fly". For example I had a whole
>> bunch of problems where copying data around with a 3D Blit and then 
>> missing
>> a sync between this job and a job on another rings causes a "hiccup" in 
>> the
>> hardware.
>>
>> I know that this isn't your focus and that is absolutely ok with me, 
>> cause
>> the format you are introducing is just used in debugfs and so not part of
>> any stable API (at least not in my understanding), but you should still 
>> keep
>> in mind that we might need to extend it into that direction in the 
>> future.
>>
>> Christian.
> Note that my format is also done with that in mind, it can capture ib
> from all rings. The only thing i don't think worth capturing are the
> ring themself because there would be no way to replay them without
> adding some new special API.
 I'd like to dump the rings as well, as I said I'd rather we didn't
 limit this to replay, but make it useful for getting as much info as
 possible out

 Dave.
>>> Ring will contains very little, like ib schedule and fence, i don't
>>> see how useful this can be.
>>>
>> In case we have a bug in our ib scheduling or fencing :-0
>>
>> Dave.
> Well i think we have several kind of lockup, the most basic one is
> userspace sending broken shader, vertex, or something in that line.
> The more complex one is timing related, like a bo move or some cache
> invalidation that didn't happen properly and GPU endup reading either
> wrong data or old cached data. I don't see how to capture useful
> information for this second case, beside doing snapshot of memory.
>
> For multi-ring i agree that dumping the ring might prove useful spot
> inter-ring semaphore deadlock, or possibly inter-ring absence of
> synchronization (but that would be a bad kernel bug).

I don't think that we need the actual data from the rings neither (at 
least as long as we keep the radeon_ring_* debugfs files). But it would 
still be nice to know weather or not there was a sync between the rings. 
See the patches I just send to you (sorry, actually send more patches 
than I wanted to send), storing the new sync_seq array within the debug 
output should enable us to actually figure out the dependencies and 
order between different IBs.

Cheers,
Christian.



Re: GPU lockup dumping

2012-05-24 Thread Christian König

On 23.05.2012 19:02, Jerome Glisse wrote:

On Wed, May 23, 2012 at 12:41 PM, Dave Airlieairl...@gmail.com  wrote:

On Wed, May 23, 2012 at 5:26 PM, Jerome Glissej.gli...@gmail.com  wrote:

On Wed, May 23, 2012 at 12:08 PM, Dave Airlieairl...@gmail.com  wrote:

On Wed, May 23, 2012 at 3:48 PM, Jerome Glissej.gli...@gmail.com  wrote:

On Wed, May 23, 2012 at 8:34 AM, Christian König
deathsim...@vodafone.de  wrote:

On 23.05.2012 11:27, Dave Airlie wrote:

On Thu, May 17, 2012 at 7:28 PM,j.gli...@gmail.comwrote:

So here is improved patchset, where i splited ground work necessary
for the dumping into their own patch. The debugfs improvement could
probably be usefull to intel instead of having i915 have it's own
debugfs file stuff.

The lockup dumping public api have been move into radeon_drm.h

Stressing the fact again that dump are self contained ie they have
all the data needed to be replayed (vertex, indices, shader, texture,
...).

Would really like to get this into 3.5, the new API is pretty much
straightforward and userspace tools can easily be made to convert
it to other format. The change to the driver is self contained.

I really don't like introducing this at this stage into 3.5,

I'd really like a good review of the API and what information we provide
along with how extensible it is.

I'm still not convinced replay is what we want in the field, I know its
what
*you* want, but I think apitrace stuff in userspace pretty much covers
the replaying situation. So I'd have to look at this and see how easy
it makes disecting command streams etc.

Dave.


I agree that it might not be a good idea to push that into 3.5, since at
least I (and I also think Alex) didn't had time to look into it yet. On the
other hand the patches look quite reasonable.

But I still wanted to throw in a requirement from my day to day work, maybe
that helps finding a more general solution:
When we start to work with more parts of the chip it might be necessary to
dump everything that is currently in the fly. For example I had a whole
bunch of problems where copying data around with a 3D Blit and then missing
a sync between this job and a job on another rings causes a hiccup in the
hardware.

I know that this isn't your focus and that is absolutely ok with me, cause
the format you are introducing is just used in debugfs and so not part of
any stable API (at least not in my understanding), but you should still keep
in mind that we might need to extend it into that direction in the future.

Christian.

Note that my format is also done with that in mind, it can capture ib
from all rings. The only thing i don't think worth capturing are the
ring themself because there would be no way to replay them without
adding some new special API.

I'd like to dump the rings as well, as I said I'd rather we didn't
limit this to replay, but make it useful for getting as much info as
possible out

Dave.

Ring will contains very little, like ib schedule and fence, i don't
see how useful this can be.


In case we have a bug in our ib scheduling or fencing :-0

Dave.

Well i think we have several kind of lockup, the most basic one is
userspace sending broken shader, vertex, or something in that line.
The more complex one is timing related, like a bo move or some cache
invalidation that didn't happen properly and GPU endup reading either
wrong data or old cached data. I don't see how to capture useful
information for this second case, beside doing snapshot of memory.

For multi-ring i agree that dumping the ring might prove useful spot
inter-ring semaphore deadlock, or possibly inter-ring absence of
synchronization (but that would be a bad kernel bug).


I don't think that we need the actual data from the rings neither (at 
least as long as we keep the radeon_ring_* debugfs files). But it would 
still be nice to know weather or not there was a sync between the rings. 
See the patches I just send to you (sorry, actually send more patches 
than I wanted to send), storing the new sync_seq array within the debug 
output should enable us to actually figure out the dependencies and 
order between different IBs.


Cheers,
Christian.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


GPU lockup dumping

2012-05-23 Thread Dave Airlie
On Wed, May 23, 2012 at 5:26 PM, Jerome Glisse  wrote:
> On Wed, May 23, 2012 at 12:08 PM, Dave Airlie  wrote:
>> On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse  wrote:
>>> On Wed, May 23, 2012 at 8:34 AM, Christian K?nig
>>>  wrote:
 On 23.05.2012 11:27, Dave Airlie wrote:
>
> On Thu, May 17, 2012 at 7:28 PM, ?wrote:
>>
>> So here is improved patchset, where i splited ground work necessary
>> for the dumping into their own patch. The debugfs improvement could
>> probably be usefull to intel instead of having i915 have it's own
>> debugfs file stuff.
>>
>> The lockup dumping public api have been move into radeon_drm.h
>>
>> Stressing the fact again that dump are self contained ie they have
>> all the data needed to be replayed (vertex, indices, shader, texture,
>> ...).
>>
>> Would really like to get this into 3.5, the new API is pretty much
>> straightforward and userspace tools can easily be made to convert
>> it to other format. The change to the driver is self contained.
>
> I really don't like introducing this at this stage into 3.5,
>
> I'd really like a good review of the API and what information we provide
> along with how extensible it is.
>
> I'm still not convinced replay is what we want in the field, I know its
> what
> *you* want, but I think apitrace stuff in userspace pretty much covers
> the replaying situation. So I'd have to look at this and see how easy
> it makes disecting command streams etc.
>
> Dave.


 I agree that it might not be a good idea to push that into 3.5, since at
 least I (and I also think Alex) didn't had time to look into it yet. On the
 other hand the patches look quite reasonable.

 But I still wanted to throw in a requirement from my day to day work, maybe
 that helps finding a more general solution:
 When we start to work with more parts of the chip it might be necessary to
 dump everything that is currently "in the fly". For example I had a whole
 bunch of problems where copying data around with a 3D Blit and then missing
 a sync between this job and a job on another rings causes a "hiccup" in the
 hardware.

 I know that this isn't your focus and that is absolutely ok with me, cause
 the format you are introducing is just used in debugfs and so not part of
 any stable API (at least not in my understanding), but you should still 
 keep
 in mind that we might need to extend it into that direction in the future.

 Christian.
>>>
>>> Note that my format is also done with that in mind, it can capture ib
>>> from all rings. The only thing i don't think worth capturing are the
>>> ring themself because there would be no way to replay them without
>>> adding some new special API.
>>
>> I'd like to dump the rings as well, as I said I'd rather we didn't
>> limit this to replay, but make it useful for getting as much info as
>> possible out
>>
>> Dave.
>
> Ring will contains very little, like ib schedule and fence, i don't
> see how useful this can be.
>

In case we have a bug in our ib scheduling or fencing :-0

Dave.


GPU lockup dumping

2012-05-23 Thread Dave Airlie
On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse  wrote:
> On Wed, May 23, 2012 at 8:34 AM, Christian K?nig
>  wrote:
>> On 23.05.2012 11:27, Dave Airlie wrote:
>>>
>>> On Thu, May 17, 2012 at 7:28 PM, ?wrote:

 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.
>>>
>>> I really don't like introducing this at this stage into 3.5,
>>>
>>> I'd really like a good review of the API and what information we provide
>>> along with how extensible it is.
>>>
>>> I'm still not convinced replay is what we want in the field, I know its
>>> what
>>> *you* want, but I think apitrace stuff in userspace pretty much covers
>>> the replaying situation. So I'd have to look at this and see how easy
>>> it makes disecting command streams etc.
>>>
>>> Dave.
>>
>>
>> I agree that it might not be a good idea to push that into 3.5, since at
>> least I (and I also think Alex) didn't had time to look into it yet. On the
>> other hand the patches look quite reasonable.
>>
>> But I still wanted to throw in a requirement from my day to day work, maybe
>> that helps finding a more general solution:
>> When we start to work with more parts of the chip it might be necessary to
>> dump everything that is currently "in the fly". For example I had a whole
>> bunch of problems where copying data around with a 3D Blit and then missing
>> a sync between this job and a job on another rings causes a "hiccup" in the
>> hardware.
>>
>> I know that this isn't your focus and that is absolutely ok with me, cause
>> the format you are introducing is just used in debugfs and so not part of
>> any stable API (at least not in my understanding), but you should still keep
>> in mind that we might need to extend it into that direction in the future.
>>
>> Christian.
>
> Note that my format is also done with that in mind, it can capture ib
> from all rings. The only thing i don't think worth capturing are the
> ring themself because there would be no way to replay them without
> adding some new special API.

I'd like to dump the rings as well, as I said I'd rather we didn't
limit this to replay, but make it useful for getting as much info as
possible out

Dave.


GPU lockup dumping

2012-05-23 Thread Christian König
On 23.05.2012 11:27, Dave Airlie wrote:
> On Thu, May 17, 2012 at 7:28 PM,  wrote:
>> So here is improved patchset, where i splited ground work necessary
>> for the dumping into their own patch. The debugfs improvement could
>> probably be usefull to intel instead of having i915 have it's own
>> debugfs file stuff.
>>
>> The lockup dumping public api have been move into radeon_drm.h
>>
>> Stressing the fact again that dump are self contained ie they have
>> all the data needed to be replayed (vertex, indices, shader, texture,
>> ...).
>>
>> Would really like to get this into 3.5, the new API is pretty much
>> straightforward and userspace tools can easily be made to convert
>> it to other format. The change to the driver is self contained.
> I really don't like introducing this at this stage into 3.5,
>
> I'd really like a good review of the API and what information we provide
> along with how extensible it is.
>
> I'm still not convinced replay is what we want in the field, I know its what
> *you* want, but I think apitrace stuff in userspace pretty much covers
> the replaying situation. So I'd have to look at this and see how easy
> it makes disecting command streams etc.
>
> Dave.

I agree that it might not be a good idea to push that into 3.5, since at 
least I (and I also think Alex) didn't had time to look into it yet. On 
the other hand the patches look quite reasonable.

But I still wanted to throw in a requirement from my day to day work, 
maybe that helps finding a more general solution:
When we start to work with more parts of the chip it might be necessary 
to dump everything that is currently "in the fly". For example I had a 
whole bunch of problems where copying data around with a 3D Blit and 
then missing a sync between this job and a job on another rings causes a 
"hiccup" in the hardware.

I know that this isn't your focus and that is absolutely ok with me, 
cause the format you are introducing is just used in debugfs and so not 
part of any stable API (at least not in my understanding), but you 
should still keep in mind that we might need to extend it into that 
direction in the future.

Christian.


GPU lockup dumping

2012-05-23 Thread Jordan Crouse
On 05/23/2012 08:51 AM, Jerome Glisse wrote:
> On Wed, May 23, 2012 at 5:27 AM, Dave Airlie  wrote:
>> On Thu, May 17, 2012 at 7:28 PM,  wrote:
>>> So here is improved patchset, where i splited ground work necessary
>>> for the dumping into their own patch. The debugfs improvement could
>>> probably be usefull to intel instead of having i915 have it's own
>>> debugfs file stuff.
>>>
>>> The lockup dumping public api have been move into radeon_drm.h
>>>
>>> Stressing the fact again that dump are self contained ie they have
>>> all the data needed to be replayed (vertex, indices, shader, texture,
>>> ...).
>>>
>>> Would really like to get this into 3.5, the new API is pretty much
>>> straightforward and userspace tools can easily be made to convert
>>> it to other format. The change to the driver is self contained.
>>
>> I really don't like introducing this at this stage into 3.5,
>>
>> I'd really like a good review of the API and what information we provide
>> along with how extensible it is.
>>
>> I'm still not convinced replay is what we want in the field, I know its what
>> *you* want, but I think apitrace stuff in userspace pretty much covers
>> the replaying situation. So I'd have to look at this and see how easy
>> it makes disecting command streams etc.
>>
>> Dave.
>
> It store pciid and allow to dump all ib per ring, and all associated
> bo object. It also have a bunch of flags to help the userspace tools
> (like does userspace need to clear offset (vm vs no vm) ...  What more
> do you want to know ?

Another useful thing might be current register states. We've been doing
dumping (we call it snapshot) in the Qualcomm driver for a little bit now and
between registers, rings, command buffers and various buffers we've been able
to get a reasonably good picture of the state suitable for playback on emulators
and other silly userspace tricks.

We have structs for registers and index/data register pairs because we also dump
lots of debug registers and queues and other various HW sources.

The implementation is way different for obvious reasons but I would love to
consolidate on a single format. Its easy for us to do since we share similar
architectures, but if  least two GPUs support the same format it can be a 
catalyst
for others to join.

https://www.codeaurora.org/gitweb/quic/la/?p=kernel/msm.git;a=blob;f=drivers/gpu/msm/kgsl_snapshot.h;hb=refs/heads/msm-3.0

Jordan


GPU lockup dumping

2012-05-23 Thread Jerome Glisse
On Wed, May 23, 2012 at 12:41 PM, Dave Airlie  wrote:
> On Wed, May 23, 2012 at 5:26 PM, Jerome Glisse  wrote:
>> On Wed, May 23, 2012 at 12:08 PM, Dave Airlie  wrote:
>>> On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse  
>>> wrote:
 On Wed, May 23, 2012 at 8:34 AM, Christian K?nig
  wrote:
> On 23.05.2012 11:27, Dave Airlie wrote:
>>
>> On Thu, May 17, 2012 at 7:28 PM, ?wrote:
>>>
>>> So here is improved patchset, where i splited ground work necessary
>>> for the dumping into their own patch. The debugfs improvement could
>>> probably be usefull to intel instead of having i915 have it's own
>>> debugfs file stuff.
>>>
>>> The lockup dumping public api have been move into radeon_drm.h
>>>
>>> Stressing the fact again that dump are self contained ie they have
>>> all the data needed to be replayed (vertex, indices, shader, texture,
>>> ...).
>>>
>>> Would really like to get this into 3.5, the new API is pretty much
>>> straightforward and userspace tools can easily be made to convert
>>> it to other format. The change to the driver is self contained.
>>
>> I really don't like introducing this at this stage into 3.5,
>>
>> I'd really like a good review of the API and what information we provide
>> along with how extensible it is.
>>
>> I'm still not convinced replay is what we want in the field, I know its
>> what
>> *you* want, but I think apitrace stuff in userspace pretty much covers
>> the replaying situation. So I'd have to look at this and see how easy
>> it makes disecting command streams etc.
>>
>> Dave.
>
>
> I agree that it might not be a good idea to push that into 3.5, since at
> least I (and I also think Alex) didn't had time to look into it yet. On 
> the
> other hand the patches look quite reasonable.
>
> But I still wanted to throw in a requirement from my day to day work, 
> maybe
> that helps finding a more general solution:
> When we start to work with more parts of the chip it might be necessary to
> dump everything that is currently "in the fly". For example I had a whole
> bunch of problems where copying data around with a 3D Blit and then 
> missing
> a sync between this job and a job on another rings causes a "hiccup" in 
> the
> hardware.
>
> I know that this isn't your focus and that is absolutely ok with me, cause
> the format you are introducing is just used in debugfs and so not part of
> any stable API (at least not in my understanding), but you should still 
> keep
> in mind that we might need to extend it into that direction in the future.
>
> Christian.

 Note that my format is also done with that in mind, it can capture ib
 from all rings. The only thing i don't think worth capturing are the
 ring themself because there would be no way to replay them without
 adding some new special API.
>>>
>>> I'd like to dump the rings as well, as I said I'd rather we didn't
>>> limit this to replay, but make it useful for getting as much info as
>>> possible out
>>>
>>> Dave.
>>
>> Ring will contains very little, like ib schedule and fence, i don't
>> see how useful this can be.
>>
>
> In case we have a bug in our ib scheduling or fencing :-0
>
> Dave.

Well i think we have several kind of lockup, the most basic one is
userspace sending broken shader, vertex, or something in that line.
The more complex one is timing related, like a bo move or some cache
invalidation that didn't happen properly and GPU endup reading either
wrong data or old cached data. I don't see how to capture useful
information for this second case, beside doing snapshot of memory.

For multi-ring i agree that dumping the ring might prove useful spot
inter-ring semaphore deadlock, or possibly inter-ring absence of
synchronization (but that would be a bad kernel bug).

Cheers,
Jerome


GPU lockup dumping

2012-05-23 Thread Jerome Glisse
On Wed, May 23, 2012 at 12:08 PM, Dave Airlie  wrote:
> On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse  wrote:
>> On Wed, May 23, 2012 at 8:34 AM, Christian K?nig
>>  wrote:
>>> On 23.05.2012 11:27, Dave Airlie wrote:

 On Thu, May 17, 2012 at 7:28 PM, ?wrote:
>
> So here is improved patchset, where i splited ground work necessary
> for the dumping into their own patch. The debugfs improvement could
> probably be usefull to intel instead of having i915 have it's own
> debugfs file stuff.
>
> The lockup dumping public api have been move into radeon_drm.h
>
> Stressing the fact again that dump are self contained ie they have
> all the data needed to be replayed (vertex, indices, shader, texture,
> ...).
>
> Would really like to get this into 3.5, the new API is pretty much
> straightforward and userspace tools can easily be made to convert
> it to other format. The change to the driver is self contained.

 I really don't like introducing this at this stage into 3.5,

 I'd really like a good review of the API and what information we provide
 along with how extensible it is.

 I'm still not convinced replay is what we want in the field, I know its
 what
 *you* want, but I think apitrace stuff in userspace pretty much covers
 the replaying situation. So I'd have to look at this and see how easy
 it makes disecting command streams etc.

 Dave.
>>>
>>>
>>> I agree that it might not be a good idea to push that into 3.5, since at
>>> least I (and I also think Alex) didn't had time to look into it yet. On the
>>> other hand the patches look quite reasonable.
>>>
>>> But I still wanted to throw in a requirement from my day to day work, maybe
>>> that helps finding a more general solution:
>>> When we start to work with more parts of the chip it might be necessary to
>>> dump everything that is currently "in the fly". For example I had a whole
>>> bunch of problems where copying data around with a 3D Blit and then missing
>>> a sync between this job and a job on another rings causes a "hiccup" in the
>>> hardware.
>>>
>>> I know that this isn't your focus and that is absolutely ok with me, cause
>>> the format you are introducing is just used in debugfs and so not part of
>>> any stable API (at least not in my understanding), but you should still keep
>>> in mind that we might need to extend it into that direction in the future.
>>>
>>> Christian.
>>
>> Note that my format is also done with that in mind, it can capture ib
>> from all rings. The only thing i don't think worth capturing are the
>> ring themself because there would be no way to replay them without
>> adding some new special API.
>
> I'd like to dump the rings as well, as I said I'd rather we didn't
> limit this to replay, but make it useful for getting as much info as
> possible out
>
> Dave.

Ring will contains very little, like ib schedule and fence, i don't
see how useful this can be.

Cheers,
Jerome


GPU lockup dumping

2012-05-23 Thread Jerome Glisse
On Wed, May 23, 2012 at 12:04 PM, Alex Deucher  wrote:
> On Wed, May 23, 2012 at 8:34 AM, Christian K?nig
>  wrote:
>> On 23.05.2012 11:27, Dave Airlie wrote:
>>>
>>> On Thu, May 17, 2012 at 7:28 PM, ?wrote:

 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.
>>>
>>> I really don't like introducing this at this stage into 3.5,
>>>
>>> I'd really like a good review of the API and what information we provide
>>> along with how extensible it is.
>>>
>>> I'm still not convinced replay is what we want in the field, I know its
>>> what
>>> *you* want, but I think apitrace stuff in userspace pretty much covers
>>> the replaying situation. So I'd have to look at this and see how easy
>>> it makes disecting command streams etc.
>>>
>>> Dave.
>>
>>
>> I agree that it might not be a good idea to push that into 3.5, since at
>> least I (and I also think Alex) didn't had time to look into it yet. On the
>> other hand the patches look quite reasonable.
>>
>> But I still wanted to throw in a requirement from my day to day work, maybe
>> that helps finding a more general solution:
>> When we start to work with more parts of the chip it might be necessary to
>> dump everything that is currently "in the fly". For example I had a whole
>> bunch of problems where copying data around with a 3D Blit and then missing
>> a sync between this job and a job on another rings causes a "hiccup" in the
>> hardware.
>>
>> I know that this isn't your focus and that is absolutely ok with me, cause
>> the format you are introducing is just used in debugfs and so not part of
>> any stable API (at least not in my understanding), but you should still keep
>> in mind that we might need to extend it into that direction in the future.
>>
>
> I'm ok with it as long as we have a path to implement support for the
> internal dump format so I can have the hw guys play them back on the
> simulators and such.
>
> Alex



GPU lockup dumping

2012-05-23 Thread Alex Deucher
On Wed, May 23, 2012 at 8:34 AM, Christian K?nig
 wrote:
> On 23.05.2012 11:27, Dave Airlie wrote:
>>
>> On Thu, May 17, 2012 at 7:28 PM, ?wrote:
>>>
>>> So here is improved patchset, where i splited ground work necessary
>>> for the dumping into their own patch. The debugfs improvement could
>>> probably be usefull to intel instead of having i915 have it's own
>>> debugfs file stuff.
>>>
>>> The lockup dumping public api have been move into radeon_drm.h
>>>
>>> Stressing the fact again that dump are self contained ie they have
>>> all the data needed to be replayed (vertex, indices, shader, texture,
>>> ...).
>>>
>>> Would really like to get this into 3.5, the new API is pretty much
>>> straightforward and userspace tools can easily be made to convert
>>> it to other format. The change to the driver is self contained.
>>
>> I really don't like introducing this at this stage into 3.5,
>>
>> I'd really like a good review of the API and what information we provide
>> along with how extensible it is.
>>
>> I'm still not convinced replay is what we want in the field, I know its
>> what
>> *you* want, but I think apitrace stuff in userspace pretty much covers
>> the replaying situation. So I'd have to look at this and see how easy
>> it makes disecting command streams etc.
>>
>> Dave.
>
>
> I agree that it might not be a good idea to push that into 3.5, since at
> least I (and I also think Alex) didn't had time to look into it yet. On the
> other hand the patches look quite reasonable.
>
> But I still wanted to throw in a requirement from my day to day work, maybe
> that helps finding a more general solution:
> When we start to work with more parts of the chip it might be necessary to
> dump everything that is currently "in the fly". For example I had a whole
> bunch of problems where copying data around with a 3D Blit and then missing
> a sync between this job and a job on another rings causes a "hiccup" in the
> hardware.
>
> I know that this isn't your focus and that is absolutely ok with me, cause
> the format you are introducing is just used in debugfs and so not part of
> any stable API (at least not in my understanding), but you should still keep
> in mind that we might need to extend it into that direction in the future.
>

I'm ok with it as long as we have a path to implement support for the
internal dump format so I can have the hw guys play them back on the
simulators and such.

Alex

> Christian.
>
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


GPU lockup dumping

2012-05-23 Thread Jerome Glisse
On Wed, May 23, 2012 at 5:27 AM, Dave Airlie  wrote:
> On Thu, May 17, 2012 at 7:28 PM, ? wrote:
>> So here is improved patchset, where i splited ground work necessary
>> for the dumping into their own patch. The debugfs improvement could
>> probably be usefull to intel instead of having i915 have it's own
>> debugfs file stuff.
>>
>> The lockup dumping public api have been move into radeon_drm.h
>>
>> Stressing the fact again that dump are self contained ie they have
>> all the data needed to be replayed (vertex, indices, shader, texture,
>> ...).
>>
>> Would really like to get this into 3.5, the new API is pretty much
>> straightforward and userspace tools can easily be made to convert
>> it to other format. The change to the driver is self contained.
>
> I really don't like introducing this at this stage into 3.5,
>
> I'd really like a good review of the API and what information we provide
> along with how extensible it is.
>
> I'm still not convinced replay is what we want in the field, I know its what
> *you* want, but I think apitrace stuff in userspace pretty much covers
> the replaying situation. So I'd have to look at this and see how easy
> it makes disecting command streams etc.
>
> Dave.

It store pciid and allow to dump all ib per ring, and all associated
bo object. It also have a bunch of flags to help the userspace tools
(like does userspace need to clear offset (vm vs no vm) ...  What more
do you want to know ?

Cheers,
Jerome


GPU lockup dumping

2012-05-23 Thread Jerome Glisse
On Wed, May 23, 2012 at 8:34 AM, Christian K?nig
 wrote:
> On 23.05.2012 11:27, Dave Airlie wrote:
>>
>> On Thu, May 17, 2012 at 7:28 PM, ?wrote:
>>>
>>> So here is improved patchset, where i splited ground work necessary
>>> for the dumping into their own patch. The debugfs improvement could
>>> probably be usefull to intel instead of having i915 have it's own
>>> debugfs file stuff.
>>>
>>> The lockup dumping public api have been move into radeon_drm.h
>>>
>>> Stressing the fact again that dump are self contained ie they have
>>> all the data needed to be replayed (vertex, indices, shader, texture,
>>> ...).
>>>
>>> Would really like to get this into 3.5, the new API is pretty much
>>> straightforward and userspace tools can easily be made to convert
>>> it to other format. The change to the driver is self contained.
>>
>> I really don't like introducing this at this stage into 3.5,
>>
>> I'd really like a good review of the API and what information we provide
>> along with how extensible it is.
>>
>> I'm still not convinced replay is what we want in the field, I know its
>> what
>> *you* want, but I think apitrace stuff in userspace pretty much covers
>> the replaying situation. So I'd have to look at this and see how easy
>> it makes disecting command streams etc.
>>
>> Dave.
>
>
> I agree that it might not be a good idea to push that into 3.5, since at
> least I (and I also think Alex) didn't had time to look into it yet. On the
> other hand the patches look quite reasonable.
>
> But I still wanted to throw in a requirement from my day to day work, maybe
> that helps finding a more general solution:
> When we start to work with more parts of the chip it might be necessary to
> dump everything that is currently "in the fly". For example I had a whole
> bunch of problems where copying data around with a 3D Blit and then missing
> a sync between this job and a job on another rings causes a "hiccup" in the
> hardware.
>
> I know that this isn't your focus and that is absolutely ok with me, cause
> the format you are introducing is just used in debugfs and so not part of
> any stable API (at least not in my understanding), but you should still keep
> in mind that we might need to extend it into that direction in the future.
>
> Christian.

Note that my format is also done with that in mind, it can capture ib
from all rings. The only thing i don't think worth capturing are the
ring themself because there would be no way to replay them without
adding some new special API.

Cheers,
Jerome


GPU lockup dumping

2012-05-23 Thread Dave Airlie
On Thu, May 17, 2012 at 7:28 PM,   wrote:
> So here is improved patchset, where i splited ground work necessary
> for the dumping into their own patch. The debugfs improvement could
> probably be usefull to intel instead of having i915 have it's own
> debugfs file stuff.
>
> The lockup dumping public api have been move into radeon_drm.h
>
> Stressing the fact again that dump are self contained ie they have
> all the data needed to be replayed (vertex, indices, shader, texture,
> ...).
>
> Would really like to get this into 3.5, the new API is pretty much
> straightforward and userspace tools can easily be made to convert
> it to other format. The change to the driver is self contained.

I really don't like introducing this at this stage into 3.5,

I'd really like a good review of the API and what information we provide
along with how extensible it is.

I'm still not convinced replay is what we want in the field, I know its what
*you* want, but I think apitrace stuff in userspace pretty much covers
the replaying situation. So I'd have to look at this and see how easy
it makes disecting command streams etc.

Dave.


Re: GPU lockup dumping

2012-05-23 Thread Dave Airlie
On Thu, May 17, 2012 at 7:28 PM,  j.gli...@gmail.com wrote:
 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.

I really don't like introducing this at this stage into 3.5,

I'd really like a good review of the API and what information we provide
along with how extensible it is.

I'm still not convinced replay is what we want in the field, I know its what
*you* want, but I think apitrace stuff in userspace pretty much covers
the replaying situation. So I'd have to look at this and see how easy
it makes disecting command streams etc.

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup dumping

2012-05-23 Thread Christian König

On 23.05.2012 11:27, Dave Airlie wrote:

On Thu, May 17, 2012 at 7:28 PM,j.gli...@gmail.com  wrote:

So here is improved patchset, where i splited ground work necessary
for the dumping into their own patch. The debugfs improvement could
probably be usefull to intel instead of having i915 have it's own
debugfs file stuff.

The lockup dumping public api have been move into radeon_drm.h

Stressing the fact again that dump are self contained ie they have
all the data needed to be replayed (vertex, indices, shader, texture,
...).

Would really like to get this into 3.5, the new API is pretty much
straightforward and userspace tools can easily be made to convert
it to other format. The change to the driver is self contained.

I really don't like introducing this at this stage into 3.5,

I'd really like a good review of the API and what information we provide
along with how extensible it is.

I'm still not convinced replay is what we want in the field, I know its what
*you* want, but I think apitrace stuff in userspace pretty much covers
the replaying situation. So I'd have to look at this and see how easy
it makes disecting command streams etc.

Dave.


I agree that it might not be a good idea to push that into 3.5, since at 
least I (and I also think Alex) didn't had time to look into it yet. On 
the other hand the patches look quite reasonable.


But I still wanted to throw in a requirement from my day to day work, 
maybe that helps finding a more general solution:
When we start to work with more parts of the chip it might be necessary 
to dump everything that is currently in the fly. For example I had a 
whole bunch of problems where copying data around with a 3D Blit and 
then missing a sync between this job and a job on another rings causes a 
hiccup in the hardware.


I know that this isn't your focus and that is absolutely ok with me, 
cause the format you are introducing is just used in debugfs and so not 
part of any stable API (at least not in my understanding), but you 
should still keep in mind that we might need to extend it into that 
direction in the future.


Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup dumping

2012-05-23 Thread Jerome Glisse
On Wed, May 23, 2012 at 8:34 AM, Christian König
deathsim...@vodafone.de wrote:
 On 23.05.2012 11:27, Dave Airlie wrote:

 On Thu, May 17, 2012 at 7:28 PM,j.gli...@gmail.com  wrote:

 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.

 I really don't like introducing this at this stage into 3.5,

 I'd really like a good review of the API and what information we provide
 along with how extensible it is.

 I'm still not convinced replay is what we want in the field, I know its
 what
 *you* want, but I think apitrace stuff in userspace pretty much covers
 the replaying situation. So I'd have to look at this and see how easy
 it makes disecting command streams etc.

 Dave.


 I agree that it might not be a good idea to push that into 3.5, since at
 least I (and I also think Alex) didn't had time to look into it yet. On the
 other hand the patches look quite reasonable.

 But I still wanted to throw in a requirement from my day to day work, maybe
 that helps finding a more general solution:
 When we start to work with more parts of the chip it might be necessary to
 dump everything that is currently in the fly. For example I had a whole
 bunch of problems where copying data around with a 3D Blit and then missing
 a sync between this job and a job on another rings causes a hiccup in the
 hardware.

 I know that this isn't your focus and that is absolutely ok with me, cause
 the format you are introducing is just used in debugfs and so not part of
 any stable API (at least not in my understanding), but you should still keep
 in mind that we might need to extend it into that direction in the future.

 Christian.

Note that my format is also done with that in mind, it can capture ib
from all rings. The only thing i don't think worth capturing are the
ring themself because there would be no way to replay them without
adding some new special API.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup dumping

2012-05-23 Thread Jerome Glisse
On Wed, May 23, 2012 at 5:27 AM, Dave Airlie airl...@gmail.com wrote:
 On Thu, May 17, 2012 at 7:28 PM,  j.gli...@gmail.com wrote:
 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.

 I really don't like introducing this at this stage into 3.5,

 I'd really like a good review of the API and what information we provide
 along with how extensible it is.

 I'm still not convinced replay is what we want in the field, I know its what
 *you* want, but I think apitrace stuff in userspace pretty much covers
 the replaying situation. So I'd have to look at this and see how easy
 it makes disecting command streams etc.

 Dave.

It store pciid and allow to dump all ib per ring, and all associated
bo object. It also have a bunch of flags to help the userspace tools
(like does userspace need to clear offset (vm vs no vm) ...  What more
do you want to know ?

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup dumping

2012-05-23 Thread Alex Deucher
On Wed, May 23, 2012 at 8:34 AM, Christian König
deathsim...@vodafone.de wrote:
 On 23.05.2012 11:27, Dave Airlie wrote:

 On Thu, May 17, 2012 at 7:28 PM,j.gli...@gmail.com  wrote:

 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.

 I really don't like introducing this at this stage into 3.5,

 I'd really like a good review of the API and what information we provide
 along with how extensible it is.

 I'm still not convinced replay is what we want in the field, I know its
 what
 *you* want, but I think apitrace stuff in userspace pretty much covers
 the replaying situation. So I'd have to look at this and see how easy
 it makes disecting command streams etc.

 Dave.


 I agree that it might not be a good idea to push that into 3.5, since at
 least I (and I also think Alex) didn't had time to look into it yet. On the
 other hand the patches look quite reasonable.

 But I still wanted to throw in a requirement from my day to day work, maybe
 that helps finding a more general solution:
 When we start to work with more parts of the chip it might be necessary to
 dump everything that is currently in the fly. For example I had a whole
 bunch of problems where copying data around with a 3D Blit and then missing
 a sync between this job and a job on another rings causes a hiccup in the
 hardware.

 I know that this isn't your focus and that is absolutely ok with me, cause
 the format you are introducing is just used in debugfs and so not part of
 any stable API (at least not in my understanding), but you should still keep
 in mind that we might need to extend it into that direction in the future.


I'm ok with it as long as we have a path to implement support for the
internal dump format so I can have the hw guys play them back on the
simulators and such.

Alex

 Christian.

 ___
 dri-devel mailing list
 dri-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup dumping

2012-05-23 Thread Dave Airlie
On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Wed, May 23, 2012 at 8:34 AM, Christian König
 deathsim...@vodafone.de wrote:
 On 23.05.2012 11:27, Dave Airlie wrote:

 On Thu, May 17, 2012 at 7:28 PM,j.gli...@gmail.com  wrote:

 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.

 I really don't like introducing this at this stage into 3.5,

 I'd really like a good review of the API and what information we provide
 along with how extensible it is.

 I'm still not convinced replay is what we want in the field, I know its
 what
 *you* want, but I think apitrace stuff in userspace pretty much covers
 the replaying situation. So I'd have to look at this and see how easy
 it makes disecting command streams etc.

 Dave.


 I agree that it might not be a good idea to push that into 3.5, since at
 least I (and I also think Alex) didn't had time to look into it yet. On the
 other hand the patches look quite reasonable.

 But I still wanted to throw in a requirement from my day to day work, maybe
 that helps finding a more general solution:
 When we start to work with more parts of the chip it might be necessary to
 dump everything that is currently in the fly. For example I had a whole
 bunch of problems where copying data around with a 3D Blit and then missing
 a sync between this job and a job on another rings causes a hiccup in the
 hardware.

 I know that this isn't your focus and that is absolutely ok with me, cause
 the format you are introducing is just used in debugfs and so not part of
 any stable API (at least not in my understanding), but you should still keep
 in mind that we might need to extend it into that direction in the future.

 Christian.

 Note that my format is also done with that in mind, it can capture ib
 from all rings. The only thing i don't think worth capturing are the
 ring themself because there would be no way to replay them without
 adding some new special API.

I'd like to dump the rings as well, as I said I'd rather we didn't
limit this to replay, but make it useful for getting as much info as
possible out

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup dumping

2012-05-23 Thread Jerome Glisse
On Wed, May 23, 2012 at 12:04 PM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Wed, May 23, 2012 at 8:34 AM, Christian König
 deathsim...@vodafone.de wrote:
 On 23.05.2012 11:27, Dave Airlie wrote:

 On Thu, May 17, 2012 at 7:28 PM,j.gli...@gmail.com  wrote:

 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.

 I really don't like introducing this at this stage into 3.5,

 I'd really like a good review of the API and what information we provide
 along with how extensible it is.

 I'm still not convinced replay is what we want in the field, I know its
 what
 *you* want, but I think apitrace stuff in userspace pretty much covers
 the replaying situation. So I'd have to look at this and see how easy
 it makes disecting command streams etc.

 Dave.


 I agree that it might not be a good idea to push that into 3.5, since at
 least I (and I also think Alex) didn't had time to look into it yet. On the
 other hand the patches look quite reasonable.

 But I still wanted to throw in a requirement from my day to day work, maybe
 that helps finding a more general solution:
 When we start to work with more parts of the chip it might be necessary to
 dump everything that is currently in the fly. For example I had a whole
 bunch of problems where copying data around with a 3D Blit and then missing
 a sync between this job and a job on another rings causes a hiccup in the
 hardware.

 I know that this isn't your focus and that is absolutely ok with me, cause
 the format you are introducing is just used in debugfs and so not part of
 any stable API (at least not in my understanding), but you should still keep
 in mind that we might need to extend it into that direction in the future.


 I'm ok with it as long as we have a path to implement support for the
 internal dump format so I can have the hw guys play them back on the
 simulators and such.

 Alex

From what i see of the internal dump format, it's way more complex and
i don't think kernel is the right place for it, for instance you need
to set the primary surface thing, i don't want to parse cmd stream in
kernel to figure that out. I would rather have userspace tool that
postprocess thing and convert it to proper AMD dump format.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup dumping

2012-05-23 Thread Jerome Glisse
On Wed, May 23, 2012 at 12:08 PM, Dave Airlie airl...@gmail.com wrote:
 On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Wed, May 23, 2012 at 8:34 AM, Christian König
 deathsim...@vodafone.de wrote:
 On 23.05.2012 11:27, Dave Airlie wrote:

 On Thu, May 17, 2012 at 7:28 PM,j.gli...@gmail.com  wrote:

 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.

 I really don't like introducing this at this stage into 3.5,

 I'd really like a good review of the API and what information we provide
 along with how extensible it is.

 I'm still not convinced replay is what we want in the field, I know its
 what
 *you* want, but I think apitrace stuff in userspace pretty much covers
 the replaying situation. So I'd have to look at this and see how easy
 it makes disecting command streams etc.

 Dave.


 I agree that it might not be a good idea to push that into 3.5, since at
 least I (and I also think Alex) didn't had time to look into it yet. On the
 other hand the patches look quite reasonable.

 But I still wanted to throw in a requirement from my day to day work, maybe
 that helps finding a more general solution:
 When we start to work with more parts of the chip it might be necessary to
 dump everything that is currently in the fly. For example I had a whole
 bunch of problems where copying data around with a 3D Blit and then missing
 a sync between this job and a job on another rings causes a hiccup in the
 hardware.

 I know that this isn't your focus and that is absolutely ok with me, cause
 the format you are introducing is just used in debugfs and so not part of
 any stable API (at least not in my understanding), but you should still keep
 in mind that we might need to extend it into that direction in the future.

 Christian.

 Note that my format is also done with that in mind, it can capture ib
 from all rings. The only thing i don't think worth capturing are the
 ring themself because there would be no way to replay them without
 adding some new special API.

 I'd like to dump the rings as well, as I said I'd rather we didn't
 limit this to replay, but make it useful for getting as much info as
 possible out

 Dave.

Ring will contains very little, like ib schedule and fence, i don't
see how useful this can be.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup dumping

2012-05-23 Thread Dave Airlie
On Wed, May 23, 2012 at 5:26 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Wed, May 23, 2012 at 12:08 PM, Dave Airlie airl...@gmail.com wrote:
 On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Wed, May 23, 2012 at 8:34 AM, Christian König
 deathsim...@vodafone.de wrote:
 On 23.05.2012 11:27, Dave Airlie wrote:

 On Thu, May 17, 2012 at 7:28 PM,j.gli...@gmail.com  wrote:

 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.

 I really don't like introducing this at this stage into 3.5,

 I'd really like a good review of the API and what information we provide
 along with how extensible it is.

 I'm still not convinced replay is what we want in the field, I know its
 what
 *you* want, but I think apitrace stuff in userspace pretty much covers
 the replaying situation. So I'd have to look at this and see how easy
 it makes disecting command streams etc.

 Dave.


 I agree that it might not be a good idea to push that into 3.5, since at
 least I (and I also think Alex) didn't had time to look into it yet. On the
 other hand the patches look quite reasonable.

 But I still wanted to throw in a requirement from my day to day work, maybe
 that helps finding a more general solution:
 When we start to work with more parts of the chip it might be necessary to
 dump everything that is currently in the fly. For example I had a whole
 bunch of problems where copying data around with a 3D Blit and then missing
 a sync between this job and a job on another rings causes a hiccup in the
 hardware.

 I know that this isn't your focus and that is absolutely ok with me, cause
 the format you are introducing is just used in debugfs and so not part of
 any stable API (at least not in my understanding), but you should still 
 keep
 in mind that we might need to extend it into that direction in the future.

 Christian.

 Note that my format is also done with that in mind, it can capture ib
 from all rings. The only thing i don't think worth capturing are the
 ring themself because there would be no way to replay them without
 adding some new special API.

 I'd like to dump the rings as well, as I said I'd rather we didn't
 limit this to replay, but make it useful for getting as much info as
 possible out

 Dave.

 Ring will contains very little, like ib schedule and fence, i don't
 see how useful this can be.


In case we have a bug in our ib scheduling or fencing :-0

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup dumping

2012-05-23 Thread Jerome Glisse
On Wed, May 23, 2012 at 12:41 PM, Dave Airlie airl...@gmail.com wrote:
 On Wed, May 23, 2012 at 5:26 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Wed, May 23, 2012 at 12:08 PM, Dave Airlie airl...@gmail.com wrote:
 On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Wed, May 23, 2012 at 8:34 AM, Christian König
 deathsim...@vodafone.de wrote:
 On 23.05.2012 11:27, Dave Airlie wrote:

 On Thu, May 17, 2012 at 7:28 PM,j.gli...@gmail.com  wrote:

 So here is improved patchset, where i splited ground work necessary
 for the dumping into their own patch. The debugfs improvement could
 probably be usefull to intel instead of having i915 have it's own
 debugfs file stuff.

 The lockup dumping public api have been move into radeon_drm.h

 Stressing the fact again that dump are self contained ie they have
 all the data needed to be replayed (vertex, indices, shader, texture,
 ...).

 Would really like to get this into 3.5, the new API is pretty much
 straightforward and userspace tools can easily be made to convert
 it to other format. The change to the driver is self contained.

 I really don't like introducing this at this stage into 3.5,

 I'd really like a good review of the API and what information we provide
 along with how extensible it is.

 I'm still not convinced replay is what we want in the field, I know its
 what
 *you* want, but I think apitrace stuff in userspace pretty much covers
 the replaying situation. So I'd have to look at this and see how easy
 it makes disecting command streams etc.

 Dave.


 I agree that it might not be a good idea to push that into 3.5, since at
 least I (and I also think Alex) didn't had time to look into it yet. On 
 the
 other hand the patches look quite reasonable.

 But I still wanted to throw in a requirement from my day to day work, 
 maybe
 that helps finding a more general solution:
 When we start to work with more parts of the chip it might be necessary to
 dump everything that is currently in the fly. For example I had a whole
 bunch of problems where copying data around with a 3D Blit and then 
 missing
 a sync between this job and a job on another rings causes a hiccup in 
 the
 hardware.

 I know that this isn't your focus and that is absolutely ok with me, cause
 the format you are introducing is just used in debugfs and so not part of
 any stable API (at least not in my understanding), but you should still 
 keep
 in mind that we might need to extend it into that direction in the future.

 Christian.

 Note that my format is also done with that in mind, it can capture ib
 from all rings. The only thing i don't think worth capturing are the
 ring themself because there would be no way to replay them without
 adding some new special API.

 I'd like to dump the rings as well, as I said I'd rather we didn't
 limit this to replay, but make it useful for getting as much info as
 possible out

 Dave.

 Ring will contains very little, like ib schedule and fence, i don't
 see how useful this can be.


 In case we have a bug in our ib scheduling or fencing :-0

 Dave.

Well i think we have several kind of lockup, the most basic one is
userspace sending broken shader, vertex, or something in that line.
The more complex one is timing related, like a bo move or some cache
invalidation that didn't happen properly and GPU endup reading either
wrong data or old cached data. I don't see how to capture useful
information for this second case, beside doing snapshot of memory.

For multi-ring i agree that dumping the ring might prove useful spot
inter-ring semaphore deadlock, or possibly inter-ring absence of
synchronization (but that would be a bad kernel bug).

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup dumping

2012-05-23 Thread Jordan Crouse

On 05/23/2012 08:51 AM, Jerome Glisse wrote:

On Wed, May 23, 2012 at 5:27 AM, Dave Airlieairl...@gmail.com  wrote:

On Thu, May 17, 2012 at 7:28 PM,j.gli...@gmail.com  wrote:

So here is improved patchset, where i splited ground work necessary
for the dumping into their own patch. The debugfs improvement could
probably be usefull to intel instead of having i915 have it's own
debugfs file stuff.

The lockup dumping public api have been move into radeon_drm.h

Stressing the fact again that dump are self contained ie they have
all the data needed to be replayed (vertex, indices, shader, texture,
...).

Would really like to get this into 3.5, the new API is pretty much
straightforward and userspace tools can easily be made to convert
it to other format. The change to the driver is self contained.


I really don't like introducing this at this stage into 3.5,

I'd really like a good review of the API and what information we provide
along with how extensible it is.

I'm still not convinced replay is what we want in the field, I know its what
*you* want, but I think apitrace stuff in userspace pretty much covers
the replaying situation. So I'd have to look at this and see how easy
it makes disecting command streams etc.

Dave.


It store pciid and allow to dump all ib per ring, and all associated
bo object. It also have a bunch of flags to help the userspace tools
(like does userspace need to clear offset (vm vs no vm) ...  What more
do you want to know ?


Another useful thing might be current register states. We've been doing
dumping (we call it snapshot) in the Qualcomm driver for a little bit now and
between registers, rings, command buffers and various buffers we've been able
to get a reasonably good picture of the state suitable for playback on emulators
and other silly userspace tricks.

We have structs for registers and index/data register pairs because we also dump
lots of debug registers and queues and other various HW sources.

The implementation is way different for obvious reasons but I would love to
consolidate on a single format. Its easy for us to do since we share similar
architectures, but if  least two GPUs support the same format it can be a 
catalyst
for others to join.

https://www.codeaurora.org/gitweb/quic/la/?p=kernel/msm.git;a=blob;f=drivers/gpu/msm/kgsl_snapshot.h;hb=refs/heads/msm-3.0

Jordan
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


GPU lockup dumping

2012-05-22 Thread Jerome Glisse
On Thu, May 17, 2012 at 6:07 PM,   wrote:
> Ok this time is final version, i added a bunch of flags to cmd buffer
> to make the userspace tools life easier.
>
> Cheers,
> Jerome
>

So updated libdrm patch :
http://people.freedesktop.org/~glisse/lockup/0001-radeon-add-rati-dumping-helper.patch

Also updated git://people.freedesktop.org/~glisse/joujou and now you
can replay a gpu lockup file captured with those patch on r6xx hw.

Example captured from the kernel (i manualy edited the file to fix
what was causing the lockup so you can replay it safely).
http://people.freedesktop.org/~glisse/lockup/lockup-dump.rati

Also improved the text conversion see :

http://people.freedesktop.org/~glisse/lockup/lockup-dump.tati

Will do r7xx and the evergreen/cayman. But i do think that the kernel
patch are good as is.

Cheers,
Jerome


Re: GPU lockup dumping

2012-05-22 Thread Jerome Glisse
On Thu, May 17, 2012 at 6:07 PM,  j.gli...@gmail.com wrote:
 Ok this time is final version, i added a bunch of flags to cmd buffer
 to make the userspace tools life easier.

 Cheers,
 Jerome


So updated libdrm patch :
http://people.freedesktop.org/~glisse/lockup/0001-radeon-add-rati-dumping-helper.patch

Also updated git://people.freedesktop.org/~glisse/joujou and now you
can replay a gpu lockup file captured with those patch on r6xx hw.

Example captured from the kernel (i manualy edited the file to fix
what was causing the lockup so you can replay it safely).
http://people.freedesktop.org/~glisse/lockup/lockup-dump.rati

Also improved the text conversion see :

http://people.freedesktop.org/~glisse/lockup/lockup-dump.tati

Will do r7xx and the evergreen/cayman. But i do think that the kernel
patch are good as is.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


GPU lockup dumping

2012-05-17 Thread j.gli...@gmail.com
Ok this time is final version, i added a bunch of flags to cmd buffer
to make the userspace tools life easier.

Cheers,
Jerome



GPU lockup dumping

2012-05-17 Thread j.gli...@gmail.com
Make the format more future proof reliable by adding a total chunk
size field that allow old userspace to skip over potentialy new
chunk. Not sure this is really needed but hey.

Jerome


GPU lockup dumping

2012-05-17 Thread j.gli...@gmail.com
So here is improved patchset, where i splited ground work necessary
for the dumping into their own patch. The debugfs improvement could
probably be usefull to intel instead of having i915 have it's own
debugfs file stuff.

The lockup dumping public api have been move into radeon_drm.h

Stressing the fact again that dump are self contained ie they have
all the data needed to be replayed (vertex, indices, shader, texture,
...).

Would really like to get this into 3.5, the new API is pretty much
straightforward and userspace tools can easily be made to convert
it to other format. The change to the driver is self contained.

Cheers,
Jerome


GPU lockup dumping

2012-05-17 Thread j . glisse
So here is improved patchset, where i splited ground work necessary
for the dumping into their own patch. The debugfs improvement could
probably be usefull to intel instead of having i915 have it's own
debugfs file stuff.

The lockup dumping public api have been move into radeon_drm.h

Stressing the fact again that dump are self contained ie they have
all the data needed to be replayed (vertex, indices, shader, texture,
...).

Would really like to get this into 3.5, the new API is pretty much
straightforward and userspace tools can easily be made to convert
it to other format. The change to the driver is self contained.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


GPU lockup dumping

2012-05-17 Thread j . glisse
Make the format more future proof reliable by adding a total chunk
size field that allow old userspace to skip over potentialy new
chunk. Not sure this is really needed but hey.

Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


GPU lockup dumping

2012-05-17 Thread j . glisse
Ok this time is final version, i added a bunch of flags to cmd buffer
to make the userspace tools life easier.

Cheers,
Jerome

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel