On 23.05.2012 19:02, Jerome Glisse wrote: > On Wed, May 23, 2012 at 12:41 PM, Dave Airlie<airlied at gmail.com> wrote: >> On Wed, May 23, 2012 at 5:26 PM, Jerome Glisse<j.glisse at gmail.com> wrote: >>> On Wed, May 23, 2012 at 12:08 PM, Dave Airlie<airlied at gmail.com> wrote: >>>> On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse<j.glisse at gmail.com> >>>> wrote: >>>>> On Wed, May 23, 2012 at 8:34 AM, Christian K?nig >>>>> <deathsimple at vodafone.de> wrote: >>>>>> On 23.05.2012 11:27, Dave Airlie wrote: >>>>>>> On Thu, May 17, 2012 at 7:28 PM,<j.glisse at gmail.com> wrote: >>>>>>>> So here is improved patchset, where i splited ground work necessary >>>>>>>> for the dumping into their own patch. The debugfs improvement could >>>>>>>> probably be usefull to intel instead of having i915 have it's own >>>>>>>> debugfs file stuff. >>>>>>>> >>>>>>>> The lockup dumping public api have been move into radeon_drm.h >>>>>>>> >>>>>>>> Stressing the fact again that dump are self contained ie they have >>>>>>>> all the data needed to be replayed (vertex, indices, shader, texture, >>>>>>>> ...). >>>>>>>> >>>>>>>> Would really like to get this into 3.5, the new API is pretty much >>>>>>>> straightforward and userspace tools can easily be made to convert >>>>>>>> it to other format. The change to the driver is self contained. >>>>>>> I really don't like introducing this at this stage into 3.5, >>>>>>> >>>>>>> I'd really like a good review of the API and what information we provide >>>>>>> along with how extensible it is. >>>>>>> >>>>>>> I'm still not convinced replay is what we want in the field, I know its >>>>>>> what >>>>>>> *you* want, but I think apitrace stuff in userspace pretty much covers >>>>>>> the replaying situation. So I'd have to look at this and see how easy >>>>>>> it makes disecting command streams etc. >>>>>>> >>>>>>> Dave. >>>>>> >>>>>> I agree that it might not be a good idea to push that into 3.5, since at >>>>>> least I (and I also think Alex) didn't had time to look into it yet. On >>>>>> the >>>>>> other hand the patches look quite reasonable. >>>>>> >>>>>> But I still wanted to throw in a requirement from my day to day work, >>>>>> maybe >>>>>> that helps finding a more general solution: >>>>>> When we start to work with more parts of the chip it might be necessary >>>>>> to >>>>>> dump everything that is currently "in the fly". For example I had a whole >>>>>> bunch of problems where copying data around with a 3D Blit and then >>>>>> missing >>>>>> a sync between this job and a job on another rings causes a "hiccup" in >>>>>> the >>>>>> hardware. >>>>>> >>>>>> I know that this isn't your focus and that is absolutely ok with me, >>>>>> cause >>>>>> the format you are introducing is just used in debugfs and so not part of >>>>>> any stable API (at least not in my understanding), but you should still >>>>>> keep >>>>>> in mind that we might need to extend it into that direction in the >>>>>> future. >>>>>> >>>>>> Christian. >>>>> Note that my format is also done with that in mind, it can capture ib >>>>> from all rings. The only thing i don't think worth capturing are the >>>>> ring themself because there would be no way to replay them without >>>>> adding some new special API. >>>> I'd like to dump the rings as well, as I said I'd rather we didn't >>>> limit this to replay, but make it useful for getting as much info as >>>> possible out >>>> >>>> Dave. >>> Ring will contains very little, like ib schedule and fence, i don't >>> see how useful this can be. >>> >> In case we have a bug in our ib scheduling or fencing :-0 >> >> Dave. > Well i think we have several kind of lockup, the most basic one is > userspace sending broken shader, vertex, or something in that line. > The more complex one is timing related, like a bo move or some cache > invalidation that didn't happen properly and GPU endup reading either > wrong data or old cached data. I don't see how to capture useful > information for this second case, beside doing snapshot of memory. > > For multi-ring i agree that dumping the ring might prove useful spot > inter-ring semaphore deadlock, or possibly inter-ring absence of > synchronization (but that would be a bad kernel bug).
I don't think that we need the actual data from the rings neither (at least as long as we keep the radeon_ring_* debugfs files). But it would still be nice to know weather or not there was a sync between the rings. See the patches I just send to you (sorry, actually send more patches than I wanted to send), storing the new sync_seq array within the debug output should enable us to actually figure out the dependencies and order between different IBs. Cheers, Christian.