Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
On Wed, Feb 06, 2013 at 12:58:19PM -0800, Terje Bergström wrote: > On 05.02.2013 01:15, Thierry Reding wrote: > > On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote: > >> This is used by debugfs code to direct to debugfs, and > >> nvhost_debug_dump() to send via printk. > > > > Yes, that was precisely my point. Why bother providing the same data via > > several output methods. debugfs is good for showing large amounts of > > data such as register dumps or a tabular representation of syncpoints > > for instance. > > > > If, however, you want to interactively show debug information using > > printk the same format isn't very useful and something more reduced is > > often better. > > debugfs is there to be able to get a reliable dump of host1x state > (f.ex. no lines intermixed with other output). > > printk output is there because often we get just UART logs from failure > cases, and having as much information as possible in the logs speeds up > debugging. > > Both of them need to output the values of sync points, and the channel > state. Dumping all of that consists of a lot of code, and I wouldn't > want to duplicate that for two output formats. I'm still not convinced, but I think I could live with it. =) > >> I have another suggestion. In downstream I removed the decoding part and > >> I just print out a string of hex. That removes quite a bit bunch of code > >> from kernel. It makes the debug output also more compact. > > I don't know. I think if you use in-kernel debugging facilities such as > > debugfs or printk, then the output should be self-explanatory. However I > > do see the usefulness of having a binary dump that can be decoded in > > userspace. But I think if we want to go that way we should make that a > > separate interface. USB provides something like that, which can then be > > fed to libpcap or wireshark to capture and analyze USB traffic. If done > > properly you get replay functionality for free. I don't know what infra- > > structure exists to help with implementing something similar. > > It's not actually binary. I think I misrepresented the suggestion. > > I'm suggesting that we'd display only the contents of command FIFO and > contents of gathers (i.e. all opcodes) in hex format, not decoded. All > other text would remain as is, so syncpt values, etc would be readable > by a glance. > > The user space tool can then take the streams and decode them if needed. > > We've noticed that the decoded opcodes format can be very long and > sometimes takes a minute to dump out via a slow console. The hex output > is much more compact and faster to dump. > > Actual tracing or wireshark kind of capability would come via decoding > the ftrace log. When enabled, everything that is written to the channel, > is also written to ftrace. Okay, I'll have to take a closer look at ftrace since I've never used it before. It sounds like extra infrastructure won't be necessary then. > +static void show_channel_word(struct output *o, int *state, int *count, > +u32 addr, u32 val, struct host1x_cdma *cdma) > +{ > +static int start_count, dont_print; > >>> > >>> What if two processes read debug information at the same time? > >> > >> show_channels() acquires cdma.lock, so that shouldn't happen. > > > > Okay. Another solution would be to pass around a debug context which > > keeps track of the variables. But if we opt for a more involved dump > > interface as discussed above this will no longer be relevant. > > Actually, debugging process needs cdma.lock, because it goes through the > cdma queue. Also command FIFO dumping is something that must be done by > a single thread at a time. > > > Okay. In the context of a channel dump interface this may not be > > relevant anymore. Can you think of any issue that wouldn't be detectable > > or debuggable by analyzing a binary dump of the data within a channel? > > I'm asking because at that point we wouldn't be able to access any of > > the in-kernel data structures but would have to rely on the data itself > > for diagnostics. IOMMU virtual addresses won't be available and so on. > > In many cases, looking at syncpt values, and channel state > (active/waiting on a syncpt, etc) gives an indication on what is the > current state of hardware. But, very often problems are ripple effects > on something that happened earlier and the job that caused the problem > has already been freed and is not visible in the dump. > > To get a full history, we need often need the ftrace log. So that's already covered. Great! Thierry pgpcnj7iSskoz.pgp Description: PGP signature
Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
On Wed, Feb 06, 2013 at 12:58:19PM -0800, Terje Bergström wrote: On 05.02.2013 01:15, Thierry Reding wrote: On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote: This is used by debugfs code to direct to debugfs, and nvhost_debug_dump() to send via printk. Yes, that was precisely my point. Why bother providing the same data via several output methods. debugfs is good for showing large amounts of data such as register dumps or a tabular representation of syncpoints for instance. If, however, you want to interactively show debug information using printk the same format isn't very useful and something more reduced is often better. debugfs is there to be able to get a reliable dump of host1x state (f.ex. no lines intermixed with other output). printk output is there because often we get just UART logs from failure cases, and having as much information as possible in the logs speeds up debugging. Both of them need to output the values of sync points, and the channel state. Dumping all of that consists of a lot of code, and I wouldn't want to duplicate that for two output formats. I'm still not convinced, but I think I could live with it. =) I have another suggestion. In downstream I removed the decoding part and I just print out a string of hex. That removes quite a bit bunch of code from kernel. It makes the debug output also more compact. I don't know. I think if you use in-kernel debugging facilities such as debugfs or printk, then the output should be self-explanatory. However I do see the usefulness of having a binary dump that can be decoded in userspace. But I think if we want to go that way we should make that a separate interface. USB provides something like that, which can then be fed to libpcap or wireshark to capture and analyze USB traffic. If done properly you get replay functionality for free. I don't know what infra- structure exists to help with implementing something similar. It's not actually binary. I think I misrepresented the suggestion. I'm suggesting that we'd display only the contents of command FIFO and contents of gathers (i.e. all opcodes) in hex format, not decoded. All other text would remain as is, so syncpt values, etc would be readable by a glance. The user space tool can then take the streams and decode them if needed. We've noticed that the decoded opcodes format can be very long and sometimes takes a minute to dump out via a slow console. The hex output is much more compact and faster to dump. Actual tracing or wireshark kind of capability would come via decoding the ftrace log. When enabled, everything that is written to the channel, is also written to ftrace. Okay, I'll have to take a closer look at ftrace since I've never used it before. It sounds like extra infrastructure won't be necessary then. +static void show_channel_word(struct output *o, int *state, int *count, +u32 addr, u32 val, struct host1x_cdma *cdma) +{ +static int start_count, dont_print; What if two processes read debug information at the same time? show_channels() acquires cdma.lock, so that shouldn't happen. Okay. Another solution would be to pass around a debug context which keeps track of the variables. But if we opt for a more involved dump interface as discussed above this will no longer be relevant. Actually, debugging process needs cdma.lock, because it goes through the cdma queue. Also command FIFO dumping is something that must be done by a single thread at a time. Okay. In the context of a channel dump interface this may not be relevant anymore. Can you think of any issue that wouldn't be detectable or debuggable by analyzing a binary dump of the data within a channel? I'm asking because at that point we wouldn't be able to access any of the in-kernel data structures but would have to rely on the data itself for diagnostics. IOMMU virtual addresses won't be available and so on. In many cases, looking at syncpt values, and channel state (active/waiting on a syncpt, etc) gives an indication on what is the current state of hardware. But, very often problems are ripple effects on something that happened earlier and the job that caused the problem has already been freed and is not visible in the dump. To get a full history, we need often need the ftrace log. So that's already covered. Great! Thierry pgpcnj7iSskoz.pgp Description: PGP signature
Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
On 05.02.2013 01:15, Thierry Reding wrote: > On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote: >> This is used by debugfs code to direct to debugfs, and >> nvhost_debug_dump() to send via printk. > > Yes, that was precisely my point. Why bother providing the same data via > several output methods. debugfs is good for showing large amounts of > data such as register dumps or a tabular representation of syncpoints > for instance. > > If, however, you want to interactively show debug information using > printk the same format isn't very useful and something more reduced is > often better. debugfs is there to be able to get a reliable dump of host1x state (f.ex. no lines intermixed with other output). printk output is there because often we get just UART logs from failure cases, and having as much information as possible in the logs speeds up debugging. Both of them need to output the values of sync points, and the channel state. Dumping all of that consists of a lot of code, and I wouldn't want to duplicate that for two output formats. >> I have another suggestion. In downstream I removed the decoding part and >> I just print out a string of hex. That removes quite a bit bunch of code >> from kernel. It makes the debug output also more compact. > I don't know. I think if you use in-kernel debugging facilities such as > debugfs or printk, then the output should be self-explanatory. However I > do see the usefulness of having a binary dump that can be decoded in > userspace. But I think if we want to go that way we should make that a > separate interface. USB provides something like that, which can then be > fed to libpcap or wireshark to capture and analyze USB traffic. If done > properly you get replay functionality for free. I don't know what infra- > structure exists to help with implementing something similar. It's not actually binary. I think I misrepresented the suggestion. I'm suggesting that we'd display only the contents of command FIFO and contents of gathers (i.e. all opcodes) in hex format, not decoded. All other text would remain as is, so syncpt values, etc would be readable by a glance. The user space tool can then take the streams and decode them if needed. We've noticed that the decoded opcodes format can be very long and sometimes takes a minute to dump out via a slow console. The hex output is much more compact and faster to dump. Actual tracing or wireshark kind of capability would come via decoding the ftrace log. When enabled, everything that is written to the channel, is also written to ftrace. +static void show_channel_word(struct output *o, int *state, int *count, + u32 addr, u32 val, struct host1x_cdma *cdma) +{ + static int start_count, dont_print; >>> >>> What if two processes read debug information at the same time? >> >> show_channels() acquires cdma.lock, so that shouldn't happen. > > Okay. Another solution would be to pass around a debug context which > keeps track of the variables. But if we opt for a more involved dump > interface as discussed above this will no longer be relevant. Actually, debugging process needs cdma.lock, because it goes through the cdma queue. Also command FIFO dumping is something that must be done by a single thread at a time. > Okay. In the context of a channel dump interface this may not be > relevant anymore. Can you think of any issue that wouldn't be detectable > or debuggable by analyzing a binary dump of the data within a channel? > I'm asking because at that point we wouldn't be able to access any of > the in-kernel data structures but would have to rely on the data itself > for diagnostics. IOMMU virtual addresses won't be available and so on. In many cases, looking at syncpt values, and channel state (active/waiting on a syncpt, etc) gives an indication on what is the current state of hardware. But, very often problems are ripple effects on something that happened earlier and the job that caused the problem has already been freed and is not visible in the dump. To get a full history, we need often need the ftrace log. Terje -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
On 05.02.2013 01:15, Thierry Reding wrote: On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote: This is used by debugfs code to direct to debugfs, and nvhost_debug_dump() to send via printk. Yes, that was precisely my point. Why bother providing the same data via several output methods. debugfs is good for showing large amounts of data such as register dumps or a tabular representation of syncpoints for instance. If, however, you want to interactively show debug information using printk the same format isn't very useful and something more reduced is often better. debugfs is there to be able to get a reliable dump of host1x state (f.ex. no lines intermixed with other output). printk output is there because often we get just UART logs from failure cases, and having as much information as possible in the logs speeds up debugging. Both of them need to output the values of sync points, and the channel state. Dumping all of that consists of a lot of code, and I wouldn't want to duplicate that for two output formats. I have another suggestion. In downstream I removed the decoding part and I just print out a string of hex. That removes quite a bit bunch of code from kernel. It makes the debug output also more compact. I don't know. I think if you use in-kernel debugging facilities such as debugfs or printk, then the output should be self-explanatory. However I do see the usefulness of having a binary dump that can be decoded in userspace. But I think if we want to go that way we should make that a separate interface. USB provides something like that, which can then be fed to libpcap or wireshark to capture and analyze USB traffic. If done properly you get replay functionality for free. I don't know what infra- structure exists to help with implementing something similar. It's not actually binary. I think I misrepresented the suggestion. I'm suggesting that we'd display only the contents of command FIFO and contents of gathers (i.e. all opcodes) in hex format, not decoded. All other text would remain as is, so syncpt values, etc would be readable by a glance. The user space tool can then take the streams and decode them if needed. We've noticed that the decoded opcodes format can be very long and sometimes takes a minute to dump out via a slow console. The hex output is much more compact and faster to dump. Actual tracing or wireshark kind of capability would come via decoding the ftrace log. When enabled, everything that is written to the channel, is also written to ftrace. +static void show_channel_word(struct output *o, int *state, int *count, + u32 addr, u32 val, struct host1x_cdma *cdma) +{ + static int start_count, dont_print; What if two processes read debug information at the same time? show_channels() acquires cdma.lock, so that shouldn't happen. Okay. Another solution would be to pass around a debug context which keeps track of the variables. But if we opt for a more involved dump interface as discussed above this will no longer be relevant. Actually, debugging process needs cdma.lock, because it goes through the cdma queue. Also command FIFO dumping is something that must be done by a single thread at a time. Okay. In the context of a channel dump interface this may not be relevant anymore. Can you think of any issue that wouldn't be detectable or debuggable by analyzing a binary dump of the data within a channel? I'm asking because at that point we wouldn't be able to access any of the in-kernel data structures but would have to rely on the data itself for diagnostics. IOMMU virtual addresses won't be available and so on. In many cases, looking at syncpt values, and channel state (active/waiting on a syncpt, etc) gives an indication on what is the current state of hardware. But, very often problems are ripple effects on something that happened earlier and the job that caused the problem has already been freed and is not visible in the dump. To get a full history, we need often need the ftrace log. Terje -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote: > On 04.02.2013 03:03, Thierry Reding wrote: > > On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote: > >> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h > > [...] > >> +struct output { > >> + void (*fn)(void *ctx, const char *str, size_t len); > >> + void *ctx; > >> + char buf[256]; > >> +}; > > > > Do we really need this kind of abstraction? There really should be only > > one location where debug information is obtained, so I don't see a need > > for this. > > This is used by debugfs code to direct to debugfs, and > nvhost_debug_dump() to send via printk. Yes, that was precisely my point. Why bother providing the same data via several output methods. debugfs is good for showing large amounts of data such as register dumps or a tabular representation of syncpoints for instance. If, however, you want to interactively show debug information using printk the same format isn't very useful and something more reduced is often better. > >> diff --git a/drivers/gpu/host1x/hw/debug_hw.c > >> b/drivers/gpu/host1x/hw/debug_hw.c > > > >> +static int show_channel_command(struct output *o, u32 addr, u32 val, int > >> *count) > >> +{ > >> + unsigned mask; > >> + unsigned subop; > >> + > >> + switch (val >> 28) { > >> + case 0x0: > > > > These can easily be derived by looking at the debug output, but it may > > still make sense to assign symbolic names to them. > > I have another suggestion. In downstream I removed the decoding part and > I just print out a string of hex. That removes quite a bit bunch of code > from kernel. It makes the debug output also more compact. > > It's much easier to write a user space program to decode than maintain > it in kernel. I don't know. I think if you use in-kernel debugging facilities such as debugfs or printk, then the output should be self-explanatory. However I do see the usefulness of having a binary dump that can be decoded in userspace. But I think if we want to go that way we should make that a separate interface. USB provides something like that, which can then be fed to libpcap or wireshark to capture and analyze USB traffic. If done properly you get replay functionality for free. I don't know what infra- structure exists to help with implementing something similar. So I think having debugfs output some data about syncpoints or the state of channels might be useful to quickly diagnose a certain set of problems but for anything more involved maybe a complete binary dump may be better. I'm not sure whether doing this would be acceptable though. Maybe Dave or somebody else on the lists can answer that. An alternative way to achieve the same would be to hook ioctl() from userspace and not do any of it in kernel space. > >> +static void show_channel_word(struct output *o, int *state, int *count, > >> + u32 addr, u32 val, struct host1x_cdma *cdma) > >> +{ > >> + static int start_count, dont_print; > > > > What if two processes read debug information at the same time? > > show_channels() acquires cdma.lock, so that shouldn't happen. Okay. Another solution would be to pass around a debug context which keeps track of the variables. But if we opt for a more involved dump interface as discussed above this will no longer be relevant. > >> +static void do_show_channel_gather(struct output *o, > >> + phys_addr_t phys_addr, > >> + u32 words, struct host1x_cdma *cdma, > >> + phys_addr_t pin_addr, u32 *map_addr) > >> +{ > >> + /* Map dmaget cursor to corresponding mem handle */ > >> + u32 offset; > >> + int state, count, i; > >> + > >> + offset = phys_addr - pin_addr; > >> + /* > >> + * Sometimes we're given different hardware address to the same > >> + * page - in these cases the offset will get an invalid number and > >> + * we just have to bail out. > >> + */ > > > > Why's that? > > Because of a race - memory might've been unpinned and unmapped from > IOMMU and when we re-map (pin), we are given a new address. > > But, I think this comment is a bit stale - we used to dump also old > gathers. The latest code only dumps jobs in sync queue, so the race > shouldn't happen. Okay. In the context of a channel dump interface this may not be relevant anymore. Can you think of any issue that wouldn't be detectable or debuggable by analyzing a binary dump of the data within a channel? I'm asking because at that point we wouldn't be able to access any of the in-kernel data structures but would have to rely on the data itself for diagnostics. IOMMU virtual addresses won't be available and so on. > >> +static void host1x_debug_show_channel_cdma(struct host1x *m, > >> + struct host1x_channel *ch, struct output *o, int chid) > >> +{ > > [...] > >> + switch (cbstat) { > >> + case 0x00010008: > > > > Again, symbolic names would be nice. > > I propose I remove the decoding from kernel, and save 200 lines. I think it could
Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote: On 04.02.2013 03:03, Thierry Reding wrote: On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote: diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h [...] +struct output { + void (*fn)(void *ctx, const char *str, size_t len); + void *ctx; + char buf[256]; +}; Do we really need this kind of abstraction? There really should be only one location where debug information is obtained, so I don't see a need for this. This is used by debugfs code to direct to debugfs, and nvhost_debug_dump() to send via printk. Yes, that was precisely my point. Why bother providing the same data via several output methods. debugfs is good for showing large amounts of data such as register dumps or a tabular representation of syncpoints for instance. If, however, you want to interactively show debug information using printk the same format isn't very useful and something more reduced is often better. diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c +static int show_channel_command(struct output *o, u32 addr, u32 val, int *count) +{ + unsigned mask; + unsigned subop; + + switch (val 28) { + case 0x0: These can easily be derived by looking at the debug output, but it may still make sense to assign symbolic names to them. I have another suggestion. In downstream I removed the decoding part and I just print out a string of hex. That removes quite a bit bunch of code from kernel. It makes the debug output also more compact. It's much easier to write a user space program to decode than maintain it in kernel. I don't know. I think if you use in-kernel debugging facilities such as debugfs or printk, then the output should be self-explanatory. However I do see the usefulness of having a binary dump that can be decoded in userspace. But I think if we want to go that way we should make that a separate interface. USB provides something like that, which can then be fed to libpcap or wireshark to capture and analyze USB traffic. If done properly you get replay functionality for free. I don't know what infra- structure exists to help with implementing something similar. So I think having debugfs output some data about syncpoints or the state of channels might be useful to quickly diagnose a certain set of problems but for anything more involved maybe a complete binary dump may be better. I'm not sure whether doing this would be acceptable though. Maybe Dave or somebody else on the lists can answer that. An alternative way to achieve the same would be to hook ioctl() from userspace and not do any of it in kernel space. +static void show_channel_word(struct output *o, int *state, int *count, + u32 addr, u32 val, struct host1x_cdma *cdma) +{ + static int start_count, dont_print; What if two processes read debug information at the same time? show_channels() acquires cdma.lock, so that shouldn't happen. Okay. Another solution would be to pass around a debug context which keeps track of the variables. But if we opt for a more involved dump interface as discussed above this will no longer be relevant. +static void do_show_channel_gather(struct output *o, + phys_addr_t phys_addr, + u32 words, struct host1x_cdma *cdma, + phys_addr_t pin_addr, u32 *map_addr) +{ + /* Map dmaget cursor to corresponding mem handle */ + u32 offset; + int state, count, i; + + offset = phys_addr - pin_addr; + /* + * Sometimes we're given different hardware address to the same + * page - in these cases the offset will get an invalid number and + * we just have to bail out. + */ Why's that? Because of a race - memory might've been unpinned and unmapped from IOMMU and when we re-map (pin), we are given a new address. But, I think this comment is a bit stale - we used to dump also old gathers. The latest code only dumps jobs in sync queue, so the race shouldn't happen. Okay. In the context of a channel dump interface this may not be relevant anymore. Can you think of any issue that wouldn't be detectable or debuggable by analyzing a binary dump of the data within a channel? I'm asking because at that point we wouldn't be able to access any of the in-kernel data structures but would have to rely on the data itself for diagnostics. IOMMU virtual addresses won't be available and so on. +static void host1x_debug_show_channel_cdma(struct host1x *m, + struct host1x_channel *ch, struct output *o, int chid) +{ [...] + switch (cbstat) { + case 0x00010008: Again, symbolic names would be nice. I propose I remove the decoding from kernel, and save 200 lines. I think it could be more than 200 lines. If all we provide in the kernel is some statistics about syncpoint usage or channel state that should be a lot less code than we have now. However that
Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
On 04.02.2013 03:03, Thierry Reding wrote: > * PGP Signed by an unknown key > > On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote: >> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c > [...] >> +static pid_t host1x_debug_null_kickoff_pid; >> +unsigned int host1x_debug_trace_cmdbuf; >> + >> +static pid_t host1x_debug_force_timeout_pid; >> +static u32 host1x_debug_force_timeout_val; >> +static u32 host1x_debug_force_timeout_channel; > > Please group static and non-static variables. Will do. > >> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h > [...] >> +struct output { >> +void (*fn)(void *ctx, const char *str, size_t len); >> +void *ctx; >> +char buf[256]; >> +}; > > Do we really need this kind of abstraction? There really should be only > one location where debug information is obtained, so I don't see a need > for this. This is used by debugfs code to direct to debugfs, and nvhost_debug_dump() to send via printk. > >> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h > [...] >> struct host1x_syncpt_ops { >> void (*reset)(struct host1x_syncpt *); >> void (*reset_wait_base)(struct host1x_syncpt *); >> @@ -117,6 +133,7 @@ struct host1x { >> struct host1x_channel_ops channel_op; >> struct host1x_cdma_ops cdma_op; >> struct host1x_pushbuffer_ops cdma_pb_op; >> +struct host1x_debug_ops debug_op; >> struct host1x_syncpt_ops syncpt_op; >> struct host1x_intr_ops intr_op; > > Again, better to pass in a const pointer to the ops structure. Ok. > >> diff --git a/drivers/gpu/host1x/hw/debug_hw.c >> b/drivers/gpu/host1x/hw/debug_hw.c > >> +static int show_channel_command(struct output *o, u32 addr, u32 val, int >> *count) >> +{ >> +unsigned mask; >> +unsigned subop; >> + >> +switch (val >> 28) { >> +case 0x0: > > These can easily be derived by looking at the debug output, but it may > still make sense to assign symbolic names to them. I have another suggestion. In downstream I removed the decoding part and I just print out a string of hex. That removes quite a bit bunch of code from kernel. It makes the debug output also more compact. It's much easier to write a user space program to decode than maintain it in kernel. > >> +static void show_channel_word(struct output *o, int *state, int *count, >> +u32 addr, u32 val, struct host1x_cdma *cdma) >> +{ >> +static int start_count, dont_print; > > What if two processes read debug information at the same time? show_channels() acquires cdma.lock, so that shouldn't happen. > >> +static void do_show_channel_gather(struct output *o, >> +phys_addr_t phys_addr, >> +u32 words, struct host1x_cdma *cdma, >> +phys_addr_t pin_addr, u32 *map_addr) >> +{ >> +/* Map dmaget cursor to corresponding mem handle */ >> +u32 offset; >> +int state, count, i; >> + >> +offset = phys_addr - pin_addr; >> +/* >> + * Sometimes we're given different hardware address to the same >> + * page - in these cases the offset will get an invalid number and >> + * we just have to bail out. >> + */ > > Why's that? Because of a race - memory might've been unpinned and unmapped from IOMMU and when we re-map (pin), we are given a new address. But, I think this comment is a bit stale - we used to dump also old gathers. The latest code only dumps jobs in sync queue, so the race shouldn't happen. > >> +map_addr = host1x_memmgr_mmap(mem); >> +if (!map_addr) { >> +host1x_debug_output(o, "[could not mmap]\n"); >> +return; >> +} >> + >> +/* Get base address from mem */ >> +sgt = host1x_memmgr_pin(mem); >> +if (IS_ERR(sgt)) { >> +host1x_debug_output(o, "[couldn't pin]\n"); >> +host1x_memmgr_munmap(mem, map_addr); >> +return; >> +} > > Maybe you should stick with one of "could not" or "couldn't". Makes it > easier to search for. I prefer "could not", so I'll use that. > >> +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma) >> +{ >> +struct host1x_job *job; >> + >> +list_for_each_entry(job, >sync_queue, list) { >> +int i; >> +host1x_debug_output(o, >> +"\n%p: JOB, syncpt_id=%d, syncpt_val=%d," >> +" first_get=%08x, timeout=%d" >> +" num_slots=%d, num_handles=%d\n", >> +job, >> +job->syncpt_id, >> +job->syncpt_end, >> +job->first_get, >> +job->timeout, >> +job->num_slots, >> +job->num_unpins); > > This could go on fewer lines. Yes, will merge. > >> +static void host1x_debug_show_channel_cdma(struct host1x *m, >> +struct host1x_channel *ch, struct
Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote: > diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c [...] > +static pid_t host1x_debug_null_kickoff_pid; > +unsigned int host1x_debug_trace_cmdbuf; > + > +static pid_t host1x_debug_force_timeout_pid; > +static u32 host1x_debug_force_timeout_val; > +static u32 host1x_debug_force_timeout_channel; Please group static and non-static variables. > diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h [...] > +struct output { > + void (*fn)(void *ctx, const char *str, size_t len); > + void *ctx; > + char buf[256]; > +}; Do we really need this kind of abstraction? There really should be only one location where debug information is obtained, so I don't see a need for this. > diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h [...] > struct host1x_syncpt_ops { > void (*reset)(struct host1x_syncpt *); > void (*reset_wait_base)(struct host1x_syncpt *); > @@ -117,6 +133,7 @@ struct host1x { > struct host1x_channel_ops channel_op; > struct host1x_cdma_ops cdma_op; > struct host1x_pushbuffer_ops cdma_pb_op; > + struct host1x_debug_ops debug_op; > struct host1x_syncpt_ops syncpt_op; > struct host1x_intr_ops intr_op; Again, better to pass in a const pointer to the ops structure. > diff --git a/drivers/gpu/host1x/hw/debug_hw.c > b/drivers/gpu/host1x/hw/debug_hw.c > +static int show_channel_command(struct output *o, u32 addr, u32 val, int > *count) > +{ > + unsigned mask; > + unsigned subop; > + > + switch (val >> 28) { > + case 0x0: These can easily be derived by looking at the debug output, but it may still make sense to assign symbolic names to them. > +static void show_channel_word(struct output *o, int *state, int *count, > + u32 addr, u32 val, struct host1x_cdma *cdma) > +{ > + static int start_count, dont_print; What if two processes read debug information at the same time? > +static void do_show_channel_gather(struct output *o, > + phys_addr_t phys_addr, > + u32 words, struct host1x_cdma *cdma, > + phys_addr_t pin_addr, u32 *map_addr) > +{ > + /* Map dmaget cursor to corresponding mem handle */ > + u32 offset; > + int state, count, i; > + > + offset = phys_addr - pin_addr; > + /* > + * Sometimes we're given different hardware address to the same > + * page - in these cases the offset will get an invalid number and > + * we just have to bail out. > + */ Why's that? > + map_addr = host1x_memmgr_mmap(mem); > + if (!map_addr) { > + host1x_debug_output(o, "[could not mmap]\n"); > + return; > + } > + > + /* Get base address from mem */ > + sgt = host1x_memmgr_pin(mem); > + if (IS_ERR(sgt)) { > + host1x_debug_output(o, "[couldn't pin]\n"); > + host1x_memmgr_munmap(mem, map_addr); > + return; > + } Maybe you should stick with one of "could not" or "couldn't". Makes it easier to search for. > +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma) > +{ > + struct host1x_job *job; > + > + list_for_each_entry(job, >sync_queue, list) { > + int i; > + host1x_debug_output(o, > + "\n%p: JOB, syncpt_id=%d, syncpt_val=%d," > + " first_get=%08x, timeout=%d" > + " num_slots=%d, num_handles=%d\n", > + job, > + job->syncpt_id, > + job->syncpt_end, > + job->first_get, > + job->timeout, > + job->num_slots, > + job->num_unpins); This could go on fewer lines. > +static void host1x_debug_show_channel_cdma(struct host1x *m, > + struct host1x_channel *ch, struct output *o, int chid) > +{ [...] > + switch (cbstat) { > + case 0x00010008: Again, symbolic names would be nice. Thierry pgpycVdq9BIWK.pgp Description: PGP signature
Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote: diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c [...] +static pid_t host1x_debug_null_kickoff_pid; +unsigned int host1x_debug_trace_cmdbuf; + +static pid_t host1x_debug_force_timeout_pid; +static u32 host1x_debug_force_timeout_val; +static u32 host1x_debug_force_timeout_channel; Please group static and non-static variables. diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h [...] +struct output { + void (*fn)(void *ctx, const char *str, size_t len); + void *ctx; + char buf[256]; +}; Do we really need this kind of abstraction? There really should be only one location where debug information is obtained, so I don't see a need for this. diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h [...] struct host1x_syncpt_ops { void (*reset)(struct host1x_syncpt *); void (*reset_wait_base)(struct host1x_syncpt *); @@ -117,6 +133,7 @@ struct host1x { struct host1x_channel_ops channel_op; struct host1x_cdma_ops cdma_op; struct host1x_pushbuffer_ops cdma_pb_op; + struct host1x_debug_ops debug_op; struct host1x_syncpt_ops syncpt_op; struct host1x_intr_ops intr_op; Again, better to pass in a const pointer to the ops structure. diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c +static int show_channel_command(struct output *o, u32 addr, u32 val, int *count) +{ + unsigned mask; + unsigned subop; + + switch (val 28) { + case 0x0: These can easily be derived by looking at the debug output, but it may still make sense to assign symbolic names to them. +static void show_channel_word(struct output *o, int *state, int *count, + u32 addr, u32 val, struct host1x_cdma *cdma) +{ + static int start_count, dont_print; What if two processes read debug information at the same time? +static void do_show_channel_gather(struct output *o, + phys_addr_t phys_addr, + u32 words, struct host1x_cdma *cdma, + phys_addr_t pin_addr, u32 *map_addr) +{ + /* Map dmaget cursor to corresponding mem handle */ + u32 offset; + int state, count, i; + + offset = phys_addr - pin_addr; + /* + * Sometimes we're given different hardware address to the same + * page - in these cases the offset will get an invalid number and + * we just have to bail out. + */ Why's that? + map_addr = host1x_memmgr_mmap(mem); + if (!map_addr) { + host1x_debug_output(o, [could not mmap]\n); + return; + } + + /* Get base address from mem */ + sgt = host1x_memmgr_pin(mem); + if (IS_ERR(sgt)) { + host1x_debug_output(o, [couldn't pin]\n); + host1x_memmgr_munmap(mem, map_addr); + return; + } Maybe you should stick with one of could not or couldn't. Makes it easier to search for. +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma) +{ + struct host1x_job *job; + + list_for_each_entry(job, cdma-sync_queue, list) { + int i; + host1x_debug_output(o, + \n%p: JOB, syncpt_id=%d, syncpt_val=%d, + first_get=%08x, timeout=%d + num_slots=%d, num_handles=%d\n, + job, + job-syncpt_id, + job-syncpt_end, + job-first_get, + job-timeout, + job-num_slots, + job-num_unpins); This could go on fewer lines. +static void host1x_debug_show_channel_cdma(struct host1x *m, + struct host1x_channel *ch, struct output *o, int chid) +{ [...] + switch (cbstat) { + case 0x00010008: Again, symbolic names would be nice. Thierry pgpycVdq9BIWK.pgp Description: PGP signature
Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
On 04.02.2013 03:03, Thierry Reding wrote: * PGP Signed by an unknown key On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote: diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c [...] +static pid_t host1x_debug_null_kickoff_pid; +unsigned int host1x_debug_trace_cmdbuf; + +static pid_t host1x_debug_force_timeout_pid; +static u32 host1x_debug_force_timeout_val; +static u32 host1x_debug_force_timeout_channel; Please group static and non-static variables. Will do. diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h [...] +struct output { +void (*fn)(void *ctx, const char *str, size_t len); +void *ctx; +char buf[256]; +}; Do we really need this kind of abstraction? There really should be only one location where debug information is obtained, so I don't see a need for this. This is used by debugfs code to direct to debugfs, and nvhost_debug_dump() to send via printk. diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h [...] struct host1x_syncpt_ops { void (*reset)(struct host1x_syncpt *); void (*reset_wait_base)(struct host1x_syncpt *); @@ -117,6 +133,7 @@ struct host1x { struct host1x_channel_ops channel_op; struct host1x_cdma_ops cdma_op; struct host1x_pushbuffer_ops cdma_pb_op; +struct host1x_debug_ops debug_op; struct host1x_syncpt_ops syncpt_op; struct host1x_intr_ops intr_op; Again, better to pass in a const pointer to the ops structure. Ok. diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c +static int show_channel_command(struct output *o, u32 addr, u32 val, int *count) +{ +unsigned mask; +unsigned subop; + +switch (val 28) { +case 0x0: These can easily be derived by looking at the debug output, but it may still make sense to assign symbolic names to them. I have another suggestion. In downstream I removed the decoding part and I just print out a string of hex. That removes quite a bit bunch of code from kernel. It makes the debug output also more compact. It's much easier to write a user space program to decode than maintain it in kernel. +static void show_channel_word(struct output *o, int *state, int *count, +u32 addr, u32 val, struct host1x_cdma *cdma) +{ +static int start_count, dont_print; What if two processes read debug information at the same time? show_channels() acquires cdma.lock, so that shouldn't happen. +static void do_show_channel_gather(struct output *o, +phys_addr_t phys_addr, +u32 words, struct host1x_cdma *cdma, +phys_addr_t pin_addr, u32 *map_addr) +{ +/* Map dmaget cursor to corresponding mem handle */ +u32 offset; +int state, count, i; + +offset = phys_addr - pin_addr; +/* + * Sometimes we're given different hardware address to the same + * page - in these cases the offset will get an invalid number and + * we just have to bail out. + */ Why's that? Because of a race - memory might've been unpinned and unmapped from IOMMU and when we re-map (pin), we are given a new address. But, I think this comment is a bit stale - we used to dump also old gathers. The latest code only dumps jobs in sync queue, so the race shouldn't happen. +map_addr = host1x_memmgr_mmap(mem); +if (!map_addr) { +host1x_debug_output(o, [could not mmap]\n); +return; +} + +/* Get base address from mem */ +sgt = host1x_memmgr_pin(mem); +if (IS_ERR(sgt)) { +host1x_debug_output(o, [couldn't pin]\n); +host1x_memmgr_munmap(mem, map_addr); +return; +} Maybe you should stick with one of could not or couldn't. Makes it easier to search for. I prefer could not, so I'll use that. +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma) +{ +struct host1x_job *job; + +list_for_each_entry(job, cdma-sync_queue, list) { +int i; +host1x_debug_output(o, +\n%p: JOB, syncpt_id=%d, syncpt_val=%d, + first_get=%08x, timeout=%d + num_slots=%d, num_handles=%d\n, +job, +job-syncpt_id, +job-syncpt_end, +job-first_get, +job-timeout, +job-num_slots, +job-num_unpins); This could go on fewer lines. Yes, will merge. +static void host1x_debug_show_channel_cdma(struct host1x *m, +struct host1x_channel *ch, struct output *o, int chid) +{ [...] +switch (cbstat) { +case 0x00010008: Again, symbolic names would be nice. I propose I remove the decoding from kernel, and save 200 lines. Terje -- To unsubscribe from this
[PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
Add support for host1x debugging. Adds debugfs entries, and dumps channel state to UART in case of stuck job. Signed-off-by: Terje Bergstrom --- drivers/gpu/host1x/Makefile |1 + drivers/gpu/host1x/cdma.c | 34 +++ drivers/gpu/host1x/debug.c | 215 ++ drivers/gpu/host1x/debug.h | 50 drivers/gpu/host1x/dev.c|3 + drivers/gpu/host1x/dev.h| 17 ++ drivers/gpu/host1x/hw/cdma_hw.c |3 + drivers/gpu/host1x/hw/debug_hw.c| 400 +++ drivers/gpu/host1x/hw/host1x01.c|2 + drivers/gpu/host1x/hw/hw_host1x01_channel.h | 18 ++ drivers/gpu/host1x/hw/hw_host1x01_sync.h| 115 drivers/gpu/host1x/hw/syncpt_hw.c |1 + drivers/gpu/host1x/syncpt.c |3 + 13 files changed, 862 insertions(+) create mode 100644 drivers/gpu/host1x/debug.c create mode 100644 drivers/gpu/host1x/debug.h create mode 100644 drivers/gpu/host1x/hw/debug_hw.c diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index cdd87c8..697d49a 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -7,6 +7,7 @@ host1x-y = \ cdma.o \ channel.o \ job.o \ + debug.o \ memmgr.o \ hw/host1x01.o diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c index d6a38d2..12dd46c 100644 --- a/drivers/gpu/host1x/cdma.c +++ b/drivers/gpu/host1x/cdma.c @@ -19,6 +19,7 @@ #include "cdma.h" #include "channel.h" #include "dev.h" +#include "debug.h" #include "memmgr.h" #include "job.h" #include @@ -370,12 +371,42 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job) return 0; } +static void trace_write_gather(struct host1x_cdma *cdma, + struct mem_handle *ref, + u32 offset, u32 words) +{ + void *mem = NULL; + + if (host1x_debug_trace_cmdbuf) + mem = host1x_memmgr_mmap(ref); + + if (mem) { + u32 i; + /* +* Write in batches of 128 as there seems to be a limit +* of how much you can output to ftrace at once. +*/ + for (i = 0; i < words; i += TRACE_MAX_LENGTH) { + trace_host1x_cdma_push_gather( + cdma_to_channel(cdma)->dev->name, + (u32)ref, + min(words - i, TRACE_MAX_LENGTH), + offset + i * sizeof(u32), + mem); + } + host1x_memmgr_munmap(ref, mem); + } +} + /* * Push two words into a push buffer slot * Blocks as necessary if the push buffer is full. */ void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2) { + if (host1x_debug_trace_cmdbuf) + trace_host1x_cdma_push(cdma_to_channel(cdma)->dev->name, + op1, op2); host1x_cdma_push_gather(cdma, NULL, 0, op1, op2); } @@ -391,6 +422,9 @@ void host1x_cdma_push_gather(struct host1x_cdma *cdma, u32 slots_free = cdma->slots_free; struct push_buffer *pb = >push_buffer; + if (handle) + trace_write_gather(cdma, handle, offset, op1 & 0x); + if (slots_free == 0) { host1x->cdma_op.kick(cdma); slots_free = host1x_cdma_wait_locked(cdma, diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c new file mode 100644 index 000..29cbe93 --- /dev/null +++ b/drivers/gpu/host1x/debug.c @@ -0,0 +1,215 @@ +/* + * Copyright (C) 2010 Google, Inc. + * Author: Erik Gilling + * + * Copyright (C) 2011-2012 NVIDIA Corporation + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include +#include +#include + +#include + +#include "dev.h" +#include "debug.h" +#include "channel.h" + +static pid_t host1x_debug_null_kickoff_pid; +unsigned int host1x_debug_trace_cmdbuf; + +static pid_t host1x_debug_force_timeout_pid; +static u32 host1x_debug_force_timeout_val; +static u32 host1x_debug_force_timeout_channel; + +void host1x_debug_output(struct output *o, const char *fmt, ...) +{ + va_list args; + int len; + + va_start(args, fmt); + len = vsnprintf(o->buf, sizeof(o->buf), fmt, args); + va_end(args); + o->fn(o->ctx, o->buf, len); +} + +static int show_channels(struct host1x_channel *ch, void *data) +{ +
[PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
Add support for host1x debugging. Adds debugfs entries, and dumps channel state to UART in case of stuck job. Signed-off-by: Terje Bergstrom tbergst...@nvidia.com --- drivers/gpu/host1x/Makefile |1 + drivers/gpu/host1x/cdma.c | 34 +++ drivers/gpu/host1x/debug.c | 215 ++ drivers/gpu/host1x/debug.h | 50 drivers/gpu/host1x/dev.c|3 + drivers/gpu/host1x/dev.h| 17 ++ drivers/gpu/host1x/hw/cdma_hw.c |3 + drivers/gpu/host1x/hw/debug_hw.c| 400 +++ drivers/gpu/host1x/hw/host1x01.c|2 + drivers/gpu/host1x/hw/hw_host1x01_channel.h | 18 ++ drivers/gpu/host1x/hw/hw_host1x01_sync.h| 115 drivers/gpu/host1x/hw/syncpt_hw.c |1 + drivers/gpu/host1x/syncpt.c |3 + 13 files changed, 862 insertions(+) create mode 100644 drivers/gpu/host1x/debug.c create mode 100644 drivers/gpu/host1x/debug.h create mode 100644 drivers/gpu/host1x/hw/debug_hw.c diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index cdd87c8..697d49a 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -7,6 +7,7 @@ host1x-y = \ cdma.o \ channel.o \ job.o \ + debug.o \ memmgr.o \ hw/host1x01.o diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c index d6a38d2..12dd46c 100644 --- a/drivers/gpu/host1x/cdma.c +++ b/drivers/gpu/host1x/cdma.c @@ -19,6 +19,7 @@ #include cdma.h #include channel.h #include dev.h +#include debug.h #include memmgr.h #include job.h #include asm/cacheflush.h @@ -370,12 +371,42 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job) return 0; } +static void trace_write_gather(struct host1x_cdma *cdma, + struct mem_handle *ref, + u32 offset, u32 words) +{ + void *mem = NULL; + + if (host1x_debug_trace_cmdbuf) + mem = host1x_memmgr_mmap(ref); + + if (mem) { + u32 i; + /* +* Write in batches of 128 as there seems to be a limit +* of how much you can output to ftrace at once. +*/ + for (i = 0; i words; i += TRACE_MAX_LENGTH) { + trace_host1x_cdma_push_gather( + cdma_to_channel(cdma)-dev-name, + (u32)ref, + min(words - i, TRACE_MAX_LENGTH), + offset + i * sizeof(u32), + mem); + } + host1x_memmgr_munmap(ref, mem); + } +} + /* * Push two words into a push buffer slot * Blocks as necessary if the push buffer is full. */ void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2) { + if (host1x_debug_trace_cmdbuf) + trace_host1x_cdma_push(cdma_to_channel(cdma)-dev-name, + op1, op2); host1x_cdma_push_gather(cdma, NULL, 0, op1, op2); } @@ -391,6 +422,9 @@ void host1x_cdma_push_gather(struct host1x_cdma *cdma, u32 slots_free = cdma-slots_free; struct push_buffer *pb = cdma-push_buffer; + if (handle) + trace_write_gather(cdma, handle, offset, op1 0x); + if (slots_free == 0) { host1x-cdma_op.kick(cdma); slots_free = host1x_cdma_wait_locked(cdma, diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c new file mode 100644 index 000..29cbe93 --- /dev/null +++ b/drivers/gpu/host1x/debug.c @@ -0,0 +1,215 @@ +/* + * Copyright (C) 2010 Google, Inc. + * Author: Erik Gilling konk...@android.com + * + * Copyright (C) 2011-2012 NVIDIA Corporation + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include linux/debugfs.h +#include linux/seq_file.h +#include linux/uaccess.h + +#include linux/io.h + +#include dev.h +#include debug.h +#include channel.h + +static pid_t host1x_debug_null_kickoff_pid; +unsigned int host1x_debug_trace_cmdbuf; + +static pid_t host1x_debug_force_timeout_pid; +static u32 host1x_debug_force_timeout_val; +static u32 host1x_debug_force_timeout_channel; + +void host1x_debug_output(struct output *o, const char *fmt, ...) +{ + va_list args; + int len; + + va_start(args, fmt); + len = vsnprintf(o-buf, sizeof(o-buf), fmt, args); + va_end(args); + o-fn(o-ctx,