Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-07 Thread Thierry Reding
On Wed, Feb 06, 2013 at 12:58:19PM -0800, Terje Bergström wrote:
> On 05.02.2013 01:15, Thierry Reding wrote:
> > On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
> >> This is used by debugfs code to direct to debugfs, and
> >> nvhost_debug_dump() to send via printk.
> > 
> > Yes, that was precisely my point. Why bother providing the same data via
> > several output methods. debugfs is good for showing large amounts of
> > data such as register dumps or a tabular representation of syncpoints
> > for instance.
> > 
> > If, however, you want to interactively show debug information using
> > printk the same format isn't very useful and something more reduced is
> > often better.
> 
> debugfs is there to be able to get a reliable dump of host1x state
> (f.ex. no lines intermixed with other output).
> 
> printk output is there because often we get just UART logs from failure
> cases, and having as much information as possible in the logs speeds up
> debugging.
> 
> Both of them need to output the values of sync points, and the channel
> state. Dumping all of that consists of a lot of code, and I wouldn't
> want to duplicate that for two output formats.

I'm still not convinced, but I think I could live with it. =)

> >> I have another suggestion. In downstream I removed the decoding part and
> >> I just print out a string of hex. That removes quite a bit bunch of code
> >> from kernel. It makes the debug output also more compact.
> > I don't know. I think if you use in-kernel debugging facilities such as
> > debugfs or printk, then the output should be self-explanatory. However I
> > do see the usefulness of having a binary dump that can be decoded in
> > userspace. But I think if we want to go that way we should make that a
> > separate interface. USB provides something like that, which can then be
> > fed to libpcap or wireshark to capture and analyze USB traffic. If done
> > properly you get replay functionality for free. I don't know what infra-
> > structure exists to help with implementing something similar.
> 
> It's not actually binary. I think I misrepresented the suggestion.
> 
> I'm suggesting that we'd display only the contents of command FIFO and
> contents of gathers (i.e. all opcodes) in hex format, not decoded. All
> other text would remain as is, so syncpt values, etc would be readable
> by a glance.
> 
> The user space tool can then take the streams and decode them if needed.
> 
> We've noticed that the decoded opcodes format can be very long and
> sometimes takes a minute to dump out via a slow console. The hex output
> is much more compact and faster to dump.
> 
> Actual tracing or wireshark kind of capability would come via decoding
> the ftrace log. When enabled, everything that is written to the channel,
> is also written to ftrace.

Okay, I'll have to take a closer look at ftrace since I've never used it
before. It sounds like extra infrastructure won't be necessary then.

>  +static void show_channel_word(struct output *o, int *state, int *count,
>  +u32 addr, u32 val, struct host1x_cdma *cdma)
>  +{
>  +static int start_count, dont_print;
> >>>
> >>> What if two processes read debug information at the same time?
> >>
> >> show_channels() acquires cdma.lock, so that shouldn't happen.
> > 
> > Okay. Another solution would be to pass around a debug context which
> > keeps track of the variables. But if we opt for a more involved dump
> > interface as discussed above this will no longer be relevant.
> 
> Actually, debugging process needs cdma.lock, because it goes through the
> cdma queue. Also command FIFO dumping is something that must be done by
> a single thread at a time.
> 
> > Okay. In the context of a channel dump interface this may not be
> > relevant anymore. Can you think of any issue that wouldn't be detectable
> > or debuggable by analyzing a binary dump of the data within a channel?
> > I'm asking because at that point we wouldn't be able to access any of
> > the in-kernel data structures but would have to rely on the data itself
> > for diagnostics. IOMMU virtual addresses won't be available and so on.
> 
> In many cases, looking at syncpt values, and channel state
> (active/waiting on a syncpt, etc) gives an indication on what is the
> current state of hardware. But, very often problems are ripple effects
> on something that happened earlier and the job that caused the problem
> has already been freed and is not visible in the dump.
> 
> To get a full history, we need often need the ftrace log.

So that's already covered. Great!

Thierry


pgpcnj7iSskoz.pgp
Description: PGP signature


Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-07 Thread Thierry Reding
On Wed, Feb 06, 2013 at 12:58:19PM -0800, Terje Bergström wrote:
 On 05.02.2013 01:15, Thierry Reding wrote:
  On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
  This is used by debugfs code to direct to debugfs, and
  nvhost_debug_dump() to send via printk.
  
  Yes, that was precisely my point. Why bother providing the same data via
  several output methods. debugfs is good for showing large amounts of
  data such as register dumps or a tabular representation of syncpoints
  for instance.
  
  If, however, you want to interactively show debug information using
  printk the same format isn't very useful and something more reduced is
  often better.
 
 debugfs is there to be able to get a reliable dump of host1x state
 (f.ex. no lines intermixed with other output).
 
 printk output is there because often we get just UART logs from failure
 cases, and having as much information as possible in the logs speeds up
 debugging.
 
 Both of them need to output the values of sync points, and the channel
 state. Dumping all of that consists of a lot of code, and I wouldn't
 want to duplicate that for two output formats.

I'm still not convinced, but I think I could live with it. =)

  I have another suggestion. In downstream I removed the decoding part and
  I just print out a string of hex. That removes quite a bit bunch of code
  from kernel. It makes the debug output also more compact.
  I don't know. I think if you use in-kernel debugging facilities such as
  debugfs or printk, then the output should be self-explanatory. However I
  do see the usefulness of having a binary dump that can be decoded in
  userspace. But I think if we want to go that way we should make that a
  separate interface. USB provides something like that, which can then be
  fed to libpcap or wireshark to capture and analyze USB traffic. If done
  properly you get replay functionality for free. I don't know what infra-
  structure exists to help with implementing something similar.
 
 It's not actually binary. I think I misrepresented the suggestion.
 
 I'm suggesting that we'd display only the contents of command FIFO and
 contents of gathers (i.e. all opcodes) in hex format, not decoded. All
 other text would remain as is, so syncpt values, etc would be readable
 by a glance.
 
 The user space tool can then take the streams and decode them if needed.
 
 We've noticed that the decoded opcodes format can be very long and
 sometimes takes a minute to dump out via a slow console. The hex output
 is much more compact and faster to dump.
 
 Actual tracing or wireshark kind of capability would come via decoding
 the ftrace log. When enabled, everything that is written to the channel,
 is also written to ftrace.

Okay, I'll have to take a closer look at ftrace since I've never used it
before. It sounds like extra infrastructure won't be necessary then.

  +static void show_channel_word(struct output *o, int *state, int *count,
  +u32 addr, u32 val, struct host1x_cdma *cdma)
  +{
  +static int start_count, dont_print;
 
  What if two processes read debug information at the same time?
 
  show_channels() acquires cdma.lock, so that shouldn't happen.
  
  Okay. Another solution would be to pass around a debug context which
  keeps track of the variables. But if we opt for a more involved dump
  interface as discussed above this will no longer be relevant.
 
 Actually, debugging process needs cdma.lock, because it goes through the
 cdma queue. Also command FIFO dumping is something that must be done by
 a single thread at a time.
 
  Okay. In the context of a channel dump interface this may not be
  relevant anymore. Can you think of any issue that wouldn't be detectable
  or debuggable by analyzing a binary dump of the data within a channel?
  I'm asking because at that point we wouldn't be able to access any of
  the in-kernel data structures but would have to rely on the data itself
  for diagnostics. IOMMU virtual addresses won't be available and so on.
 
 In many cases, looking at syncpt values, and channel state
 (active/waiting on a syncpt, etc) gives an indication on what is the
 current state of hardware. But, very often problems are ripple effects
 on something that happened earlier and the job that caused the problem
 has already been freed and is not visible in the dump.
 
 To get a full history, we need often need the ftrace log.

So that's already covered. Great!

Thierry


pgpcnj7iSskoz.pgp
Description: PGP signature


Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-06 Thread Terje Bergström
On 05.02.2013 01:15, Thierry Reding wrote:
> On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
>> This is used by debugfs code to direct to debugfs, and
>> nvhost_debug_dump() to send via printk.
> 
> Yes, that was precisely my point. Why bother providing the same data via
> several output methods. debugfs is good for showing large amounts of
> data such as register dumps or a tabular representation of syncpoints
> for instance.
> 
> If, however, you want to interactively show debug information using
> printk the same format isn't very useful and something more reduced is
> often better.

debugfs is there to be able to get a reliable dump of host1x state
(f.ex. no lines intermixed with other output).

printk output is there because often we get just UART logs from failure
cases, and having as much information as possible in the logs speeds up
debugging.

Both of them need to output the values of sync points, and the channel
state. Dumping all of that consists of a lot of code, and I wouldn't
want to duplicate that for two output formats.

>> I have another suggestion. In downstream I removed the decoding part and
>> I just print out a string of hex. That removes quite a bit bunch of code
>> from kernel. It makes the debug output also more compact.
> I don't know. I think if you use in-kernel debugging facilities such as
> debugfs or printk, then the output should be self-explanatory. However I
> do see the usefulness of having a binary dump that can be decoded in
> userspace. But I think if we want to go that way we should make that a
> separate interface. USB provides something like that, which can then be
> fed to libpcap or wireshark to capture and analyze USB traffic. If done
> properly you get replay functionality for free. I don't know what infra-
> structure exists to help with implementing something similar.

It's not actually binary. I think I misrepresented the suggestion.

I'm suggesting that we'd display only the contents of command FIFO and
contents of gathers (i.e. all opcodes) in hex format, not decoded. All
other text would remain as is, so syncpt values, etc would be readable
by a glance.

The user space tool can then take the streams and decode them if needed.

We've noticed that the decoded opcodes format can be very long and
sometimes takes a minute to dump out via a slow console. The hex output
is much more compact and faster to dump.

Actual tracing or wireshark kind of capability would come via decoding
the ftrace log. When enabled, everything that is written to the channel,
is also written to ftrace.

 +static void show_channel_word(struct output *o, int *state, int *count,
 +  u32 addr, u32 val, struct host1x_cdma *cdma)
 +{
 +  static int start_count, dont_print;
>>>
>>> What if two processes read debug information at the same time?
>>
>> show_channels() acquires cdma.lock, so that shouldn't happen.
> 
> Okay. Another solution would be to pass around a debug context which
> keeps track of the variables. But if we opt for a more involved dump
> interface as discussed above this will no longer be relevant.

Actually, debugging process needs cdma.lock, because it goes through the
cdma queue. Also command FIFO dumping is something that must be done by
a single thread at a time.

> Okay. In the context of a channel dump interface this may not be
> relevant anymore. Can you think of any issue that wouldn't be detectable
> or debuggable by analyzing a binary dump of the data within a channel?
> I'm asking because at that point we wouldn't be able to access any of
> the in-kernel data structures but would have to rely on the data itself
> for diagnostics. IOMMU virtual addresses won't be available and so on.

In many cases, looking at syncpt values, and channel state
(active/waiting on a syncpt, etc) gives an indication on what is the
current state of hardware. But, very often problems are ripple effects
on something that happened earlier and the job that caused the problem
has already been freed and is not visible in the dump.

To get a full history, we need often need the ftrace log.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-06 Thread Terje Bergström
On 05.02.2013 01:15, Thierry Reding wrote:
 On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
 This is used by debugfs code to direct to debugfs, and
 nvhost_debug_dump() to send via printk.
 
 Yes, that was precisely my point. Why bother providing the same data via
 several output methods. debugfs is good for showing large amounts of
 data such as register dumps or a tabular representation of syncpoints
 for instance.
 
 If, however, you want to interactively show debug information using
 printk the same format isn't very useful and something more reduced is
 often better.

debugfs is there to be able to get a reliable dump of host1x state
(f.ex. no lines intermixed with other output).

printk output is there because often we get just UART logs from failure
cases, and having as much information as possible in the logs speeds up
debugging.

Both of them need to output the values of sync points, and the channel
state. Dumping all of that consists of a lot of code, and I wouldn't
want to duplicate that for two output formats.

 I have another suggestion. In downstream I removed the decoding part and
 I just print out a string of hex. That removes quite a bit bunch of code
 from kernel. It makes the debug output also more compact.
 I don't know. I think if you use in-kernel debugging facilities such as
 debugfs or printk, then the output should be self-explanatory. However I
 do see the usefulness of having a binary dump that can be decoded in
 userspace. But I think if we want to go that way we should make that a
 separate interface. USB provides something like that, which can then be
 fed to libpcap or wireshark to capture and analyze USB traffic. If done
 properly you get replay functionality for free. I don't know what infra-
 structure exists to help with implementing something similar.

It's not actually binary. I think I misrepresented the suggestion.

I'm suggesting that we'd display only the contents of command FIFO and
contents of gathers (i.e. all opcodes) in hex format, not decoded. All
other text would remain as is, so syncpt values, etc would be readable
by a glance.

The user space tool can then take the streams and decode them if needed.

We've noticed that the decoded opcodes format can be very long and
sometimes takes a minute to dump out via a slow console. The hex output
is much more compact and faster to dump.

Actual tracing or wireshark kind of capability would come via decoding
the ftrace log. When enabled, everything that is written to the channel,
is also written to ftrace.

 +static void show_channel_word(struct output *o, int *state, int *count,
 +  u32 addr, u32 val, struct host1x_cdma *cdma)
 +{
 +  static int start_count, dont_print;

 What if two processes read debug information at the same time?

 show_channels() acquires cdma.lock, so that shouldn't happen.
 
 Okay. Another solution would be to pass around a debug context which
 keeps track of the variables. But if we opt for a more involved dump
 interface as discussed above this will no longer be relevant.

Actually, debugging process needs cdma.lock, because it goes through the
cdma queue. Also command FIFO dumping is something that must be done by
a single thread at a time.

 Okay. In the context of a channel dump interface this may not be
 relevant anymore. Can you think of any issue that wouldn't be detectable
 or debuggable by analyzing a binary dump of the data within a channel?
 I'm asking because at that point we wouldn't be able to access any of
 the in-kernel data structures but would have to rely on the data itself
 for diagnostics. IOMMU virtual addresses won't be available and so on.

In many cases, looking at syncpt values, and channel state
(active/waiting on a syncpt, etc) gives an indication on what is the
current state of hardware. But, very often problems are ripple effects
on something that happened earlier and the job that caused the problem
has already been freed and is not visible in the dump.

To get a full history, we need often need the ftrace log.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-05 Thread Thierry Reding
On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
> On 04.02.2013 03:03, Thierry Reding wrote:
> > On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
> >> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
> > [...]
> >> +struct output {
> >> +  void (*fn)(void *ctx, const char *str, size_t len);
> >> +  void *ctx;
> >> +  char buf[256];
> >> +};
> > 
> > Do we really need this kind of abstraction? There really should be only
> > one location where debug information is obtained, so I don't see a need
> > for this.
> 
> This is used by debugfs code to direct to debugfs, and
> nvhost_debug_dump() to send via printk.

Yes, that was precisely my point. Why bother providing the same data via
several output methods. debugfs is good for showing large amounts of
data such as register dumps or a tabular representation of syncpoints
for instance.

If, however, you want to interactively show debug information using
printk the same format isn't very useful and something more reduced is
often better.

> >> diff --git a/drivers/gpu/host1x/hw/debug_hw.c 
> >> b/drivers/gpu/host1x/hw/debug_hw.c
> > 
> >> +static int show_channel_command(struct output *o, u32 addr, u32 val, int 
> >> *count)
> >> +{
> >> +  unsigned mask;
> >> +  unsigned subop;
> >> +
> >> +  switch (val >> 28) {
> >> +  case 0x0:
> > 
> > These can easily be derived by looking at the debug output, but it may
> > still make sense to assign symbolic names to them.
> 
> I have another suggestion. In downstream I removed the decoding part and
> I just print out a string of hex. That removes quite a bit bunch of code
> from kernel. It makes the debug output also more compact.
> 
> It's much easier to write a user space program to decode than maintain
> it in kernel.

I don't know. I think if you use in-kernel debugging facilities such as
debugfs or printk, then the output should be self-explanatory. However I
do see the usefulness of having a binary dump that can be decoded in
userspace. But I think if we want to go that way we should make that a
separate interface. USB provides something like that, which can then be
fed to libpcap or wireshark to capture and analyze USB traffic. If done
properly you get replay functionality for free. I don't know what infra-
structure exists to help with implementing something similar.

So I think having debugfs output some data about syncpoints or the state
of channels might be useful to quickly diagnose a certain set of
problems but for anything more involved maybe a complete binary dump may
be better.

I'm not sure whether doing this would be acceptable though. Maybe Dave
or somebody else on the lists can answer that. An alternative way to
achieve the same would be to hook ioctl() from userspace and not do any
of it in kernel space.

> >> +static void show_channel_word(struct output *o, int *state, int *count,
> >> +  u32 addr, u32 val, struct host1x_cdma *cdma)
> >> +{
> >> +  static int start_count, dont_print;
> > 
> > What if two processes read debug information at the same time?
> 
> show_channels() acquires cdma.lock, so that shouldn't happen.

Okay. Another solution would be to pass around a debug context which
keeps track of the variables. But if we opt for a more involved dump
interface as discussed above this will no longer be relevant.

> >> +static void do_show_channel_gather(struct output *o,
> >> +  phys_addr_t phys_addr,
> >> +  u32 words, struct host1x_cdma *cdma,
> >> +  phys_addr_t pin_addr, u32 *map_addr)
> >> +{
> >> +  /* Map dmaget cursor to corresponding mem handle */
> >> +  u32 offset;
> >> +  int state, count, i;
> >> +
> >> +  offset = phys_addr - pin_addr;
> >> +  /*
> >> +   * Sometimes we're given different hardware address to the same
> >> +   * page - in these cases the offset will get an invalid number and
> >> +   * we just have to bail out.
> >> +   */
> > 
> > Why's that?
> 
> Because of a race - memory might've been unpinned and unmapped from
> IOMMU and when we re-map (pin), we are given a new address.
> 
> But, I think this comment is a bit stale - we used to dump also old
> gathers. The latest code only dumps jobs in sync queue, so the race
> shouldn't happen.

Okay. In the context of a channel dump interface this may not be
relevant anymore. Can you think of any issue that wouldn't be detectable
or debuggable by analyzing a binary dump of the data within a channel?
I'm asking because at that point we wouldn't be able to access any of
the in-kernel data structures but would have to rely on the data itself
for diagnostics. IOMMU virtual addresses won't be available and so on.

> >> +static void host1x_debug_show_channel_cdma(struct host1x *m,
> >> +  struct host1x_channel *ch, struct output *o, int chid)
> >> +{
> > [...]
> >> +  switch (cbstat) {
> >> +  case 0x00010008:
> > 
> > Again, symbolic names would be nice.
> 
> I propose I remove the decoding from kernel, and save 200 lines.

I think it could 

Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-05 Thread Thierry Reding
On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
 On 04.02.2013 03:03, Thierry Reding wrote:
  On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
  diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
  [...]
  +struct output {
  +  void (*fn)(void *ctx, const char *str, size_t len);
  +  void *ctx;
  +  char buf[256];
  +};
  
  Do we really need this kind of abstraction? There really should be only
  one location where debug information is obtained, so I don't see a need
  for this.
 
 This is used by debugfs code to direct to debugfs, and
 nvhost_debug_dump() to send via printk.

Yes, that was precisely my point. Why bother providing the same data via
several output methods. debugfs is good for showing large amounts of
data such as register dumps or a tabular representation of syncpoints
for instance.

If, however, you want to interactively show debug information using
printk the same format isn't very useful and something more reduced is
often better.

  diff --git a/drivers/gpu/host1x/hw/debug_hw.c 
  b/drivers/gpu/host1x/hw/debug_hw.c
  
  +static int show_channel_command(struct output *o, u32 addr, u32 val, int 
  *count)
  +{
  +  unsigned mask;
  +  unsigned subop;
  +
  +  switch (val  28) {
  +  case 0x0:
  
  These can easily be derived by looking at the debug output, but it may
  still make sense to assign symbolic names to them.
 
 I have another suggestion. In downstream I removed the decoding part and
 I just print out a string of hex. That removes quite a bit bunch of code
 from kernel. It makes the debug output also more compact.
 
 It's much easier to write a user space program to decode than maintain
 it in kernel.

I don't know. I think if you use in-kernel debugging facilities such as
debugfs or printk, then the output should be self-explanatory. However I
do see the usefulness of having a binary dump that can be decoded in
userspace. But I think if we want to go that way we should make that a
separate interface. USB provides something like that, which can then be
fed to libpcap or wireshark to capture and analyze USB traffic. If done
properly you get replay functionality for free. I don't know what infra-
structure exists to help with implementing something similar.

So I think having debugfs output some data about syncpoints or the state
of channels might be useful to quickly diagnose a certain set of
problems but for anything more involved maybe a complete binary dump may
be better.

I'm not sure whether doing this would be acceptable though. Maybe Dave
or somebody else on the lists can answer that. An alternative way to
achieve the same would be to hook ioctl() from userspace and not do any
of it in kernel space.

  +static void show_channel_word(struct output *o, int *state, int *count,
  +  u32 addr, u32 val, struct host1x_cdma *cdma)
  +{
  +  static int start_count, dont_print;
  
  What if two processes read debug information at the same time?
 
 show_channels() acquires cdma.lock, so that shouldn't happen.

Okay. Another solution would be to pass around a debug context which
keeps track of the variables. But if we opt for a more involved dump
interface as discussed above this will no longer be relevant.

  +static void do_show_channel_gather(struct output *o,
  +  phys_addr_t phys_addr,
  +  u32 words, struct host1x_cdma *cdma,
  +  phys_addr_t pin_addr, u32 *map_addr)
  +{
  +  /* Map dmaget cursor to corresponding mem handle */
  +  u32 offset;
  +  int state, count, i;
  +
  +  offset = phys_addr - pin_addr;
  +  /*
  +   * Sometimes we're given different hardware address to the same
  +   * page - in these cases the offset will get an invalid number and
  +   * we just have to bail out.
  +   */
  
  Why's that?
 
 Because of a race - memory might've been unpinned and unmapped from
 IOMMU and when we re-map (pin), we are given a new address.
 
 But, I think this comment is a bit stale - we used to dump also old
 gathers. The latest code only dumps jobs in sync queue, so the race
 shouldn't happen.

Okay. In the context of a channel dump interface this may not be
relevant anymore. Can you think of any issue that wouldn't be detectable
or debuggable by analyzing a binary dump of the data within a channel?
I'm asking because at that point we wouldn't be able to access any of
the in-kernel data structures but would have to rely on the data itself
for diagnostics. IOMMU virtual addresses won't be available and so on.

  +static void host1x_debug_show_channel_cdma(struct host1x *m,
  +  struct host1x_channel *ch, struct output *o, int chid)
  +{
  [...]
  +  switch (cbstat) {
  +  case 0x00010008:
  
  Again, symbolic names would be nice.
 
 I propose I remove the decoding from kernel, and save 200 lines.

I think it could be more than 200 lines. If all we provide in the kernel
is some statistics about syncpoint usage or channel state that should be
a lot less code than we have now.

However that 

Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-04 Thread Terje Bergström
On 04.02.2013 03:03, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> [...]
>> +static pid_t host1x_debug_null_kickoff_pid;
>> +unsigned int host1x_debug_trace_cmdbuf;
>> +
>> +static pid_t host1x_debug_force_timeout_pid;
>> +static u32 host1x_debug_force_timeout_val;
>> +static u32 host1x_debug_force_timeout_channel;
> 
> Please group static and non-static variables.

Will do.

> 
>> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
> [...]
>> +struct output {
>> +void (*fn)(void *ctx, const char *str, size_t len);
>> +void *ctx;
>> +char buf[256];
>> +};
> 
> Do we really need this kind of abstraction? There really should be only
> one location where debug information is obtained, so I don't see a need
> for this.

This is used by debugfs code to direct to debugfs, and
nvhost_debug_dump() to send via printk.

> 
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> [...]
>>  struct host1x_syncpt_ops {
>>  void (*reset)(struct host1x_syncpt *);
>>  void (*reset_wait_base)(struct host1x_syncpt *);
>> @@ -117,6 +133,7 @@ struct host1x {
>>  struct host1x_channel_ops channel_op;
>>  struct host1x_cdma_ops cdma_op;
>>  struct host1x_pushbuffer_ops cdma_pb_op;
>> +struct host1x_debug_ops debug_op;
>>  struct host1x_syncpt_ops syncpt_op;
>>  struct host1x_intr_ops intr_op;
> 
> Again, better to pass in a const pointer to the ops structure.

Ok.

> 
>> diff --git a/drivers/gpu/host1x/hw/debug_hw.c 
>> b/drivers/gpu/host1x/hw/debug_hw.c
> 
>> +static int show_channel_command(struct output *o, u32 addr, u32 val, int 
>> *count)
>> +{
>> +unsigned mask;
>> +unsigned subop;
>> +
>> +switch (val >> 28) {
>> +case 0x0:
> 
> These can easily be derived by looking at the debug output, but it may
> still make sense to assign symbolic names to them.

I have another suggestion. In downstream I removed the decoding part and
I just print out a string of hex. That removes quite a bit bunch of code
from kernel. It makes the debug output also more compact.

It's much easier to write a user space program to decode than maintain
it in kernel.

> 
>> +static void show_channel_word(struct output *o, int *state, int *count,
>> +u32 addr, u32 val, struct host1x_cdma *cdma)
>> +{
>> +static int start_count, dont_print;
> 
> What if two processes read debug information at the same time?

show_channels() acquires cdma.lock, so that shouldn't happen.

> 
>> +static void do_show_channel_gather(struct output *o,
>> +phys_addr_t phys_addr,
>> +u32 words, struct host1x_cdma *cdma,
>> +phys_addr_t pin_addr, u32 *map_addr)
>> +{
>> +/* Map dmaget cursor to corresponding mem handle */
>> +u32 offset;
>> +int state, count, i;
>> +
>> +offset = phys_addr - pin_addr;
>> +/*
>> + * Sometimes we're given different hardware address to the same
>> + * page - in these cases the offset will get an invalid number and
>> + * we just have to bail out.
>> + */
> 
> Why's that?

Because of a race - memory might've been unpinned and unmapped from
IOMMU and when we re-map (pin), we are given a new address.

But, I think this comment is a bit stale - we used to dump also old
gathers. The latest code only dumps jobs in sync queue, so the race
shouldn't happen.

> 
>> +map_addr = host1x_memmgr_mmap(mem);
>> +if (!map_addr) {
>> +host1x_debug_output(o, "[could not mmap]\n");
>> +return;
>> +}
>> +
>> +/* Get base address from mem */
>> +sgt = host1x_memmgr_pin(mem);
>> +if (IS_ERR(sgt)) {
>> +host1x_debug_output(o, "[couldn't pin]\n");
>> +host1x_memmgr_munmap(mem, map_addr);
>> +return;
>> +}
> 
> Maybe you should stick with one of "could not" or "couldn't". Makes it
> easier to search for.

I prefer "could not", so I'll use that.

> 
>> +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
>> +{
>> +struct host1x_job *job;
>> +
>> +list_for_each_entry(job, >sync_queue, list) {
>> +int i;
>> +host1x_debug_output(o,
>> +"\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
>> +" first_get=%08x, timeout=%d"
>> +" num_slots=%d, num_handles=%d\n",
>> +job,
>> +job->syncpt_id,
>> +job->syncpt_end,
>> +job->first_get,
>> +job->timeout,
>> +job->num_slots,
>> +job->num_unpins);
> 
> This could go on fewer lines.

Yes, will merge.

> 
>> +static void host1x_debug_show_channel_cdma(struct host1x *m,
>> +struct host1x_channel *ch, struct 

Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-04 Thread Thierry Reding
On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
[...]
> +static pid_t host1x_debug_null_kickoff_pid;
> +unsigned int host1x_debug_trace_cmdbuf;
> +
> +static pid_t host1x_debug_force_timeout_pid;
> +static u32 host1x_debug_force_timeout_val;
> +static u32 host1x_debug_force_timeout_channel;

Please group static and non-static variables.

> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
[...]
> +struct output {
> + void (*fn)(void *ctx, const char *str, size_t len);
> + void *ctx;
> + char buf[256];
> +};

Do we really need this kind of abstraction? There really should be only
one location where debug information is obtained, so I don't see a need
for this.

> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
[...]
>  struct host1x_syncpt_ops {
>   void (*reset)(struct host1x_syncpt *);
>   void (*reset_wait_base)(struct host1x_syncpt *);
> @@ -117,6 +133,7 @@ struct host1x {
>   struct host1x_channel_ops channel_op;
>   struct host1x_cdma_ops cdma_op;
>   struct host1x_pushbuffer_ops cdma_pb_op;
> + struct host1x_debug_ops debug_op;
>   struct host1x_syncpt_ops syncpt_op;
>   struct host1x_intr_ops intr_op;

Again, better to pass in a const pointer to the ops structure.

> diff --git a/drivers/gpu/host1x/hw/debug_hw.c 
> b/drivers/gpu/host1x/hw/debug_hw.c

> +static int show_channel_command(struct output *o, u32 addr, u32 val, int 
> *count)
> +{
> + unsigned mask;
> + unsigned subop;
> +
> + switch (val >> 28) {
> + case 0x0:

These can easily be derived by looking at the debug output, but it may
still make sense to assign symbolic names to them.

> +static void show_channel_word(struct output *o, int *state, int *count,
> + u32 addr, u32 val, struct host1x_cdma *cdma)
> +{
> + static int start_count, dont_print;

What if two processes read debug information at the same time?

> +static void do_show_channel_gather(struct output *o,
> + phys_addr_t phys_addr,
> + u32 words, struct host1x_cdma *cdma,
> + phys_addr_t pin_addr, u32 *map_addr)
> +{
> + /* Map dmaget cursor to corresponding mem handle */
> + u32 offset;
> + int state, count, i;
> +
> + offset = phys_addr - pin_addr;
> + /*
> +  * Sometimes we're given different hardware address to the same
> +  * page - in these cases the offset will get an invalid number and
> +  * we just have to bail out.
> +  */

Why's that?

> + map_addr = host1x_memmgr_mmap(mem);
> + if (!map_addr) {
> + host1x_debug_output(o, "[could not mmap]\n");
> + return;
> + }
> +
> + /* Get base address from mem */
> + sgt = host1x_memmgr_pin(mem);
> + if (IS_ERR(sgt)) {
> + host1x_debug_output(o, "[couldn't pin]\n");
> + host1x_memmgr_munmap(mem, map_addr);
> + return;
> + }

Maybe you should stick with one of "could not" or "couldn't". Makes it
easier to search for.

> +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
> +{
> + struct host1x_job *job;
> +
> + list_for_each_entry(job, >sync_queue, list) {
> + int i;
> + host1x_debug_output(o,
> + "\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
> + " first_get=%08x, timeout=%d"
> + " num_slots=%d, num_handles=%d\n",
> + job,
> + job->syncpt_id,
> + job->syncpt_end,
> + job->first_get,
> + job->timeout,
> + job->num_slots,
> + job->num_unpins);

This could go on fewer lines.

> +static void host1x_debug_show_channel_cdma(struct host1x *m,
> + struct host1x_channel *ch, struct output *o, int chid)
> +{
[...]
> + switch (cbstat) {
> + case 0x00010008:

Again, symbolic names would be nice.

Thierry


pgpycVdq9BIWK.pgp
Description: PGP signature


Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-04 Thread Thierry Reding
On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
 diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
[...]
 +static pid_t host1x_debug_null_kickoff_pid;
 +unsigned int host1x_debug_trace_cmdbuf;
 +
 +static pid_t host1x_debug_force_timeout_pid;
 +static u32 host1x_debug_force_timeout_val;
 +static u32 host1x_debug_force_timeout_channel;

Please group static and non-static variables.

 diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
[...]
 +struct output {
 + void (*fn)(void *ctx, const char *str, size_t len);
 + void *ctx;
 + char buf[256];
 +};

Do we really need this kind of abstraction? There really should be only
one location where debug information is obtained, so I don't see a need
for this.

 diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
[...]
  struct host1x_syncpt_ops {
   void (*reset)(struct host1x_syncpt *);
   void (*reset_wait_base)(struct host1x_syncpt *);
 @@ -117,6 +133,7 @@ struct host1x {
   struct host1x_channel_ops channel_op;
   struct host1x_cdma_ops cdma_op;
   struct host1x_pushbuffer_ops cdma_pb_op;
 + struct host1x_debug_ops debug_op;
   struct host1x_syncpt_ops syncpt_op;
   struct host1x_intr_ops intr_op;

Again, better to pass in a const pointer to the ops structure.

 diff --git a/drivers/gpu/host1x/hw/debug_hw.c 
 b/drivers/gpu/host1x/hw/debug_hw.c

 +static int show_channel_command(struct output *o, u32 addr, u32 val, int 
 *count)
 +{
 + unsigned mask;
 + unsigned subop;
 +
 + switch (val  28) {
 + case 0x0:

These can easily be derived by looking at the debug output, but it may
still make sense to assign symbolic names to them.

 +static void show_channel_word(struct output *o, int *state, int *count,
 + u32 addr, u32 val, struct host1x_cdma *cdma)
 +{
 + static int start_count, dont_print;

What if two processes read debug information at the same time?

 +static void do_show_channel_gather(struct output *o,
 + phys_addr_t phys_addr,
 + u32 words, struct host1x_cdma *cdma,
 + phys_addr_t pin_addr, u32 *map_addr)
 +{
 + /* Map dmaget cursor to corresponding mem handle */
 + u32 offset;
 + int state, count, i;
 +
 + offset = phys_addr - pin_addr;
 + /*
 +  * Sometimes we're given different hardware address to the same
 +  * page - in these cases the offset will get an invalid number and
 +  * we just have to bail out.
 +  */

Why's that?

 + map_addr = host1x_memmgr_mmap(mem);
 + if (!map_addr) {
 + host1x_debug_output(o, [could not mmap]\n);
 + return;
 + }
 +
 + /* Get base address from mem */
 + sgt = host1x_memmgr_pin(mem);
 + if (IS_ERR(sgt)) {
 + host1x_debug_output(o, [couldn't pin]\n);
 + host1x_memmgr_munmap(mem, map_addr);
 + return;
 + }

Maybe you should stick with one of could not or couldn't. Makes it
easier to search for.

 +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 +{
 + struct host1x_job *job;
 +
 + list_for_each_entry(job, cdma-sync_queue, list) {
 + int i;
 + host1x_debug_output(o,
 + \n%p: JOB, syncpt_id=%d, syncpt_val=%d,
 +  first_get=%08x, timeout=%d
 +  num_slots=%d, num_handles=%d\n,
 + job,
 + job-syncpt_id,
 + job-syncpt_end,
 + job-first_get,
 + job-timeout,
 + job-num_slots,
 + job-num_unpins);

This could go on fewer lines.

 +static void host1x_debug_show_channel_cdma(struct host1x *m,
 + struct host1x_channel *ch, struct output *o, int chid)
 +{
[...]
 + switch (cbstat) {
 + case 0x00010008:

Again, symbolic names would be nice.

Thierry


pgpycVdq9BIWK.pgp
Description: PGP signature


Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-04 Thread Terje Bergström
On 04.02.2013 03:03, Thierry Reding wrote:
 * PGP Signed by an unknown key
 
 On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
 diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
 [...]
 +static pid_t host1x_debug_null_kickoff_pid;
 +unsigned int host1x_debug_trace_cmdbuf;
 +
 +static pid_t host1x_debug_force_timeout_pid;
 +static u32 host1x_debug_force_timeout_val;
 +static u32 host1x_debug_force_timeout_channel;
 
 Please group static and non-static variables.

Will do.

 
 diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
 [...]
 +struct output {
 +void (*fn)(void *ctx, const char *str, size_t len);
 +void *ctx;
 +char buf[256];
 +};
 
 Do we really need this kind of abstraction? There really should be only
 one location where debug information is obtained, so I don't see a need
 for this.

This is used by debugfs code to direct to debugfs, and
nvhost_debug_dump() to send via printk.

 
 diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
 [...]
  struct host1x_syncpt_ops {
  void (*reset)(struct host1x_syncpt *);
  void (*reset_wait_base)(struct host1x_syncpt *);
 @@ -117,6 +133,7 @@ struct host1x {
  struct host1x_channel_ops channel_op;
  struct host1x_cdma_ops cdma_op;
  struct host1x_pushbuffer_ops cdma_pb_op;
 +struct host1x_debug_ops debug_op;
  struct host1x_syncpt_ops syncpt_op;
  struct host1x_intr_ops intr_op;
 
 Again, better to pass in a const pointer to the ops structure.

Ok.

 
 diff --git a/drivers/gpu/host1x/hw/debug_hw.c 
 b/drivers/gpu/host1x/hw/debug_hw.c
 
 +static int show_channel_command(struct output *o, u32 addr, u32 val, int 
 *count)
 +{
 +unsigned mask;
 +unsigned subop;
 +
 +switch (val  28) {
 +case 0x0:
 
 These can easily be derived by looking at the debug output, but it may
 still make sense to assign symbolic names to them.

I have another suggestion. In downstream I removed the decoding part and
I just print out a string of hex. That removes quite a bit bunch of code
from kernel. It makes the debug output also more compact.

It's much easier to write a user space program to decode than maintain
it in kernel.

 
 +static void show_channel_word(struct output *o, int *state, int *count,
 +u32 addr, u32 val, struct host1x_cdma *cdma)
 +{
 +static int start_count, dont_print;
 
 What if two processes read debug information at the same time?

show_channels() acquires cdma.lock, so that shouldn't happen.

 
 +static void do_show_channel_gather(struct output *o,
 +phys_addr_t phys_addr,
 +u32 words, struct host1x_cdma *cdma,
 +phys_addr_t pin_addr, u32 *map_addr)
 +{
 +/* Map dmaget cursor to corresponding mem handle */
 +u32 offset;
 +int state, count, i;
 +
 +offset = phys_addr - pin_addr;
 +/*
 + * Sometimes we're given different hardware address to the same
 + * page - in these cases the offset will get an invalid number and
 + * we just have to bail out.
 + */
 
 Why's that?

Because of a race - memory might've been unpinned and unmapped from
IOMMU and when we re-map (pin), we are given a new address.

But, I think this comment is a bit stale - we used to dump also old
gathers. The latest code only dumps jobs in sync queue, so the race
shouldn't happen.

 
 +map_addr = host1x_memmgr_mmap(mem);
 +if (!map_addr) {
 +host1x_debug_output(o, [could not mmap]\n);
 +return;
 +}
 +
 +/* Get base address from mem */
 +sgt = host1x_memmgr_pin(mem);
 +if (IS_ERR(sgt)) {
 +host1x_debug_output(o, [couldn't pin]\n);
 +host1x_memmgr_munmap(mem, map_addr);
 +return;
 +}
 
 Maybe you should stick with one of could not or couldn't. Makes it
 easier to search for.

I prefer could not, so I'll use that.

 
 +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 +{
 +struct host1x_job *job;
 +
 +list_for_each_entry(job, cdma-sync_queue, list) {
 +int i;
 +host1x_debug_output(o,
 +\n%p: JOB, syncpt_id=%d, syncpt_val=%d,
 + first_get=%08x, timeout=%d
 + num_slots=%d, num_handles=%d\n,
 +job,
 +job-syncpt_id,
 +job-syncpt_end,
 +job-first_get,
 +job-timeout,
 +job-num_slots,
 +job-num_unpins);
 
 This could go on fewer lines.

Yes, will merge.

 
 +static void host1x_debug_show_channel_cdma(struct host1x *m,
 +struct host1x_channel *ch, struct output *o, int chid)
 +{
 [...]
 +switch (cbstat) {
 +case 0x00010008:
 
 Again, symbolic names would be nice.

I propose I remove the decoding from kernel, and save 200 lines.

Terje
--
To unsubscribe from this 

[PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-01-15 Thread Terje Bergstrom
Add support for host1x debugging. Adds debugfs entries, and dumps
channel state to UART in case of stuck job.

Signed-off-by: Terje Bergstrom 
---
 drivers/gpu/host1x/Makefile |1 +
 drivers/gpu/host1x/cdma.c   |   34 +++
 drivers/gpu/host1x/debug.c  |  215 ++
 drivers/gpu/host1x/debug.h  |   50 
 drivers/gpu/host1x/dev.c|3 +
 drivers/gpu/host1x/dev.h|   17 ++
 drivers/gpu/host1x/hw/cdma_hw.c |3 +
 drivers/gpu/host1x/hw/debug_hw.c|  400 +++
 drivers/gpu/host1x/hw/host1x01.c|2 +
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |   18 ++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h|  115 
 drivers/gpu/host1x/hw/syncpt_hw.c   |1 +
 drivers/gpu/host1x/syncpt.c |3 +
 13 files changed, 862 insertions(+)
 create mode 100644 drivers/gpu/host1x/debug.c
 create mode 100644 drivers/gpu/host1x/debug.h
 create mode 100644 drivers/gpu/host1x/hw/debug_hw.c

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index cdd87c8..697d49a 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -7,6 +7,7 @@ host1x-y = \
cdma.o \
channel.o \
job.o \
+   debug.o \
memmgr.o \
hw/host1x01.o
 
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index d6a38d2..12dd46c 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -19,6 +19,7 @@
 #include "cdma.h"
 #include "channel.h"
 #include "dev.h"
+#include "debug.h"
 #include "memmgr.h"
 #include "job.h"
 #include 
@@ -370,12 +371,42 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct 
host1x_job *job)
return 0;
 }
 
+static void trace_write_gather(struct host1x_cdma *cdma,
+   struct mem_handle *ref,
+   u32 offset, u32 words)
+{
+   void *mem = NULL;
+
+   if (host1x_debug_trace_cmdbuf)
+   mem = host1x_memmgr_mmap(ref);
+
+   if (mem) {
+   u32 i;
+   /*
+* Write in batches of 128 as there seems to be a limit
+* of how much you can output to ftrace at once.
+*/
+   for (i = 0; i < words; i += TRACE_MAX_LENGTH) {
+   trace_host1x_cdma_push_gather(
+   cdma_to_channel(cdma)->dev->name,
+   (u32)ref,
+   min(words - i, TRACE_MAX_LENGTH),
+   offset + i * sizeof(u32),
+   mem);
+   }
+   host1x_memmgr_munmap(ref, mem);
+   }
+}
+
 /*
  * Push two words into a push buffer slot
  * Blocks as necessary if the push buffer is full.
  */
 void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2)
 {
+   if (host1x_debug_trace_cmdbuf)
+   trace_host1x_cdma_push(cdma_to_channel(cdma)->dev->name,
+   op1, op2);
host1x_cdma_push_gather(cdma, NULL, 0, op1, op2);
 }
 
@@ -391,6 +422,9 @@ void host1x_cdma_push_gather(struct host1x_cdma *cdma,
u32 slots_free = cdma->slots_free;
struct push_buffer *pb = >push_buffer;
 
+   if (handle)
+   trace_write_gather(cdma, handle, offset, op1 & 0x);
+
if (slots_free == 0) {
host1x->cdma_op.kick(cdma);
slots_free = host1x_cdma_wait_locked(cdma,
diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
new file mode 100644
index 000..29cbe93
--- /dev/null
+++ b/drivers/gpu/host1x/debug.c
@@ -0,0 +1,215 @@
+/*
+ * Copyright (C) 2010 Google, Inc.
+ * Author: Erik Gilling 
+ *
+ * Copyright (C) 2011-2012 NVIDIA Corporation
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+#include "dev.h"
+#include "debug.h"
+#include "channel.h"
+
+static pid_t host1x_debug_null_kickoff_pid;
+unsigned int host1x_debug_trace_cmdbuf;
+
+static pid_t host1x_debug_force_timeout_pid;
+static u32 host1x_debug_force_timeout_val;
+static u32 host1x_debug_force_timeout_channel;
+
+void host1x_debug_output(struct output *o, const char *fmt, ...)
+{
+   va_list args;
+   int len;
+
+   va_start(args, fmt);
+   len = vsnprintf(o->buf, sizeof(o->buf), fmt, args);
+   va_end(args);
+   o->fn(o->ctx, o->buf, len);
+}
+
+static int show_channels(struct host1x_channel *ch, void *data)
+{
+ 

[PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-01-15 Thread Terje Bergstrom
Add support for host1x debugging. Adds debugfs entries, and dumps
channel state to UART in case of stuck job.

Signed-off-by: Terje Bergstrom tbergst...@nvidia.com
---
 drivers/gpu/host1x/Makefile |1 +
 drivers/gpu/host1x/cdma.c   |   34 +++
 drivers/gpu/host1x/debug.c  |  215 ++
 drivers/gpu/host1x/debug.h  |   50 
 drivers/gpu/host1x/dev.c|3 +
 drivers/gpu/host1x/dev.h|   17 ++
 drivers/gpu/host1x/hw/cdma_hw.c |3 +
 drivers/gpu/host1x/hw/debug_hw.c|  400 +++
 drivers/gpu/host1x/hw/host1x01.c|2 +
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |   18 ++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h|  115 
 drivers/gpu/host1x/hw/syncpt_hw.c   |1 +
 drivers/gpu/host1x/syncpt.c |3 +
 13 files changed, 862 insertions(+)
 create mode 100644 drivers/gpu/host1x/debug.c
 create mode 100644 drivers/gpu/host1x/debug.h
 create mode 100644 drivers/gpu/host1x/hw/debug_hw.c

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index cdd87c8..697d49a 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -7,6 +7,7 @@ host1x-y = \
cdma.o \
channel.o \
job.o \
+   debug.o \
memmgr.o \
hw/host1x01.o
 
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index d6a38d2..12dd46c 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -19,6 +19,7 @@
 #include cdma.h
 #include channel.h
 #include dev.h
+#include debug.h
 #include memmgr.h
 #include job.h
 #include asm/cacheflush.h
@@ -370,12 +371,42 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct 
host1x_job *job)
return 0;
 }
 
+static void trace_write_gather(struct host1x_cdma *cdma,
+   struct mem_handle *ref,
+   u32 offset, u32 words)
+{
+   void *mem = NULL;
+
+   if (host1x_debug_trace_cmdbuf)
+   mem = host1x_memmgr_mmap(ref);
+
+   if (mem) {
+   u32 i;
+   /*
+* Write in batches of 128 as there seems to be a limit
+* of how much you can output to ftrace at once.
+*/
+   for (i = 0; i  words; i += TRACE_MAX_LENGTH) {
+   trace_host1x_cdma_push_gather(
+   cdma_to_channel(cdma)-dev-name,
+   (u32)ref,
+   min(words - i, TRACE_MAX_LENGTH),
+   offset + i * sizeof(u32),
+   mem);
+   }
+   host1x_memmgr_munmap(ref, mem);
+   }
+}
+
 /*
  * Push two words into a push buffer slot
  * Blocks as necessary if the push buffer is full.
  */
 void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2)
 {
+   if (host1x_debug_trace_cmdbuf)
+   trace_host1x_cdma_push(cdma_to_channel(cdma)-dev-name,
+   op1, op2);
host1x_cdma_push_gather(cdma, NULL, 0, op1, op2);
 }
 
@@ -391,6 +422,9 @@ void host1x_cdma_push_gather(struct host1x_cdma *cdma,
u32 slots_free = cdma-slots_free;
struct push_buffer *pb = cdma-push_buffer;
 
+   if (handle)
+   trace_write_gather(cdma, handle, offset, op1  0x);
+
if (slots_free == 0) {
host1x-cdma_op.kick(cdma);
slots_free = host1x_cdma_wait_locked(cdma,
diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
new file mode 100644
index 000..29cbe93
--- /dev/null
+++ b/drivers/gpu/host1x/debug.c
@@ -0,0 +1,215 @@
+/*
+ * Copyright (C) 2010 Google, Inc.
+ * Author: Erik Gilling konk...@android.com
+ *
+ * Copyright (C) 2011-2012 NVIDIA Corporation
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include linux/debugfs.h
+#include linux/seq_file.h
+#include linux/uaccess.h
+
+#include linux/io.h
+
+#include dev.h
+#include debug.h
+#include channel.h
+
+static pid_t host1x_debug_null_kickoff_pid;
+unsigned int host1x_debug_trace_cmdbuf;
+
+static pid_t host1x_debug_force_timeout_pid;
+static u32 host1x_debug_force_timeout_val;
+static u32 host1x_debug_force_timeout_channel;
+
+void host1x_debug_output(struct output *o, const char *fmt, ...)
+{
+   va_list args;
+   int len;
+
+   va_start(args, fmt);
+   len = vsnprintf(o-buf, sizeof(o-buf), fmt, args);
+   va_end(args);
+   o-fn(o-ctx,