Re: [RFC PATCH bpf-next 0/6] bpf, printk: add BTF-based type printing

2020-05-01 Thread Joe Perches
On Thu, 2020-04-30 at 03:03 -0700, Alan Maguire wrote:
> On Mon, 20 Apr 2020, Joe Perches wrote:

> > Here as well the individual field types don't contain
> > enough information to determine if a field should be
> > output as %x or %u.
> Right, we could add some more format modifiers for cases
> like that I guess.  Currently the display formats used are
> - numbers are represented as decimal
> - bitfields are represented in hex
> - pointers are obfuscated unless the 'x' option is used
> - char arrays are printed as chars if printable,
>   [ 'l', 'i', 'k', 'e', ' ', 't', 'h', 'i', 's' ]
> 
> I'd be happy to add more format specifiers to control
> these options, or tweak the defaults if needed. A
> "print numbers in hex" option seems worthwhile perhaps?

Or maybe add and use new typedefs like x8,x16,x32,x64 to the
bpf struct definitions where u8,u16,u32,u64 are %u and
x8,x16,x32,x64 are %x




Re: [RFC PATCH bpf-next 0/6] bpf, printk: add BTF-based type printing

2020-04-30 Thread Alan Maguire
On Mon, 20 Apr 2020, Joe Perches wrote:

> On Mon, 2020-04-20 at 16:29 +0100, Alan Maguire wrote:
> > On Sat, 18 Apr 2020, Alexei Starovoitov wrote:
> > 
> > > On Fri, Apr 17, 2020 at 11:42:34AM +0100, Alan Maguire wrote:
> > > > The printk family of functions support printing specific pointer types
> > > > using %p format specifiers (MAC addresses, IP addresses, etc).  For
> > > > full details see Documentation/core-api/printk-formats.rst.
> > > > 
> > > > This RFC patchset proposes introducing a "print typed pointer" format
> > > > specifier "%pT"; the type specified is then looked up in the BPF
> > > > Type Format (BTF) information provided for vmlinux to support display.
> > > 
> > > This is great idea! Love it.
> > > 
> > 
> > Thanks for taking a look!
> >  
> > > > The above potential use cases hint at a potential reply to
> > > > a reasonable objection that such typed display should be
> > > > solved by tracing programs, where the in kernel tracing records
> > > > data and the userspace program prints it out.  While this
> > > > is certainly the recommended approach for most cases, I
> > > > believe having an in-kernel mechanism would be valuable
> > > > also.
> > > 
> > > yep. This is useful for general purpose printk.
> > > The only piece that must be highlighted in the printk documentation
> > > that unlike the rest of BPF there are zero safety guarantees here.
> > > The programmer can pass wrong pointer to printk() and the kernel _will_ 
> > > crash.
> > > 
> > 
> > Good point; I'll highlight the fact that we aren't
> > executing in BPF context, no verifier etc.
> > 
> > > >   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);
> > > > 
> > > >   pr_info("%pTN", skb);
> > > 
> > > why follow "TN" convention?
> > > I think "%p" is much more obvious, unambiguous, and
> > > equally easy to parse.
> > > 
> > 
> > That was my first choice, but the first character
> > after the 'p' in the '%p' specifier signifies the
> > pointer format specifier. If we use '<', and have
> > '%p<', where do we put the modifiers? '%p'
> > seems clunky to me.
> 
> While I don't really like the %p block,
> it's at least obvious what's being attempted.
> 
> Modifiers could easily go after the  block.
>

Good idea; I'll go with that approach for v2.
 
> It appears a %p output might be a lot of
> total characters so another potential issue might be
> the maximum length of each individual printk.
>

Right, that's a concern. On the other side, it turns out
that with the "omit zeroed values" behaviour by default,
the output trims down nicely.  The omit zero functionality
works for nested types too, so a freshly-allocated skb
fits easily inside the printk limit now.  The in-progress
output looks like this now (v2 coming shortly I hope):

printk(KERN_INFO "%p", skb);

(struct sk_buff){
 .transport_header = (__u16)65535,
 .mac_header = (__u16)65535,
 .end = (sk_buff_data_t)192,
 .head = (unsigned char *)7524fd8b,
 .data = (unsigned char *)7524fd8b,
 .truesize = (unsigned int)768,
 .users = (refcount_t){
  .refs = (atomic_t){
   .counter = (int)1,
  },
 },
}

Of course as structures get used and values get set
(precisely when we need to see them!) more fields will
be displayed and we will bump against printk limits.

The modifiers I'm working on for v2 are

'c' - compact mode (no pretty-printing), i.e.

(struct sk_buff){.transport_header = (__u16)65535,.mac_header = 
(__u16)65535,.end = (sk_buff_data_t)192,.head = (unsigned char 
*)7524fd8b,.data = (unsigned char *)7524fd8b,.truesize = 
(unsigned int)768,.users = (refcount_t){.refs = (atomic_t){.counter = 
(int)1,},},}

  This saves a fair few characters, especially for highly-indented
  data structures.

'T' - omit type/member names.
'x' - avoid pointer obfuscation
'0' - show zeroed values, as suggested by Arnaldo and Alexei

...so the default output of "%p"
will be like the above example.

I was hoping to punt on pointer obfuscation and just
use %p[K] since obfuscation can be sysctl-controlled;
however that control assumes a controlling process context
as I understand it.  Since BTF printk can be invoked
anywhere (especially via trace_printk() in BPF programs),
those settings aren't hugely relevant, so I _think_ we need
a way to directly control obfuscation for this use case.

> > > I like the choice of C style output, but please format it similar to 
> > > drgn. Like:
> > > *(struct task_struct *)0x889ff8a08000 = {
> > >   .thread_info = (struct thread_info){
> > >   .flags = (unsigned long)0,
> > >   .status = (u32)0,
> > >   },
> > >   .state = (volatile long)1,
> > >   .stack = (void *)0xc9000c4dc000,
> > >   .usage = (refcount_t){
> > >   .refs = (atomic_t){
> > >   .counter = (int)2,
> > >   },
> > >   },
> > >   .flags = (unsigned int)4194560,
> > >   .ptrace = (unsigned int)0,
> 
> And here, the issue might be duplicating the log level
> for each line of output and/or prefixing 

Re: [RFC PATCH bpf-next 0/6] bpf, printk: add BTF-based type printing

2020-04-29 Thread Rasmus Villemoes
On 20/04/2020 18.32, Joe Perches wrote:
> On Mon, 2020-04-20 at 16:29 +0100, Alan Maguire wrote:

   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);

   pr_info("%pTN", skb);
>>>
>>> why follow "TN" convention?
>>> I think "%p" is much more obvious, unambiguous, and
>>> equally easy to parse.
>>>
>>
>> That was my first choice, but the first character
>> after the 'p' in the '%p' specifier signifies the
>> pointer format specifier. If we use '<', and have
>> '%p<', where do we put the modifiers? '%p'
>> seems clunky to me.

There's also the issue that %p followed by alnum has been understood to
be reserved/specially handled by the kernel's printf implementation for
a decade, while other characters have so far been treated as "OK, this
was just a normal %p". A quick grep for %p< only gives a hit in
drivers/scsi/dc395x.c, but there could be others (with field width
modifier between % and p), and in any case I think it's a bad idea to
extend the set of characters that cannot follow %p.

Rasmus