Brendan Gregg - Sun Microsystems wrote:

>[...]
>  
>
>>>There might be an easier way to drop the duplicate probe overhead, which
>>>currently looks like,
>>>
>>> 14223                 DTRACE_PROBE4(ip4__physical__out__start,
>>> 14224                     ill_t *, NULL, ill_t *, stq_ill,
>>> 14225                     ipha_t *, ipha, mblk_t *, mp);
>>> 14226                 FW_HOOKS(ipst->ips_ip4_physical_out_event,
>>> 14227                     ipst->ips_ipv4firewall_physical_out,
>>> 14228                     NULL, stq_ill, ipha, mp, mpip, ipst);
>>> 14229                 DTRACE_PROBE1(ip4__physical__out__end, mblk_t *,
>>> 14230                     mp);
>>> 14231                 if (mp == NULL)
>>> 14232                         goto drop;
>>> 14233 
>>> 14234                 DTRACE_IP5(send, mblk_t *, mp, void_ip_t *, ipha,
>>> 14235                     ill_t *, stq_ill, ipha_t *, ipha, ip6_t *, 
>>> NULL);
>>>
>>>Why have sdt:::ip4-physical-out-start and sdt:::ip4-physical-out-end 
>>>anyway?
>>>
>>>
>>>      
>>>
>>For -start, the point is to have a dtrace probe that is particular to the
>>location of the hook in the stack relative to packet processing.
>>    
>>
>
>That sounds doable from fbt,
>
>   # dtrace -n 'fbt::hook_run:entry { 
> trace(stringof(args[0]->hei_event->he_name)); stack(); }'
>   dtrace: description 'fbt::hook_run:entry ' matched 1 probe
>   CPU     ID                    FUNCTION:NAME
>     0  73492                   hook_run:entry   PHYSICAL_OUT                  
>    
>                 ip`tcp_send_data+0x78e
>                 ip`tcp_output+0x78f
>                 ip`squeue_enter+0x41a
>                 ip`tcp_wput+0xf8
>                 unix`putnext+0x2f1
>                 rpcmod`mir_wput+0x1aa
>                 rpcmod`rmm_wput+0x1e
>                 unix`put+0x270
>                 rpcmod`clnt_dispatch_send+0x11a
>                 rpcmod`clnt_cots_kcallit+0x596
>                 nfs`nfs4_rfscall+0x4e2
>                 nfs`rfs4call+0x102
>                 nfs`nfs4lookupvalidate_otw+0x2d5
>                 nfs`nfs4lookup+0x212
>                 nfs`nfs4_lookup+0xe5
>                 genunix`fop_lookup+0x53
>                 genunix`lookuppnvp+0x2e5
>                 genunix`lookuppnat+0x125
>                 genunix`lookupnameat+0x82
>                 genunix`cstatat_getvp+0x160
>   [...]
>  
>

When it is possible to write a dtrace script that use a probe named
"ip4-physical-out-end" and there is no requirement for me to know
about fbt, all will be good.


>>For -end, it is useful to see the mp after the FW_HOOKS() is complete.
>>Is it NULL?  Has it been changed?  Both of these are possible here.
>>Or in other words, the complexity of using dtrace to achieve something
>>meaningful is why there are two hooks.
>>    
>>
>
>A minute reading FW_HOOKS() shows that it will set mp to NULL and complete
>if hook_run() returns zero, which fbt can trace.
>  
>

So long as hook run is being used in this context.

If hook_run() is ever used outside of networking (and there's no reason
why it couldn't) then this becomes harder - more matching is required on
entry of fbt's and saving parameters too.

The use of "?" in D to achieve "if-else" assignments is not obvious at first
and the lack of "if-else" is the biggest obstacle I run into with D.


>>Plus, none of the standard fbt things will show us what happens to mp on
>>the successful return of the call to hook_run.
>>    
>>
>
>This is also doable from fbt, although a few lines of code,
>
>       fbt::hook_run:entry
>       {
>               self->info = (hook_pkt_event_t *)arg1;
>       }
>
>       fbt::hook_run:return
>       /self->info/
>       {
>               this->mp = arg1 != 0 ? NULL : (uint64_t)*self->info->hpe_mp;
>               printf("mp: %x", this->mp);
>               self->info = 0;
>       }
>
>The above shows the value of mp after FW_HOOKS(), and accounts for both
>hook_run() and FW_HOOKS() itself changing mp.
>  
>

But it requires me to use 2 probes, not 1, and in a more complex
method than simply doing:

dtrace -n 'sdt:::ip4-physical-out-end{printf("mp: %x", this->mp);}'

People able to put it all on one command line has its advantages ;)


>...
>  
>
>>Or to put it another way, it it was possible to use a probes that
>>presented just as simple an interface but implemented it using
>>other means then sure, the probes could be eliminated.
>>    
>>
>
>Who is this for? Who would be working on this code and not have read the
>contents of FW_HOOKS()?
>  
>

Using the existing probes doesn't require someone to read the
FW_HOOKS macro, today.  If I write up a blog or something else
about this SDT probes and mention what the parameters are,
that should be enough for someone to go and use them for
something.

I'd also reject the assertion that using SDT probes implies that
someone is "working on code."


>>Or to use the syntax from above, I would be just as happy if I could
>>write a probe like this:
>>
>>dtrace -I provider/pfh -n 'pfh:::ip4-physical-out-end{...}'
>>
>>where "-I" is an "include" function, and dtrace somehow built the
>>ip4-physical-out-end using fbt probes like you did, why do I care
>>how the probe is implemented?  A consideration that needs to be
>>taken into account is the performance cost of this approach.
>>    
>>
>
>DTrace already hase some functionality that could be used; People can
>build custom /usr/lib/dtrace include files containing translators,
>inlines and #defines to make life easier for tracing certain targets.
>  
>

Hmm, and I see you're using translators to build up the data
structures for your provider...  I've never looked at a translator
before - the syntax seems to borrow from C++ - I'm sure that
will make instant enemies of some ;)

They look useful, especially for networking...


>It might make sense to add a little functionality, such as provider
>aliasing as you suggested, to tie this together as a user customised
>view of fbt. Especially for fbt probes that are complex to use.
>
>Of course, anyone writing these include files will need to maintain them
>as the kernel changes.
>  
>

Of course.

Darren

_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to