Re: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint

2018-01-02 Thread Brendan Gregg
On Sat, Dec 30, 2017 at 7:06 PM, Yafang Shao <laoar.s...@gmail.com> wrote:
> On Sun, Dec 31, 2017 at 6:33 AM, Brendan Gregg
> <brendan.d.gr...@gmail.com> wrote:
>> On Tue, Dec 19, 2017 at 7:12 PM, Yafang Shao <laoar.s...@gmail.com> wrote:
>>> As sk_state is a common field for struct sock, so the state
>>> transition tracepoint should not be a TCP specific feature.
>>> Currently it traces all AF_INET state transition, so I rename this
>>> tracepoint to inet_sock_set_state tracepoint with some minor changes and 
>>> move it
>>> into trace/events/sock.h.
>>
>> The tcp:tcp_set_state probe is tcp_set_state(), so it's only going to
>> fire for TCP sessions. It's not broken, and we could add a
>> sctp:sctp_set_state as well. Replacing tcp:tcp_set_state with
>> inet_sk_set_state is feeling like we might be baking too much
>> implementation detail into the tracepoint API.
>>
>> If we must have inet_sk_set_state, then must we also delete 
>> tcp:tcp_set_state?
>>
>
> Hi Brendan,
>
> The reason we have to make this change could be got from this mail
> thread, https://patchwork.kernel.org/patch/10099243/ .
>
> The original tcp:tcp_set_state probe doesn't traced all TCP state transitions.
> There're some state transitions in inet_connection_sock.c and
> inet_hashtables.c are missed.
> So we have to place this probe into these two files to fix the issue.
> But as inet_connection_sock.c and inet_hashtables.c are common files
> for all IPv4 protocols, not only for TCP, so it is not proper to place
> a tcp_ function in these two files.
> That's why we decide to rename tcp:tcp_set_state probe to
> sock:inet_sock_set_state.

It kinda feels like we are fixing one exposing-implementation problem
(the missing state changes, which I'm happy to see fixed), by exposing
another (there's no tcp:tcp_set_state because we don't want to put tcp
functions in inet*.c files). Anyway...

If I'm to use sock:inet_sock_set_state for TCP tracing, I'd like
sk->sk_protocol exposed as a tracepoint argument so I can match on
IPPROTO_TCP. Otherwise I'll have to keep digging it out of (void
*)skaddr. (And if we're adding arguments, maybe consider sk_family as
well, to make it easier to see which address arguments to use).

Brendan


Re: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint

2017-12-30 Thread Brendan Gregg
On Tue, Dec 19, 2017 at 7:12 PM, Yafang Shao  wrote:
> As sk_state is a common field for struct sock, so the state
> transition tracepoint should not be a TCP specific feature.
> Currently it traces all AF_INET state transition, so I rename this
> tracepoint to inet_sock_set_state tracepoint with some minor changes and move 
> it
> into trace/events/sock.h.

The tcp:tcp_set_state probe is tcp_set_state(), so it's only going to
fire for TCP sessions. It's not broken, and we could add a
sctp:sctp_set_state as well. Replacing tcp:tcp_set_state with
inet_sk_set_state is feeling like we might be baking too much
implementation detail into the tracepoint API.

If we must have inet_sk_set_state, then must we also delete tcp:tcp_set_state?

Brendan


> We dont need to create a file named trace/events/inet_sock.h for this one 
> single
> tracepoint.
>
> Two helpers are introduced to trace sk_state transition
> - void inet_sk_state_store(struct sock *sk, int newstate);
> - void inet_sk_set_state(struct sock *sk, int state);
> As trace header should not be included in other header files,
> so they are defined in sock.c.
>
> The protocol such as SCTP maybe compiled as a ko, hence export
> inet_sk_set_state().
>[...]


Re: [Patch net-next v3] tcp: add a tracepoint for tcp retransmission

2017-10-13 Thread Brendan Gregg
On Fri, Oct 13, 2017 at 3:09 PM, Alexei Starovoitov
<alexei.starovoi...@gmail.com> wrote:
> On Fri, Oct 13, 2017 at 01:50:44PM -0700, Brendan Gregg wrote:
>> On Fri, Oct 13, 2017 at 1:03 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>> > We need a real-time notification for tcp retransmission
>> > for monitoring.
>> >
>> > Of course we could use ftrace to dynamically instrument this
>> > kernel function too, however we can't retrieve the connection
>> > information at the same time, for example perf-tools [1] reads
>> > /proc/net/tcp for socket details, which is slow when we have
>> > a lots of connections.
>> >
>> > Therefore, this patch adds a tracepoint for __tcp_retransmit_skb()
>> > and exposes src/dst IP addresses and ports of the connection.
>> > This also makes it easier to integrate into perf.
>> >
>> > Note, I expose both IPv4 and IPv6 addresses at the same time:
>> > for a IPv4 socket, v4 mapped address is used as IPv6 addresses,
>> > for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel.
>> > Also, add sk and skb pointers as they are useful for BPF.
>>
>> Thanks, a TCP retransmit tracepoint would be great. (tcp_set_state
>> would be highly useful too, which Alexei already has in his list).
>>
>> Should skp->__sk_common.skc_state be included in the format string, so
>> we don't have to always dig it out of the skaddr? For retransmits I
>> always want to know the TCP state, to determine if it is ESTABLISHED
>> (packet drop) or SYN_SENT (backlog full) or something else.
>
> let's not expose internal socket fields into tp fields.
> Few people still believe that tp fields are abi, so to be safe
> no such fields should be exposed.
> It's trivial enough to read sk_state from bpf program
> with bpf_probe_read().

Ah, right, the number mapping for TCP_ESTABLISHED and friends is a
Linux implementation detail, and not from the RFCs. Ok, I can dig it
from the skp instead.

>
>> We probably need a tracepoint for tcp_send_loss_probe() (TLP) as well,
>> for tracing at the same time as retransmits (like my tools do), but
>> that can be added later.
>
> hmm. why?
> This single tracepoint will cover both cases of retransmits.

I don't think tcp_send_loss_probe() TLP goes through
__tcp_retransmit_skb(): look at the path to bumping
LINUX_MIB_TCPLOSSPROBES. I was thinking that later on we might want to
add a tcp:tcp_send_tlp tracepoint, in addition to this
tcp:tcp_retransmit_skb tracepoint, for investigating the same kind of
issues: packet loss. This existing tcp:tcp_retransmit_skb tracepoint
patch is ok.

Acked-by: Brendan Gregg <bgr...@netflix.com>

(with or without %pI6c)

Brendan


Re: [Patch net-next v3] tcp: add a tracepoint for tcp retransmission

2017-10-13 Thread Brendan Gregg
On Fri, Oct 13, 2017 at 1:03 PM, Cong Wang  wrote:
> We need a real-time notification for tcp retransmission
> for monitoring.
>
> Of course we could use ftrace to dynamically instrument this
> kernel function too, however we can't retrieve the connection
> information at the same time, for example perf-tools [1] reads
> /proc/net/tcp for socket details, which is slow when we have
> a lots of connections.
>
> Therefore, this patch adds a tracepoint for __tcp_retransmit_skb()
> and exposes src/dst IP addresses and ports of the connection.
> This also makes it easier to integrate into perf.
>
> Note, I expose both IPv4 and IPv6 addresses at the same time:
> for a IPv4 socket, v4 mapped address is used as IPv6 addresses,
> for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel.
> Also, add sk and skb pointers as they are useful for BPF.

Thanks, a TCP retransmit tracepoint would be great. (tcp_set_state
would be highly useful too, which Alexei already has in his list).

Should skp->__sk_common.skc_state be included in the format string, so
we don't have to always dig it out of the skaddr? For retransmits I
always want to know the TCP state, to determine if it is ESTABLISHED
(packet drop) or SYN_SENT (backlog full) or something else.

We probably need a tracepoint for tcp_send_loss_probe() (TLP) as well,
for tracing at the same time as retransmits (like my tools do), but
that can be added later.

Brendan


Re: [PATCH v2 net-next 0/6] perf, bpf: add support for bpf in sw/hw perf_events

2016-08-31 Thread Brendan Gregg
On Wed, Aug 31, 2016 at 2:50 PM, Alexei Starovoitov <a...@fb.com> wrote:
> Hi Peter, Dave,
>
> this patch set is a follow up to the discussion:
> https://lkml.kernel.org/r/20160804142853.GO6862%20()%20twins%20!%20programming%20!%20kicks-ass%20!%20net
> It turned out to be simpler than what we discussed.
>
> Patches 1-3 is bpf-side prep for the main patch 4
> that adds bpf program as an overflow_handler to sw and hw perf_events.
> Peter, please review.
>
> Patches 5 and 6 are examples from myself and Brendan.
>
> v1-v2: fixed issues spotted by Peter and Daniel.

Thanks Alexei!

Tested-by: Brendan Gregg <bgr...@netflix.com>

Brendan


Re: [PATCH net-next 0/6] perf, bpf: add support for bpf in sw/hw perf_events

2016-08-29 Thread Brendan Gregg
On Mon, Aug 29, 2016 at 5:19 AM, Peter Zijlstra  wrote:
>
> On Fri, Aug 26, 2016 at 07:31:18PM -0700, Alexei Starovoitov wrote:
> > Hi Peter, Dave,
> >
> > this patch set is a follow up to the discussion:
> > https://lkml.org/lkml/2016/8/4/304
> > It turned out to be simpler than what we discussed.
> >
> > Patches 1-3 is a bpf-side prep for the main patch 4
> > that adds bpf program as an overflow_handler to sw and hw perf_events.
> > Peter, please review.
> >
> > Patches 5 and 6 are tests/examples from myself and Brendan.
>
> Brendan, so this works for you without extra hacks required?

Yes, thanks for checking, I've done both IP and stack sampling so far with it.

Brendan